Close

Plasma Effect for TMS9918

A project log for Game Boards for RC2014

Run classic video games on your RC2014

jb-langstonJ.B. Langston 09/07/2020 at 15:390 Comments

Plascii Petsma by Cruzer/Camelot has one of the nicest looking plasma effects I've seen for the C64. Since he included the source code, I was able to port it to the Z80 and TMS9918.   

On top of the features in the original C64 version, I have added the following interactive features:

In this article, I will explain how the code works. Before getting into the specific implementation details, it helps to understand how plasma effects work in general. Rather than write another explanation when others have already done it well, I'll refer you to this one, which covers the basic concepts using C code.

The implementation used here defines each plasma effect using a set of constant parameters:

SineAddsX:    defb    $fa,$05,$03,$fa,$07,$04,$fe,$fe
SineAddsY:    defb    $fe,$01,$fe,$02,$03,$ff,$02,$02
SineStartsY:  defb    $5e,$e8,$eb,$32,$69,$4f,$0a,$41
SineSpeeds:   defb    $fe,$fc
PlasmaFreqs:  defb    $06,$07
CycleSpeed:   defb    $ff
ColorPalette: defw    Pal01

Each of these parameters governs a specific aspect of the plasma calculation, which will be described in detail below. This scheme allows a huge variety of unique plasma effects to be specified in a compact format, and random effects can be easily produced by generating parameters in the appropriate ranges.

When implementing a plasma effect on an 8-bit computer, there are several challenges to overcome.  First, the processor doesn't support floating point math or provide a sin function. This can be overcome using a sine table, which contains pre-computed sine values converted to 8-bit integers.  

Normally, the amplitude of a sine wave ranges from -1 to 1.  When converted to 8-bit integers, the amplitude ranges from -128 to 127.  For ease of lookup, the sine table will contain 256 bytes representing a full period of the sine wave.  

A single period of the sine wave first falls from 0 to -1, then rises back from -1 to 0, rises further from 0 to +1, then falls from 1 to 0.  Each quarter of the period is the same curve, flipped vertically and/or horizontally.  To save space, only the first quarter needs to be pre-computed and these values can be mirrored at runtime.

I used a python script to generate the first quarter of the sine table.  The resulting data looks like this:

SineSrc:
        defb    $81,$84,$87,$8a,$8d,$90,$93,$96
        defb    $99,$9c,$9f,$a2,$a5,$a8,$ab,$ae
        defb    $b1,$b4,$b7,$ba,$bc,$bf,$c2,$c4
        defb    $c7,$ca,$cc,$cf,$d1,$d3,$d6,$d8
        defb    $da,$dc,$df,$e1,$e3,$e5,$e7,$e8
        defb    $ea,$ec,$ed,$ef,$f1,$f2,$f3,$f5
        defb    $f6,$f7,$f8,$f9,$fa,$fb,$fc,$fc
        defb    $fd,$fe,$fe,$ff,$ff,$ff,$ff,$ff

The MakeSineTable routine builds the sine table for a complete period from the precalculated quarter period.  The first 64 values are copied verbatim from the precomputed values. The next 64 values are flipped horizontally by copying them in reverse order. The last 128 values are flipped vertically by complementing them. The vertically flipped values are written twice, first in forward order, and then in reverse order to flip them horizontally and complete the period. The resulting lookup table is stored on a 256-byte boundary so that a sine value can be looked up by loading a single register with the input value.

The next challenge to overcome is representing a smooth gradient of color when the video chip only has 15 colors to work with.  To do this, I use dithering to combine different ratios of the 15 colors available.  This is the full set of 64 gradient tiles used in Produkthandler Kom Her on the C64:

The TMS9918 tile mode defines 256 tile patterns, each of which is associated with a specific foreground and background color. For palettes of 8 colors each, I can use 32 tiles per color, so I only use every other tile from this set. The LoadPatternTable routine loads 8 copies of the 32 tiles into the TMS9918 pattern table.

The color table in Graphics I mode consists of 32 bytes. Each byte defines two colors for 8 consecutive patterns in the pattern table.  The upper nybble defines the color of the 1 bits and the lower nybble defines the color of the 0 bits. For simplicity, palettes are stored with one color per byte, and the LoadColorTable routine combines each adjacent color into a single byte for the color table. Since I am using 8 colors and 32 tiles per color combination, I need to load each color combination into the color table 4 times.

With the pattern and color tables pre-loaded, I need only update the name table on each frame with the 768 bytes defining which tile will be shown in each of the 32x24 tile positions.

This brings us to the third challenge: calculating the values for the name table quickly enough to achieve a 60Hz frame rate.  I use a couple of tricks to achieve this. First, the base image is pre-calculated only once for each effect and then distorted on each frame using simpler calculations. 

The initial value for each tile is calculated in the CalcPlasmaStarts routine by summing together 8 sine waves of varying frequencies.  Each of the 8 sine waves is defined by a starting value and coefficients for the x and y coordinates. The sine waves combine to create the complex contours of the image. The constants S, X, and Y in this formula stand for the SineStartsY, SineAddsX and SineAddsY parameters.

For each frame, the code applies a distortion effect to the original image.  For each row y of frame n, the distortion value is calculated as follows.  The constants S, P, and C, stand for the SineSpeeds, PlasmaFreqs, and CycleSpeed parameters.

For each row, the distortion value is calculated as described above, then for each column within the row, the distortion value is added to the starting value pre-calculated for that tile and saved into a back buffer for the screen.

In order to calculate each frame quickly, the loops are unrolled and all values are kept in registers.  The CalcPlasmaFrame function sets up the registers for the unrolled loop that will follow.  The registers are assigned as follows:

The MakeSpeedCode routine runs at the beginning of the program to copy the row and column sections of code into memory repeatedly to create the unrolled loops.

The main loop of the program repeatedly calls CalcPlasmaFrame for each frame, and then copies the back buffer into VRAM during Vsync. 

A counter is used to determine when to switch to a new plasma effect. Each effect is displayed for 256 frames. Hold mode can be enabled to display the current effect indefinitely.  Animation can also be disabled to display the initial image so that the effect of changing the parameters can be observed.

After the last frame, the main loop calls NextEffect to load the parameters for the next effect, then calculates the starting image and loads the color table with the specified palette.

A number of predefined effects are provided in the format described above, or the RandomParameters routine can generate parameters for a random effect if this mode is enabled.

Random numbers are generated using a combined LFSR/LCG pseudo-random number generator with 16-bit seeds.  The random number generator is called repeatedly to generate each of the required parameters, which are then masked and adjusted as necessary to create the desired range of values.

At the beginning of the program, the RandomSeed routine seeds the PRNG from the screen buffer area of memory.  This area of memory should contain relatively random data regardless of whether this program or some other program was loaded into that memory previously.  To increase randomness, the offset into the screen buffer is determined by the refresh regsiter. On the Z80, this register automatically increments for each memory refresh cycle, and provides a source of relatively random 8-bit values.

Finally the ProcessCommand routine and associated functions provide a simple keyboard command interface to control the program. 

Translating 6502 to Z80 as efficiently as possible was an interesting exercise. The lack of indexed indirect mode made it pretty tricky sometimes. I tried to play to the Z80's strengths when I could. I keep almost everything in registers and use the shadow register set extensively. The speed code is composed entirely of one byte instructions and the only things held in memory are the input and output arrays.

A lot of the complexity in Plascii Petsma's code was because he didn't want to use a custom character set. Since I didn't have this self-imposed restriction, I was able to remove that code and just calculate the tile names directly from the sine table, which simplified my code significantly.

The Plascii Petsma source code is well written and ingenious, but it's not well-commented or explained anywhere, so it took me a long time to understand it.  Although good explanations exist for the general concept of plasma effects in high level languages, using assembly on an 8 bit platform adds another layer of complexity. Hopefully my explanation removes some of the mystery from how to implement plasmas on 8 bit machines. 

Discussions