Close
0%
0%

PIC Graphics Demo

Generate 640x480 64-color VGA graphics with an 8-bit PIC and an SRAM framebuffer

Similar projects worth following
I wanted to do some ray-tracing on an 8-bit PIC16 in 1kB. I haven't fit the ray-tracer in 1kB yet (currently 8.3kB), but I got pretty fractals working in 1kB. I also wanted actual by-the-spec full-resolution 640x480 VGA output, which meant an external framebuffer - implemented in a minimal style. That part works according to plan.

All code and design files for this project are released under an MIT License.

OK, I want to do some flashy graphics on a PIC. And on a VGA monitor. I've generated composite video on PICs before, and seen VGA implemented, but it's always a pain, and you're forced to accept marginal timings and low resolution. One particularly nice job of navigating the constraints is at #VGA Blinking Lights - actually another contest entry! But, I want a full-resolution display and while I fondly remember the timing liberties you could take with "multisync" CRTs back in the linux modeline days, I'm not sure modern LCDs are so forgiving. So, I'm planning an actual 640x480 display with standard timings, which means a 25.175 MHz dot clock - too fast to bitbang - and a bunch of external memory to store the pretty pixels.

Luckily, I've been following @esot.eric's adventures with #sdramThingZero. His work has inspired me to store the whole digitized VGA signal, sync pulses and all, in a 4 Mbit SRAM. The pic can write the required waveforms to the RAM at its leisure, then the signal can be clocked out at 25.175 MHz with a few 74AC163 counters and a handful of latches and glue logic.

I got fractals working in 1kB on a PIC16F1718 coded in pure C with the hardware 640x480 VGA adapter. The results are in this log and the code is here. The hardware is discussed in this log, and a pdf schematic can be downloaded here.

wrencher.c

PIC16F1718 code for displaying 640x480 wrencher on VGA adapter

x-csrc - 4.92 kB - 01/06/2017 at 23:40

Download

rle_wrencher.h

640x480 wrencher logo compressed with RLE

x-chdr - 8.91 kB - 01/06/2017 at 23:39

Download

LICENSE.txt

MIT license for all files here

text/plain - 1.06 kB - 01/05/2017 at 17:40

Download

xkB_ray_tracer.c

8.3kB ray tracer for PIC16F1718 with VGA hardware

text/x-csrc - 7.88 kB - 01/05/2017 at 17:28

Download

vga_schematic.pdf

Schematic for VGA generation hardware

application/pdf - 22.79 kB - 01/03/2017 at 14:10

Preview
Download

View all 6 files

  • 1 × PIC16F1718 Microcontroller
  • 1 × AS7C4096A (-12) 512k x 8 SRAM (10ns grade also OK)
  • 5 × 74AC163 Synchronous 4-bit binary counters
  • 1 × 74ACT574 octal d flip-flip with three-state output
  • 1 × 74AC02 quad NOR gate

View all 19 components

  • Fractals re-written in assembly: 560 bytes

    Ted Yapo01/17/2017 at 19:41 0 comments

    I re-wrote the fractal-generation code in assembly to see what gains I could make. Just a simple manual translation of the code came out to 560 bytes (320 instructions), as opposed to the 1019 bytes of compiled C. It produces the same output as the C version. There are places it could be optimized, but I just wanted a baseline written this way. As I often find in PIC assembly, there's a lot of loading and storing to do (you gotta love RISC), and that overhead before each subroutine call is expensive. The PIC16F1718 has a linearly-addressable memory and two auto-incrementing (or decrementing) pointer registers (FSR0/FSR1), and these can be used to automate the parameter passing overhead. I have some prototype code written with a new framework that should streamline numerical code like this, but haven't had a chance to re-implement the fractal code in it yet. I am anxious to see how much it improves things here.

    All this is leading up to trying to get the ray tracer in 1kB (I can't let go). I also worked out a way to avoid square roots in testing intersections with spheres, which I'll detail in another log. That alone could save a decent chunk of code.

    The naive assembly fractal code is listed here. I know of at least a few places it could be tuned up - not the least of which is inlining the four main calls - that's 8 instructions (14 bytes!) right there. But not terribly interesting ones.

    ;;;
    ;;; generate VGA Mandelbrot set in assembly
    ;;;
    #include <p16f1718.inc>
    
      ;;RADIX           DEC
      ERRORLEVEL     -302
      ERRORLEVEL     -305  
      
      __CONFIG  _CONFIG1,  _FOSC_INTOSC & _WDTE_OFF & _PWRTE_ON & _MCLRE_ON & _CP_OFF & _BOREN_ON & _CLKOUTEN_OFF & _FCMEN_ON
      __CONFIG  _CONFIG2,  _WRT_ALL & _PPS1WAY_OFF & _ZCDDIS_ON & _PLLEN_ON & _STVREN_OFF & _BORV_LO & _LPBOR_OFF & _LVP_ON
    
    ;;;
    ;;;  h/w interface definition
    ;;;
    #define REG_OE_bar  b'00010000'
    #define TIMER_en    b'00001000'
    #define WE_bar      b'00000001'
    #define WE_bit      0
    #define OE_bar      b'00000010'
    #define CP_en       b'00001000'
    #define MR_en       b'00100000'
    #define CP_bar      b'00000100'
    #define CP_bit      2  
    #define MR_bar      b'00010000'
    
    #define VSYNC       b'10000000'
    #define HSYNC       b'01000000'
    #define RGB(r, g, b) (((r & 0x3) << 4) | ((g & 0x3) << 2) | (b & 0x3))
    
    #define H_FRONT_PORCH   .16
    #define H_SYNC_PULSE    .96
    #define H_BACK_PORCH    .48
    
    #define V_FRONT_PORCH   .10
    #define V_SYNC_PULSE    .2
    #define V_BACK_PORCH    .33  
    
    #define MAXITER         .255
    #define ESCAPE_RADIUS   0xc0
    #define COMPONENT_RADIUS 0xf8
    
    ;#define WHOLE_SET  
    #ifdef WHOLE_SET  
    #define IMAG_MIN        0xec00
    #define REAL_MIN        0xd800
    #define IMAG_STEP       0x0015
    #define REAL_STEP       0x0015
    #else
    #define IMAG_MIN        0xff00
    #define REAL_MIN        0xe800
    #define IMAG_STEP       0x0001
    #define REAL_STEP       0x0001  
    #endif  
      
    ;;;
    ;;; variables
    ;;; 
      cblock  0x20
        ;; WRITE_SRAM_BYTES
        sram_value
        sram_count
        ;; WRITE_LINE
        line_vsync_value
        line_rgb_value
        line_count
        line_loop_count
        ;; WRITE_FRAME
        row_l
        row_h
        col_l
        col_h
        ;; mandelbrot calculation
        iter
        a_l
        a_h
        b_l
        b_h
        c_l
        c_h
        d_l
        d_h
        dc_l
        dc_h
        dd_l
        dd_h
        aa_l
        aa_h
        bb_l
        bb_h
        aa_plus_bb_l
        aa_plus_bb_h
        red
        green
        blue
        ;; MUL16
        mul_a_l
        mul_a_h
        mul_a_2
        mul_a_3
        mul_b_l
        mul_b_h
        mul16_flags
        prod_0  
        prod_1  
        prod_2  
        prod_3
        ;; ABS16
        abs_l
        abs_h
        ;; MUL16_SIGNED
        mul_sign
        ;; MUL16_SHIFT
        mul_shift_count
      endc
    
    ;;;
    ;;; reset vector
    ;;; 
        ORG         0
        call        SETUP_PERIPHERALS
        call        LOAD_MODE
        call        WRITE_FRAME
        call        RUN_MODE
    MAIN_LOOP:
        bra         MAIN_LOOP
    
    SETUP_PERIPHERALS:
        ;; intosc 32 MHz
        BANKSEL     OSCCON
        movlw       b'11110000'
        movwf       OSCCON
        ;; select digital I/O
        BANKSEL     ANSELA
        clrf        ANSELA
        clrf        ANSELB
        clrf        ANSELC    
        ;; set TRIS bits: all outputs
        BANKSEL     LATA
        clrf        LATA
        clrf        LATB
        clrf        LATC
        BANKSEL     TRISA
        clrf        TRISA
        clrf        TRISB
        clrf        TRISC    
        return
        
    LOAD_MODE:
        BANKSEL     LATA
        movlw       REG_OE_bar | TIMER_en
        movwf       LATA
        ;; toggle CP with MR low to reset address counter
        movlw       WE_bar | OE_bar | CP_en | CP_bar   | MR_en | MR_bar
        movwf       LATB
        movlw       WE_bar | OE_bar | CP_en | CP_bar&0 | MR_en | MR_bar
        movwf       LATB    
        movlw       WE_bar | OE_bar | CP_en | CP_bar   | MR_en | MR_bar
    ...
    Read more »

  • 640x480 RLE Wrencher (4409 Bytes)

    Ted Yapo01/06/2017 at 23:50 19 comments

    I had to do it.

    The logo plus decoding code takes up 2519 14-bit instructions (4409 bytes). I used a simple RLE compression and stored the compressed data in a header file. I only compressed the left half image, then decompressed the runs in reverse for the right half. The whole code is uploaded here, but the decompression part looks like:

    void GenerateFrame()
    {
      GenerateLine( VSYNC   , RGB(0, 0, 0),  33);  // V back porch
    
      const uint8_t *data_ptr = rle_wrencher;
      uint16_t row = 480;
      do {
        write_SRAM_bytes( VSYNC | HSYNC   | 0 , 16);  // H front porch
        write_SRAM_bytes( VSYNC | HSYNC&0 | 0 , 96);  // H sync pulse
        write_SRAM_bytes( VSYNC | HSYNC   | 0 , 48);  // H back porch
    
        const uint8_t *row_ptr = data_ptr;
    
        // left half image
        uint8_t color = 0;
        uint8_t num_runs = *data_ptr++;
        if (num_runs){
          do {
            uint8_t run_length = *data_ptr++;
            write_SRAM_bytes( VSYNC | HSYNC   | color, run_length);
            color ^= 0x3f;
          } while (--num_runs);
        } else {
          write_SRAM_bytes( VSYNC | HSYNC   | 0, 160);
          write_SRAM_bytes( VSYNC | HSYNC   | 0, 160);
        }
    
        // right half image (runs processed in reverse)
        color ^= 0x3f;
        num_runs = *row_ptr;
        if (num_runs){
          do {
            uint8_t run_length = *--data_ptr;
            write_SRAM_bytes( VSYNC | HSYNC   | color, run_length);
            color ^= 0x3f;
          } while (--num_runs);
        } else {
          write_SRAM_bytes( VSYNC | HSYNC   | 0, 160);
          write_SRAM_bytes( VSYNC | HSYNC   | 0, 160);
        }
        data_ptr += *row_ptr;
    
      } while (--row);
    
      GenerateLine( VSYNC   , RGB(0, 0, 0),  10);  // V front porch
      GenerateLine( VSYNC&0 , RGB(0, 0, 0),  2);   // V sync pulse
      write_SRAM_bytes( VSYNC | HSYNC | 0, 2);     // end of vsync; resets counter
    }
    I really wanted to fit this into 1kB, but ran out of time.

    Quadtrees?

    I also experimented with quadtree compression, which should take better advantage of the solid 2D areas (not just 1D runs). The best strategy I found was to compress the top and bottom halves as 256x256 blocks (bottom visualized here):

    Again, symmetry would be used to create the right side. I got the data down to 1047 bytes this way, but didn't think I could add the decompression code *and* find a way to make it all fit in 1kB, so I abandoned the effort.

    I think you probably could get this image into 1kB, though, if you really worked at it.

    Now, back to other projects...


    For those that want to try compressing this 640x480 rendering of the wrencher, here is the one I used as a png. I do not know anything about the intellectual property status of this image, so, you know...don't sue me or anything. I make no representations whatsoever about rights to use this logo or this specific rendering of it. I hear it's a touchy subject - but then again, there are a number of instances of people using the logo, then not being sued for trademark infringement, so that sounds like failure-to-enforce. But failing to enforce copyright doesn't weaken the copyright, so ... whatever. Then again, this is their site, so if they have an issue with this, they should send themselves a takedown notice.

  • Other Video Formats (SVGA/Composite)?

    Ted Yapo01/06/2017 at 14:08 10 comments

    So, 640x480 VGA is nice, but what else could you do with this hardware (or something very similar?). The vertical-sync address reset is very flexible regarding frame size - my initial tests of the board used only an 8-pixel frame so I could see it all on the scope. By changing the dot clock, you could have the board play out video in whatever format you wanted, limited basically by two factors: SRAM size and maximum clock frequency.

    SRAM Size

    I took a look at this page to determine the SRAM requirements for various SVGA modes. With this adapter, each dot clock in the whole frame needs a slot in the SRAM. For example, in 640x480, a total of 800x525 = 420000 bytes are required, since the video frame consists of 525 lines of 800 dot clocks each. Here's a summary of some common modes and what size memories they fit into:

    MODE Line Width Lines Bytes Req'd 512k x 8 1M x 8 2M x 8 4M x 8
    640x480 800 525 420000 X X X X
    768x576 976 597 582672 X X X
    800x600 1024 625 640000 X X X
    1024x768 1344 806 1083264 X X
    1280x1024 1688 1066 1799408 X X

    I didn't bother with modes above this, because the required clock speed becomes the limiting factor.

    As you can see, 640x480 is the only common resolution that fits in the 512k SRAM I used - to go any higher, you'd need a bigger one. The (5x) 74AC163 counters could generate 1M addresses (20 bits); beyond that, and you'd need to expand the counter.

    Clock Frequency

    I looked at the minimum VESA-standard refresh rates for the above modes (typically 60 Hz), what dot clock frequencies are required, and the period of this frequency:

    MODERefreshDot ClockPeriod
    640x48060 Hz25.175 MHz39.7 ns
    768x57660 Hz34.96 MHz28.6 ns
    800x60056 Hz36 MHz27.8 ns
    1024x76860 Hz44.9 MHz22.3 ns
    1280x102460 Hz108 MHz9.26 ns

    According to Ti's datasheet, the 74AC163 will count at 103 MHz over commercial temperatures at 5V. But, the maximum propagation delay from the clock to the outputs is 15 ns. I used a 12ns SRAM, although I see them (in less hacker-friendly packages) down to 6ns. With the SRAM I used, you might be limited to a (12+15 =) 27 ns cycle time, which could do 800x600 but no higher. Moving to a 6ns SRAM allows a cycle time of (6+15 = 21), which with some tricks and tweaks might get you to 1024x768.

    FPGAs?

    I couldn't implement the counting and reset logic in an FPGA for this project because I thought the FPGA code might count against the 1kB limit - maybe it didn't; who knows. But, it certainly seems like a small FPGA might do the counting and reset logic easily, and do it faster and more compactly than the discrete logic packages. Combined with a larger, faster SRAM, this might make a nice system. With the extra logic afforded by the FPGA, you might even add a way to write to the SRAM while the VGA output is active - the biggest missing piece in this simple system. Of course, you could also implement a more traditional split-counter and synch-generation system while you were at it.

    Composite Video?

    Last, and possibly least, would be generation of composite video (NTSC or PAL). There's enough memory on the board I built to store a nice NTSC frame. The clock frequency of 25.175 MHz is more than 7x the color-burst frequency at 3.58 MHz, so you might even be able to generate a full-color signal by directly synthesizing the 3.58 MHz color subcarrier along with the luminance signal. I think the only hardware modification required would be to ditch the three 2-bit DACs and replace them with a single 6-bit DAC. For 6 bits, you can use 1% resistors in an R/2R ladder.

    Arbitrary Waveforms?

    At its heart, this system is an arbitrary waveform generator, so why not use it as such? I'm guessing you could go to 35 MHz clock frequency with this board, maybe a little more. That would give you a theoretical maximum sinewave output frequency of 17.5 MHz (yes, you'd have to filter it heavily). A more...

    Read more »

  • Zero Instruction Elapsed Timer

    Ted Yapo01/06/2017 at 02:04 8 comments

    Since these images take so long (hours) to generate, I wanted to be able to time them and report exact run-times. But, I didn't want to add any extra instructions to do it, so I decided to use a very simple, low-tech approach.

    I found this clock for $3.88 at Walmart. It normally runs off an AA battery. Instead, I'm running it from an unused output line on the PIC:

    The resistor and diodes form a crude 1.4 V regulator. The clock wouldn't run with small capacitors - the hand would twitch but not fully advance. The mechanism must need a hefty current pulse to actually advance it.

    I was able to set and clear a bit in the code just by changing the constants that get written to the ports, so I didn't increase the code size at all.

    To use the timer, you set the clock to 12:00 before a run, then come back and read off the elapsed time after it's done. For very long runs, you just have to check the clock every 12 hours. This would at first appear to violate the Nyquist criterion, since the period of the clock is 12 hours, meaning you have to sample at least every 6 hours, but it doesn't :-)

    I probably won't have timing data before the contest deadline, but I'll post it here when I have it.


    The ray-tracing took 1 hour and 28 minutes to run:

    I'm timing the fractal code now. I think that takes longer...


    Technically, I was correct, the fractal code took longer, but not much: 1 hour and 32 minutes. Funny, they're pretty well matched.

    I didn't wait around while I was running these things before - I always kicked them off before going to bed, or something like that.

  • Finally: Ray-tracing on an 8-bit PIC (not 1kB)

    Ted Yapo01/05/2017 at 17:33 0 comments

    OK, I'll clog your feed with one more portrait-aspect image (sorry for all of them). I finally got the ray-tracer doing the right thing. It's still 8.3kB of code, which is the best I'm going to be able to do before the contest deadline, but it works. The fractals in 1kB will have to be the code for the contest, but here's ray-tracing on an 8-bit PIC:

    The wavy lines are moiré patterns caused by the monitor pixels beating with the camera pixels; they're not in the generated image. I uploaded updated source code with the latest epsilon fix. The code ain't pretty, but "it works."

  • 24-bit Float Fail

    Ted Yapo01/05/2017 at 13:00 18 comments

    It turns out that the 24-bit floating point implemented in the XC8 compiler requires a few tweaks of the ray-tracing code. My goal is to re-write this all in 16 (or 24) bit fixed point, anyway, but the code I had ready used floats. Here's the problem:

    The noise on the spheres is caused when rays bounce off, then are found intersect the sphere again immediately. This happens because if the origin of the reflected ray is right on the surface of the sphere, it's ambiguous which side the ray originates on - the dropout points above are where the origin of the reflected ray were found to be inside the sphere, so the ray got trapped in there instead of bouncing off normally. The classic solution to this classic problem is to add a small offset (epsilon) to the reflected ray origin to ensure the reflected ray remains outside - and the required magnitude of this offset depends on the numerical precision used.

    I tested this code on my linux box with IEEE 32-bit floating point, where my chosen epsilon was sufficient. Porting the code to the PIC with 24-bit floats looks like it requires a few tweaks. I changed one line to bump epsilon:

          // reflect from sphere
          float eps = 0.1;

    and I have it running again.

    CONTEST DISCLAIMER: this code is 8.3kB in size.


    Second Try - Slightly Less Fail

    My epsilon is still too large - dropouts on the left-hand sphere only now. It's running again...

  • Can I have 9kB if I enter nine times?

    Ted Yapo01/05/2017 at 01:35 4 comments

    I got the ray-tracer to fit - into the PIC, anyway. At 8473 bytes, it's way over the 1kB limit, unfortunately. The good news is that this is without any serious attempts at size optimization - it's using the free XC8 compiler's native 24-bit floating point in pure C, and it was written more for simplicity than compactness. I figured once I got it working, I could start cutting corners to make a version fit.

    The bad news is that since this thing is going to take so long to run, I need to start running the code I have if I want a chance at it finishing before the contest deadline - my rough estimates put it at 8 hours, but it could be twice as long or more. So, I kicked it off a few minutes ago.

    So, it won't be in 1kB, but I might get a ray-tracer on an 8-bit PIC in true VGA resolution. In some universe, that's worth doing anyway :-)

    I'll post an image when it appears. I uploaded the code already - you can see that it could use some serious optimization.

    The one issue I had was that the XC8 compiler doesn't support recursive functions, which make ray tracing so much easier. I had to convert to an iterative approach. I've written four (or maybe five) ray tracers of varying complexity for different applications (graphics / optics / solar concentrator design) since the late 1980s, but I think this was the first time I couldn't use recursion. It was a neat little twist.

  • Final Hardware Design

    Ted Yapo01/03/2017 at 03:09 6 comments

    Here's the hardware as-built. Click here for a nicer pdf version. It took a few more ICs than I originally thought, but it's still very simple.

    All five waveforms (vsync, hsync, red, green, blue) are stored as a sequence of bytes in a 512k x 8 SRAM; one byte for each pixel clock in the entire frame - blanking intervals and all. Five 74AC163 synchronous binary counters cycle the addresses into the SRAM. The output of the SRAM is latched in a 74ACT574 register - the ACT part is used here since the SRAM has TTL-level outputs. The vsync and hsync signals are output through 61.9 ohm resistors, source-terminating the 75-ohm cables. The color signals are each formed with a simple 3-resistor DAC, providing a total palette of 64 colors, again matched to 75 ohms. Gamma correction is willfully ignored.

    I used a 74AC02 quad NOR gate as a pair of MUXes for the clock and reset signals to the counters. When loading data into the SRAM, the PIC bitbangs the clock and reset lines to address sequential locations. After the PIC finishes loading the data, it switches control of the clock and reset lines back to the free-running circuit.

    The address reset circuit is fed by a synchronous edge detector (a 74AC74 flip-flop and a 74AC00 NAND gate) that detects the rising edge of the vsync pulse to reset the counter. This arrangement means that the vertical back porch has to be stored first in the SRAM, but this is easily handled by the PIC software.

    Finally, since the 74AC logic edges are so fast, and the counters and flip-flops edge sensitive, I took extra care routing the clock signals around the board. The 25.175 MHz dot clock comes from a pre-packaged oscillator "can". I used a 74AC244 octal buffer as a clock-distribution amp, with each of the seven clock lines required on the board driven by a dedicated buffer. To prevent distortions of the clock signals, each line was run with a twisted-pair of wire-wrap wire. These home-brew pairs have an impedance of about 102 ohms, so 86.6 ohm resistors were used to source-terminate each of them at the 74AC244 outputs. The resulting clock signals look good at each of the clocked ICs around the board - this is not the place where you want ringing and possible double-triggering.

    It might not have been obvious from other photos of the board, but most of the ICs are SOIC/SOJ and are mounted on some adapters I designed and had made at OSH Park:

    These worked really well. The pads are spaced out just enough to make soldering manageable. I've soldered directly to SOIC pins before, and I don't like it. Too small. The boards assume the standard corner power pins for logic ICs and include sites for MLCC bypass caps.

    Oh, and as for power consumption - when the PIC is calculating the fractals and loading the SRAM, the circuit draws around 19 mA. Once the VGA generation starts, this jumps to 128 mA. Still not too bad, I guess - you could run it off a USB port if you asked the USB host nicely for more than 100 mA.

    EDIT 20170103

    I updated the schematic to include the CE_bar line on the SRAM, which just gets tied low.

    I have also been thinking about the whole clock distribution/twisted pair thing. It might be avoided by using a 74HC gate as the clock driver - with the slower edge rates, maybe you don't need to worry so much about wire lengths. I discussed why 74AC163's were required a few logs ago, but that doesn't mean everything has to be 74AC. If I build another one, this would be an interesting place to try to simplify even more. Maybe a 74HC02 substituted for the 74AC02 could serve as the MUXes and clock driver.

    I found a reference that says 74HC edge rates can be as short as 5ns. This is 1.5m of wire at a velocity factor of 1. 1/6 of this length is 25cm. That might make it doable, assuming the rest of the timing still holds.

  • Official Contest Entry Log

    Ted Yapo01/02/2017 at 16:27 11 comments

    OK. I got pretty fractals onto a VGA monitor using 1019 bytes of code on a PIC16F1718. Unless I get something better done by Thursday, I'm going to consider this my contest entry. I let this one run overnight - when I turned on the monitor this morning, here's what I saw:

    The straight C-code takes up 582 14-bit instructions, which is equivalent to 582 * 14 / 8 = 1018.5 bytes.

    The code just fit - actually, as originally written, it was one instruction over. I had to inline the SetupPeriperhals() call to shave off four instructions. Note that this code is compiled with the free Microchip XC8 compiler. The free version tells me that the code could be 232 words smaller if I used the Pro version - you have to wonder how it knows :-)

    Yes, writing the whole thing in C is lazy. Perhaps I can redeem myself in the next few days. I did really want to do a hardware project, though. I'll draw up a schematic for the board as-built today for completeness.

    Here's the code. I'll upload it as a file, too.

    //
    // vga_test.c - create first VGA frame
    //
    #include <xc.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    #include <pic16f1718.h>
    
    // CONFIG1
    #pragma config FOSC = INTOSC
    #pragma config WDTE = OFF
    #pragma config PWRTE = ON
    #pragma config MCLRE = ON
    #pragma config CP = OFF
    #pragma config BOREN = ON
    #pragma config CLKOUTEN = OFF
    #pragma config FCMEN = ON
    
    // CONFIG2
    #pragma config WRT = ALL
    #pragma config PPS1WAY = OFF
    #pragma config ZCDDIS = ON
    #pragma config PLLEN = ON
    #pragma config STVREN = OFF
    #pragma config BORV = LO
    #pragma config LPBOR = OFF
    #pragma config LVP = ON
    
    //
    // h/w interface definition
    //
    #define REG_OE_bar 0b00010000
    #define WE_bar     0b00000001
    #define OE_bar     0b00000010
    #define CP_en      0b00001000
    #define MR_en      0b00100000
    #define CP_bar     0b00000100
    #define MR_bar     0b00010000
    
    #define VSYNC 0b10000000
    #define HSYNC 0b01000000
    #define RGB(r, g, b) (((r) << 4) | ((g) << 2) | (b))
    
    #if 0 // manually inlined below to save 4 instructions
    void SetupPeripherals() {
      // intosc 32 MHz
      OSCCON = 0b11110000;
    
      // select digital I/O
      ANSELA = 0;
      ANSELB = 0;
      ANSELC = 0;
    
      // set TRIS bits: all outputs
      PORTA = 0x00;
      TRISA = 0x00;
      PORTB = 0x00;
      TRISB = 0x00;
      PORTC = 0x00;
      TRISC = 0x80;
    }
    #endif
    
    //
    // set control lines for free-running VGA signal generation
    //
    void RunMode()
    {
      TRISC = 0xff; // data lines all inputs
      // reset address counter, then let it rip
      LATB = WE_bar | OE_bar&0 | CP_en&0 | CP_bar&0 | MR_en   | MR_bar   ;
      LATB = WE_bar | OE_bar&0 | CP_en&0 | CP_bar&0 | MR_en&0 | MR_bar&0 ;
      LATA = REG_OE_bar&0;
    }
    
    //
    // set control lines for bitbanging waveforms into SRAM, and
    //   reset SRAM address counter to 0
    //
    void LoadMode()
    {
      LATA = REG_OE_bar;
      // toggle CP with MR low to reset address counter
      LATB = WE_bar | OE_bar | CP_en | CP_bar   | MR_en | MR_bar   ;
      LATB = WE_bar | OE_bar | CP_en | CP_bar&0 | MR_en | MR_bar   ;
      LATB = WE_bar | OE_bar | CP_en | CP_bar   | MR_en | MR_bar   ;
      // bring out of reset
      LATB = WE_bar | OE_bar | CP_en | CP_bar   | MR_en | MR_bar&0 ;
      TRISC = 0x00; // data lines all outputs
    }
    
    //
    // bitbang a number of identical bytes into sequential SRAM addresses
    //
    void write_SRAM_bytes(uint8_t value, uint8_t count)
    {
      PORTC = value;
      LATB = WE_bar | OE_bar | CP_en | CP_bar | MR_en | MR_bar&0 ;
      do {
        // toggle WE to write data
        LATB = WE_bar&0 | OE_bar | CP_en | CP_bar   | MR_en | MR_bar&0 ;
        LATB = WE_bar   | OE_bar | CP_en | CP_bar   | MR_en | MR_bar&0 ;
        // toggle CP to advance address
        LATB = WE_bar   | OE_bar | CP_en | CP_bar&0 | MR_en | MR_bar&0 ;
        LATB = WE_bar   | OE_bar | CP_en | CP_bar   | MR_en | MR_bar&0 ;
      } while (--count);
    }
    
    void GenerateLine(uint8_t vsync, uint8_t rgb, uint8_t count)
    {
      do {
        write_SRAM_bytes( vsync | HSYNC   | rgb&0 , 16);  // front porch
        write_SRAM_bytes( vsync | HSYNC&0 | rgb&0 , 96);  // sync pulse
        write_SRAM_bytes( vsync | HSYNC   | rgb&0 , 48);  // back porch
        write_SRAM_bytes( vsync | HSYNC   | rgb   , 200); // video
        write_SRAM_bytes( vsync | HSYNC   | rgb   , 200); // video
     write_SRAM_bytes( vsync...
    Read more »

  • Gamma correction? We don't need no stinkin' gamma correction.

    Ted Yapo01/02/2017 at 03:50 0 comments

    It actually could use gamma correction, but I'm not going to do it. The problem is that the VGA video intensity response curve is non-linear: a hold-over from CRTs. To compensate, the video signal levels need to be be pre-distorted so that the overall system response is linear. Without correction, here's the 64-color palette of the adapter:

    It's not bad, but it would look better corrected. Of course, you can't do this kind of non-linear transform with just a resistor network. I had thought about decoding each 2-bit color component into a 1-of-4 with a 74AC138 decoder, then giving each level it's own resistor, but didn't want to add that much more hardware.

    Next, I considered making a non-linear network with resistors and diodes, but thought that color drift with temperature would be a bizarre side-effect.

    In the end, I decided to just leave it.

View all 21 project logs

Enjoy this project?

Share

Discussions

jaromir.sukuba wrote 12/02/2016 at 13:20 point

What is the actual PIC you are going to use?

Btw. your spellcheck is too pic-ky


  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 13:53 point

LOL, I saw that wavy line once I posted it - and actually created a new image, but then decided I liked the emphasis.  I added "PIC" to the dictionary for next time :-)

I still haven't decided on the PIC - except for 8-bits.  I'd get more instructions on a 12-bit PIC, but coding graphics without the extra instructions probably more than makes up for the 14-bit word (ADDWFC alone might be worth it).

I might end up with a larger one, just to get more RAM - of course, most of the flash will be empty :-)

I'm going to start playing with whatever I have around here - and send pixels over the UART to display them with some python code on a PC.  That was the original plan, but I'm not sure if the whole PC O/S would have to fit in the 1kB :-) 

  Are you sure? yes | no

jaromir.sukuba wrote 12/02/2016 at 20:29 point

12-bit PICs do have narrow instruction word, but the instruction set is very RISC-ish, do have shallow stack, usually very little memory (RAM is the problem here). PIC16xxx are better, but 14-bit instruction word is wider, with not much instructions added. PIC16F1xxx do have the same 14-bit instruction set width, but more with more instructions and you can have pieces with lots of RAM, more CPU speed, better peripherals. Personally, I'd go to PIC18. You have 16-bit instruction width, but even more powerful instruction set. I took some decisions when designing my #Brainf*cktor computer - and without PIC18 core I'd be constantly juggling FSRs, together with worse compare instructions it would probably mitigate the advantage of 14-bit instruction width.

  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 23:45 point

I'll look into the PIC18.  I haven't used them before, so I'd probably do better with the one I'm already familiar with.

Then again, maybe that's exactly what contests are for - to try something new that you wouldn't usually choose.

I'll check out your #Brainf*cktor!

  Are you sure? yes | no

jaromir.sukuba wrote 12/03/2016 at 06:19 point

It is easy to migrate to PIC18. Instruction mnemonics are mostly the same (I think the only exception are shifts), so you can use PIC18 as PIC16 and explore the new instructions as you need.

But anyway, I wouldn't use PIC16Fxxx (like 16F690), as it doesn't have no advantages over PIC16F1xxx (like 16F1829), but a few disadvantages.

  Are you sure? yes | no

Eric Hertz wrote 12/02/2016 at 06:29 point

LOL Thanks for the shout-out! 

You're gonna go with SRAM and "all that logic" eh? Good times! Good call about it not needing to be in rows/cols *in the RAM* though... sequential is much simpler. Storing the syncs/porches there, as well... then feeding-back that sync to reset the counter. Great ideas reducing the logic-count dramatically. 

My 15y/o self is kicking himself for not thinking of 'em: https://sites.google.com/site/geekattempts/old-projects

And if you don't care about how long it takes to load, then no muxing necessary :) 

Cycle through three of 'em and have a short "animated-GIF" of your bullet-movie dice-experiments.

  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 12:55 point

Yeah, I'm hopeful I can keep the counter and glue logic to maybe 5 chips (OK - 10 or less - you always end up with a few more).  

Loading time: who cares :-) It's going to take forever to render graphics on this thing anyway. So, the counter can own the address lines - the PIC can bitbang clock the counter to load the RAM sequentially. I only need to mux the counter clock and reset lines - and I probably won't even synchronize the clock mux - I'll accept a possible runt pulse on the first pixel.  The PIC has tri-state outputs which hook right onto 8-bit data bus, so no muxing parts required there, either.

  Are you sure? yes | no

Ted Yapo wrote 12/04/2016 at 14:27 point

I was checking out your TTL LCD controller this morning - wow, that's pretty cool.  At 15 yrs?  Impressive.  I'd be psyched if I could get something like that working today :-) I think the biggest project I did in my teenage years was an LED oscilloscope (10x10).  There was an idea for it in one of those old Forrest Mims RadioShack books - I tweaked it a bunch, and it came in handy for audio-frequency work until I could afford a real one.  Unfortunately, nothing of it survived.

  Are you sure? yes | no

Eric Hertz wrote 12/04/2016 at 16:29 point

haha "I'd be psyched if I could get something like that working today" so would I... so would I... Yep, I tried an LED 'scope, never did get it running... 4017's don't cascade the way I thought. 10x10 woulda been the way to go!

  Are you sure? yes | no

danjovic wrote 12/02/2016 at 02:56 point

Your project looks awesome! As for refreshing the DRAM I suggest you to take a loot at this article from Mr Chan: http://elm-chan.org/works/vp/report.html

  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 03:00 point

Thanks!  I found 512k x 8 static RAMS - no refresh required!

https://www.digikey.com/scripts/DkSearch/dksus.dll?Detail&itemSeq=213190055&uq=636162090757737599

I originally wanted to use 8 of them and implement page-flipping for animation, but at $4 each, I figured I'd start with one :-)

  Are you sure? yes | no

K.C. Lee wrote 12/02/2016 at 04:04 point

Or you can use SPI RAM.  Save a lot of CPU cycles as it has built-in address generation and can shift out bit(s) sequentially.  There is a 2-bit and a 4-bit mode.

I originally started off the project looking at that approach, but I went the other direction rendering everything live from the CPU.

https://hackaday.io/project/9992-low-cost-vga-terminal-module/log/33436-spi-ram-buffer-idea

  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 04:18 point

Thanks - I hadn't seen that project before.  Interesting stuff - I'm going to have to sift through it - looks like good info.

I had some SPI RAMs in my DigiKey cart at one point this week, but then thought about those nice, fast, socketed DIP cache RAMs I used to see on motherboards.  The 8-bit parallel outputs hooked me.

Three 74HC393's can count to 2^19 no problem - and no CPU cycles :-)

I'm not worried about real-time on the PIC.  If it blanks the VGA for a few hours while it renders its pretty, pretty picture into RAM, so be it.

  Are you sure? yes | no

K.C. Lee wrote 12/02/2016 at 04:39 point

The traditional MUX between the processor/display buffer, address generator, RAM etc all adds up to a lot of parts.  All of that can be done relative easily on a SPI RAM.

You can use 74HC4040 - 12-bit ripple counter.

http://www.nxp.com/documents/data_sheet/74HC_HCT4040.pdf

  Are you sure? yes | no

Ted Yapo wrote 12/02/2016 at 12:32 point

Yes, I was looking at the '4040 counter.  I somehow mis-remembered that not all the count outputs were brought out to the pins, so I had ruled it out.  I was probably confusing it with the 74HC4060, which I had used before - 14 stages, but you don't get the first two outputs.  The 4040 looks good - and they make a 74VHC4040 in case I can't meet timing with the HC part.

Thanks for the brain reset!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates