Close

Adding Some Texture

A project log for QuickSilver Neo: Open Source GPU

A 3D Graphics Accelerator for FPGAs

ruud-schellekensRuud Schellekens 06/09/2016 at 23:110 Comments

Enough formula and bit-counting, let's get to something more exciting: Texture Mapping!

Texture mapping is, at its core, pretty straightforward. You have a image (the texture) you want project onto your triangle. The texture has horizontal and vertical coordinates U and V. For a triangle, you can take the (U,V) coordinates at each vertex, and just interpolate to get the coordinates at each point within the triangle. Then, when drawing the triangle, instead of directly drawing a colour we instead calculate the (U,V) coordinates, look up the corresponding texture element (texel), and use that colour to draw to the screen.

Pretty simple, right? We can actually already do this in QuickSilver, just use the R and G values as the U and V coordinates, put some memory nearby to place our texture, look up the texel, and output to the screen. Unfortunately, this approach is both theoretically incorrect and practically infeasible. We'll ignore the theory, and solve the practical.

Alright, so that is a bit of a bold claim, let me explain the theoretical incorrectness and why we'll ignore it. The simple linear interpolation I talked about above great for triangles that are viewed straight on, flat relative to the screen, but it doesn't work for triangles to have some depth to them. Compare it to a brick wall. When you take a photograph of the wall straight on, each brick will be the same size on the photograph. Take a picture of the same wall from an angle, thought, and the bricks that are close will appear larger than those that are further away. A simple linear interpolation would render each brick as equally wide on the photograph. This is called "affine texture mapping". Wikipedia has a nice image comparing it to the correct version:

The trick to correct this uses an interesting property: while you can't interpolate Z, or the U,V coordinates, linearly over the screen, you can linearly interpolate 1/Z, U/Z and V/Z. Then when you want to render a pixel you just divide the values of U/Z and V/Z by 1/Z to get the correct U and V.

We're not going to do that. It's a simple fix, I know, but it requires two additional divisions (or one reciprocal and two multiplications) for every pixel. And dividers in hardware are bulky and slow. Instead, we'll follow the great example of the Sony Playstation and Sega Saturn in just completely ignoring perspective correctness and render everything with affine mapping.

The second part of the problem is more practical (with loads more little details further along the line.) Storage and bandwidth limitations once again rear their ugly little heads. Let's start by calculating how much of each we'll expect to need.

Seeing how the VGA output on the Nexys 2 board only has 8 bits per pixel, I'll use the same limit on the textures. As a minimum, I'd would like to display one unique texel per pixel. The 640x480 resolution runs at a pixel-clock of 25MHz, but the actual time between scanlines is a bit longer due to the horizontal blanking times. If you include the blanking time it comes to an equivalent of 800 pixels per scanline, so we have (800/640) / 25MHz = 50ns per texel, or 20MBytes/s.

As always, we can choose between two extremes: the very fast but small Block RAM, or the large but slow on-board RAM. The BRAM is easy to interface and can easily meet the bandwidth requirements. Even in the most optimistic situation, can only use 16 BRAM for texture storage, which at 2KByte per BRAM means a total of 32KByte of texture memory, or a total texture space of about 256x128 texels. That's small even for a single texture, but this would have to be shared among all textures on the screen.

The on-board RAM is much bigger. At 16MByte it could fit a whopping 4096x4096 texels. It's random access latency is rather high at 70ns, though. Too large to read the textures at the rate we would like, and that's before we even considered doing anything else with the RAM.

In the next post, I'll talk about how texture compression can be applied to reduce both storage and bandwidth requirements for textures. After that I'll talk about how I use caching to combine the benefits of both types of memory.

Discussions