Bilinear downsampling, aligning pixel grids and the infamous half pixel offset

dietrichepp · on Feb 25, 2021

Fantastic article, I’m bookmarking this.

Recently I learned that an older video game console, the Nintendo 64, would generate 480i interlaced video signals from a 240-row framebuffer by following exactly these recommendations—the alternating 0.25, 0.75 weights applied to each row. The way interlacing works, you’d end up with [0.25, 0.75] for one field and then [0.75, 0.25] for the other field.

Some other consoles did not address this issue and fine lines would vibrate up and down.

(You could also put the N64 in “interlaced mode” and generate a proper 480i signal instead of asking the hardware to upsample a 240p buffer, but the performance requirements were very strict and the hardware was not very powerful.)

rarefied_tomato · on Feb 25, 2021

The half pixel offset is also a Gaussian convolution (the simplest) with filter [1/2, 1/2]. It's centered at the corners between pixels, so it's not obvious to describe it as such.

You can calculate Gaussian filters by scaling a particular row k of Pascal's triangle to sum to 1. Or rather, a square pyramid for the 2D case.

You can calculate any particular Gaussian (i.e. binomial) convolution by iterating this "half pixel offset" operation (averaging the adjacent pixels) k times.

bartwr · on Feb 25, 2021

In image processing we often call those "binomial filters" due to this binomial distribution / Pascal triangle connection. Cool thing is that they both a) sum up to powers of 2, making the normalization divisions in fixed point super efficient - just bit shift! b) due to central limit theorem, they approximate Gaussian better and better. Very useful tool in image processing, especially on DSPs where fixed point / shorter integer arithmetic might be faster or wider vector sizes than floats.

mark-r · on Feb 25, 2021

This whole post seems motivated by a misconception. One of the pioneers of computer graphics, Alvy Ray Smith, said it best in his paper "A Pixel Is Not A Little Square": https://www.cs.princeton.edu/courses/archive/fall00/cs426/pa...

I prefer to treat pixel coordinates as being on the integer coordinates, and if you really need to imagine a box around it treat it as stretching from (-0.5,0.5). This makes different resolutions match up better, particularly when mixing raster and vector graphics.

Jasper_ · on Feb 25, 2021

No, this isn't a misconception. Yes, a pixel is (theoretically) a point sample, but it matters where we site them.

> I prefer to treat pixel coordinates as being on the integer coordinates

If we have a continuous grid, defined by point samples, where everywhere outside of the point samples is undefined (classic sampling theorem), we need to choose the location of those points. Pixels themselves might not be boxes, but during the process of rasterization (sampling an abstract mathematical shape to define coverage), we treat pixels as little squares we compute the area over.

For a lot of good mathematical reasons, we often will choose to site these points at +0.5,+0.5 inside the box [0], and this is what all GPUs do. This is well-defined.

On top of this, we can define the rest of the grid -- usually through some form of interpolation, as the article suggests. When resampling, we need to be extra careful about the location of our pixel samples here.

Please don't think place pixel point sample centers on the "integer grid lines", that leads to a lot of trouble when doing all sorts of things.

[0] For a visual demonstration of why we do this, see https://magcius.github.io/xplain/article/rast1.html#sampling

bartwr · on Feb 25, 2021

Hey, I'm the author of the post and I wish I could simply write I disagree with you and how - but you haven't explained how the integer coordinates help aligning the grids, so hard for me to disagree with any specifics. I show how integer coordinates work in my post and what is the problem ("hanging" pixels on the right, problems with resampling). Integer coordinates lead to all those nasty bugs with resampling like the original Tensorflow. There is a reason why all libraries and GPUs switch to this convention; as it's most sane from the POV of signal processing - and for the most common box reconstruction filter.

I obviously agree with Alvy Ray Smith and his work is super influential. I like his essay (as it's not really a paper!) a lot, and think that programmers don't think about reconstruction filters enough (I have a few blogs posts touching lightly on the topic as well). But this is very different - pixel being a square or not and the optimal reconstruction filters for filtering Monte Carlo renderings don't answer how to align pixel the grids when resampling images and useful conventions.

Finally, I also emphasize at least a few times that while my "default" convention of half pixel offsets is reasonable (i.e. you can multiply UVs by 2 and get decent resampling behavior with any filter - including windowed sincs, Cat-Rom etc), it's just important to understand the one your signal is already represented in and follow it.

mark-r · on Feb 26, 2021

Sorry, my reply missed an important point - you did an excellent job of analyzing how GPU shaders work, and why they work that way. And for that I thank you. But it just means that I disagree with the entire GPU industry.

My reasoning was in the reply, but perhaps it was too subtle. If you have an infinitesimally thin vector line running through a raster pixel, where should it intersect the pixel? I contend that it should run through the center of the raster pixel, so that it continues to align if you make the vector line thicker. If your raster pixel is at (0,0), you don't want to have to offset your vector line to (0.5,0.5) to make it match. That way lies madness, I know from painful experience.

Also when you resize you don't want your input and output coordinates to line up exactly. Why? Because your output should be independent of your input size. For an integer multiple it might not matter so much, but consider for example resizing to 2.5x. If you double the size of your input in both directions, the upper-left corner of the output shouldn't change just because of alignment issues. So for example, if you're doubling the size of a 1-D image with samples at [0, 1] you should be interpolating the points at [-0.25, 0.25, 0.75, 1.25]. That way when you double the input to [0, 1, 2, 3] your output will also double to [-0.25, 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25]. Of course when you pass that output to the next stage in your pipeline the coordinates are on a new axis and revert to integers again.

I've given this subject years of thought, and I back it up with sample code that I use for my own private image resizing. Here's your example downsized to 50% then upsized to 200% using a Lanczos-5 filter: http://marksblog.com/share/LanczosDownUp.png. No half-pixel offsets here.

giomasce · on Feb 25, 2021

Well, you can model pixels as you wish and you can translate coordinates as you wish, as long as you are consistent with your assumptions. Modelling pixels as points (i.e., delta distributions) instead of squares or other continuous distributions makes everything more prone to aliasing artifacts and is arguably less physically justified, but can sometimes be the right thing.

mark-r · on Feb 26, 2021

Did you even read the link I gave? Sampling theory relies on treating samples as sizeless points rather than rectangles, and aliasing is completely predicted by sampling theory.

giomasce · on Feb 26, 2021

Maybe we're just using different words. To me the "shape" of a pixel is the "shape" of the sampling filter you're using, and assuming that I stay by my first comment: you can choose whatever distribution you want (provided that you can couple it with the function space you're choosing for the non-sampled images, but let's avoid technicalities). Unless I am mistaken, this aspect is not discussed in that essay, only lightly touched. They speak more about reconstruction.

In other words, let's say a non-sampled (black and white, for simplicity) image is a function [0,1]^2 -> [0,1]. You have to choose function space, like C^0, L^1, L^2, L^\infty, there are many. Whatever function space you choose, it will probably be infinite-dimensional, do you want to sample it to reduce it to a finite dimension and be able to represent it in a computer. Usually you do that by convolving it with a kernel. Using a delta kernel (i.e., taking the value of the function at the location of a pixel) is choice, but not the only one, and certainly not the one that models physical processes like a camera or a scanner (as the essay itself discusses).

Once you have chosen your sampling filter, you can process to choose your reconstruction filter, and again you have a lot of freedom. One property that you will reasonably want to retain is that the reconstruction filter is a right inverse of the sampling filter, i.e., if you take a sampled image, reconstruct the non-sampled image and then sample it again, you will probably want to get the same image you started with. But being the sampling filter a map from an infinite-dimensional space to a finite-dimensional one, it has a lot of right inverses, and you can choose.

Depending on these choices you get different aliasing and interpolation results. My point is that there is no hardcoded predefined choice: depending on your models you will have different outcomes, and you have to choose wisely depending on what you want. This is probably not far from what that essay says, but then I don't see why its conclusion is that a pixel is a sizeless point: it can be whatever you want.