Tuesday, April 21, 2015

Calculation of Average Luminance of Texture

When we implement the pipeline of Tone Mapping, we always need to calculate the average luminance of a rendering texture in which float format mostly. Generally, there are two algorithms to calculate the average luminance: geometric mean and arithmetic mean.
From wiki, suppose we have a data set containing the values.The geometric mean G is given by:

The arithmetic mean A is defined by the formula

We assume thedenote ith texel of texture, Apparently, the arithmetic mean could be implemented easily, how about the geometric mean? Yes, you know it, we employ the so-called log average to get the geometric mean value.
The process is like that:

Hence, we could calculate the geometric mean in the same way of calculation of arithmetic mean except to append a log and exp operations.

First we explore the calculation arithmetic mean of texture, the difference between geometric mean and arithmetic mean is just to add a log and exp operation in the shader.
1.  Based on DX11
There are three methods to calculate the average luminance:
(1)   D3D API: with the function of “GenerateMips”, you should first translate the texel(RGB) to luminance and call the function “GenerateMips”, then we can get the average luminance from the mip 1x1 level. It works based on the algorithm of generating mipmap with box filter, as long as now, it’ ok for DX.
(2)   Computer shader: with the new shader stage of DX11, computer shader, we can write a computer shader to do GPGPU operation, for calculating the average luminance, it easy to implement just a bit of more works compared to use function “GenerateMips”.
(3)   Rendering multi-pass: actually it’s primary adapt to DX9, we explore it in DX9.
2.  Based on DX9
As far as I know, we only have one approach to do it, rendering multi-pass
(1)   Rendering multi-pass: with DX9, we don’t have the “GenerateMips” or computer shader, so we need to calculate the average luminance by repeated rendering pass. We benefit from GPU’s property of linear sampling to do down sample pass by pass. Is a simple pixel shader is enough? Yes, but you should be cautious of sampling. Suppose we have a original rendering texture with size of 1024x1024, usually the queue of down-sample rendering pass would be 1024x1024->256x256->64x64->16x16->4x4->1x1, a quarter was choose because we can down to 1x1 quickly, the only extra works is that you should do pcf youself in the shader. For instance, 1024x1024->256x256, a texel of 256x256 just the average of 16 texels which illustrated as follow:

Figure 1: down sampling
Where the GPU’s linear sampling is like

Figure 2: linear sampling
Hence, in the shader we do some things as follow
     float2 offset[4] =
        0.0f, 0.0f,
        1.0f, 0.0f,
        0.0f, 1.0f,
        1.0f, 1.0f

for(int i=0; i<4; ++i)
oColor += tex2D(baseTex, texCoord.xy + (offset[i] + 0.5f) * invTexSize.xy);
      oColor *= 0.25f;
Note: we need half pixel offset for DX9 render texture, but on DX11 not.

Monday, March 9, 2015

Deep in Perspective Projection Transformation

A typical GPU rendering pipeline is just the process of creating 2D surface from 3D models and show it on the computer monitor. The perspective projection transformation is actually quite fundamental to that process (3D-->2D). Usually we use a 4x4 matrix to complete the transformation. The mainstream 3D API (OpenGL/D3D) has functions to produce the matrix, strangely enough however, very little information about it can be found in function spec or formal books. Definitely it seems like mysterious for some freshman who trying to step into the real time rendering field if they wondering Why the projection matrix like that? How is the derivation of projection matrix? Now this blog will try to answer these questions and give you a comprehensive guide to perspective projection transformation.


Figure 1: Project M on the image plane (at m).
2 . Essential Background Knowledge
Basically, the principle of perspective projection is rather simple, In Fig.1, you can figure out the point m from the property of similar triangle, it’s easy. The story, however, does not stop here.
Our real purpose is to encode this projection processing into a matrix, so that projecting a point onto the image plane can be obtained via basic matrix multiplication. Before investigating the projection transformation, we first review the homogenous briefly.


Figure 2: Point and Vector in coordinate system
In geometric algebra, subtracting two points (P-O)yields a vector as depicted in Figure 2. The vector is called position vector when the point O is origin. Algebraically, we can’t distinguish between a point or a vector if only given the triplet with three components(2 3 5). With the homogeneous coordinates, point and vector could be represented respectively and clearly.
Conversion rules the ordinary coordinate (Ordinary Coordinate) and homogeneous coordinates (Homogeneous Coordinate):
(1).  From ordinary coordinates converted into homogeneous coordinates
       If (x, y, z) is a point, it becomes (x, y, z, 1), so the point P is (2, 3, 5, 1);
       If (x, y, z) is a vector, then becomes (x, y, z, 0), so the vector PO is (2, 3, 5, 0);
  If (x, y, z, w), it’s same to the (x/w, y/w, z/w, w/w), where w isn’t 0;
On the other hand, it’s easy to combine the translation and other affine transforms(rotation, reflection, shear, scale,…) into unique matrix multiplication with homogenous coordinates.
“Homogeneous coordinates is an important means of computer graphics, both can be used to clearly distinguish between vectors and points, but also easier for affine (linear) geometric transformation. “- FS Hill, JR.
Definitely, projection transformation is convenient by matrix multiplication with homogeneous coordinates.
[x y z w] = [x y z w] * Mp     where Mp is the projection matrix
Second, let’s review the general transformation pipeline of GPU and interpolation of GPU. There are some notations:
OCS: Object Coordinate System
WCS: World Coordinate System
VCS: View Coordinate System
CCS: Clip Coordinate System
NDCS: Normalize Device Coordinate System
CVV: Canonical View Volume


Figure 3: Transformation

Figure 4: Interpolation
In order to eliminate redundant workload GPU have to clip objects against six planes of the view frustum, but clipping against arbitrary 3D planes requires considerable computation. For fast clipping GPU transform the viewing volume into a CVV(Canonical View Volume) against which clipping is easy to employ. The coordinate system of CVV is referred to as the normalized device coordinate system or NDCS. The CVV is a cuboid in D3D where its x- and y-coordinates are within range [-1:1] but z-coordinate is within range [0:1]. Points whose projected coordinates are not contained in this range are invisible and thereby are not drawn. Actually the projection transformation of GPU is composite of two steps:
(1). Transform the vertices into CVV from view frustum
(2). Perspective divide
If you have a vertex shader, you can carry out the transformation by what you like, but remember that you must not implement the perspective divide(usually divided by w), the perspective divide is carried out by GPU eventually. Consequently the vertex(x,y,z,w) that produced from vertex shader is in the CCS(Clip Coordinate System) and when its x- and y-coordinates are within range [-w:w] but z-coordinate is within range [0:w].

Figure 5: CVV
Another operation of GPU to mention is interpolation, as we know that pixel shader is invoked on pixel coupled with the attributes output from vertex shader, and GPU interpolate attributes of each pixel based vertex’s attributes. Generally it’s referred to rasterizer. rasterizer perform the linear interpolation of attributes for pixels which contribute the final image. As depicted in Figure 4, the attributes include position, texture coordinate, color, tangent and so on. All of these attributes would be consumed by pixel shader to do data processing such as arithmetic calculation, texture sampling. Rasterizer is employed by GPU which succeeds geometric transformations, consequently, you could imagine that it performs interpolation on the image plane, as depicted in Fig.6, after projection transformation of the spatial line ABwe will get the line AsBs on the image plane, which is the red line in the picture. We have already known the screen coordinates of As, Bs, and all attributes of them. What we want is to get the attributes of any position(pixel) on the line according to the screen line equation and vertices attributes. Obviously, interpolation should be done on the spatial line AB, but not on the screen line AsBs, because of projection transformation is not linear. This is the essential reason that GPU use perspective divide and perspective correction. From the triangle similar in the Fig.6, we can get that X/Z = Xs/d, where d is constant. That is to say, Xs is linear to X/Z, in other words, (Xs * Z) is linear to X.  X is linear to attributes, so Xs is linear to attributes/Z. Consequently, if vertices attributes are divided by the real depth value(view space Z), we can interpolate new attributes value with screen coordinates Xs, Then we use them to interpolate for attributes of any position(pixel), finally to multiply real depth value(view space Z) which called perspective correction, correct result can be obtained.

Figure 6: Projection Interpolation
Let’s conclude the derivation:
Condition (1): X/Z = Xs/d, d is constant àXs is linear to X/Z
Condition (2): Attributes are linear to X (basic rule)
Condition (1) + (2) à Attributes/Z is linear to Xs.
3. Perspective Projection Matrix Derivation
Assume we put the image plane on the near plane, and N denote the distance of near plane, F denote the far plane, we have the point Ps(NX/Z, NY/Z, N) after projection transformation. The z-coordinate N is useless after projection transformation, we can restore some useful value in the projection z-coordinate. Remember the property of homogeneous coordinates, we could substitute 1/Z for z-coordinate N, the position Ps will like that(NX/Z, NY/Z, 1/Z), As you know, it shouldn’t be so simple. Actually, the attributes linear interpolation alongside 1/Z, we set the z-coordinate as (aZ + b)/Z. The reason to choose this expression is demonstrated as follow:
      (1). Generally, GPU employ z buffer to compare the relative location of objects, the (aZ + b)/Z can be restored in the z buffer to implement the comparison. For z-buffer purpose, the final projective correction need not even be computed, because all we need is to reverse the comparison operation during z-value comparison.
      (2). Easy to represent by homogeneous coordinates and matrix multiplication.
(3). The CVV’s z-coordinate is within rang [0:1], so we could find some adaptive values(a, b) when
Now our projection point Ps is (NX/Z, NY/Z, (aZ + b)/Z), and correspond homogeneous coordinate is (NX/Z, NY/Z, (aZ + b)/Z, 1.0), its simplified to (NX, NY, aZ + b, Z). The perspective divide is just to translate the (NX, NY, aZ + b, Z) to (NX/Z, NY/Z, (aZ + b)/Z, 1.0) with dividing by Z.
(1)Actually, GPU interpolate the (aZ + b)/Z directly with screen coordinate Xs, and restore it into z buffer.
(2)Perspective divide and perspective correction is different in this blog, first represents the divided by Wc(view space Z), second represents multiplied by 1/WS, where Ws = 1/Wc
According to 

Putting it all together, the first perspective projection transformation lists below:

Similar to z-coordinate derivation, we have the x- and y-coordinates calculated by linearly interpolation:

Which could be rearranged into

Figure 7 illustrates the term top, bottom, left and right and the transformation of CVV.
Figure 7: CVV transformation
 Two conditions are to take into account:
(1)The center of image plane is just the center of x-y plane
(2)Offset of center
For condition (1), we have 

So the perspective transformation matrix could be derived as follow:  

Moreover, according to the Pythagoras theorem, these parameters (top, bottom, left, right) could be replaced with FOV (Field of View), the relation depicted in Fig.8.

Figure 8: FOV
For condition (2), we have

Consequently, the perspective transformation matrix is

The D3D API generates the matrices by some functions(D3DXMatrixPerspectiveLH, D3DXMatrixPerspectiveRH, D3DXMatrixPerspectiveFovLH, D3DXMatrixPerspectiveFovRH, D3DXMatrixPerspectiveOffCenterLH,  D3DXMatrixPerspectiveOffCenterRH).

Some differences for OpenGL:
(1). The z-coordinate is within range [-1:1]
(2). The handedness is left
So the perspective projection transformation is


[1] D3D SDK Doc
[2] Some pictures come from internet
[3] Peter Shirley, “Fundamentals of Computer Graphics”
[4] Steve Baker “Learning to Love your Z-buffer”,
[5] OpenGPU Forum