博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
关于Gbuffer中的normal存储
阅读量:2437 次
发布时间:2019-05-10

本文共 15392 字,大约阅读时间需要 51 分钟。

转自:http://aras-p.info/texts/CompactNormalStorage.html

Compact Normal Storage for Small G-Buffers

Intro

Various deferred shading/lighting approaches or image postprocessing effects need to store normals as part of their G-buffer. Let’s figure out a compact storage method for view space normals. In my case, main target is minimalist G-buffer, where depth and normals are packed into a single 32 bit (8 bits/channel) render texture. I try to minimize error and shader cycles to encode/decode.

Now of course, 8 bits/channel storage for normals can be not enough for shading, especially if you want specular (low precision & quantization leads to specular “wobble” when camera or objects move). However, everything below should Just Work (tm) for 10 or 16 bits/channel integer formats. For 16 bits/channel half-float formats, some of the computations are not necessary (e.g. bringing normal values into 0..1 range).

If you know other ways to store/encode normals, please let me know in the comments!

Various normal encoding methods and their comparison below. Notes:

  • Error images are: 1-pow(dot(n1,n2),1024) and abs(n1-n2)*30, where n1 is actual normal, andn2 is normal encoded into a texture, read back & decoded. MSE and PSNR is computed on the difference (abs(n1-n2)) image.
  • Shader code is HLSL. Compiled into ps_3_0 by d3dx9_42.dll (February 2010 SDK).
  • Radeon GPU performance numbers from AMD’s GPU ShaderAnalyzer 1.53, using Catalyst 9.12 driver.
  • GeForce GPU performance numbers from NVIDIA’s NVShaderPerf 2.0, using 174.74 driver.

Note: there was an error!

 of my article had some stupidity: encoding shaders did not normalize the incoming per-vertex normal. This resulted in quality evaluation results being somewhat wrong. Also, if normal is assumed to be normalized, then three methods in original article (Sphere Map, Cry Engine 3 and Lambert Azimuthal) are in fact completely equivalent. The old version is  for the sake of integrity of the internets.

Test Playground Application

Here is a small Windows application I used to test everything below: (4.8MB, source included).

It requires GPU with Shader Model 3.0 support. When it writes fancy shader reports, it expects AMD’s GPUShaderAnalyzer and NVIDIA’s NVShaderPerf to be installed. Source code should build with Visual C++ 2008.

Baseline: store X&Y&Z

Just to set the basis, store all three components of the normal. It’s not suitable for our quest, but I include it here to evaluate “base” encoding error (which happens here only because of quantization to 8 bits per component).

Encoding, Error to Power, Error * 30 images below. MSE: 0.000008; PSNR: 51.081 dB.

  

Encoding Decoding
half4 encode (half3 n, float3 view){    return half4(n.xyz*0.5+0.5,0);}
half3 decode (half4 enc, float3 view){    return enc.xyz*2-1;}
ps_3_0def c0, 0.5, 0, 0, 0dcl_texcoord_pp v0.xyzmad_pp oC0, v0.xyzx, c0.xxxy, c0.xxxy
ps_3_0def c0, 2, -1, 0, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp oC0.xyz, r0, c0.x, c0.ymov_pp oC0.w, c0.z
1 ALURadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 6 GPR, 8.00 clk
2 ALU, 1 TEXRadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 6 GPR, 10.00 clk

Method #1: store X&Y, reconstruct Z

Used by Killzone 2 among others ().

Encoding, Error to Power, Error * 30 images below. MSE: 0.013514; PSNR: 18.692 dB.

  

Pros:
  • Very simple to encode/decode
Cons:
  • Normal can point away from the camera. My test scene setup actually has that. See Resistance 2 Prelighting paper () for explanation.
Encoding Decoding
half4 encode (half3 n, float3 view){    return half4(n.xy*0.5+0.5,0,0);}
half3 decode (half2 enc, float3 view){    half3 n;    n.xy = enc*2-1;    n.z = sqrt(1-dot(n.xy, n.xy));    return n;}
ps_3_0def c0, 0.5, 0, 0, 0dcl_texcoord_pp v0.xymad_pp oC0, v0.xyxx, c0.xxyy, c0.xxyy
ps_3_0def c0, 2, -1, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ydp2add_pp r0.z, r0, -r0, c0.zmov_pp oC0.xy, r0rsq_pp r0.x, r0.zrcp_pp oC0.z, r0.xmov_pp oC0.w, c0.w
1 ALURadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 5 GPR, 7.00 clk
7 ALU, 1 TEXRadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 3.00 clkGeForce 8800GTX: 5 GPR, 15.00 clk

Method #3: Spherical Coordinates

It is possible to use spherical coordinates to encode the normal. Since we know it’s unit length, we can just store the two angles.

Suggested by Pat Wilson of Garage Games: . Other mentions: , , , .

Encoding, Error to Power, Error * 30 images below. MSE: 0.000062; PSNR: 42.042 dB.

  

Pros:
  • Suitable for normals in general (not necessarily view space)
Cons:
  • Uses trig instructions (quite heavy on ALU). Possible to replace some of that with texture lookups though.
Encoding Decoding
#define kPI 3.1415926536fhalf4 encode (half3 n, float3 view){    return half4(      (half2(atan2(n.y,n.x)/kPI, n.z)+1.0)*0.5,      0,0);}
half3 decode (half2 enc, float3 view){    half2 ang = enc*2-1;    half2 scth;    sincos(ang.x * kPI, scth.x, scth.y);    half2 scphi = half2(sqrt(1.0 - ang.y*ang.y), ang.y);    return half3(scth.y*scphi.x, scth.x*scphi.x, scphi.y);}
ps_3_0def c0, 0.999866009, 0, 1, 3.14159274def c1, 0.0208350997, -0.0851330012,    0.180141002, -0.330299497def c2, -2, 1.57079637, 0.318309873, 0.5dcl_texcoord_pp v0.xyzadd_pp r0.xy, -v0_abs, v0_abs.yxzwcmp_pp r0.xz, r0.x, v0_abs.xyyw, v0_abs.yyxwcmp_pp r0.y, r0.y, c0.y, c0.zrcp_pp r0.z, r0.zmul_pp r0.x, r0.x, r0.zmul_pp r0.z, r0.x, r0.xmad_pp r0.w, r0.z, c1.x, c1.ymad_pp r0.w, r0.z, r0.w, c1.zmad_pp r0.w, r0.z, r0.w, c1.wmad_pp r0.z, r0.z, r0.w, c0.xmul_pp r0.x, r0.x, r0.zmad_pp r0.z, r0.x, c2.x, c2.ymad_pp r0.x, r0.z, r0.y, r0.xcmp_pp r0.y, v0.x, -c0.y, -c0.wadd_pp r0.x, r0.x, r0.yadd_pp r0.y, r0.x, r0.xadd_pp r0.z, -v0.x, v0.ycmp_pp r0.zw, r0.z, v0.xyxy, v0.xyyxcmp_pp r0.zw, r0, c0.xyyz, c0.xyzymul_pp r0.z, r0.w, r0.zmad_pp r0.x, r0.z, -r0.y, r0.xmul_pp r0.x, r0.x, c2.zmov_pp r0.y, v0.zadd_pp r0.xy, r0, c0.zmul_pp oC0.xy, r0, c2.wmov_pp oC0.zw, c0.y
ps_3_0def c0, 2, -1, 0.5, 1def c1, 6.28318548, -3.14159274, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ymad r0.x, r0.x, c0.z, c0.zfrc r0.x, r0.xmad r0.x, r0.x, c1.x, c1.ysincos_pp r1.xy, r0.xmad_pp r0.x, r0.y, -r0.y, c0.wmul_pp oC0.zw, r0.y, c1rsq_pp r0.x, r0.xrcp_pp r0.x, r0.xmul_pp oC0.xy, r1, r0.x
26 ALURadeon HD 2400: 1 GPR, 17.00 clkRadeon HD 3870: 1 GPR, 4.25 clkRadeon HD 5870: 2 GPR, 0.95 clkGeForce 6200: 2 GPR, 12.00 clkGeForce 7800GT: 2 GPR, 9.00 clkGeForce 8800GTX: 9 GPR, 43.00 clk
17 ALU, 1 TEXRadeon HD 2400: 1 GPR, 17.00 clkRadeon HD 3870: 1 GPR, 4.25 clkRadeon HD 5870: 2 GPR, 0.95 clkGeForce 6200: 2 GPR, 7.00 clkGeForce 7800GT: 1 GPR, 5.00 clkGeForce 8800GTX: 6 GPR, 23.00 clk

Method #4: Spheremap Transform

Spherical environment mapping (indirectly) maps reflection vector to a texture coordinate in [0..1] range. The reflection vector can point away from the camera, just like our view space normals. Bingo! See  for sphere map math. Normal we want to encode is R, resulting values are (s,t).

If we assume that incoming normal is normalized, then there are methods derived from elsewhere that end up being exactly equivalent:

  • Used in Cry Engine 3, presented by Martin Mittring in “A bit more Deferred” presentation (, slide 13). For Unity, I had to negate Z component of view space normal to produce good results, I guess Unity’s and Cry Engine’s coordinate systems are different. The code would be:
    half2 encode (half3 n, float3 view){    half2 enc = normalize(n.xy) * (sqrt(-n.z*0.5+0.5));    enc = enc*0.5+0.5;    return enc;}half3 decode (half4 enc, float3 view){    half4 nn = enc*half4(2,2,0,0) + half4(-1,-1,1,-1);    half l = dot(nn.xyz,-nn.xyw);    nn.z = l;    nn.xy *= sqrt(l);    return nn.xyz * 2 + half3(0,0,-1);}
  • Lambert Azimuthal Equal-Area projection (). Suggested by Sean Barrett in for this article. The code would be:
    half2 encode (half3 n, float3 view){    half f = sqrt(8*n.z+8);    return n.xy / f + 0.5;}half3 decode (half4 enc, float3 view){    half2 fenc = enc*4-2;    half f = dot(fenc,fenc);    half g = sqrt(1-f/4);    half3 n;    n.xy = fenc*g;    n.z = 1-f/2;    return n;}

Encoding, Error to Power, Error * 30 images below. MSE: 0.000016; PSNR: 48.071 dB.

  

Pros:
  • Quality pretty good!
  • Quite cheap to encode/decode.
  • Similar derivation used by Cry Engine 3, so it must be good :)
Cons:
  • ???
Encoding Decoding
half4 encode (half3 n, float3 view){    half p = sqrt(n.z*8+8);    return half4(n.xy/p + 0.5,0,0);}
half3 decode (half2 enc, float3 view){    half2 fenc = enc*4-2;    half f = dot(fenc,fenc);    half g = sqrt(1-f/4);    half3 n;    n.xy = fenc*g;    n.z = 1-f/2;    return n;}
ps_3_0def c0, 8, 0.5, 0, 0dcl_texcoord_pp v0.xyzmad_pp r0.x, v0.z, c0.x, c0.xrsq_pp r0.x, r0.xmad_pp oC0.xy, v0, r0.x, c0.ymov_pp oC0.zw, c0.z
ps_3_0def c0, 4, -2, 0, 1def c1, 0.25, 0.5, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ydp2add_pp r0.z, r0, r0, c0.zmad_pp r0.zw, r0.z, -c1.xyxy, c1.zrsq_pp r0.z, r0.zmul_pp oC0.zw, r0.w, c0.xywzrcp_pp r0.z, r0.zmul_pp oC0.xy, r0, r0.z
4 ALURadeon HD 2400: 2 GPR, 3.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 2.00 clkGeForce 8800GTX: 5 GPR, 12.00 clk
8 ALU, 1 TEXRadeon HD 2400: 2 GPR, 3.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 6.00 clkGeForce 7800GT: 1 GPR, 3.00 clkGeForce 8800GTX: 6 GPR, 15.00 clk

Method #7: Stereographic Projection

What the title says: use Stereographic Projection (), plus rescaling so that “practically visible” range of normals maps into unit circle (regular stereographic projection maps sphere to circle of infinite size). In my tests, scaling factor of 1.7777 produced best results; in practice it depends on FOV used and how much do you care about normals that point away from the camera.

Suggested by Sean Barrett and Ignacio Castano in  for this article.

Encoding, Error to Power, Error * 30 images below. MSE: 0.000038; PSNR: 44.147 dB.

  

Pros:
  • Quality pretty good!
  • Quite cheap to encode/decode.
Cons:
  • ???
Encoding Decoding
half4 encode (half3 n, float3 view){    half scale = 1.7777;    half2 enc = n.xy / (n.z+1);    enc /= scale;    enc = enc*0.5+0.5;    return half4(enc,0,0);}
half3 decode (half4 enc, float3 view){    half scale = 1.7777;    half3 nn =        enc.xyz*half3(2*scale,2*scale,0) +        half3(-scale,-scale,1);    half g = 2.0 / dot(nn.xyz,nn.xyz);    half3 n;    n.xy = g*nn.xy;    n.z = g-1;    return n;}
ps_3_0def c0, 1, 0.281262308, 0.5, 0dcl_texcoord_pp v0.xyzadd_pp r0.x, c0.x, v0.zrcp r0.x, r0.xmul_pp r0.xy, r0.x, v0mad_pp oC0.xy, r0, c0.y, c0.zmov_pp oC0.zw, c0.w
ps_3_0def c0, 3.55539989, 0, -1.77769995, 1def c1, 2, -1, 0, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xyz, r0, c0.xxyw, c0.zzwwdp3_pp r0.z, r0, r0rcp r0.z, r0.zadd_pp r0.w, r0.z, r0.zmad_pp oC0.z, r0.z, c1.x, c1.ymul_pp oC0.xy, r0, r0.wmov_pp oC0.w, c0.y
5 ALURadeon HD 2400: 2 GPR, 4.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 2.00 clkGeForce 7800GT: 1 GPR, 2.00 clkGeForce 8800GTX: 5 GPR, 12.00 clk
7 ALU, 1 TEXRadeon HD 2400: 2 GPR, 4.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 4.00 clkGeForce 8800GTX: 6 GPR, 12.00 clk

Method #8: Per-pixel View Space

If we compute view space per-pixel, then Z component of a normal can never be negative. Then just store X&Y, and compute Z.

Suggested by Yuriy O’Donnell on .

Encoding, Error to Power, Error * 30 images below. MSE: 0.000134; PSNR: 38.730 dB.

  

Pros:
  • ???
Cons:
  • Quite heavy on ALU
Encoding Decoding
float3x3 make_view_mat (float3 view){    view = normalize(view);    float3 x,y,z;    z = -view;    x = normalize (float3(z.z, 0, -z.x));    y = cross (z,x);    return float3x3 (x,y,z);}half4 encode (half3 n, float3 view){    return half4(mul (make_view_mat(view), n).xy*0.5+0.5,0,0);}half3 decode (half4 enc, float3 view){    half3 n;    n.xy = enc*2-1;    n.z = sqrt(1+dot(n.xy,-n.xy));    n = mul(n, make_view_mat(view));    return n;}
ps_3_0def c0, 1, -1, 0, 0.5dcl_texcoord_pp v0.xyzdcl_texcoord1 v1.xyzmov r0.x, c0.znrm r1.xyz, v1mov r1.w, -r1.zmul r0.yz, r1.xxzw, c0.xxywdp2add r0.w, r1.wxzw, r0.zyzw, c0.zrsq r0.w, r0.wmul r0.xyz, r0, r0.wmul r2.xyz, -r1.zxyw, r0mad r1.xyz, -r1.yzxw, r0.yzxw, -r2dp2add r0.x, r0.zyzw, v0.xzzw, c0.zdp3 r0.y, r1, v0mad_pp oC0.xy, r0, c0.w, c0.wmov_pp oC0.zw, c0.z
ps_3_0def c0, 2, -1, 1, 0dcl_texcoord1 v0.xyzdcl_texcoord2 v1.xydcl_2d s0mov r0.y, c0.wnrm r1.xyz, v0mov r1.w, -r1.zmul r0.xz, r1.zyxw, c0.yyzwdp2add r0.w, r1.wxzw, r0.xzzw, c0.wrsq r0.w, r0.wmul r0.xyz, r0, r0.wmul r2.xyz, -r1.zxyw, r0.yzxwmad r2.xyz, -r1.yzxw, r0.zxyw, -r2texld_pp r3, v1, s0mad_pp r3.xy, r3, c0.x, c0.ymul r2.xyz, r2, r3.ymad r0.xyz, r3.x, r0, r2dp2add_pp r0.w, r3, -r3, c0.zrsq_pp r0.w, r0.wrcp_pp r0.w, r0.wmad_pp oC0.xyz, r0.w, -r1, r0mov_pp oC0.w, c0.w
17 ALURadeon HD 2400: 3 GPR, 11.00 clkRadeon HD 3870: 3 GPR, 2.75 clkRadeon HD 5870: 2 GPR, 0.80 clkGeForce 6200: 4 GPR, 12.00 clkGeForce 7800GT: 4 GPR, 8.00 clkGeForce 8800GTX: 8 GPR, 24.00 clk
21 ALU, 1 TEXRadeon HD 2400: 3 GPR, 11.00 clkRadeon HD 3870: 3 GPR, 2.75 clkRadeon HD 5870: 2 GPR, 0.80 clkGeForce 6200: 3 GPR, 12.00 clkGeForce 7800GT: 3 GPR, 9.00 clkGeForce 8800GTX: 12 GPR, 29.00 clk

Performance Comparison

GPU performance comparison in a single table:

  #1: X & Y #3: Spherical #4: Spheremap #7: Stereo #8: PPView
Encoding, GPU cycles
Radeon HD2400 1.00 17.00 3.00 4.00 11.00
Radeon HD5870 0.50 0.95 0.50 0.50 0.80
GeForce 6200 1.00 12.00 4.00 2.00 12.00
GeForce 8800 7.00 43.00 12.00 12.00 24.00
Decoding, GPU cycles
Radeon HD2400 1.00 17.00 3.00 4.00 11.00
Radeon HD5870 0.50 0.95 0.50 1.00 0.80
GeForce 6200 4.00 7.00 6.00 4.00 12.00
GeForce 8800 15.00 23.00 15.00 12.00 29.00
Encoding, D3D ALU+TEX instruction slots
SM3.0 1 26 4 5 17
Decoding, D3D ALU+TEX instruction slots
SM3.0 8 18 9 8 22

Quality Comparison

Quality comparison in a single table. PSNR based, higher numbers are better.

Method PSNR, dB
#1: X & Y 18.629
#3: Spherical 42.042
#4: Spheremap 48.071
#7: Stereographic 44.147
#8: Per pixel view 38.730

Changelog

  • 2010 03 25: Added Method #8: Per-pixel View Space. Suggested by .
  • 2010 03 24: Stop! Everything before was wrong! Old article .
  • 2009 08 12: Added Method #7: Stereographic projection. Suggested by  and .
  • 2009 08 12: Optimized Method #5, suggested by Steve Hill.
  • 2009 08 08: Added power difference images.
  • 2009 08 07: Optimized Method #4: Sphere map. Suggested by Irenee Caroulle.
  • 2009 08 07: Added Method #6: Lambert Azimuthal Equal Area. Suggested by .
  • 2009 08 05: Added Method #5: Cry Engine 3. Suggested by Steve Hill.
  • 2009 08 05: Improved quality of Method #3a: round values in texture LUT.
  • 2009 08 05: Added MSE and PSNR values for all methods.
  • 2009 08 04: Added Method #3a: Spherical Coordinates w/ texture LUT.
  • 2009 08 04: Method #1: 1-dot(n.xy,n.xy) is slightly better than 1-n.x*n.x-n.y*n.y (better pipelining on NV and ATI). Suggested by .

转载地址:http://nhwqb.baihongyu.com/

你可能感兴趣的文章
LTP(Linux Test Project)学习(二)——LTP下载编译执行
查看>>
LTP(Linux Test Project)学习(三)——LTP目录介绍
查看>>
DirtyCow CVE-2016-5195分析
查看>>
caffe编译报错解决记录
查看>>
LTP(Linux Test Project)学习(七)——LTP提交补丁
查看>>
Linux 4.0亮点特性
查看>>
LTP(Linux Test Project)学习(六)—— 问题分析:chattr命令的限制
查看>>
Linux 4.1亮点特性
查看>>
Caffe学习(二) —— 下载、编译和安装Caffe(源码安装方式)
查看>>
Linux 4.4亮点特性
查看>>
Linux 4.5 亮点特性
查看>>
Makefile开发工具学习小结
查看>>
学习linux0.11内核代码——引导启动程序bootsect.s(3)
查看>>
学习linux0.11内核代码——引导启动程序setup.s
查看>>
Linux 单用户模式patch解析
查看>>
决策树
查看>>
CGI
查看>>
时间换算
查看>>
csv文件
查看>>
xml空格WhiteSpace处理
查看>>