cinera_handmade.network/cmuratori/hero/code/code556.hmml

158 lines
13 KiB
Plaintext
Raw Permalink Normal View History

[video output=day556 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing Depth Peeling and Multisample Resolves" vod_platform=youtube id=M6qE6ncZV68 annotator=Miblo]
2019-09-29 23:43:30 +00:00
[0:02][Plug Molly Rocket's Discord channel[ref
site=Discord
page=MollyRocket
url=https://discord.gg/mollyrocket]][:speech]
[1:44][Recap and set the stage for the day with praise for RenderDoc][:speech]
[3:15][Note that lots of our time is spent doing the multisample resolve][:performance :rendering :speech]
[7:32][Configure our project in RenderDoc][:admin]
[8:42][Capture a frame in RenderDoc and consult the event timings to see that the multisample buffer took 10 times longer to resolve than to draw][:performance :rendering :run]
[10:54][The possible bandwidth cost of resolving our multisample buffer][:performance :rendering :run]
[14:16][Our best case solution: Resolve the multisample buffer independently of the depth peel][:performance :rendering :run]
[16:29][Demo ~Milton's new grid feature, but its smoothing bug][:blackboard]
[17:32][:Rendering requirements of Sprites vs Geometry][:blackboard]
[20:16][Demo some undesirable alpha blending when traversing stairs][:rendering :run]
[22:15][Consider two multisample resolves to may be necessary][:performance :rendering :run]
[24:00][Plan to render the sprites with depth peel in a separate pass, then composite in the geometry with two multisample resolves][:blackboard :performance :rendering]
[30:17][Segregating our sprites and geometry into separate buffers][:blackboard :performance :memory]
[33:28][Hesitate to impose this separation requirement on the renderer][:blackboard :library]
[35:31][Producing a separate multisample buffer of the edge information][:blackboard :performance :rendering]
[36:52][The edge information in question][:performance :rendering :run]
[38:53][Using conservative rasterization to enable recovery of our edge blend from a single high- / low-coverage multisample resolve][:performance :rendering :run]
[43:48][Using conservative rasterization just to tell us how much a pixel is covered by a primitive][:performance :rendering :run]
[45:28][:Research conservative rasterization[ref
site="NVIDIA GameWorks Documentation"
page="Conservative Rasterization Sample"
url=https://docs.nvidia.com/gameworks/content/gameworkslibrary/graphicssamples/opengl_samples/conservativerasterizationsample.htm][ref
author="Jon Story"
title="Don't be conservative with Conservative Rasterization"
publisher="NVIDIA GameWorks Blog"
url=https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization][ref
site="NVIDIA Developer"
page="NV_conservative_raster"
url=https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_conservative_raster.txt]][:rendering]
[49:14][Hunt for a coverage-to-alpha function, in NV_shading_rate_image,[ref
site="Khronos"
page="NV_shading_rate_image"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shading_rate_image.txt] ARB_sample_locations[ref
site="Khronos"
page="ARB_sample_locations"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_locations.txt] and ARB_sample_shading[ref
site="Khronos"
page="ARB_sample_shading"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_shading.txt]][:rendering :research]
[58:13][Find that conservative rasterization is not widely available[ref
site="OpenGL Hardware Database"
url=https://opengl.gpuinfo.org/]][:rendering :research]
[59:10][Consider running the multisample routine without a multisample buffer, only recording which samples were covered[ref
site="Khronos Wiki"
page="Multisampling"
url=https://www.khronos.org/opengl/wiki/Multisampling][ref
site="Khronos"
page="NV_multisample_coverage"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_coverage.txt][ref
site="Khronos"
page="NV_multisample_filter_hint"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_filter_hint.txt]][:rendering :research]
[1:05:40][Consider attaching a multisample and non-multisample render target at the same time][:rendering :research]
[1:07:05][Consult glext.h for coverage-related functions[ref
site="Khronos"
page="glext.h"
url=https://www.khronos.org/registry/OpenGL/api/GL/glext.h]][:rendering :research]
[1:11:25][:Research NV_fragment_coverage_to_color[ref
site="Khronos"
page="NV_fragment_coverage_to_color"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_fragment_coverage_to_color.txt][ref
site="OpenGL Hardware Database"
url=https://opengl.gpuinfo.org/]][:rendering]
[1:14:11][:Research ARB_post_depth_coverage[ref
site="Khronos"
page="ARB_post_depth_coverage"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_post_depth_coverage.txt] and gl_SampleMaskIn[ref
site="Khronos"
page="gl_SampleMaskIn"
url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl_SampleMaskIn.xhtml]][:rendering]
[1:17:53][NVIDIA request: When enabling conservative rasterization, let us set the SampleMask to the number of samples we want][:rendering :speech]
[1:19:08][Lament the tremendous amount of bandwidth required to smooth out our edges][:memory :rendering :speech]
[1:20:36][Reflect on the need for the multisample buffer to tell if primitives are coplanar][:rendering :run]
[1:21:40][Consider augmenting our Colour Pass shader to skip pixels whose prior pass produced a fully opaque pixel][:rendering :run]
[1:28:10][Make CompileResolveMultisample() set the gl_FragDepth of opaque pixels to 1.0f, enabling CompileZBiasProgram() to discard obscured pixels][:rendering]
[1:36:24][Capture a frame to see that depth sorting is still working, and our Colour Passes are now more efficient][:rendering :run]
[1:38:28][Switch to the non-multisampling fast path in CompileResolveMultisample()][:rendering]
[1:39:50][Capture a frame to see that our glDrawArrays() calls have sped up][:rendering :run]
[1:41:01][Options for creating our fast path in CompileResolveMultisample(): 1. Read from the previous depth peel and do not resolve opaque pixels; 2. Resolve into a separate "blend" buffer][:rendering :speech]
[1:43:30][Make CompileResolveMultisample() only blend non-opaque pixels][:rendering]
[1:44:42][Reacquaint ourselves with the final CompilePeelComposite() with a view to instead accumulating the colour as we go][:rendering :research]
[1:46:39][Introduce a MaskSampler in CompileResolveMultisample() to contain the opacity][:rendering]
[1:48:49][Crash the game under RenderDoc][:run]
[1:49:11][Hit a shader error "unable to find overloaded function texelFetch()"][:rendering :run]
[1:49:45][Fix CompileResolveMultisample() to fetch the Mask's texel from the 0th texture][:rendering]
[1:50:24][See that the multisampling is a little busted][:rendering :run]
[1:51:01][Just let CompileResolveMultisample() always blend, including some shader parser mayhem][:rendering]
[1:53:16][See that the multisampled artefacts are gone][:rendering :run]
[1:53:42][Make CompileResolveMultisample() initialise the samplers in the order in which they are passed to OpenGLLinkSamplers()][:rendering]
[1:56:56][Make OpenGLEndFrame() set the Mask for CompileResolveMultisample() to acquire pixel opacity, introducing a SinglePixelAllZeroesTexture][:rendering]
[2:07:05][See a few multisampling artefacts in there][:rendering :run]
[2:08:08][Quickly scrutinise our new Mask code][:rendering :research]
[2:09:18][Capture a frame to see that our second and fourth Colour Passes are not as expected][:rendering :run]
[2:10:01][Check the Mask test in CompileResolveMultisample()][:rendering :research]
[2:11:00][Take a close look at our first glDrawArrays() call, to see that our MaskSampler is 0×0 pixels][:rendering :run]
[2:12:57][Fix OpenGLInit() to correctly bind our SinglePixelAllZeroesTexture][:rendering]
[2:13:19][Capture a frame to see that our third Colour Pass drew much more than expected][:rendering :run]
[2:15:29][Make CompileResolveMultisample() set gl_FragDepth of opaque pixels][:rendering]
[2:16:46][Crash in RenderDoc][:run]
[2:17:11][Make CompileResolveMultisample() set BlendUnitColor of opaque pixels][:rendering]
[2:17:36][Battle with the shader parser][:rendering :programming :run]
[2:21:16][Prevent CompileResolveMultisample() from setting the BlendUnitColor of opaque pixels][:rendering]
[2:21:39][Capture a frame to see that we are still busted][:rendering :run]
[2:24:31][Make CompileResolveMultisample() always blend][:rendering]
[2:24:40][Find that the shader parser is unhappy, and wonder if it's a problem with ~4coder's virtual whitespace][:rendering :run]
[2:25:14][Revert CompileResolveMultisample() to a supposedly working state][:rendering]
[2:25:48][Find that the shader parser continues to be unhappy][:rendering :run]
[2:26:48][Change all the comments in CompileResolveMultisample() to be C-style ones][:language]
[2:27:23][Find that that doesn't solve the problem][:rendering :run]
[2:28:00][See how ~RemedyBG sees the code in CompileResolveMultisample()][:language :run]
[2:30:06][Trim out some possibly problematic code from CompileResolveMultisample()][:language]
[2:30:18][Find that the shader parser is happy now][:language :run]
[2:30:35][Make CompileResolveMultisample() set the Mask to 0.0][:rendering]
[2:31:08][Find that the shader parser remains happy, but we still see our multisampling artefact][:rendering :run]
[2:31:23][Make CompileResolveMultisample() always blend][:rendering]
[2:31:33][See our multisampling artefact][:rendering :run]
[2:32:09][Leave in our fast path in CompileResolveMultisample()][:rendering]
[2:32:33][See new artefacts][:rendering :run]
[2:32:52][Make CompileResolveMultisample() always blend][:rendering]
[2:33:05][See that everything looks nice and smooth][:rendering :run]
[2:33:53][Reinsert our fast path in CompileResolveMultisample()][:rendering]
[2:34:06][See new MIP map anisotropic filtering artefacts][:rendering :run]
[2:34:44][Capture a frame to see that our third and fourth Colour Passes are not as free as expected][:rendering :run]
[2:35:11][Fix CompileResolveMultisample() to black out the BlendUnitColor of opaque pixels][:rendering]
[2:36:16][Capture a frame to see that our Colour Passes are way more efficient][:rendering :run]
[2:38:31][Q&A][:speech]
[2:38:38][@dithinas][Q: Are you exceeding the size of your buffer for the shader code text? The EOF error seems kind of suspicious]
[2:38:50][Make CompileResolveMultisample() record the shader string BufferSize][:language]
[2:39:42][See that the shader string is 3877 characters][:language :run]
[2:39:59][Increase the FragmentCode buffer size from 4096 to 16000][:language]
[2:41:04][Find that the shader parser is happy, with a few words on tooling][:language :run]
[2:43:33][@sagian2005][Q: Couldn't you just not define the dimension and let the compiler figure out the size?][:language]
[2:44:10][@letambourinroyal][Q: Why cant you just run the preprocessor on the shaders, though?][:language]
[2:46:53][@dithinas][Q: Do you have any concept of like a string builder that you could use with your format string function to have it pass you back a string allocated from temp memory? It's a small thing, but you won't need to think / worry about that anymore][:"string manipulation"]
[2:48:06][@enyo_enev][Q: I think that Dual Depth Peeling.[ref
site="NVIDIA Developer"
page="Dual Depth Peeling"
url=http://developer.download.nvidia.com/SDK/10/opengl/screenshots/samples/dual_depth_peeling.html][ref
author="Louis Bavoil, Kevin Myers"
title="Order Independent Transparency with Dual Depth Peeling"
publisher="NVIDIA Corporation"
url=http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf] This paper may be able to help you out, because I looked at their OpenGL code and it seems possible to resolve only once when you render to the screen. You can download the code and see. I am not sure. It is pure OpenGL. They have not enabled multisampling, however I think it is possible to work][:rendering]
[2:53:20][@samyuutsu][Q: Have you considered using RenderDoc directly to recompile the shaders at runtime during rapid development?]
[2:54:40][@eddiesutrecht][Q: Hi [@cmuratori Casey], some episodes ago you said you didn't understand why [@naysayer88 Jon] credited you on Braid, since you didn't do any work on it. But you wrote the collision detector, right? Maybe I misunderstood and it was about the rewind only]
[2:59:08][@vaualbus][Q: How you would implement in debug systems the way of changing also v3 / v4, values?][:"debug system"]
[2:59:56][@zrizi][Q: I just watched the video about sub-pixel :sampling for pixel art assets. Thank you for explaining it, was really good. I have two related questions, though: 1) When you viewed the sprite sheet you noticed that its not alpha pre-multiplied and kind of mentioned that it might be a problem. I was wondering why. 2) You mentioned that they dont use MIP maps and that got me thinking… How should we produce MIPs for pixel art assets? Box-filtering would not work since thats made for bilinear]
[3:02:44][@zrizi][Q: 1) You're right. But I think Unity's shaders were set up for point :sampling]
[3:03:47][@zrizi][Q: And thank you very much for the skin mesh video. I still have to watch it]
[3:04:00][Time for bed, with a plug of Molly Rocket's Discord channel[ref
site=Discord
page=MollyRocket
url=https://discord.gg/mollyrocket]][:speech]
[/video]