diff --git a/cmuratori/hero/code/code556.hmml b/cmuratori/hero/code/code556.hmml new file mode 100644 index 0000000..84348b1 --- /dev/null +++ b/cmuratori/hero/code/code556.hmml @@ -0,0 +1,157 @@ +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing Depth Peeling and Multisample Resolves" vod_platform=youtube id=M6qE6ncZV68 annotator=Miblo] +[0:02][Plug Molly Rocket's Discord channel[ref + site=Discord + page=MollyRocket + url=https://discord.gg/mollyrocket]][:speech] +[1:44][Recap and set the stage for the day with praise for RenderDoc][:speech] +[3:15][Note that lots of our time is spent doing the multisample resolve][:performance :rendering :speech] +[7:32][Configure our project in RenderDoc][:admin] +[8:42][Capture a frame in RenderDoc and consult the event timings to see that the multisample buffer took 10 times longer to resolve than to draw][:performance :rendering :run] +[10:54][The possible bandwidth cost of resolving our multisample buffer][:performance :rendering :run] +[14:16][Our best case solution: Resolve the multisample buffer independently of the depth peel][:performance :rendering :run] +[16:29][Demo ~Milton's new grid feature, but its smoothing bug][:blackboard] +[17:32][:Rendering requirements of Sprites vs Geometry][:blackboard] +[20:16][Demo some undesirable alpha blending when traversing stairs][:rendering :run] +[22:15][Consider two multisample resolves to may be necessary][:performance :rendering :run] +[24:00][Plan to render the sprites with depth peel in a separate pass, then composite in the geometry with two multisample resolves][:blackboard :performance :rendering] +[30:17][Segregating our sprites and geometry into separate buffers][:blackboard :performance :memory] +[33:28][Hesitate to impose this separation requirement on the renderer][:blackboard :library] +[35:31][Producing a separate multisample buffer of the edge information][:blackboard :performance :rendering] +[36:52][The edge information in question][:performance :rendering :run] +[38:53][Using conservative rasterization to enable recovery of our edge blend from a single high- / low-coverage multisample resolve][:performance :rendering :run] +[43:48][Using conservative rasterization just to tell us how much a pixel is covered by a primitive][:performance :rendering :run] +[45:28][:Research conservative rasterization[ref + site="NVIDIA GameWorks Documentation" + page="Conservative Rasterization Sample" + url=https://docs.nvidia.com/gameworks/content/gameworkslibrary/graphicssamples/opengl_samples/conservativerasterizationsample.htm][ref + author="Jon Story" + title="Don't be conservative with Conservative Rasterization" + publisher="NVIDIA GameWorks Blog" + url=https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization][ref + site="NVIDIA Developer" + page="NV_conservative_raster" + url=https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_conservative_raster.txt]][:rendering] +[49:14][Hunt for a coverage-to-alpha function, in NV_shading_rate_image,[ref + site="Khronos" + page="NV_shading_rate_image" + url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shading_rate_image.txt] ARB_sample_locations[ref + site="Khronos" + page="ARB_sample_locations" + url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_locations.txt] and ARB_sample_shading[ref + site="Khronos" + page="ARB_sample_shading" + url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_shading.txt]][:rendering :research] +[58:13][Find that conservative rasterization is not widely available[ref + site="OpenGL Hardware Database" + url=https://opengl.gpuinfo.org/]][:rendering :research] +[59:10][Consider running the multisample routine without a multisample buffer, only recording which samples were covered[ref + site="Khronos Wiki" + page="Multisampling" + url=https://www.khronos.org/opengl/wiki/Multisampling][ref + site="Khronos" + page="NV_multisample_coverage" + url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_coverage.txt][ref + site="Khronos" + page="NV_multisample_filter_hint" + url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_filter_hint.txt]][:rendering :research] +[1:05:40][Consider attaching a multisample and non-multisample render target at the same time][:rendering :research] +[1:07:05][Consult glext.h for coverage-related functions[ref + site="Khronos" + page="glext.h" + url=https://www.khronos.org/registry/OpenGL/api/GL/glext.h]][:rendering :research] +[1:11:25][:Research NV_fragment_coverage_to_color[ref + site="Khronos" + page="NV_fragment_coverage_to_color" + url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_fragment_coverage_to_color.txt][ref + site="OpenGL Hardware Database" + url=https://opengl.gpuinfo.org/]][:rendering] +[1:14:11][:Research ARB_post_depth_coverage[ref + site="Khronos" + page="ARB_post_depth_coverage" + url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_post_depth_coverage.txt] and gl_SampleMaskIn[ref + site="Khronos" + page="gl_SampleMaskIn" + url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl_SampleMaskIn.xhtml]][:rendering] +[1:17:53][NVIDIA request: When enabling conservative rasterization, let us set the SampleMask to the number of samples we want][:rendering :speech] +[1:19:08][Lament the tremendous amount of bandwidth required to smooth out our edges][:memory :rendering :speech] +[1:20:36][Reflect on the need for the multisample buffer to tell if primitives are coplanar][:rendering :run] +[1:21:40][Consider augmenting our Colour Pass shader to skip pixels whose prior pass produced a fully opaque pixel][:rendering :run] +[1:28:10][Make CompileResolveMultisample() set the gl_FragDepth of opaque pixels to 1.0f, enabling CompileZBiasProgram() to discard obscured pixels][:rendering] +[1:36:24][Capture a frame to see that depth sorting is still working, and our Colour Passes are now more efficient][:rendering :run] +[1:38:28][Switch to the non-multisampling fast path in CompileResolveMultisample()][:rendering] +[1:39:50][Capture a frame to see that our glDrawArrays() calls have sped up][:rendering :run] +[1:41:01][Options for creating our fast path in CompileResolveMultisample(): 1. Read from the previous depth peel and do not resolve opaque pixels; 2. Resolve into a separate "blend" buffer][:rendering :speech] +[1:43:30][Make CompileResolveMultisample() only blend non-opaque pixels][:rendering] +[1:44:42][Reacquaint ourselves with the final CompilePeelComposite() with a view to instead accumulating the colour as we go][:rendering :research] +[1:46:39][Introduce a MaskSampler in CompileResolveMultisample() to contain the opacity][:rendering] +[1:48:49][Crash the game under RenderDoc][:run] +[1:49:11][Hit a shader error "unable to find overloaded function texelFetch()"][:rendering :run] +[1:49:45][Fix CompileResolveMultisample() to fetch the Mask's texel from the 0th texture][:rendering] +[1:50:24][See that the multisampling is a little busted][:rendering :run] +[1:51:01][Just let CompileResolveMultisample() always blend, including some shader parser mayhem][:rendering] +[1:53:16][See that the multisampled artefacts are gone][:rendering :run] +[1:53:42][Make CompileResolveMultisample() initialise the samplers in the order in which they are passed to OpenGLLinkSamplers()][:rendering] +[1:56:56][Make OpenGLEndFrame() set the Mask for CompileResolveMultisample() to acquire pixel opacity, introducing a SinglePixelAllZeroesTexture][:rendering] +[2:07:05][See a few multisampling artefacts in there][:rendering :run] +[2:08:08][Quickly scrutinise our new Mask code][:rendering :research] +[2:09:18][Capture a frame to see that our second and fourth Colour Passes are not as expected][:rendering :run] +[2:10:01][Check the Mask test in CompileResolveMultisample()][:rendering :research] +[2:11:00][Take a close look at our first glDrawArrays() call, to see that our MaskSampler is 0×0 pixels][:rendering :run] +[2:12:57][Fix OpenGLInit() to correctly bind our SinglePixelAllZeroesTexture][:rendering] +[2:13:19][Capture a frame to see that our third Colour Pass drew much more than expected][:rendering :run] +[2:15:29][Make CompileResolveMultisample() set gl_FragDepth of opaque pixels][:rendering] +[2:16:46][Crash in RenderDoc][:run] +[2:17:11][Make CompileResolveMultisample() set BlendUnitColor of opaque pixels][:rendering] +[2:17:36][Battle with the shader parser][:rendering :programming :run] +[2:21:16][Prevent CompileResolveMultisample() from setting the BlendUnitColor of opaque pixels][:rendering] +[2:21:39][Capture a frame to see that we are still busted][:rendering :run] +[2:24:31][Make CompileResolveMultisample() always blend][:rendering] +[2:24:40][Find that the shader parser is unhappy, and wonder if it's a problem with ~4coder's virtual whitespace][:rendering :run] +[2:25:14][Revert CompileResolveMultisample() to a supposedly working state][:rendering] +[2:25:48][Find that the shader parser continues to be unhappy][:rendering :run] +[2:26:48][Change all the comments in CompileResolveMultisample() to be C-style ones][:language] +[2:27:23][Find that that doesn't solve the problem][:rendering :run] +[2:28:00][See how ~RemedyBG sees the code in CompileResolveMultisample()][:language :run] +[2:30:06][Trim out some possibly problematic code from CompileResolveMultisample()][:language] +[2:30:18][Find that the shader parser is happy now][:language :run] +[2:30:35][Make CompileResolveMultisample() set the Mask to 0.0][:rendering] +[2:31:08][Find that the shader parser remains happy, but we still see our multisampling artefact][:rendering :run] +[2:31:23][Make CompileResolveMultisample() always blend][:rendering] +[2:31:33][See our multisampling artefact][:rendering :run] +[2:32:09][Leave in our fast path in CompileResolveMultisample()][:rendering] +[2:32:33][See new artefacts][:rendering :run] +[2:32:52][Make CompileResolveMultisample() always blend][:rendering] +[2:33:05][See that everything looks nice and smooth][:rendering :run] +[2:33:53][Reinsert our fast path in CompileResolveMultisample()][:rendering] +[2:34:06][See new MIP map anisotropic filtering artefacts][:rendering :run] +[2:34:44][Capture a frame to see that our third and fourth Colour Passes are not as free as expected][:rendering :run] +[2:35:11][Fix CompileResolveMultisample() to black out the BlendUnitColor of opaque pixels][:rendering] +[2:36:16][Capture a frame to see that our Colour Passes are way more efficient][:rendering :run] +[2:38:31][Q&A][:speech] +[2:38:38][@dithinas][Q: Are you exceeding the size of your buffer for the shader code text? The EOF error seems kind of suspicious] +[2:38:50][Make CompileResolveMultisample() record the shader string BufferSize][:language] +[2:39:42][See that the shader string is 3877 characters][:language :run] +[2:39:59][Increase the FragmentCode buffer size from 4096 to 16000][:language] +[2:41:04][Find that the shader parser is happy, with a few words on tooling][:language :run] +[2:43:33][@sagian2005][Q: Couldn't you just not define the dimension and let the compiler figure out the size?][:language] +[2:44:10][@letambourinroyal][Q: Why can’t you just run the preprocessor on the shaders, though?][:language] +[2:46:53][@dithinas][Q: Do you have any concept of like a string builder that you could use with your format string function to have it pass you back a string allocated from temp memory? It's a small thing, but you won't need to think / worry about that anymore][:"string manipulation"] +[2:48:06][@enyo_enev][Q: I think that Dual Depth Peeling.[ref + site="NVIDIA Developer" + page="Dual Depth Peeling" + url=http://developer.download.nvidia.com/SDK/10/opengl/screenshots/samples/dual_depth_peeling.html][ref + author="Louis Bavoil, Kevin Myers" + title="Order Independent Transparency with Dual Depth Peeling" + publisher="NVIDIA Corporation" + url=http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf] This paper may be able to help you out, because I looked at their OpenGL code and it seems possible to resolve only once when you render to the screen. You can download the code and see. I am not sure. It is pure OpenGL. They have not enabled multisampling, however I think it is possible to work][:rendering] +[2:53:20][@samyuutsu][Q: Have you considered using RenderDoc directly to recompile the shaders at runtime during rapid development?] +[2:54:40][@eddiesutrecht][Q: Hi [@cmuratori Casey], some episodes ago you said you didn't understand why [@naysayer88 Jon] credited you on Braid, since you didn't do any work on it. But you wrote the collision detector, right? Maybe I misunderstood and it was about the rewind only] +[2:59:08][@vaualbus][Q: How you would implement in debug systems the way of changing also v3 / v4, values?][:"debug system"] +[2:59:56][@zrizi][Q: I just watched the video about sub-pixel :sampling for pixel art assets. Thank you for explaining it, was really good. I have two related questions, though: 1) When you viewed the sprite sheet you noticed that it’s not alpha pre-multiplied and kind of mentioned that it might be a problem. I was wondering why. 2) You mentioned that they don’t use MIP maps and that got me thinking… How should we produce MIPs for pixel art assets? Box-filtering would not work since that’s made for bilinear] +[3:02:44][@zrizi][Q: 1) You're right. But I think Unity's shaders were set up for point :sampling] +[3:03:47][@zrizi][Q: And thank you very much for the skin mesh video. I still have to watch it] +[3:04:00][Time for bed, with a plug of Molly Rocket's Discord channel[ref + site=Discord + page=MollyRocket + url=https://discord.gg/mollyrocket]][:speech] +[/video]