[video output=day556 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing Depth Peeling and Multisample Resolves" vod_platform=youtube id=M6qE6ncZV68 annotator=Miblo] [0:02][Plug Molly Rocket's Discord channel[ref site=Discord page=MollyRocket url=https://discord.gg/mollyrocket]][:speech] [1:44][Recap and set the stage for the day with praise for RenderDoc][:speech] [3:15][Note that lots of our time is spent doing the multisample resolve][:performance :rendering :speech] [7:32][Configure our project in RenderDoc][:admin] [8:42][Capture a frame in RenderDoc and consult the event timings to see that the multisample buffer took 10 times longer to resolve than to draw][:performance :rendering :run] [10:54][The possible bandwidth cost of resolving our multisample buffer][:performance :rendering :run] [14:16][Our best case solution: Resolve the multisample buffer independently of the depth peel][:performance :rendering :run] [16:29][Demo ~Milton's new grid feature, but its smoothing bug][:blackboard] [17:32][:Rendering requirements of Sprites vs Geometry][:blackboard] [20:16][Demo some undesirable alpha blending when traversing stairs][:rendering :run] [22:15][Consider two multisample resolves to may be necessary][:performance :rendering :run] [24:00][Plan to render the sprites with depth peel in a separate pass, then composite in the geometry with two multisample resolves][:blackboard :performance :rendering] [30:17][Segregating our sprites and geometry into separate buffers][:blackboard :performance :memory] [33:28][Hesitate to impose this separation requirement on the renderer][:blackboard :library] [35:31][Producing a separate multisample buffer of the edge information][:blackboard :performance :rendering] [36:52][The edge information in question][:performance :rendering :run] [38:53][Using conservative rasterization to enable recovery of our edge blend from a single high- / low-coverage multisample resolve][:performance :rendering :run] [43:48][Using conservative rasterization just to tell us how much a pixel is covered by a primitive][:performance :rendering :run] [45:28][:Research conservative rasterization[ref site="NVIDIA GameWorks Documentation" page="Conservative Rasterization Sample" url=https://docs.nvidia.com/gameworks/content/gameworkslibrary/graphicssamples/opengl_samples/conservativerasterizationsample.htm][ref author="Jon Story" title="Don't be conservative with Conservative Rasterization" publisher="NVIDIA GameWorks Blog" url=https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization][ref site="NVIDIA Developer" page="NV_conservative_raster" url=https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_conservative_raster.txt]][:rendering] [49:14][Hunt for a coverage-to-alpha function, in NV_shading_rate_image,[ref site="Khronos" page="NV_shading_rate_image" url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shading_rate_image.txt] ARB_sample_locations[ref site="Khronos" page="ARB_sample_locations" url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_locations.txt] and ARB_sample_shading[ref site="Khronos" page="ARB_sample_shading" url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_shading.txt]][:rendering :research] [58:13][Find that conservative rasterization is not widely available[ref site="OpenGL Hardware Database" url=https://opengl.gpuinfo.org/]][:rendering :research] [59:10][Consider running the multisample routine without a multisample buffer, only recording which samples were covered[ref site="Khronos Wiki" page="Multisampling" url=https://www.khronos.org/opengl/wiki/Multisampling][ref site="Khronos" page="NV_multisample_coverage" url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_coverage.txt][ref site="Khronos" page="NV_multisample_filter_hint" url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_filter_hint.txt]][:rendering :research] [1:05:40][Consider attaching a multisample and non-multisample render target at the same time][:rendering :research] [1:07:05][Consult glext.h for coverage-related functions[ref site="Khronos" page="glext.h" url=https://www.khronos.org/registry/OpenGL/api/GL/glext.h]][:rendering :research] [1:11:25][:Research NV_fragment_coverage_to_color[ref site="Khronos" page="NV_fragment_coverage_to_color" url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_fragment_coverage_to_color.txt][ref site="OpenGL Hardware Database" url=https://opengl.gpuinfo.org/]][:rendering] [1:14:11][:Research ARB_post_depth_coverage[ref site="Khronos" page="ARB_post_depth_coverage" url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_post_depth_coverage.txt] and gl_SampleMaskIn[ref site="Khronos" page="gl_SampleMaskIn" url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl_SampleMaskIn.xhtml]][:rendering] [1:17:53][NVIDIA request: When enabling conservative rasterization, let us set the SampleMask to the number of samples we want][:rendering :speech] [1:19:08][Lament the tremendous amount of bandwidth required to smooth out our edges][:memory :rendering :speech] [1:20:36][Reflect on the need for the multisample buffer to tell if primitives are coplanar][:rendering :run] [1:21:40][Consider augmenting our Colour Pass shader to skip pixels whose prior pass produced a fully opaque pixel][:rendering :run] [1:28:10][Make CompileResolveMultisample() set the gl_FragDepth of opaque pixels to 1.0f, enabling CompileZBiasProgram() to discard obscured pixels][:rendering] [1:36:24][Capture a frame to see that depth sorting is still working, and our Colour Passes are now more efficient][:rendering :run] [1:38:28][Switch to the non-multisampling fast path in CompileResolveMultisample()][:rendering] [1:39:50][Capture a frame to see that our glDrawArrays() calls have sped up][:rendering :run] [1:41:01][Options for creating our fast path in CompileResolveMultisample(): 1. Read from the previous depth peel and do not resolve opaque pixels; 2. Resolve into a separate "blend" buffer][:rendering :speech] [1:43:30][Make CompileResolveMultisample() only blend non-opaque pixels][:rendering] [1:44:42][Reacquaint ourselves with the final CompilePeelComposite() with a view to instead accumulating the colour as we go][:rendering :research] [1:46:39][Introduce a MaskSampler in CompileResolveMultisample() to contain the opacity][:rendering] [1:48:49][Crash the game under RenderDoc][:run] [1:49:11][Hit a shader error "unable to find overloaded function texelFetch()"][:rendering :run] [1:49:45][Fix CompileResolveMultisample() to fetch the Mask's texel from the 0th texture][:rendering] [1:50:24][See that the multisampling is a little busted][:rendering :run] [1:51:01][Just let CompileResolveMultisample() always blend, including some shader parser mayhem][:rendering] [1:53:16][See that the multisampled artefacts are gone][:rendering :run] [1:53:42][Make CompileResolveMultisample() initialise the samplers in the order in which they are passed to OpenGLLinkSamplers()][:rendering] [1:56:56][Make OpenGLEndFrame() set the Mask for CompileResolveMultisample() to acquire pixel opacity, introducing a SinglePixelAllZeroesTexture][:rendering] [2:07:05][See a few multisampling artefacts in there][:rendering :run] [2:08:08][Quickly scrutinise our new Mask code][:rendering :research] [2:09:18][Capture a frame to see that our second and fourth Colour Passes are not as expected][:rendering :run] [2:10:01][Check the Mask test in CompileResolveMultisample()][:rendering :research] [2:11:00][Take a close look at our first glDrawArrays() call, to see that our MaskSampler is 0×0 pixels][:rendering :run] [2:12:57][Fix OpenGLInit() to correctly bind our SinglePixelAllZeroesTexture][:rendering] [2:13:19][Capture a frame to see that our third Colour Pass drew much more than expected][:rendering :run] [2:15:29][Make CompileResolveMultisample() set gl_FragDepth of opaque pixels][:rendering] [2:16:46][Crash in RenderDoc][:run] [2:17:11][Make CompileResolveMultisample() set BlendUnitColor of opaque pixels][:rendering] [2:17:36][Battle with the shader parser][:rendering :programming :run] [2:21:16][Prevent CompileResolveMultisample() from setting the BlendUnitColor of opaque pixels][:rendering] [2:21:39][Capture a frame to see that we are still busted][:rendering :run] [2:24:31][Make CompileResolveMultisample() always blend][:rendering] [2:24:40][Find that the shader parser is unhappy, and wonder if it's a problem with ~4coder's virtual whitespace][:rendering :run] [2:25:14][Revert CompileResolveMultisample() to a supposedly working state][:rendering] [2:25:48][Find that the shader parser continues to be unhappy][:rendering :run] [2:26:48][Change all the comments in CompileResolveMultisample() to be C-style ones][:language] [2:27:23][Find that that doesn't solve the problem][:rendering :run] [2:28:00][See how ~RemedyBG sees the code in CompileResolveMultisample()][:language :run] [2:30:06][Trim out some possibly problematic code from CompileResolveMultisample()][:language] [2:30:18][Find that the shader parser is happy now][:language :run] [2:30:35][Make CompileResolveMultisample() set the Mask to 0.0][:rendering] [2:31:08][Find that the shader parser remains happy, but we still see our multisampling artefact][:rendering :run] [2:31:23][Make CompileResolveMultisample() always blend][:rendering] [2:31:33][See our multisampling artefact][:rendering :run] [2:32:09][Leave in our fast path in CompileResolveMultisample()][:rendering] [2:32:33][See new artefacts][:rendering :run] [2:32:52][Make CompileResolveMultisample() always blend][:rendering] [2:33:05][See that everything looks nice and smooth][:rendering :run] [2:33:53][Reinsert our fast path in CompileResolveMultisample()][:rendering] [2:34:06][See new MIP map anisotropic filtering artefacts][:rendering :run] [2:34:44][Capture a frame to see that our third and fourth Colour Passes are not as free as expected][:rendering :run] [2:35:11][Fix CompileResolveMultisample() to black out the BlendUnitColor of opaque pixels][:rendering] [2:36:16][Capture a frame to see that our Colour Passes are way more efficient][:rendering :run] [2:38:31][Q&A][:speech] [2:38:38][@dithinas][Q: Are you exceeding the size of your buffer for the shader code text? The EOF error seems kind of suspicious] [2:38:50][Make CompileResolveMultisample() record the shader string BufferSize][:language] [2:39:42][See that the shader string is 3877 characters][:language :run] [2:39:59][Increase the FragmentCode buffer size from 4096 to 16000][:language] [2:41:04][Find that the shader parser is happy, with a few words on tooling][:language :run] [2:43:33][@sagian2005][Q: Couldn't you just not define the dimension and let the compiler figure out the size?][:language] [2:44:10][@letambourinroyal][Q: Why can’t you just run the preprocessor on the shaders, though?][:language] [2:46:53][@dithinas][Q: Do you have any concept of like a string builder that you could use with your format string function to have it pass you back a string allocated from temp memory? It's a small thing, but you won't need to think / worry about that anymore][:"string manipulation"] [2:48:06][@enyo_enev][Q: I think that Dual Depth Peeling.[ref site="NVIDIA Developer" page="Dual Depth Peeling" url=http://developer.download.nvidia.com/SDK/10/opengl/screenshots/samples/dual_depth_peeling.html][ref author="Louis Bavoil, Kevin Myers" title="Order Independent Transparency with Dual Depth Peeling" publisher="NVIDIA Corporation" url=http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf] This paper may be able to help you out, because I looked at their OpenGL code and it seems possible to resolve only once when you render to the screen. You can download the code and see. I am not sure. It is pure OpenGL. They have not enabled multisampling, however I think it is possible to work][:rendering] [2:53:20][@samyuutsu][Q: Have you considered using RenderDoc directly to recompile the shaders at runtime during rapid development?] [2:54:40][@eddiesutrecht][Q: Hi [@cmuratori Casey], some episodes ago you said you didn't understand why [@naysayer88 Jon] credited you on Braid, since you didn't do any work on it. But you wrote the collision detector, right? Maybe I misunderstood and it was about the rewind only] [2:59:08][@vaualbus][Q: How you would implement in debug systems the way of changing also v3 / v4, values?][:"debug system"] [2:59:56][@zrizi][Q: I just watched the video about sub-pixel :sampling for pixel art assets. Thank you for explaining it, was really good. I have two related questions, though: 1) When you viewed the sprite sheet you noticed that it’s not alpha pre-multiplied and kind of mentioned that it might be a problem. I was wondering why. 2) You mentioned that they don’t use MIP maps and that got me thinking… How should we produce MIPs for pixel art assets? Box-filtering would not work since that’s made for bilinear] [3:02:44][@zrizi][Q: 1) You're right. But I think Unity's shaders were set up for point :sampling] [3:03:47][@zrizi][Q: And thank you very much for the skin mesh video. I still have to watch it] [3:04:00][Time for bed, with a plug of Molly Rocket's Discord channel[ref site=Discord page=MollyRocket url=https://discord.gg/mollyrocket]][:speech] [/video]