cinera_handmade.network/cmuratori/hero/code/code556.hmml

158 lines
13 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video output=day556 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing Depth Peeling and Multisample Resolves" vod_platform=youtube id=M6qE6ncZV68 annotator=Miblo]
[0:02][Plug Molly Rocket's Discord channel[ref
site=Discord
page=MollyRocket
url=https://discord.gg/mollyrocket]][:speech]
[1:44][Recap and set the stage for the day with praise for RenderDoc][:speech]
[3:15][Note that lots of our time is spent doing the multisample resolve][:performance :rendering :speech]
[7:32][Configure our project in RenderDoc][:admin]
[8:42][Capture a frame in RenderDoc and consult the event timings to see that the multisample buffer took 10 times longer to resolve than to draw][:performance :rendering :run]
[10:54][The possible bandwidth cost of resolving our multisample buffer][:performance :rendering :run]
[14:16][Our best case solution: Resolve the multisample buffer independently of the depth peel][:performance :rendering :run]
[16:29][Demo ~Milton's new grid feature, but its smoothing bug][:blackboard]
[17:32][:Rendering requirements of Sprites vs Geometry][:blackboard]
[20:16][Demo some undesirable alpha blending when traversing stairs][:rendering :run]
[22:15][Consider two multisample resolves to may be necessary][:performance :rendering :run]
[24:00][Plan to render the sprites with depth peel in a separate pass, then composite in the geometry with two multisample resolves][:blackboard :performance :rendering]
[30:17][Segregating our sprites and geometry into separate buffers][:blackboard :performance :memory]
[33:28][Hesitate to impose this separation requirement on the renderer][:blackboard :library]
[35:31][Producing a separate multisample buffer of the edge information][:blackboard :performance :rendering]
[36:52][The edge information in question][:performance :rendering :run]
[38:53][Using conservative rasterization to enable recovery of our edge blend from a single high- / low-coverage multisample resolve][:performance :rendering :run]
[43:48][Using conservative rasterization just to tell us how much a pixel is covered by a primitive][:performance :rendering :run]
[45:28][:Research conservative rasterization[ref
site="NVIDIA GameWorks Documentation"
page="Conservative Rasterization Sample"
url=https://docs.nvidia.com/gameworks/content/gameworkslibrary/graphicssamples/opengl_samples/conservativerasterizationsample.htm][ref
author="Jon Story"
title="Don't be conservative with Conservative Rasterization"
publisher="NVIDIA GameWorks Blog"
url=https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization][ref
site="NVIDIA Developer"
page="NV_conservative_raster"
url=https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_conservative_raster.txt]][:rendering]
[49:14][Hunt for a coverage-to-alpha function, in NV_shading_rate_image,[ref
site="Khronos"
page="NV_shading_rate_image"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shading_rate_image.txt] ARB_sample_locations[ref
site="Khronos"
page="ARB_sample_locations"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_locations.txt] and ARB_sample_shading[ref
site="Khronos"
page="ARB_sample_shading"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_sample_shading.txt]][:rendering :research]
[58:13][Find that conservative rasterization is not widely available[ref
site="OpenGL Hardware Database"
url=https://opengl.gpuinfo.org/]][:rendering :research]
[59:10][Consider running the multisample routine without a multisample buffer, only recording which samples were covered[ref
site="Khronos Wiki"
page="Multisampling"
url=https://www.khronos.org/opengl/wiki/Multisampling][ref
site="Khronos"
page="NV_multisample_coverage"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_coverage.txt][ref
site="Khronos"
page="NV_multisample_filter_hint"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_multisample_filter_hint.txt]][:rendering :research]
[1:05:40][Consider attaching a multisample and non-multisample render target at the same time][:rendering :research]
[1:07:05][Consult glext.h for coverage-related functions[ref
site="Khronos"
page="glext.h"
url=https://www.khronos.org/registry/OpenGL/api/GL/glext.h]][:rendering :research]
[1:11:25][:Research NV_fragment_coverage_to_color[ref
site="Khronos"
page="NV_fragment_coverage_to_color"
url=https://www.khronos.org/registry/OpenGL/extensions/NV/NV_fragment_coverage_to_color.txt][ref
site="OpenGL Hardware Database"
url=https://opengl.gpuinfo.org/]][:rendering]
[1:14:11][:Research ARB_post_depth_coverage[ref
site="Khronos"
page="ARB_post_depth_coverage"
url=https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_post_depth_coverage.txt] and gl_SampleMaskIn[ref
site="Khronos"
page="gl_SampleMaskIn"
url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl_SampleMaskIn.xhtml]][:rendering]
[1:17:53][NVIDIA request: When enabling conservative rasterization, let us set the SampleMask to the number of samples we want][:rendering :speech]
[1:19:08][Lament the tremendous amount of bandwidth required to smooth out our edges][:memory :rendering :speech]
[1:20:36][Reflect on the need for the multisample buffer to tell if primitives are coplanar][:rendering :run]
[1:21:40][Consider augmenting our Colour Pass shader to skip pixels whose prior pass produced a fully opaque pixel][:rendering :run]
[1:28:10][Make CompileResolveMultisample() set the gl_FragDepth of opaque pixels to 1.0f, enabling CompileZBiasProgram() to discard obscured pixels][:rendering]
[1:36:24][Capture a frame to see that depth sorting is still working, and our Colour Passes are now more efficient][:rendering :run]
[1:38:28][Switch to the non-multisampling fast path in CompileResolveMultisample()][:rendering]
[1:39:50][Capture a frame to see that our glDrawArrays() calls have sped up][:rendering :run]
[1:41:01][Options for creating our fast path in CompileResolveMultisample(): 1. Read from the previous depth peel and do not resolve opaque pixels; 2. Resolve into a separate "blend" buffer][:rendering :speech]
[1:43:30][Make CompileResolveMultisample() only blend non-opaque pixels][:rendering]
[1:44:42][Reacquaint ourselves with the final CompilePeelComposite() with a view to instead accumulating the colour as we go][:rendering :research]
[1:46:39][Introduce a MaskSampler in CompileResolveMultisample() to contain the opacity][:rendering]
[1:48:49][Crash the game under RenderDoc][:run]
[1:49:11][Hit a shader error "unable to find overloaded function texelFetch()"][:rendering :run]
[1:49:45][Fix CompileResolveMultisample() to fetch the Mask's texel from the 0th texture][:rendering]
[1:50:24][See that the multisampling is a little busted][:rendering :run]
[1:51:01][Just let CompileResolveMultisample() always blend, including some shader parser mayhem][:rendering]
[1:53:16][See that the multisampled artefacts are gone][:rendering :run]
[1:53:42][Make CompileResolveMultisample() initialise the samplers in the order in which they are passed to OpenGLLinkSamplers()][:rendering]
[1:56:56][Make OpenGLEndFrame() set the Mask for CompileResolveMultisample() to acquire pixel opacity, introducing a SinglePixelAllZeroesTexture][:rendering]
[2:07:05][See a few multisampling artefacts in there][:rendering :run]
[2:08:08][Quickly scrutinise our new Mask code][:rendering :research]
[2:09:18][Capture a frame to see that our second and fourth Colour Passes are not as expected][:rendering :run]
[2:10:01][Check the Mask test in CompileResolveMultisample()][:rendering :research]
[2:11:00][Take a close look at our first glDrawArrays() call, to see that our MaskSampler is 0×0 pixels][:rendering :run]
[2:12:57][Fix OpenGLInit() to correctly bind our SinglePixelAllZeroesTexture][:rendering]
[2:13:19][Capture a frame to see that our third Colour Pass drew much more than expected][:rendering :run]
[2:15:29][Make CompileResolveMultisample() set gl_FragDepth of opaque pixels][:rendering]
[2:16:46][Crash in RenderDoc][:run]
[2:17:11][Make CompileResolveMultisample() set BlendUnitColor of opaque pixels][:rendering]
[2:17:36][Battle with the shader parser][:rendering :programming :run]
[2:21:16][Prevent CompileResolveMultisample() from setting the BlendUnitColor of opaque pixels][:rendering]
[2:21:39][Capture a frame to see that we are still busted][:rendering :run]
[2:24:31][Make CompileResolveMultisample() always blend][:rendering]
[2:24:40][Find that the shader parser is unhappy, and wonder if it's a problem with ~4coder's virtual whitespace][:rendering :run]
[2:25:14][Revert CompileResolveMultisample() to a supposedly working state][:rendering]
[2:25:48][Find that the shader parser continues to be unhappy][:rendering :run]
[2:26:48][Change all the comments in CompileResolveMultisample() to be C-style ones][:language]
[2:27:23][Find that that doesn't solve the problem][:rendering :run]
[2:28:00][See how ~RemedyBG sees the code in CompileResolveMultisample()][:language :run]
[2:30:06][Trim out some possibly problematic code from CompileResolveMultisample()][:language]
[2:30:18][Find that the shader parser is happy now][:language :run]
[2:30:35][Make CompileResolveMultisample() set the Mask to 0.0][:rendering]
[2:31:08][Find that the shader parser remains happy, but we still see our multisampling artefact][:rendering :run]
[2:31:23][Make CompileResolveMultisample() always blend][:rendering]
[2:31:33][See our multisampling artefact][:rendering :run]
[2:32:09][Leave in our fast path in CompileResolveMultisample()][:rendering]
[2:32:33][See new artefacts][:rendering :run]
[2:32:52][Make CompileResolveMultisample() always blend][:rendering]
[2:33:05][See that everything looks nice and smooth][:rendering :run]
[2:33:53][Reinsert our fast path in CompileResolveMultisample()][:rendering]
[2:34:06][See new MIP map anisotropic filtering artefacts][:rendering :run]
[2:34:44][Capture a frame to see that our third and fourth Colour Passes are not as free as expected][:rendering :run]
[2:35:11][Fix CompileResolveMultisample() to black out the BlendUnitColor of opaque pixels][:rendering]
[2:36:16][Capture a frame to see that our Colour Passes are way more efficient][:rendering :run]
[2:38:31][Q&A][:speech]
[2:38:38][@dithinas][Q: Are you exceeding the size of your buffer for the shader code text? The EOF error seems kind of suspicious]
[2:38:50][Make CompileResolveMultisample() record the shader string BufferSize][:language]
[2:39:42][See that the shader string is 3877 characters][:language :run]
[2:39:59][Increase the FragmentCode buffer size from 4096 to 16000][:language]
[2:41:04][Find that the shader parser is happy, with a few words on tooling][:language :run]
[2:43:33][@sagian2005][Q: Couldn't you just not define the dimension and let the compiler figure out the size?][:language]
[2:44:10][@letambourinroyal][Q: Why cant you just run the preprocessor on the shaders, though?][:language]
[2:46:53][@dithinas][Q: Do you have any concept of like a string builder that you could use with your format string function to have it pass you back a string allocated from temp memory? It's a small thing, but you won't need to think / worry about that anymore][:"string manipulation"]
[2:48:06][@enyo_enev][Q: I think that Dual Depth Peeling.[ref
site="NVIDIA Developer"
page="Dual Depth Peeling"
url=http://developer.download.nvidia.com/SDK/10/opengl/screenshots/samples/dual_depth_peeling.html][ref
author="Louis Bavoil, Kevin Myers"
title="Order Independent Transparency with Dual Depth Peeling"
publisher="NVIDIA Corporation"
url=http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf] This paper may be able to help you out, because I looked at their OpenGL code and it seems possible to resolve only once when you render to the screen. You can download the code and see. I am not sure. It is pure OpenGL. They have not enabled multisampling, however I think it is possible to work][:rendering]
[2:53:20][@samyuutsu][Q: Have you considered using RenderDoc directly to recompile the shaders at runtime during rapid development?]
[2:54:40][@eddiesutrecht][Q: Hi [@cmuratori Casey], some episodes ago you said you didn't understand why [@naysayer88 Jon] credited you on Braid, since you didn't do any work on it. But you wrote the collision detector, right? Maybe I misunderstood and it was about the rewind only]
[2:59:08][@vaualbus][Q: How you would implement in debug systems the way of changing also v3 / v4, values?][:"debug system"]
[2:59:56][@zrizi][Q: I just watched the video about sub-pixel :sampling for pixel art assets. Thank you for explaining it, was really good. I have two related questions, though: 1) When you viewed the sprite sheet you noticed that its not alpha pre-multiplied and kind of mentioned that it might be a problem. I was wondering why. 2) You mentioned that they dont use MIP maps and that got me thinking… How should we produce MIPs for pixel art assets? Box-filtering would not work since thats made for bilinear]
[3:02:44][@zrizi][Q: 1) You're right. But I think Unity's shaders were set up for point :sampling]
[3:03:47][@zrizi][Q: And thank you very much for the skin mesh video. I still have to watch it]
[3:04:00][Time for bed, with a plug of Molly Rocket's Discord channel[ref
site=Discord
page=MollyRocket
url=https://discord.gg/mollyrocket]][:speech]
[/video]