[video output=day555 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Looking for GPU Performance Issues" vod_platform=youtube id=fduWZsh1riQ annotator=Miblo]
[0:02][Welcome to the stream][:speech]
[0:14][A few words on RenderDoc's crash message yesterday, with praise for their tech support, and plans to enable the game to fail gracefully when launched with incorrect parameters][:speech]
[3:48][Launch [~hero Handmade Hero] in RenderDoc][:run]
[4:13][Set the wrong Working Directory in RenderDoc][:admin]
[4:34][Crash RenderDoc upon launching [~hero Handmade Hero]][:run]
[6:12][Launch [~hero Handmade Hero] in ~RemedyBG with the Working Directory set wrong][:run]
[7:15][Hit our assertion in GetFontInfo()][:"asset system" :run]
[7:39][Make GetFontInfo() additionally assert that the Asset's Type is HHAAsset_None][:"asset system"]
[8:34][Hit our TextureIndex assertion in PushQuad()][:"asset system" :run]
[9:55][Enable all our asset Get*() functions to handle the absence of assets][:"asset system"]
[16:43][:Run successfully with our incorrect Working Directory]
[17:13][Enable AllocateGameAssets() to issue a warning notification when no assets were available][:"error handling"]
[23:17][Crash ~RemedyBG apparently on a jump-to-zero][:run]
[23:58][Create jump_to_zero_crash.cpp]
[25:07][:Run jump_to_zero to find that ~RemedyBG is fine with it]
[25:48][:Run [~hero Handmade Hero] successfully with our new warning code][:"error handling"]
[26:38][Introduce Win32ErrorMessage()][:"error handling" :"platform layer"]
[28:31][:Run and close cleanly][:"error handling" :"platform layer"]
[28:44][Make Win32ErrorMessage() print out an error message[ref
    site=MSDN
    page="MessageBoxExA function"
    url=https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-messageboxexa]][:"error handling" :"platform layer"]
[34:16][:Run and close with our warning box][:"error handling" :"platform layer"]
[34:39][Fix the MBoxType in Win32ErrorMessage()[ref
    site=MSDN
    page="MessageBox function"
    url=https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-messagebox]]
[35:46][:Run and close with our warning box][:"error handling" :"platform layer"]
[35:51][Temporarily try making AllocateGameAssets() produce a Fatal error][:"error handling"]
[36:17][:Run and close with our error box][:"error handling" :"platform layer"]
[36:29][Make WinMainCRTStartup() emit errors using Win32ErrorMessage()][:"error handling" :"platform layer"]
[38:58][:Run and close with our warning box][:"error handling" :"platform layer"]
[39:06][:Run unsuccessfully with the correct Working Directory][:"error handling" :"platform layer"]
[41:16][We've got a [@Molly saucy bean]][:speech]
[42:12][Fix GetBitmap() to correctly set the TextureHandle][:"asset system"]
[43:45][:Run successfully][:"asset system"]
[44:02][:Run in RenderDoc with the wrong Working Directory]
[44:25][Crash RenderDoc post-exit][:run]
[45:40][:Run in RenderDoc with the correct Working Directory, noting that we sometimes miss 60 FPS][:performance]
[46:54][Capture a frame in RenderDoc and see that it took 35590 µs][:run]
[48:48][Look at our four Colour Passes, and plan to submit vertices economically][:rendering :run]
[51:37][Make OpenGLInit() disable RequestVSync][:rendering]
[52:40][Find that our frame time hovers around 14 ms per frame][:performance :rendering :run]
[53:27][Reacquaint ourselves with our render dispatch in OpenGLEndFrame()][:rendering :research]
[55:31][Understanding glMapBuffer()[ref
    site=docs.GL
    page=glMapBuffer
    url=http://docs.gl/gl3/glMapBuffer]][:api :rendering :research]
[57:57][Nsight rendering time: 14ms to 18ms / frame][:rendering :run]
[59:37][Scrub through Events to see that glDrawArrays takes 1.27ms][:rendering :run]
[1:00:35][Scrutinise ResolveMultisample() with a view to speeding it up][:rendering :research]
[1:05:00][Make CompileResolveMultisample() bake the SampleCount in to the shader, to hopefully permit the loop to be unrolled][:hardware :rendering]
[1:08:03][Find that it actually works][:hardware :rendering :run]
[1:08:19][Make CompileResolveMultisample() bake InvSampleCount in to the shader, and slightly reorganise it][:hardware :rendering]
[1:12:35][Find that that made no difference to the frame time, and that UpdateAndRenderEntities() takes a while][:hardware :performance :rendering :run]
[1:14:04][Temporarily Disable DrawGroundCover()][:rendering]
[1:14:37][Find that Game Update takes less of the total time, but we are not hitting 60 FPS][:performance :rendering :run]
[1:17:11][Disable HANDMADE_SLOW]
[1:18:23][Find that Debug Collation takes a lot of the total time][:"debug system" :performance :run]
[1:18:46][Compile out some of the :"debug system" if HANDMADE_SLOW is off]
[1:20:43][See the debug UI][:"debug system" :run]
[1:21:07][Instead compile out that part of the :"debug system" if HANDMADE_SLOW is on, and rearrange the code to fix compile errors]
[1:23:33][Find that lots of the :"debug system" is absent][:run]
[1:24:06][Compile out timing if HANDMADE_SLOW is off][:"debug system"]
[1:25:05][Still see jerkiness with debug collation off][:performance :run]
[1:26:09][Compile in our frame marker in all situations][:"debug system"]
[1:26:45][See that our frame time is well below 16ms, but we are not actually seeing 60 FPS][:performance :run]
[1:27:22][Nsight rendering time: 10ms to 14ms / frame][:rendering :run]
[1:29:07][Capture a frame in Nsight to see that glDrawArrays remains by far the most expensive call][:performance :rendering :run]
[1:33:47][Hard set SampleCount to 1 in CompileResolveMultisample()][:rendering]
[1:34:44][Nsight rendering time: 4ms / frame][:rendering :run]
[1:35:03][Capture a frame in Nsight to find that the resolve and draw calls take similar times][:performance :rendering :run]
[1:36:09][Understanding multisampling][:performance :rendering :speech]
[1:37:47][See our crispy lines, without multisampling][:rendering :run]
[1:38:31][:Research multisampling in GLSL[ref
    site="OpenGL Registry"
    page="The OpenGL ES Shading Language"
    url=https://khronos.org/registry/OpenGL/specs/es/3.0/GLSL_ES_Specification_3.00.pdf][ref
    site="Khronos Wiki"
    page="Multisampling"
    url=https://www.khronos.org/opengl/wiki/Multisampling][ref
        site=NVIDIA
        page="Deferred Shading MSAA Sample"
        url=http://gameworksdocs.nvidia.com/GraphicsSamples/DeferredShadingMSAASample.htm][ref
            site="OpenGL 4 Reference Pages"
            page="texelFetch"
            url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/texelFetch.xhtml][ref
                author="Johan Andersson"
                title="DirectX 11 Rendering in Battlefield 3"
                url=http://www.dice.se/wp-content/uploads/2014/12/GDC11_DX11inBF3_Public.pdf]][:rendering]
[1:45:42][Spec out our desired fast path in CompileResolveMultisample(), based on there being only one sample in a multisample][:rendering]
[1:48:00][See nothing][:rendering :run]
[1:48:04][Revert CompileResolveMultisample()][:rendering]
[1:50:08][See everything][:rendering :run]
[1:50:10][Spec out our desired fast path in CompileResolveMultisample() again][:rendering]
[1:51:40][See nothing][:rendering :run]
[1:51:52][Cut out the else and the if(1) in CompileResolveMultisample()][:rendering]
[1:52:41][Find that the if(1) was the problem, somehow][:rendering :run]
[1:53:08][Reinsert the if(1) in CompileResolveMultisample()][:rendering]
[1:53:30][Capture a frame in Nsight, but still see no error message][:rendering :run]
[1:55:24][Enable HANDMADE_SLOW]
[1:55:46][Trigger a fragment shader error: implicit cast from "int" to "bool"][:rendering :run]
[1:56:21][Change the if(1) to if(true) in CompileResolveMultisample()][:rendering]
[1:57:20][See our standard output][:rendering :run]
[1:57:29][Trigger our non-multisampled fast path in CompileResolveMultisample()][:rendering]
[1:57:34][See our crispy edges][:rendering :run]
[1:57:47][Make a note to ask GPU people about our fast path]
[1:58:28][Q&A][:speech]
[1:59:13][@nxsy][Q: textureSamples(sampler)?[ref
            site="OpenGL 4 Reference Pages"
            page="textureSamples"
            url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/textureSamples.xhtml]][:rendering]
[2:00:27][@Brian][Q: You can have Windows automatically capture crash dumps for you. Check if this key exists: HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\Windows Error Reporting\\LocalDumps and, if so, it will automatically capture dumps to %LOCALAPPDATA%\\CrashDumps. You can read more[ref
    site="Windows Dev Center"
    page="Collecting User-Mode Dumps"
    url=https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps]]
[2:03:47][@aaronnickovich][Q: The Watch page[ref
    site="Handmade Hero"
    page="Watch"
    url=https://handmadehero.org/watch] has no more scheduled appearances. When will you be streaming next?]
[2:04:14][@blazeitfury][Q: You should switch to Linux][:"operating system"]
[2:05:41][@rationalcoder][Q: Someone had asked whether or not that would optimize anything since the GPU has to execute both branches][:language]
[2:07:34][@bulmanator][Q: I’m not sure if you were joking when you said to ask, but if you have the time and feel like it I would 100% be down to see lectures from you about the rest of the :animation system like you did with skinning]
[2:09:26][@philliptrudeau][Q: I think that putting it under "Chat" made it less clickable. Also, flashy thumbnails are really important]
[2:11:20][@blazeitfury][Q: My girlfriend watches and tells you most of the things you said about Linux are old school issues][:"operating system"]
[2:16:13][@aaronnickovich][I have an Arch Linux machine. Its dependencies are now broken and will take weeks to get working again][:"operating system"]
[2:18:37][@ivereadthesequel][Q: [@cmuratori Casey] please dispel the myth that @molly123 and I are the same person. @rupan3 has a crazy conspiracy going on I can't shake]
[2:19:01][@pythno][Q: For what reason do you like Linux if it breaks all the time?][:"operating system"]
[2:23:12][Wrap it up][:speech]
[/video]