[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Looking for GPU Performance Issues" vod_platform=youtube id=fduWZsh1riQ annotator=Miblo] [0:02][Welcome to the stream][:speech] [0:14][A few words on RenderDoc's crash message yesterday, with praise for their tech support, and plans to enable the game to fail gracefully when launched with incorrect parameters][:speech] [3:48][Launch [~hero Handmade Hero] in RenderDoc][:run] [4:13][Set the wrong Working Directory in RenderDoc][:admin] [4:34][Crash RenderDoc upon launching [~hero Handmade Hero]][:run] [6:12][Launch [~hero Handmade Hero] in ~RemedyBG with the Working Directory set wrong][:run] [7:15][Hit our assertion in GetFontInfo()][:"asset system" :run] [7:39][Make GetFontInfo() additionally assert that the Asset's Type is HHAAsset_None][:"asset system"] [8:34][Hit our TextureIndex assertion in PushQuad()][:"asset system" :run] [9:55][Enable all our asset Get*() functions to handle the absence of assets][:"asset system"] [16:43][:Run successfully with our incorrect Working Directory] [17:13][Enable AllocateGameAssets() to issue a warning notification when no assets were available][:"error handling"] [23:17][Crash ~RemedyBG apparently on a jump-to-zero][:run] [23:58][Create jump_to_zero_crash.cpp] [25:07][:Run jump_to_zero to find that ~RemedyBG is fine with it] [25:48][:Run [~hero Handmade Hero] successfully with our new warning code][:"error handling"] [26:38][Introduce Win32ErrorMessage()][:"error handling" :"platform layer"] [28:31][:Run and close cleanly][:"error handling" :"platform layer"] [28:44][Make Win32ErrorMessage() print out an error message[ref site=MSDN page="MessageBoxExA function" url=https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-messageboxexa]][:"error handling" :"platform layer"] [34:16][:Run and close with our warning box][:"error handling" :"platform layer"] [34:39][Fix the MBoxType in Win32ErrorMessage()[ref site=MSDN page="MessageBox function" url=https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-messagebox]] [35:46][:Run and close with our warning box][:"error handling" :"platform layer"] [35:51][Temporarily try making AllocateGameAssets() produce a Fatal error][:"error handling"] [36:17][:Run and close with our error box][:"error handling" :"platform layer"] [36:29][Make WinMainCRTStartup() emit errors using Win32ErrorMessage()][:"error handling" :"platform layer"] [38:58][:Run and close with our warning box][:"error handling" :"platform layer"] [39:06][:Run unsuccessfully with the correct Working Directory][:"error handling" :"platform layer"] [41:16][We've got a [@Molly saucy bean]][:speech] [42:12][Fix GetBitmap() to correctly set the TextureHandle][:"asset system"] [43:45][:Run successfully][:"asset system"] [44:02][:Run in RenderDoc with the wrong Working Directory] [44:25][Crash RenderDoc post-exit][:run] [45:40][:Run in RenderDoc with the correct Working Directory, noting that we sometimes miss 60 FPS][:performance] [46:54][Capture a frame in RenderDoc and see that it took 35590 µs][:run] [48:48][Look at our four Colour Passes, and plan to submit vertices economically][:rendering :run] [51:37][Make OpenGLInit() disable RequestVSync][:rendering] [52:40][Find that our frame time hovers around 14 ms per frame][:performance :rendering :run] [53:27][Reacquaint ourselves with our render dispatch in OpenGLEndFrame()][:rendering :research] [55:31][Understanding glMapBuffer()[ref site=docs.GL page=glMapBuffer url=http://docs.gl/gl3/glMapBuffer]][:api :rendering :research] [57:57][Nsight rendering time: 14ms to 18ms / frame][:rendering :run] [59:37][Scrub through Events to see that glDrawArrays takes 1.27ms][:rendering :run] [1:00:35][Scrutinise ResolveMultisample() with a view to speeding it up][:rendering :research] [1:05:00][Make CompileResolveMultisample() bake the SampleCount in to the shader, to hopefully permit the loop to be unrolled][:hardware :rendering] [1:08:03][Find that it actually works][:hardware :rendering :run] [1:08:19][Make CompileResolveMultisample() bake InvSampleCount in to the shader, and slightly reorganise it][:hardware :rendering] [1:12:35][Find that that made no difference to the frame time, and that UpdateAndRenderEntities() takes a while][:hardware :performance :rendering :run] [1:14:04][Temporarily Disable DrawGroundCover()][:rendering] [1:14:37][Find that Game Update takes less of the total time, but we are not hitting 60 FPS][:performance :rendering :run] [1:17:11][Disable HANDMADE_SLOW] [1:18:23][Find that Debug Collation takes a lot of the total time][:"debug system" :performance :run] [1:18:46][Compile out some of the :"debug system" if HANDMADE_SLOW is off] [1:20:43][See the debug UI][:"debug system" :run] [1:21:07][Instead compile out that part of the :"debug system" if HANDMADE_SLOW is on, and rearrange the code to fix compile errors] [1:23:33][Find that lots of the :"debug system" is absent][:run] [1:24:06][Compile out timing if HANDMADE_SLOW is off][:"debug system"] [1:25:05][Still see jerkiness with debug collation off][:performance :run] [1:26:09][Compile in our frame marker in all situations][:"debug system"] [1:26:45][See that our frame time is well below 16ms, but we are not actually seeing 60 FPS][:performance :run] [1:27:22][Nsight rendering time: 10ms to 14ms / frame][:rendering :run] [1:29:07][Capture a frame in Nsight to see that glDrawArrays remains by far the most expensive call][:performance :rendering :run] [1:33:47][Hard set SampleCount to 1 in CompileResolveMultisample()][:rendering] [1:34:44][Nsight rendering time: 4ms / frame][:rendering :run] [1:35:03][Capture a frame in Nsight to find that the resolve and draw calls take similar times][:performance :rendering :run] [1:36:09][Understanding multisampling][:performance :rendering :speech] [1:37:47][See our crispy lines, without multisampling][:rendering :run] [1:38:31][:Research multisampling in GLSL[ref site="OpenGL Registry" page="The OpenGL ES Shading Language" url=https://khronos.org/registry/OpenGL/specs/es/3.0/GLSL_ES_Specification_3.00.pdf][ref site="Khronos Wiki" page="Multisampling" url=https://www.khronos.org/opengl/wiki/Multisampling][ref site=NVIDIA page="Deferred Shading MSAA Sample" url=http://gameworksdocs.nvidia.com/GraphicsSamples/DeferredShadingMSAASample.htm][ref site="OpenGL 4 Reference Pages" page="texelFetch" url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/texelFetch.xhtml][ref author="Johan Andersson" title="DirectX 11 Rendering in Battlefield 3" url=http://www.dice.se/wp-content/uploads/2014/12/GDC11_DX11inBF3_Public.pdf]][:rendering] [1:45:42][Spec out our desired fast path in CompileResolveMultisample(), based on there being only one sample in a multisample][:rendering] [1:48:00][See nothing][:rendering :run] [1:48:04][Revert CompileResolveMultisample()][:rendering] [1:50:08][See everything][:rendering :run] [1:50:10][Spec out our desired fast path in CompileResolveMultisample() again][:rendering] [1:51:40][See nothing][:rendering :run] [1:51:52][Cut out the else and the if(1) in CompileResolveMultisample()][:rendering] [1:52:41][Find that the if(1) was the problem, somehow][:rendering :run] [1:53:08][Reinsert the if(1) in CompileResolveMultisample()][:rendering] [1:53:30][Capture a frame in Nsight, but still see no error message][:rendering :run] [1:55:24][Enable HANDMADE_SLOW] [1:55:46][Trigger a fragment shader error: implicit cast from "int" to "bool"][:rendering :run] [1:56:21][Change the if(1) to if(true) in CompileResolveMultisample()][:rendering] [1:57:20][See our standard output][:rendering :run] [1:57:29][Trigger our non-multisampled fast path in CompileResolveMultisample()][:rendering] [1:57:34][See our crispy edges][:rendering :run] [1:57:47][Make a note to ask GPU people about our fast path] [1:58:28][Q&A][:speech] [1:59:13][@nxsy][Q: textureSamples(sampler)?[ref site="OpenGL 4 Reference Pages" page="textureSamples" url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/textureSamples.xhtml]][:rendering] [2:00:27][@Brian][Q: You can have Windows automatically capture crash dumps for you. Check if this key exists: HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\Windows Error Reporting\\LocalDumps and, if so, it will automatically capture dumps to %LOCALAPPDATA%\\CrashDumps. You can read more[ref site="Windows Dev Center" page="Collecting User-Mode Dumps" url=https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps]] [2:03:47][@aaronnickovich][Q: The Watch page[ref site="Handmade Hero" page="Watch" url=https://handmadehero.org/watch] has no more scheduled appearances. When will you be streaming next?] [2:04:14][@blazeitfury][Q: You should switch to Linux][:"operating system"] [2:05:41][@rationalcoder][Q: Someone had asked whether or not that would optimize anything since the GPU has to execute both branches][:language] [2:07:34][@bulmanator][Q: I’m not sure if you were joking when you said to ask, but if you have the time and feel like it I would 100% be down to see lectures from you about the rest of the :animation system like you did with skinning] [2:09:26][@philliptrudeau][Q: I think that putting it under "Chat" made it less clickable. Also, flashy thumbnails are really important] [2:11:20][@blazeitfury][Q: My girlfriend watches and tells you most of the things you said about Linux are old school issues][:"operating system"] [2:16:13][@aaronnickovich][I have an Arch Linux machine. Its dependencies are now broken and will take weeks to get working again][:"operating system"] [2:18:37][@ivereadthesequel][Q: [@cmuratori Casey] please dispel the myth that @molly123 and I are the same person. @rupan3 has a crazy conspiracy going on I can't shake] [2:19:01][@pythno][Q: For what reason do you like Linux if it breaks all the time?][:"operating system"] [2:23:12][Wrap it up][:speech] [/video]