cinera_handmade.network/cmuratori/hero/code/code593.hmml

201 lines
18 KiB
Plaintext
Raw Normal View History

[video output=day593 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Debugging Lighting Validation" vod_platform=youtube id=uoVm_59w03o annotator=Miblo]
2020-04-14 00:05:40 +00:00
[0:03][Plug the Meow the Infinite printed comic Kickstarter[ref
site=Kickstarter
page="Meow the Infinite: Book One"
url=https://www.kickstarter.com/projects/annarettberg/meow-the-infinite-book-one] and tease the related fun stuff in celebration of it][:research]
[0:52][Recap our :lighting discrepancy between the game and hhlightprof][:speech]
[2:00][Change the dump file paths in hhlightprof, with the determination to dump new :lighting data in a single-threaded run of the game][:threading]
[2:58][Make InternalLightingCore() disable the LightBoxDumpTrigger() after dumping one set of data][:lighting]
[4:05][Hit our Work alignment assertion in InternalLightingCore()][:lighting :run]
[4:42][Build in -Od]
[4:55][Hit our Work alignment assertion in InternalLightingCore()][:lighting :run]
[5:06][~RemedyBG bug report: AND'ing a location with an integer][:lighting :run]
[6:13][@x13pixels][Sheeet. I'll get that fixed]
[6:27][Fix the BigPad in lighting_work][:"data structure"]
[7:52][:Run without hitting that alignment assertion in InternalLightingCore][:lighting]
[8:13][:Run the game with the determination to capture :lighting dumps]
[9:13][Add the LightBoxDumpTrigger to the debug :UI in EndLightingComputation()][:lighting]
[10:06][Dump our multithreaded :lighting][:run :threading]
[10:25][Disable multi-threading of the :lighting][:threading]
[10:43][Dump our single-threaded :lighting][:run :threading]
[11:33][Re-enable multi-threading of the :lighting][:threading]
[11:50][:Run hhlightprof on the single-threaded :lighting data, with errors][:threading]
[12:23][:Run hhlightprof on the multi-threaded :lighting data, also with errors][:threading]
[12:43][Scour hhlightprof for bugs][:lighting :research]
[16:37][Make InternalLightingCore() dump the light Boxes and BoxTable after the BuildSpatialPartitionForLighting() call][:lighting]
[20:09][Dump our :lighting][:run]
[20:37][Make hhlightprof load in and validate the Boxes and BoxTable][:"file io" :lighting]
[24:52][:Run hhlightprof to find that the light boxes don't match][:lighting]
[25:10][Make hhlightprof validate the BoxTable][:lighting]
[26:16][:Run hhlightprof to find that the boxrefs don't match][:lighting]
[26:41][Note the simplicity of BuildSpatialPartitionForLighting()][:lighting :research]
[27:33][Enable the LightBoxDumpTrigger, to dump the first frame of :lighting]
[27:59][Break in to InternalLightingCore()][:lighting :run]
[28:43][Fix InternalLightingCore() to dump the correct amount of light boxes at the head][:"file io" :lighting]
[29:11][Dump our :lighting][:run]
[29:34][:Run hhlightprof to find that the light boxes still don't match, but the error / texel is much lower][:lighting]
[32:36][Fix InternalLightingCore() to dump the correct amount of light boxes after BuildSpatialPartitionForLighting()][:"file io" :lighting]
[33:07][Dump our :lighting][:run]
[33:22][:Run hhlightprof to find that the light boxes now match][:lighting]
[33:40][Check InternalLightingCore() for BoxTable dumping errors][:lighting :research]
[34:18][Step in to InternalLightingCore() and inspect the BoxTable values][:lighting :run]
[35:31][Step through hhlightprof and find the BoxTable file size to be wrong][:"file io" :lighting :run]
[38:38][Step through the DEBUGDumpData() of the BoxTable][:"file io" :lighting :run]
[39:24][Add a SetFileSize() function pointer to the platform[ref
site="Windows Dev Center"
page="SetEndOfFile function"
url=https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setendoffile][ref
site="Windows Dev Center"
page="SetFilePointerEx function"
url=https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfilepointerex]][:"data structure" :"file io" :"platform layer"]
[49:35][Make DEBUGDumpData() call SetFileSize()][:"file io" :lighting]
[50:03][Dump our :lighting][:run]
[50:31][:Run hhlightprof to find that the light refs now match][:lighting]
[50:49][Make hhlightprof record the max error / texel;][:lighting]
[51:27][Max error / texel: 0.001786][:lighting :run]
[52:56][Dump our :lighting in -O2, with a max error / texel of 0.001814][:run]
[53:09][Disable the LightBoxDumpTrigger][:lighting]
[53:34][Walk through the orphanage and dump our :lighting][:run]
[53:54][Max error / texel: 0.001195][:lighting :run]
[54:00][Walk outside and dump our :lighting][:run]
[54:12][Max error / texel: 0.002587][:lighting :run]
[54:21][Walk down to the dungeon and dump our :lighting][:run]
[54:31][Max error / texel: 0.009179, and the light boxes don't match][:lighting :run]
[55:18][Consider how to proceed][:lighting :run]
[56:25][:Run our "Instructions Per Clock" analysis of hhlightprof][:lighting :profiling]
[58:33][Make InternalLightingCore() compute 5 seconds of :lighting]
[1:00:44][Introduce ProfileRun() in hhlightprof, to run it multiple times][:lighting]
[1:02:05][:Run hhlightprof for 9 seconds, to completion][:lighting]
[1:02:23][Decrease the iterations of ProfileRun() from 60*5 to 60][:lighting]
[1:02:42][:Run hhlightprof for 3 seconds, to completion][:lighting]
[1:02:48][Increase the iterations of ProfileRun() from 60 to 60*2][:lighting]
[1:03:02][:Run our "Instructions Per Clock" analysis of hhlightprof][:lighting :profiling]
[1:03:23][Consult our "Instructions Per Clock" VTune analysis][:lighting :profiling :run]
[1:04:02][:Optimisation Opportunities: 1) Post-processing textures][:lighting :research]
[1:04:43][:Optimisation Opportunities: 2) Accessing the lighting_box in the spatial partition][:lighting :research]
[1:05:43][:Optimisation Opportunities: 3) Efficient loading of data][:lighting :research]
[1:06:07][Prepare to pack our ray casting data more concisely][:"data structure" :lighting :research]
[1:09:05][:Run hhlightprof][:lighting]
[1:09:19][Make hhlightprof record its execution time[ref
site="Windows Dev Center"
page="QueryPerformanceCounter function"
url=https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter][ref
site="Windows Dev Center"
page="QueryPerformanceFrequency function"
url=https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency]][:lighting :timing]
[1:12:55][hhlightprof total seconds elapsed: 7.173237][:lighting :optimisation :run]
[1:13:58][Introduce ray_cast_stack_entry to more concisely store the data needed by RayCast()][:"data structure" :lighting :optimisation]
[1:19:43][hhlightprof total seconds elapsed: 8.112103][:lighting :optimisation :run]
[1:20:32][Inspect the assembly of RayCast()][:asm :lighting :optimisation :run]
[1:22:08][Replace ray_cast_stack_entry with a PACK_CAST_ENTRY() for RayCast() to use][:"data structure" :lighting :optimisation]
[1:26:48][hhlightprof total seconds elapsed: 7.259090][:lighting :optimisation :run]
[1:27:44][Inspect the assembly of RayCast() to see many jmp instructions][:asm :lighting :optimisation :run]
[1:29:03][Introduce lighting_box_pack for lighting_box to contain, and RayCast() to use][:"data structure" :lighting :optimisation]
[1:33:45][hhlightprof total seconds elapsed: 7.063448][:lighting :optimisation :run]
[1:34:16][Inspect the assembly of RayCast() to still see jmp instructions][:asm :lighting :optimisation :run]
[1:36:55][Consult AnyTrue()][:optimisation :research :simd]
[1:38:41][Let RayCast() push on a box regardless of its proximity][:lighting :optimisation]
[1:39:06][hhlightprof total seconds elapsed: 8.120685][:lighting :optimisation :run]
[1:39:36][Revert RayCast() to only push on boxes within a certain distance of the ray's origin][:lighting :optimisation]
[1:39:46][Consider determining more efficiently if RayTest() should push a box][:lighting :optimisation :research]
[1:40:50][Make RayTest() determine more efficiently if it should push a box][:lighting :optimisation]
[1:42:20][hhlightprof total seconds elapsed: 6.932234][:lighting :optimisation :run]
[1:42:48][Inspect the assembly of RayCast() to still see jmp instructions][:asm :lighting :optimisation :run]
[1:44:13][Try to make RayTest() determine even more efficiently if it should push a box][:lighting :optimisation]
[1:45:20][Inspect the assembly of RayCast() to still see jmp instructions][:asm :lighting :optimisation :run]
[1:45:47][:Research cmov intrinsic generation][:language]
[1:47:03][Read the gamedev.net forum post "Dependable cmov in Visual C++"[ref
site=gamedev.net
page="Dependable cmov in Visual C++"
url=https://gamedev.net/forums/topic/501223-dependable-cmov-in-visual-c/]][:language :research]
[1:47:37][Try to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:49:16][Inspect the assembly of RayCast() to still see no cmov instructions][:asm :lighting :optimisation :run]
[1:49:27][Try again to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:50:15][Inspect the assembly of RayCast() to see one cmov instruction][:asm :lighting :optimisation :run]
[1:50:41][Try again to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:50:52][Inspect the assembly of RayCast() to see no further cmov instructions][:asm :lighting :optimisation :run]
[1:51:55][Consult the Intel Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] for mask instructions][:research :simd]
[1:53:53][Try making RayCast() determine in :SIMD if it should push a box][:lighting :optimisation]
[1:58:19][Inspect the assembly of RayCast() to still see no further cmov instructions][:asm :lighting :optimisation :run]
[1:59:44][Try changing RayCast() to store off the StackAt to write back to the BoxStack][:lighting :optimisation]
[2:00:30][Inspect the assembly of RayCast() to see cmov instructions][:asm :lighting :optimisation :run]
[2:01:02][hhlightprof total seconds elapsed: 7.496200][:lighting :optimisation :run]
[2:01:33][Toggle RayCast() back to determine in scalar if it should push a box][:lighting :optimisation]
[2:01:41][Inspect the assembly of RayCast() to see our dreaded jmp instructions][:asm :lighting :optimisation :run]
[2:01:57][Try making RayCast() set ShouldPush using a bitwise, rather than a conditional, OR][:lighting :optimisation]
[2:02:22][Inspect the assembly of RayCast() to see cmov instructions][:asm :lighting :optimisation :run]
[2:02:31][Consider our ShouldPush setting, in terms of OR'ing][:lighting :optimisation :research]
[2:04:12][hhlightprof total seconds elapsed: 7.360275][:lighting :optimisation :run]
[2:04:35][Inspect the assembly of RayCast()][:asm :lighting :optimisation :run]
[2:05:20][Try making RayCast() compute ShouldPush bitwise OR'ing and AND'ing only tInside, Mask and CloseEnough][:lighting :optimisation]
[2:07:01][Inspect the assembly of RayCast()][:asm :lighting :optimisation :run]
[2:07:23][hhlightprof total seconds elapsed: 7.307276][:lighting :optimisation :run]
[2:07:40][Note that it seems cheaper to jmp than cmov][:lighting :optimisation :research]
[2:08:27][Try making RayCast() compute ShouldPush using conditional tests][:lighting :optimisation]
[2:08:47][hhlightprof total seconds elapsed: 7.253258][:lighting :optimisation :run]
[2:08:56][Revert RayCast() to the original box pushing code][:lighting :optimisation]
[2:09:29][hhlightprof total seconds elapsed: 6.878540][:lighting :optimisation :run]
[2:09:45][Save off our jmp and cmov versions of the box pushing code in RayCast()][:lighting :optimisation]
[2:12:16][hhlightprof total seconds elapsed: 6.888590][:lighting :optimisation :run]
[2:12:31][Toggle RayCast() to the slower cmov box pushing code][:lighting :optimisation]
[2:12:41][hhlightprof total seconds elapsed: 7.079671][:lighting :optimisation :run]
[2:13:11][Q&A][:speech]
[2:13:24][Realise why the cmov version isn't faster][:lighting :optimisation :research]
[2:14:56][Q&A][:speech]
[2:15:09][@vaualbus][Q: I think the MSDN you were looking at is SetFileInformationByHandle, maybe?]
[2:16:00][@yurasniper][Q: Logical || and && are short-circuited, so they will always have a jump, unless compiler can figure out some property that will allow it to collapse the expression. So to avoid jumps one should use bitwise | and & if possible. But also there were some people saying that cmov is worse than a jump over a few instructions. I think LLVM people, but I may be wrong]
[2:16:35][@somebody_took_my_name][Q: Can't you just use CloseEnough instead of CloserCloseEnough in the ShouldPush assignment? CloseEnough is already AND'd with Mask. Oh, and the assignment of StackY is busted. There is a 0, 1 instead of 2, 3 at the end][:lighting :optimisation]
[2:16:59][Fix the StackY setting in RayCast() and toggle to the faster jmp box pushing code][:lighting :optimisation :simd]
[2:17:37][hhlightprof total seconds elapsed: 6.963080][:lighting :optimisation :run]
[2:18:20][Toggle RayCast() to the slower, :SIMD cmov box pushing code][:lighting :optimisation]
[2:18:27][hhlightprof total seconds elapsed: 7.533641][:lighting :optimisation :run]
[2:18:39][Toggle RayCast() to the faster, scalar cmov box pushing code][:lighting :optimisation]
[2:18:44][hhlightprof total seconds elapsed: 7.265596][:lighting :optimisation :run]
[2:19:00][@somebody_took_my_name][Q: Can't you just use CloseEnough instead of CloserCloseEnough in the ShouldPush assignment? CloseEnough is already AND'd with Mask][:lighting :optimisation]
[2:19:31][Remove the superfluous CloserCloseEnough from RayCast()][:lighting :optimisation]
[2:20:17][hhlightprof total seconds elapsed: 7.195799][:lighting :optimisation :run]
[2:20:29][Toggle RayCast() to the faster jmp box pushing code][:lighting :optimisation]
[2:20:45][hhlightprof total seconds elapsed: 6.874347][:lighting :optimisation :run]
[2:21:00][Make a note to try pushing boxes using a circular buffer][:lighting :memory :optimisation]
[2:22:17][@somebody_took_my_name][Q: Somebody in chat had the idea of loading the ray caster through the dll for testing. Could this remove the floating point errors in the test code?][:lighting]
[2:23:25][@tjom2000][Q: Is it possible to structure the game code so it would run reasonably fast in debug mode? Would that be worth the hassle?]
[2:23:53][@emperormetallix][Q: Will you come to the dark side and try const?][:language]
[2:24:40][@martinsmemory][Q: When you dropping that low level course?]
[2:24:51][@vaualbus][Q: Why have you uploaded the last [~hero Handmade Hero] episode on Molly Rocket's YouTube account?]
[2:25:47][@isfoo][Q: MSVC is very often not optimizing away obvious things (basically you cannot do so-called zero cost abstractions with it). For example it always does short circuiting.[ref
site="Compiler Explorer"
page=oVz2i7
url=https://godbolt.org/z/oVz2i7] Or from experience, I also remember it sometimes calling empty constructors. Why not use some reasonable compiler like clang / gcc / icc?][:language]
[2:26:58][@x1bzzr][Q: In one of the first episodes of [~hero Handmade Hero] you mentioned maybe it would be a good idea to recreate the window if WM_DESTROY was caught in the window procedure. In what sort of scenario does that happen?]
[2:27:48][A few words on the C++ spec effectively preventing the optimising compiler from using const][:language :speech]
[2:28:08][@emperormetallix][Q: These optimisations seem to be very low level. How do you know when it is worth going to this level vs zooming out to examine the overall algorithm, or memory layout, data volume, etc?][:optimisation]
[2:30:20][@vaualbus][Q: Will you be in [@naysayer88 Jon]'s talk today? It's already started! Let's go there after]
[2:30:30][@x1bzzr][Q: I guess what I'm curious about is why you shouldn't just terminate the application if you get WM_DESTROY in the window procedure]
[2:30:42][@martinsmemory][Q: How many hours do you work a week normally?]
[2:31:38][@maavelar][Q: Has your RSI problem improved from a couple of years back? If so, what helped you?][:health]
[2:32:00][@internationalizationist][Q: Is it possible to not call VirtualAlloc at all? You can create global array of bytes of whatever size you want and point Persistent / Transient storages to those arrays. Global storage should go to BSS (is it?), and the operating system must allocate enough :memory at startup time (and you already allocated a determined amount of space (so you know how much it is at compile time))]
[2:32:32][@yakvie][@handmade_hero Hey [@cmuratori Casey], I was wondering what's the current plan for [~hero Handmade Hero]? Will you be releasing pieces of this code to the public domain?]
[2:32:47][@emperormetallix][Q: What do you use instead of Ctrl / Alt keys? Vim controls?]
[2:33:38][@internationalizationist][Q: Could you stream your daily working process( of 1935, for example) sometime?]
[2:33:52][@zzyzzyxx][Here's an example where adding const with clang induced some optimization (minutes 2728)[ref
author=CppCon
publisher=YouTube
title="CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"
url=https://youtu.be/zBkNBP00wJE?t=1620]][:language]
[2:34:37][@cencetv][Is it important nowadays to still support x86?][:isa]
[2:35:06][@oisincar][Apologies if you've answered this before but would there be any chance we'd see Vulkan on [~hero Handmade Hero]?][:api :hardware]
[2:35:28][Wrap it up with a plug of the Meow the Infinite printed comic Kickstarter[ref
site=Kickstarter
page="Meow the Infinite: Book One"
url=https://www.kickstarter.com/projects/annarettberg/meow-the-infinite-book-one] and related fun videos at Molly Rocket's YouTube channel,[ref
site=YouTube
page="Molly Rocket"
url=https://www.youtube.com/c/MollyRocket] and [@naysayer88 Jon]'s stream[ref
site=Twitch
page=Naysayer88
url=https://twitch.tv/naysayer88]][:speech]
[/video]