cinera_handmade.network/cmuratori/hero/code/code607.hmml

172 lines
16 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video output=day607 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Finishing Debugging the Grid Raycaster" vod_platform=youtube id=th7U72CBj3A annotator=Miblo]
[0:02][Recap and set the stage for the day][:speech]
[0:35][Demo the differently coloured ray walk boxes, notably the magenta leaf ones beyond the occluding box][:"debug visualisation" :lighting :run]
[3:48][Describe the ray colouring :"debug visualisation" code in GridRayCast()][:lighting :research]
[5:04][Make GridRayCast() draw the occluders within leaf boxes][:"debug visualisation" :lighting]
[8:09][See no occluders within those leaf boxes][:"debug visualisation" :lighting :run]
[9:03][Make GridRayCast() colour the occluder boxes white][:"debug visualisation" :lighting]
[9:30][See no white occluder boxes][:"debug visualisation" :lighting :run]
[9:40][Make GridRayCast() enlarge the occluder boxes][:"debug visualisation" :lighting]
[10:12][See no occluders][:"debug visualisation" :lighting :run]
[10:24][Compile in -Od]
[10:44][Try unsuccessfully to break on PushDebugBox() for an occluder][:"debug visualisation" :lighting :run]
[13:08][Fix GridBuildSpatialPartition() to modify StartIndex only after having used it to set OnePastLastIndex][:lighting]
[13:19][How to succeed in the game industry, according to [@naysayer88 Jon Blow]: Always update OnePastLastIndex before StartIndex][:speech]
[14:18][Again try unsuccessfully to break on PushDebugBox() for an occluder][:"debug visualisation" :lighting :run]
[15:23][Fix GridBuildSpatialPartition() to increment OnePastLastIndex][:lighting]
[17:34][Break on PushDebugBox() for an occluder, and inspect the values][:"debug visualisation" :lighting :run]
[18:32][See that our ray walk :"debug visualisation" now reports more correct results][:lighting :run]
[18:56][In an -O2 build, with our ray hitting, we are not drawing the ray itself][:"debug visualisation" :lighting :run]
[19:39][Prevent GridRayCast() from enlarging the occluder boxes, and make it draw the ray][:"debug visualisation" :lighting]
[21:59][See our ray][:"debug visualisation" :lighting :run]
[22:34][Determine the ProbeSampleP computation in GridRayCast() to be busted][:"debug visualisation" :lighting :research]
[23:31][Step into the origin ray drawing PushDebugLine() in GridRayCast(), to see that tRay is erroneous][:"debug visualisation" :lighting :run]
[26:27][Add a break location in the SomethingHit branch in GridRayCast()][:"debug visualisation" :lighting]
[27:01][Break into the SomethingHit branch in GridRayCast(), scrutinise the HComp shuffling code, and consider the hit detection code to be wrong][:"debug visualisation" :lighting :run :simd]
[30:50][Scrutinise GridRayCast() for hit bugs, keeping it in the back of our mind][:lighting]
[33:01][Again break into the SomethingHit branch in GridRayCast(), and scrutinise the HComp shuffling code][:"debug visualisation" :lighting :run :simd]
[36:03][Fix GridRayCast() to make _mm_extract_epi16() select the 1st 16-bit integer when setting ShuffleIndex][:lighting :simd]
[36:34][Break into the SomethingHit branch in GridRayCast(), and past the ray selection][:"debug visualisation" :lighting :run :simd]
[38:56][Compile in -O2]
[39:05][Our hit detection is now correct][:"debug visualisation" :lighting :run]
[39:19][Make GridRayCast() draw a small box at the hit point][:"debug visualisation" :lighting]
[40:17][Our hit detection is definitely close][:"debug visualisation" :lighting :run]
[40:58][Make GridRayCast() decrease the distance walked by our rays][:lighting]
[42:05][See our shorter ray travel distance][:"debug visualisation" :lighting :run]
[42:11][Make GridRayCast() slightly increase the distance walked by our rays][:lighting]
[42:27][See our better ray travel distance, and check the :performance][:"debug visualisation" :lighting :run]
[42:55][Make GridRayCast() slightly decrease the distance walked by our rays][:lighting]
[43:04][Our frame time is basically unchanged][:lighting :performance :run]
[43:17][Rerun the game to regenerate the walk table][:lighting :performance :run]
[43:46][Make GridRayCast() increase the distance walked by our rays][:lighting]
[44:00][Make GridRayCast() draw the normal of our hit surface][:"debug visualisation" :lighting]
[44:45][Check out our correct normal][:"debug visualisation" :lighting :run]
[45:02][Add a DebugGridIndex for FullCast() to use][:"debug system" :lighting]
[47:24][See the DebugGridIndex in our :UI][:lighting :run]
[47:50][Begin to make DEBUGBeginInteract() and DEBUGInteract() support editable u32 values][:"debug system"]
[51:08][Revert, and instead make the DebugGridIndex be a float, for editing][:"debug system" :lighting]
[51:50][Try editing the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[52:24][Change DEBUGInteract() to edit draggable values in X][:"debug system" :lighting]
[52:37][Try editing the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[53:21][Make FullCast() set the DebugGridIndex to 5×4677][:lighting]
[53:43][Edit the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[53:54][Make FullCast() set the DebugGridIndex to 3×4677][:lighting]
[54:08][Edit the DebugGridIndex to 14889][:"debug system" :"debug visualisation" :lighting :run]
[55:47][Make FullCast() set the DebugGridIndex to 14889 + (24×16)][:lighting]
[56:31][Check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[56:42][Make FullCast() set the DebugGridIndex to 14889 + (26×18)][:lighting]
[56:58][Try to check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[57:07][Make FullCast() set the DebugGridIndex to 14889 + (SpatialGrid.CellCount.x × SpatialGrid.CellCount.y)][:lighting]
[57:52][Check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[58:05][Augment lighting_solution with a DebugGridIndex and DebugRayIndex for the single-threaded UpdateLighting() to set][:"debug system" :lighting :threading]
[1:01:51][Try editing our DebugGridIndex and DebugRayIndex to 16317 and 42][:"debug system" :"debug visualisation" :lighting :run]
[1:04:47][Make UpdateLighting() set DebugGridIndex and DebugRayIndex to 16317 and 42][:"debug system" :lighting :threading]
[1:05:13][Confirm that we've picked the right ray][:"debug system" :"debug visualisation" :lighting :run]
[1:05:18][Consider scrutinising ComputeWalkTable() for bugs][:lighting :research]
[1:05:54][Show our erroneous hit][:"debug visualisation" :lighting :run]
[1:06:24][Fix ComputeWalkTable() to correctly set tTerminate][:lighting]
[1:07:22][Check out our correct hit][:"debug visualisation" :lighting :run]
[1:07:48][Try editing our DebugGridIndex and DebugRayIndex, and consider the ray cast to be correct][:"debug system" :"debug visualisation" :lighting :run]
[1:08:54][Consider scrutinising GridRayCast() for bugs in the TransferPPS][:lighting :research]
[1:10:46][Disable LIGHTING_USE_GRID][:lighting]
[1:11:07][Check out the old AABB ray traced :lighting][:run]
[1:11:21][Enable LIGHTING_USE_GRID][:lighting]
[1:11:31][Check out the new grid ray traced :lighting][:run]
[1:12:18][Scrutinise the TransferPPS computation in GridRayCast()][:lighting :research]
[1:19:19][Make GridRayCast() at least index into a different TransferPPS for each ray][:lighting]
[1:20:16][Our :lighting remains wrong][:run]
[1:20:50][Prevent GridRayCast() from applying the moon light, and make it always set ProbeSamplePSingle][:lighting]
[1:23:24][See immediate full-bright light][:lighting :run]
[1:25:01][Edit our DebugRayIndex and note that our speed has reduced][:"debug system" :"debug visualisation" :lighting :performance :run]
[1:25:51][Rerun the game to see that our speed begins fine, but degrades][:lighting :performance :run]
[1:30:16][Make GridRayCast() force the TransferPPS to 0][:lighting]
[1:31:15][Rerun the game to see that our speed begins and remains fine][:lighting :performance :run]
[1:33:05][Let GridRayCast() compute the TransferPPS as normal][:lighting]
[1:33:18][Rerun the game to see that our speed begins fine, but degrades][:lighting :performance :run]
[1:33:59][Make GridRayCast() add our SpecTexel DEBUG_VALUE to the :"debug system"][:lighting]
[1:35:45][Do not see a SpecTexel in the profiler][:lighting :run]
[1:36:29][Make GridRayCast() add a SpecTexel DEBUG_VALUE before disabling Debugging][:lighting]
[1:36:56][See our SpecTexel begin large and descend to 0][:lighting :run]
[1:37:57][Make GridRayCast() add each SpecTexel DEBUG_VALUE to the :"debug system"][:lighting]
[1:38:06][Our SpecTexel values begin the same but progress differently][:lighting :run]
[1:38:55][Make GridRayCast() zero-initialise TransferPPS][:lighting]
[1:39:42][Our SpecTexel values remain the same][:lighting :run]
[1:39:49][Revert the zero-initialisation][:lighting]
[1:40:06][Step through GridRayCast() and inspect the TransferPPS values][:lighting :run]
[1:41:12][Determine to disable multithreading of the :lighting][:threading :speech]
[1:41:28][~RemedyBG feature request: Stepping through the current thread][:admin :threading]
[1:42:26][Disable multithreading of the :lighting][:threading]
[1:42:52][Step through GridRayCast() and inspect the TransferPPS and SpecTexel values][:lighting :run]
[1:44:55][Assert in GridRayCast() that the SpecTexel doesn't look fishy][:lighting]
[1:45:58][See that the first few frames look good, until we hit our assertion][:lighting :run]
[1:47:46][Try decreasing the W modification from ×0.75 to ×0.01 in BuildDiffuseLightMaps()][:lighting]
[1:48:33][Consider the TransferPPS to perhaps be feeding back][:lighting :run]
[1:49:24][Note the slowness of a -O2 build][:lighting :performance :run]
[1:49:56][Try increasing the W modification from ×0.01 to ×0.1 in BuildDiffuseLightMaps()][:lighting]
[1:50:14][See that the TransferPPS is fascinatingly unstable][:lighting :run]
[1:51:54][Consult the Intel Developer Zone,[ref
site="Intel Developer Zone"
page="Reducing the Impact of Denormal Exceptions"
url=https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/tuning-performance/reducing-the-impact-of-denormal-exceptions.html][ref
site="Intel Developer Zone"
page="Setting the FTZ and DAZ Flags"
url=https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/understanding-floating-point-operations/setting-the-ftz-and-daz-flags.html#setting-the-ftz-and-daz-flags] Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] and JUCE Forum[ref
site="JUCE Forum"
page="State of the Art Denormal Prevention"
url=https://forum.juce.com/t/state-of-the-art-denormal-prevention/16802] for denormal prevention information][:research]
[1:56:40][Make WinMainCRTStartup() set the DAZ and FZ bits, for denormal prevention][:"platform layer"]
[1:58:13][The :lighting transfer still happens slowly][:performance :run]
[1:58:22][Read 10.2.3 MXCSR Control and Status Register in the Intel 64 and IA-32 Architectures Software Developer Manuals[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html][ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:research]
[2:01:16][Fully define and set our desired MXCSR Control and Status Register[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html]][:"platform layer"]
[2:07:27][Hit a "Floating-point Inexact Result" exception][:run]
[2:08:14][Set our DesiredBits to be the ControlMask][:"platform layer"]
[2:08:35][:Run successfully, but still with slow :lighting transfer][:performance]
[2:08:59][Compile in -Od]
[2:09:15][Step through WinMainCRTStartup() to see the CSR bits being set][:"platform layer" :run]
[2:10:10][Move the CSR setting mode into UpdateLighting()][:"platform layer"]
[2:10:56][Step through UpdateLighting() to see the CSR bits being set][:"platform layer" :run]
[2:11:08][Compile in -O2]
[2:11:22][The :lighting transfer still happens slowly][:performance :run]
[2:11:47][Introduce SetDefaultFPBehavior() for WinMainCRTStartup() to call][:"platform layer"]
[2:12:29][The :lighting transfer still happens slowly][:performance :run]
[2:12:52][Enable multithreading of the :lighting][:threading]
[2:13:09][Our :lighting transfer is quicker, but still feeds back][:run]
[2:13:29][Remove the LooksFishy() assertions from GridRayCast()][:lighting]
[2:13:42][Try increasing the W modification from ×0.1 to ×0.75 in BuildDiffuseLightMaps()][:lighting]
[2:14:20][Our speed begins fine, but degrades slightly][:lighting :performance :run]
[2:15:27][Make UpdateLighting() call SetDefaultFPBehavior() every frame][:"platform layer"]
[2:15:59][Our speed begins fine, but still degrades slightly][:lighting :performance :run]
[2:17:05][@xxthebigfoxx][@handmade_hero The control register is per thread, right? Are you setting it for the worker threads?][:"platform layer" :threading]
[2:17:23][Make ComputeLightPropagationWork() call SetDefaultFPBehavior() for each thread][:"platform layer" :threading]
[2:17:55][Excellent catch, @xxthebigfoxx][:speech]
[2:18:05][Our speed begins and remains fine, confirming that our problem was due to denormals][:lighting :performance :run]
[2:19:37][@krrsplat][Go crazy with the :camera to see if it causes shenanigans][:performance]
[2:20:22][Q&A][:speech]
[2:20:57][@imaginaryfreedom][Q: I'm building my first BVH for my raytracer. Do the previous episodes where you work on a k-d tree go into how to traverse the tree using :SIMD without slamming your face into the wall nose-first? I'm having a hard time understanding how to traverse the structure effectively without throwing all the :performance benefits of SIMD out the window][:"data structure" :lighting]
[2:23:53][@mindmark42][Q: Can you explain more why denormals could cause :performance to degrade?]
[2:26:31][@tonewexperiences][Q: "The volatile portion consists of the six status flags, in MXCSR\[0:5\], while the rest of the register, MXCSR\[6:15\], is considered nonvolatile." By the way, so much for flush to zero once per thread being enough][:"platform layer" :threading]
[2:26:51][@sagian2005][Q: Could you put the code back up where you set the control register bits?][:"platform layer"]
[2:27:29][@relvet][Q: Would it be worth special-casing the occurrences when a ray runs parallel to the grid of the walk table? Or would the special-casing cost as much as / more than the potential time save?][:lighting]
[2:28:13][@mindmark42][Q: Do you think GPUs would have the denormalized issue?]
[2:31:12][@internationalizationist][Q: Hi, I came from Day 523 (Introduction to Git), and that's just… The terminology that git uses in such commands like "git bless --by-gnome --assume-mutable-gnomes" and any other command piss me off and makes me feel like git was developed by some Star Wars fan or something and now it's Industrial Standard! Am I doing something wrong? Is it okay to name things in computer science like "gnomes"? Is there any tutorial that explains git in an accessible format? Or do I have to bite the bullet and live through that?][:vcs]
[2:33:41][@jimdopango][So [~torvalds Linus Torvalds] is not a good programmer?]
[2:35:36][@igorovich][Isn't Linus pronounced lainus?]
[2:36:52][@frozenzerker][I guess it's more Swedish than Finnish?]
[2:38:05][@internationalizationist][Q: So it was a joke‽][:vcs]
[2:41:29][@jimdopango][@cmuratori Not wasteful if your job is to integrate patches from hundreds of branches from thousands of developers, dayin dayout. I'd guess 50% of a kernel maintainer's workload is dealing with source control and not programming][:vcs]
[2:45:18][@lanquemar][Which :VCS do you use?]
[2:46:34][Call it, with a glimpse into the future debugging the :lighting transfer][:speech]
[/video]