cinera_handmade.network/cmuratori/hero/code/code607.hmml

172 lines
16 KiB
Plaintext
Raw Permalink Normal View History

2020-06-04 21:14:31 +00:00
[video output=day607 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Finishing Debugging the Grid Raycaster" vod_platform=youtube id=th7U72CBj3A annotator=Miblo]
[0:02][Recap and set the stage for the day][:speech]
[0:35][Demo the differently coloured ray walk boxes, notably the magenta leaf ones beyond the occluding box][:"debug visualisation" :lighting :run]
[3:48][Describe the ray colouring :"debug visualisation" code in GridRayCast()][:lighting :research]
[5:04][Make GridRayCast() draw the occluders within leaf boxes][:"debug visualisation" :lighting]
[8:09][See no occluders within those leaf boxes][:"debug visualisation" :lighting :run]
[9:03][Make GridRayCast() colour the occluder boxes white][:"debug visualisation" :lighting]
[9:30][See no white occluder boxes][:"debug visualisation" :lighting :run]
[9:40][Make GridRayCast() enlarge the occluder boxes][:"debug visualisation" :lighting]
[10:12][See no occluders][:"debug visualisation" :lighting :run]
[10:24][Compile in -Od]
[10:44][Try unsuccessfully to break on PushDebugBox() for an occluder][:"debug visualisation" :lighting :run]
[13:08][Fix GridBuildSpatialPartition() to modify StartIndex only after having used it to set OnePastLastIndex][:lighting]
[13:19][How to succeed in the game industry, according to [@naysayer88 Jon Blow]: Always update OnePastLastIndex before StartIndex][:speech]
[14:18][Again try unsuccessfully to break on PushDebugBox() for an occluder][:"debug visualisation" :lighting :run]
[15:23][Fix GridBuildSpatialPartition() to increment OnePastLastIndex][:lighting]
[17:34][Break on PushDebugBox() for an occluder, and inspect the values][:"debug visualisation" :lighting :run]
[18:32][See that our ray walk :"debug visualisation" now reports more correct results][:lighting :run]
[18:56][In an -O2 build, with our ray hitting, we are not drawing the ray itself][:"debug visualisation" :lighting :run]
[19:39][Prevent GridRayCast() from enlarging the occluder boxes, and make it draw the ray][:"debug visualisation" :lighting]
[21:59][See our ray][:"debug visualisation" :lighting :run]
[22:34][Determine the ProbeSampleP computation in GridRayCast() to be busted][:"debug visualisation" :lighting :research]
[23:31][Step into the origin ray drawing PushDebugLine() in GridRayCast(), to see that tRay is erroneous][:"debug visualisation" :lighting :run]
[26:27][Add a break location in the SomethingHit branch in GridRayCast()][:"debug visualisation" :lighting]
[27:01][Break into the SomethingHit branch in GridRayCast(), scrutinise the HComp shuffling code, and consider the hit detection code to be wrong][:"debug visualisation" :lighting :run :simd]
[30:50][Scrutinise GridRayCast() for hit bugs, keeping it in the back of our mind][:lighting]
[33:01][Again break into the SomethingHit branch in GridRayCast(), and scrutinise the HComp shuffling code][:"debug visualisation" :lighting :run :simd]
[36:03][Fix GridRayCast() to make _mm_extract_epi16() select the 1st 16-bit integer when setting ShuffleIndex][:lighting :simd]
[36:34][Break into the SomethingHit branch in GridRayCast(), and past the ray selection][:"debug visualisation" :lighting :run :simd]
[38:56][Compile in -O2]
[39:05][Our hit detection is now correct][:"debug visualisation" :lighting :run]
[39:19][Make GridRayCast() draw a small box at the hit point][:"debug visualisation" :lighting]
[40:17][Our hit detection is definitely close][:"debug visualisation" :lighting :run]
[40:58][Make GridRayCast() decrease the distance walked by our rays][:lighting]
[42:05][See our shorter ray travel distance][:"debug visualisation" :lighting :run]
[42:11][Make GridRayCast() slightly increase the distance walked by our rays][:lighting]
[42:27][See our better ray travel distance, and check the :performance][:"debug visualisation" :lighting :run]
[42:55][Make GridRayCast() slightly decrease the distance walked by our rays][:lighting]
[43:04][Our frame time is basically unchanged][:lighting :performance :run]
[43:17][Rerun the game to regenerate the walk table][:lighting :performance :run]
[43:46][Make GridRayCast() increase the distance walked by our rays][:lighting]
[44:00][Make GridRayCast() draw the normal of our hit surface][:"debug visualisation" :lighting]
[44:45][Check out our correct normal][:"debug visualisation" :lighting :run]
[45:02][Add a DebugGridIndex for FullCast() to use][:"debug system" :lighting]
[47:24][See the DebugGridIndex in our :UI][:lighting :run]
[47:50][Begin to make DEBUGBeginInteract() and DEBUGInteract() support editable u32 values][:"debug system"]
[51:08][Revert, and instead make the DebugGridIndex be a float, for editing][:"debug system" :lighting]
[51:50][Try editing the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[52:24][Change DEBUGInteract() to edit draggable values in X][:"debug system" :lighting]
[52:37][Try editing the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[53:21][Make FullCast() set the DebugGridIndex to 5×4677][:lighting]
[53:43][Edit the DebugGridIndex][:"debug system" :"debug visualisation" :lighting :run]
[53:54][Make FullCast() set the DebugGridIndex to 3×4677][:lighting]
[54:08][Edit the DebugGridIndex to 14889][:"debug system" :"debug visualisation" :lighting :run]
[55:47][Make FullCast() set the DebugGridIndex to 14889 + (24×16)][:lighting]
[56:31][Check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[56:42][Make FullCast() set the DebugGridIndex to 14889 + (26×18)][:lighting]
[56:58][Try to check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[57:07][Make FullCast() set the DebugGridIndex to 14889 + (SpatialGrid.CellCount.x × SpatialGrid.CellCount.y)][:lighting]
[57:52][Check out our ray][:"debug system" :"debug visualisation" :lighting :run]
[58:05][Augment lighting_solution with a DebugGridIndex and DebugRayIndex for the single-threaded UpdateLighting() to set][:"debug system" :lighting :threading]
[1:01:51][Try editing our DebugGridIndex and DebugRayIndex to 16317 and 42][:"debug system" :"debug visualisation" :lighting :run]
[1:04:47][Make UpdateLighting() set DebugGridIndex and DebugRayIndex to 16317 and 42][:"debug system" :lighting :threading]
[1:05:13][Confirm that we've picked the right ray][:"debug system" :"debug visualisation" :lighting :run]
[1:05:18][Consider scrutinising ComputeWalkTable() for bugs][:lighting :research]
[1:05:54][Show our erroneous hit][:"debug visualisation" :lighting :run]
[1:06:24][Fix ComputeWalkTable() to correctly set tTerminate][:lighting]
[1:07:22][Check out our correct hit][:"debug visualisation" :lighting :run]
[1:07:48][Try editing our DebugGridIndex and DebugRayIndex, and consider the ray cast to be correct][:"debug system" :"debug visualisation" :lighting :run]
[1:08:54][Consider scrutinising GridRayCast() for bugs in the TransferPPS][:lighting :research]
[1:10:46][Disable LIGHTING_USE_GRID][:lighting]
[1:11:07][Check out the old AABB ray traced :lighting][:run]
[1:11:21][Enable LIGHTING_USE_GRID][:lighting]
[1:11:31][Check out the new grid ray traced :lighting][:run]
[1:12:18][Scrutinise the TransferPPS computation in GridRayCast()][:lighting :research]
[1:19:19][Make GridRayCast() at least index into a different TransferPPS for each ray][:lighting]
[1:20:16][Our :lighting remains wrong][:run]
[1:20:50][Prevent GridRayCast() from applying the moon light, and make it always set ProbeSamplePSingle][:lighting]
[1:23:24][See immediate full-bright light][:lighting :run]
[1:25:01][Edit our DebugRayIndex and note that our speed has reduced][:"debug system" :"debug visualisation" :lighting :performance :run]
[1:25:51][Rerun the game to see that our speed begins fine, but degrades][:lighting :performance :run]
[1:30:16][Make GridRayCast() force the TransferPPS to 0][:lighting]
[1:31:15][Rerun the game to see that our speed begins and remains fine][:lighting :performance :run]
[1:33:05][Let GridRayCast() compute the TransferPPS as normal][:lighting]
[1:33:18][Rerun the game to see that our speed begins fine, but degrades][:lighting :performance :run]
[1:33:59][Make GridRayCast() add our SpecTexel DEBUG_VALUE to the :"debug system"][:lighting]
[1:35:45][Do not see a SpecTexel in the profiler][:lighting :run]
[1:36:29][Make GridRayCast() add a SpecTexel DEBUG_VALUE before disabling Debugging][:lighting]
[1:36:56][See our SpecTexel begin large and descend to 0][:lighting :run]
[1:37:57][Make GridRayCast() add each SpecTexel DEBUG_VALUE to the :"debug system"][:lighting]
[1:38:06][Our SpecTexel values begin the same but progress differently][:lighting :run]
[1:38:55][Make GridRayCast() zero-initialise TransferPPS][:lighting]
[1:39:42][Our SpecTexel values remain the same][:lighting :run]
[1:39:49][Revert the zero-initialisation][:lighting]
[1:40:06][Step through GridRayCast() and inspect the TransferPPS values][:lighting :run]
[1:41:12][Determine to disable multithreading of the :lighting][:threading :speech]
[1:41:28][~RemedyBG feature request: Stepping through the current thread][:admin :threading]
[1:42:26][Disable multithreading of the :lighting][:threading]
[1:42:52][Step through GridRayCast() and inspect the TransferPPS and SpecTexel values][:lighting :run]
[1:44:55][Assert in GridRayCast() that the SpecTexel doesn't look fishy][:lighting]
[1:45:58][See that the first few frames look good, until we hit our assertion][:lighting :run]
[1:47:46][Try decreasing the W modification from ×0.75 to ×0.01 in BuildDiffuseLightMaps()][:lighting]
[1:48:33][Consider the TransferPPS to perhaps be feeding back][:lighting :run]
[1:49:24][Note the slowness of a -O2 build][:lighting :performance :run]
[1:49:56][Try increasing the W modification from ×0.01 to ×0.1 in BuildDiffuseLightMaps()][:lighting]
[1:50:14][See that the TransferPPS is fascinatingly unstable][:lighting :run]
[1:51:54][Consult the Intel Developer Zone,[ref
site="Intel Developer Zone"
page="Reducing the Impact of Denormal Exceptions"
url=https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/tuning-performance/reducing-the-impact-of-denormal-exceptions.html][ref
site="Intel Developer Zone"
page="Setting the FTZ and DAZ Flags"
url=https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/understanding-floating-point-operations/setting-the-ftz-and-daz-flags.html#setting-the-ftz-and-daz-flags] Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] and JUCE Forum[ref
site="JUCE Forum"
page="State of the Art Denormal Prevention"
url=https://forum.juce.com/t/state-of-the-art-denormal-prevention/16802] for denormal prevention information][:research]
[1:56:40][Make WinMainCRTStartup() set the DAZ and FZ bits, for denormal prevention][:"platform layer"]
[1:58:13][The :lighting transfer still happens slowly][:performance :run]
[1:58:22][Read 10.2.3 MXCSR Control and Status Register in the Intel 64 and IA-32 Architectures Software Developer Manuals[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html][ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:research]
[2:01:16][Fully define and set our desired MXCSR Control and Status Register[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html]][:"platform layer"]
[2:07:27][Hit a "Floating-point Inexact Result" exception][:run]
[2:08:14][Set our DesiredBits to be the ControlMask][:"platform layer"]
[2:08:35][:Run successfully, but still with slow :lighting transfer][:performance]
[2:08:59][Compile in -Od]
[2:09:15][Step through WinMainCRTStartup() to see the CSR bits being set][:"platform layer" :run]
[2:10:10][Move the CSR setting mode into UpdateLighting()][:"platform layer"]
[2:10:56][Step through UpdateLighting() to see the CSR bits being set][:"platform layer" :run]
[2:11:08][Compile in -O2]
[2:11:22][The :lighting transfer still happens slowly][:performance :run]
[2:11:47][Introduce SetDefaultFPBehavior() for WinMainCRTStartup() to call][:"platform layer"]
[2:12:29][The :lighting transfer still happens slowly][:performance :run]
[2:12:52][Enable multithreading of the :lighting][:threading]
[2:13:09][Our :lighting transfer is quicker, but still feeds back][:run]
[2:13:29][Remove the LooksFishy() assertions from GridRayCast()][:lighting]
[2:13:42][Try increasing the W modification from ×0.1 to ×0.75 in BuildDiffuseLightMaps()][:lighting]
[2:14:20][Our speed begins fine, but degrades slightly][:lighting :performance :run]
[2:15:27][Make UpdateLighting() call SetDefaultFPBehavior() every frame][:"platform layer"]
[2:15:59][Our speed begins fine, but still degrades slightly][:lighting :performance :run]
[2:17:05][@xxthebigfoxx][@handmade_hero The control register is per thread, right? Are you setting it for the worker threads?][:"platform layer" :threading]
[2:17:23][Make ComputeLightPropagationWork() call SetDefaultFPBehavior() for each thread][:"platform layer" :threading]
[2:17:55][Excellent catch, @xxthebigfoxx][:speech]
[2:18:05][Our speed begins and remains fine, confirming that our problem was due to denormals][:lighting :performance :run]
[2:19:37][@krrsplat][Go crazy with the :camera to see if it causes shenanigans][:performance]
[2:20:22][Q&A][:speech]
[2:20:57][@imaginaryfreedom][Q: I'm building my first BVH for my raytracer. Do the previous episodes where you work on a k-d tree go into how to traverse the tree using :SIMD without slamming your face into the wall nose-first? I'm having a hard time understanding how to traverse the structure effectively without throwing all the :performance benefits of SIMD out the window][:"data structure" :lighting]
[2:23:53][@mindmark42][Q: Can you explain more why denormals could cause :performance to degrade?]
[2:26:31][@tonewexperiences][Q: "The volatile portion consists of the six status flags, in MXCSR\[0:5\], while the rest of the register, MXCSR\[6:15\], is considered nonvolatile." By the way, so much for flush to zero once per thread being enough][:"platform layer" :threading]
[2:26:51][@sagian2005][Q: Could you put the code back up where you set the control register bits?][:"platform layer"]
[2:27:29][@relvet][Q: Would it be worth special-casing the occurrences when a ray runs parallel to the grid of the walk table? Or would the special-casing cost as much as / more than the potential time save?][:lighting]
[2:28:13][@mindmark42][Q: Do you think GPUs would have the denormalized issue?]
[2:31:12][@internationalizationist][Q: Hi, I came from Day 523 (Introduction to Git), and that's just… The terminology that git uses in such commands like "git bless --by-gnome --assume-mutable-gnomes" and any other command piss me off and makes me feel like git was developed by some Star Wars fan or something and now it's Industrial Standard! Am I doing something wrong? Is it okay to name things in computer science like "gnomes"? Is there any tutorial that explains git in an accessible format? Or do I have to bite the bullet and live through that?][:vcs]
[2:33:41][@jimdopango][So [~torvalds Linus Torvalds] is not a good programmer?]
[2:35:36][@igorovich][Isn't Linus pronounced lainus?]
[2:36:52][@frozenzerker][I guess it's more Swedish than Finnish?]
[2:38:05][@internationalizationist][Q: So it was a joke‽][:vcs]
[2:41:29][@jimdopango][@cmuratori Not wasteful if your job is to integrate patches from hundreds of branches from thousands of developers, dayin dayout. I'd guess 50% of a kernel maintainer's workload is dealing with source control and not programming][:vcs]
[2:45:18][@lanquemar][Which :VCS do you use?]
[2:46:34][Call it, with a glimpse into the future debugging the :lighting transfer][:speech]
[/video]