cinera_handmade.network/cmuratori/hero/code/code433.hmml

92 lines
8.6 KiB
Plaintext

[video output=day433 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing Ray vs. AABB Intersections" vod_platform=youtube id=vohsUKjg9tU annotator=Miblo]
[0:02][Recap and set the stage for the day refining the :lighting][:speech :rendering]
[0:31][Take Visual Studio's quick feedback survey][:admin :rant]
[8:15][:Run the game to see our current :lighting situation][:rendering]
[11:59][Consider the :performance of ComputeLightPropagation()][:lighting :rendering :run]
[14:18][Track TotalPartitionLeavesUsed in ComputeLightPropagation()][:lighting :profiling :rendering]
[17:57][Consult the TotalPartitionLeavesUsed :performance figures][:lighting :rendering]
[18:22][Determine to change SplitBox() from doing k-d to quad tree partitioning][:geometry]
[20:15][k-d tree vs Quad tree][:blackboard :geometry]
[21:00][Allow SplitBox() to have 8 leaves per child, increased from 4][:geometry]
[21:32][:Run the game to see that that doesn't appreciably change our runtime][:performance]
[21:44][Track PartitionsPerLeaf in ComputeLightPropagation()][:lighting :profiling :rendering]
[22:37][:Run the game and compare the PartitionsPerLeaf with 8 and 4 leaves per child][:lighting :performance :rendering]
[23:09][Toggle off ComputeLightPropagation()][:lighting :rendering]
[23:41][:Run the game to determine that our shader is the bottleneck][:lighting :performance :rendering]
[26:00][Toggle off wglSwapIntervalExt in Win32InitOpenGL()][:hardware :platform]
[26:25][:Run the game and watch the threads view with the determination to investigate the time delay][:performance]
[28:29][There's a knock at the door][:admin]
[28:50][:afk]
[29:08][Return and consider the threads view to be misleading][:performance :run]
[31:33][Introduce HUD_TIMED_FUNCTION() called in LightingTest() to enable more direct textual :profiling of the :lighting :rendering][:"debug system"]
[36:53][:Run the game to see that it does nothing different][:performance]
[37:21][Enable DEBUGEnd() and DEBUGInit() to handle HUD_TIMED_FUNCTION(), renaming AddTooltip() to AddLine() and DrawTooltips() to DrawLineBuffer()][:"debug system" :profiling]
[57:53][Crash in DEBUGDrawElement() and inspect what's going on][:"debug system" :run]
[1:02:39][Enable DrawTreeLink() to handle the case when a tree has no element and no children, and change HasChildren() to CanHaveChildren()][:"debug system" :profiling]
[1:04:52][:Run the game to see that we do not crash, and see our new HUD element][:"debug system" :profiling]
[1:06:24][Make DEBUGEnd() display only the function name of our HUD_TIMED_FUNCTION(), expandable to contain textual :profiling information for its child functions][:"debug system"]
[1:17:27][Step in to DEBUGEnd() and inspect the HUD element][:"debug system" :profiling :run]
[1:22:41][Enable DEBUGEnd() to gather :profiling information for our HUD element][:"debug system"]
[1:31:08][:Run the game and consult our LightingTest() HUD to see that our average value is busted][:lighting :performance :rendering]
[1:32:24][Fix DEBUGEnd() to correctly compute the average cycle count][:"debug system" :profiling]
[1:33:55][:Run the game and consult the :performance of LightingTest() in the new HUD][:lighting :rendering]
[1:35:37][Tweak the leaves per child value in SplitBox(), comparing their :performance in the HUD, and consider that our spatial partitioning is not effective enough][:geometry :run]
[1:39:35][Consider making the RayCast() more performant with AABB testing][:geometry :lighting :optimisation :rendering :research]
[1:41:49][Read 'Fast, Branchless Ray / Bounding Box Intersections'[ref
site=tavianator
page="Fast, Branchless Ray / Bounding Box Intersections"
url=https://tavianator.com/fast-branchless-raybounding-box-intersections/] and 'Fast Ray / Axis-Aligned Bounding Box Overlap Tests using Ray Slopes'[ref
author="Martin Eisemann, Thorsten Grosch, Stefan Mueller, Marcus Magnor"
title="Fast Ray/Axis-Aligned Bounding Box Overlap Tests using Ray Slopes"
url=https://pdfs.semanticscholar.org/1bba/317ebf98dd67a2dea7c42924311628b6d215.pdf]][:geometry :research]
[1:45:36][Ray AABB testing][:blackboard :geometry]
[1:47:18][Continue to read these papers on ray AABB testing[ref
site=tavianator
page="Fast, Branchless Ray / Bounding Box Intersections"
url=https://tavianator.com/fast-branchless-raybounding-box-intersections/][ref
author="Martin Eisemann, Thorsten Grosch, Stefan Mueller, Marcus Magnor"
title="Fast Ray/Axis-Aligned Bounding Box Overlap Tests using Ray Slopes"
url=https://pdfs.semanticscholar.org/1bba/317ebf98dd67a2dea7c42924311628b6d215.pdf]][:geometry :research]
[1:56:23][Change RayCast() to perform AABB testing[ref
site=tavianator
page="Fast, Branchless Ray / Bounding Box Intersections"
url=https://tavianator.com/fast-branchless-raybounding-box-intersections/]][:geometry :lighting :optimisation :rendering]
[1:59:15][Computing X, Y and Z intersections in T][:blackboard :geometry]
[2:06:21][Doing this without caring which direction the normal is facing][:blackboard :geometry]
[2:09:31][Continue to enable RayCast() to perform fast AABB testing][:geometry :lighting :optimisation :rendering]
[2:13:05][Determining the tMin and tMax intersection points][:blackboard :geometry]
[2:14:12][Enable RayCast() to compute those tMin and tMax][:geometry :lighting :optimisation :rendering]
[2:15:50][The Maximum-Minimum and Minimum-Maximum][:blackboard :geometry]
[2:16:29][Set tMin and tMax and continue enabling RayCast() to perform fast AABB testing][:geometry :lighting :optimisation :rendering]
[2:27:31][Consider how to determine which side of the box we hit][:geometry :lighting :optimisation :rendering :research]
[2:30:24][Enable RayCast() to select the correct BoxSurfaceIndex][:geometry :lighting :optimisation :rendering :research]
[2:35:27][Introduce f32_4x and v3_4x versions of Min() and Max(), and the v3_4x / operator][:mathematics :optimisation]
[2:38:03][:Run the game to see that it looks pretty similar, and is faster][:geometry :lighting :performance :rendering]
[2:39:51][Introduce v3_4x * operator and make RayCast() precompute the RayD][:lighting :mathematics :optimisation :rendering]
[2:40:53][Compare the :performance of this with the previous way][:lighting :optimisation :rendering :run]
[2:41:29][Consider breaking down our :profiling of LightingTest()][:lighting :rendering :research]
[2:43:25][Consult the threads view to see that the :performance is more reasonable][:lighting :rendering]
[2:43:52][Try to stress the system][:lighting :optimisation :programming :rendering :run]
[2:45:17][Q&A][:speech]
[2:46:01][@0b0000000000000][SurfaceIndexLookupTable\[movemask(tmin == tBoxMin)\], and just choose the first set bit in the return from the movemask for which index][:optimisation]
[2:47:15][@vaualbus][Q: Would we get an improvement if we switch all the v3 to a v3_4x so we have not to load the value into the __m128 each frame?][:optimisation]
[2:47:40][@0b0000000000000][It would only be 256 entries][:optimisation]
[2:48:05][@alexkelbo][Q: Could you recap how we retain the state of the debug UI between frames?][:"debug system"]
[2:48:31][Consider the movemask instruction][:optimisation :speech]
[2:49:46][@0b0000000000000][You can collapse the 4 bytes into 2 bytes][:optimisation]
[2:52:19][@roam00010011][Q: tMin3 will always be the closest in distance to RayOrigin, since RayD that hits BoxMax will result in tBoxMax < tBoxMin, no?][:geometry]
[2:53:00][Determining the closest hit when rays can be cast in all directions][:blackboard :geometry]
[2:53:33][Sketch out RayCast() computing the RayPosition from the ray direction][:geometry :optimisation]
[2:56:08][Glimpse into the future optimising the spatial hierarchy stuff][:geometry :optimisation :speech]
[2:57:00][Consult the profiler in terms of multithreading][:run :threading]
[2:57:41][Try quadrupling PointsPerWork in ComputeLightPropagation()][:threading :optimisation]
[2:57:54][Consult the threads view to see that there is more empty space now][:run :optimisation :threading]
[2:59:40][Optimising lane usage when :threading][:blackboard :optimisation]
[3:01:54][Note that we are not measuring dead computation time, and just enjoy our :lighting][:optimisation :rendering :run :threading]
[3:03:05][Close down, with one last look at 'Fast, Branchless Ray / Bounding Box Intersections'[ref
site=tavianator
page="Fast, Branchless Ray / Bounding Box Intersections"
url=https://tavianator.com/fast-branchless-raybounding-box-intersections/]][:research]
[3:04:08][Glimpse into the future, either continuing with :optimisation or investigating the :lighting flicker][:rendering :speech]
[/video]