[video output=day611 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Examining the CPU Voxel Sampling" vod_platform=youtube id=MOqV_5x3qkg annotator=Miblo]
[0:00][Recap and set the stage for the day][:speech]
[0:31][Determine to blur the :lighting samples across voxels and reducing the flicker, after speeding up the :performance of the grid ray tracer][:run :sampling]
[3:02][Prepare to enable hhlightprof to capture a grid-based ray cast run][:lighting :performance]
[6:56][Update InternalLightingCore() to dump out the new source_lightboxes][:lighting :performance]
[9:47][Compile in hhlightprof and update it to work with our grid ray caster][:lighting :performance]
[16:27][Break in to InternalLightingCore()][:lighting :performance :run]
[18:24][Reload, to see the walk table break][:"hot reloading" :lighting :run]
[18:48][Investigate the walk table breakage on :"hot reloading"][:lighting :research]
[20:52][:Run in -Od, hot-reload and see the walk table break][:"hot reloading" :lighting]
[21:32][Refamiliarise ourselves with the walk table structure and code][:lighting :research]
[22:42][Break in to InternalLightingCore() and inspect the LightSamplingWalkTable][:"hot reloading" :lighting :run]
[25:28][Break in to GridRayCast() and inspect the WalkTable usage][:"hot reloading" :lighting :run]
[27:02][Fix our walk table breakage by making ComputeWalkTable() block copy the SampleDirections, and set a fresh RayD and WalkTableOffset][:"hot reloading" :lighting]
[33:32][:Run in -Od, hot-reload and see the walk table remain intact][:"hot reloading" :lighting]
[34:08][:Run in -O2, hot-reload and see the walk table remain intact][:"hot reloading" :lighting]
[34:42][Enable then disable the LightBoxDumpTrigger, to dump the :lighting][:run]
[35:17][Check out our :lighting dump files, noting the large size of the source_lighting.dump][:admin]
[36:44][Hit a read access violation in GetAlignmentOffset() from hhlightprof][:lighting :memory :run]
[37:14][Make ProfileRun() push the SampleDirectionTable onto the TempArena][:lighting :memory]
[37:34][Hit a write access violation in PushDebugLine()][:"debug system" :lighting :run]
[38:35][:Run an -O2 build of hhlightprof][:lighting]
[38:59][hhlightprof total seconds elapsed: 7.646482][:lighting :performance :run]
[40:03][Disable LIGHTING_USE_GRID][:lighting]
[40:20][hhlightprof total seconds elapsed: 7.287836][:lighting :performance :run]
[40:42][Save off our timings, enable LIGHTING_USE_GRID and make GridRayCast() return early if not Debugging][:lighting]
[42:34][A few words on replacing :language / compiler "Errors" and "Warnings" with "I could not compile this" and "Things I noticed about the code"][:rant :speech]
[45:29][hhlightprof total seconds elapsed (without ray casting): 1.246065][:lighting :performance :run]
[46:44][Let GridRayCast() do its work][:lighting]
[46:54][hhlightprof total seconds elapsed: 7.706672][:lighting :performance :run]
[47:10][Reorganise GridRayCast() to decrement the CostMetric after the loops, and comment out debugging code][:lighting]
[50:18][hhlightprof total seconds elapsed: 7.479611][:lighting :performance :run]
[50:46][Note why the AABB testing loop in GridRayCast() does not tend to use all four :SIMD lanes, and the simplicity of ComputeWalkTable()][:lighting :research]
[52:53][Determine to decouple the spatial and :lighting voxel grids][:lighting :research]
[53:55][Try decreasing the CostMetric from 16 to 4 in GridRayCast()][:lighting]
[54:28][hhlightprof total seconds elapsed: 6.321697][:lighting :performance :run]
[55:00][Try decreasing the CostMetric from 4 to 0 in GridRayCast()][:lighting]
[55:13][hhlightprof total seconds elapsed: 4.679288][:lighting :performance :run]
[55:34][Interpret our 4.679288 seconds :performance when casting no rays][:lighting :research]
[58:16][Alignment of Atlas Cells][:blackboard :lighting :memory :performance :threading]
[1:06:48][Determine to remove the mutex from our atlas traversal code][:lighting :performance :research :threading]
[1:10:18][Change ComputeLightPropagationWork() to distribute the :lighting computation along the Y axis][:performance :threading]
[1:12:00][hhlightprof total seconds elapsed: 4.422329][:lighting :performance :run]
[1:12:41][Consider compacting the lighting atlases][:lighting :performance :research]
[1:14:44][Consider the :performance of ComputeVoxelIrradianceAt()][:lighting :research]
[2:07:23][Let GridRayCast() use the original CostMetric][:lighting]
[2:07:48][We are back to normal][:lighting :run]
[2:07:53][Q&A][:speech]
[2:08:15][@sagian2005][Q: Would things go better if you started with U, V and W each 4 wide?][:lighting :simd]
[2:09:36][@cirdanvalen][Q: Could you not pad by 8 bytes to fix the overlap?][:threading]
[2:10:03][@mindmark42][Q: The L0 determines how many cache lines the CPU can hold, right?][:hardware :memory]
[2:10:43][@rooctag][Q: Would you ever consider just going over explaining how an operations takes x amount of CPU ops? Or will that be in the Intro to C?][:hardware :performance]
[2:10:59][@ali4410][Q: Hi, do you recommend learning vi keybindings, emacs keybindings, or neither?]
[2:12:00][@mindmark42][Q: The :memory caches on the CPU][:hardware]
[2:22:50][@mindmark42][Q: Yes, that answers my question. I just was just off by one index][:hardware :memory]
[2:23:15][@billdstrong][Q: How much time are you expecting to shave off from making this routine wide? About 2 / 3 or so? What are your expectations for your grid walk optimization? Are you trying to get under 4 seconds or lower?][:lighting :optimisation :performance :simd]
[2:23:51][@mindmark42][Q: Yeah, I mentioned those L caches. I was wondering how to determine how many cache lines a core holds][:memory]