From 68190971ed92efe2aecd6ba0a84e30f222e68333 Mon Sep 17 00:00:00 2001 From: Matt Mascarenhas Date: Mon, 22 Jun 2020 15:50:15 +0100 Subject: [PATCH] Index hero/code611 --- cmuratori/hero/code/code611.hmml | 98 ++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 cmuratori/hero/code/code611.hmml diff --git a/cmuratori/hero/code/code611.hmml b/cmuratori/hero/code/code611.hmml new file mode 100644 index 0000000..aea038a --- /dev/null +++ b/cmuratori/hero/code/code611.hmml @@ -0,0 +1,98 @@ +[video output=day611 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Examining the CPU Voxel Sampling" vod_platform=youtube id=MOqV_5x3qkg annotator=Miblo] +[0:00][Recap and set the stage for the day][:speech] +[0:31][Determine to blur the :lighting samples across voxels and reducing the flicker, after speeding up the :performance of the grid ray tracer][:run :sampling] +[3:02][Prepare to enable hhlightprof to capture a grid-based ray cast run][:lighting :performance] +[6:56][Update InternalLightingCore() to dump out the new source_lightboxes][:lighting :performance] +[9:47][Compile in hhlightprof and update it to work with our grid ray caster][:lighting :performance] +[16:27][Break in to InternalLightingCore()][:lighting :performance :run] +[18:24][Reload, to see the walk table break][:"hot reloading" :lighting :run] +[18:48][Investigate the walk table breakage on :"hot reloading"][:lighting :research] +[20:52][:Run in -Od, hot-reload and see the walk table break][:"hot reloading" :lighting] +[21:32][Refamiliarise ourselves with the walk table structure and code][:lighting :research] +[22:42][Break in to InternalLightingCore() and inspect the LightSamplingWalkTable][:"hot reloading" :lighting :run] +[25:28][Break in to GridRayCast() and inspect the WalkTable usage][:"hot reloading" :lighting :run] +[27:02][Fix our walk table breakage by making ComputeWalkTable() block copy the SampleDirections, and set a fresh RayD and WalkTableOffset][:"hot reloading" :lighting] +[33:32][:Run in -Od, hot-reload and see the walk table remain intact][:"hot reloading" :lighting] +[34:08][:Run in -O2, hot-reload and see the walk table remain intact][:"hot reloading" :lighting] +[34:42][Enable then disable the LightBoxDumpTrigger, to dump the :lighting][:run] +[35:17][Check out our :lighting dump files, noting the large size of the source_lighting.dump][:admin] +[36:44][Hit a read access violation in GetAlignmentOffset() from hhlightprof][:lighting :memory :run] +[37:14][Make ProfileRun() push the SampleDirectionTable onto the TempArena][:lighting :memory] +[37:34][Hit a write access violation in PushDebugLine()][:"debug system" :lighting :run] +[37:44][Make ProfileRun() disable UpdateDebugLines][:"debug system" :lighting] +[38:03][:Run hhlightprof successfully][:lighting] +[38:35][:Run an -O2 build of hhlightprof][:lighting] +[38:59][hhlightprof total seconds elapsed: 7.646482][:lighting :performance :run] +[40:03][Disable LIGHTING_USE_GRID][:lighting] +[40:20][hhlightprof total seconds elapsed: 7.287836][:lighting :performance :run] +[40:42][Save off our timings, enable LIGHTING_USE_GRID and make GridRayCast() return early if not Debugging][:lighting] +[42:34][A few words on replacing :language / compiler "Errors" and "Warnings" with "I could not compile this" and "Things I noticed about the code"][:rant :speech] +[45:29][hhlightprof total seconds elapsed (without ray casting): 1.246065][:lighting :performance :run] +[46:44][Let GridRayCast() do its work][:lighting] +[46:54][hhlightprof total seconds elapsed: 7.706672][:lighting :performance :run] +[47:10][Reorganise GridRayCast() to decrement the CostMetric after the loops, and comment out debugging code][:lighting] +[50:18][hhlightprof total seconds elapsed: 7.479611][:lighting :performance :run] +[50:46][Note why the AABB testing loop in GridRayCast() does not tend to use all four :SIMD lanes, and the simplicity of ComputeWalkTable()][:lighting :research] +[52:53][Determine to decouple the spatial and :lighting voxel grids][:lighting :research] +[53:55][Try decreasing the CostMetric from 16 to 4 in GridRayCast()][:lighting] +[54:28][hhlightprof total seconds elapsed: 6.321697][:lighting :performance :run] +[55:00][Try decreasing the CostMetric from 4 to 0 in GridRayCast()][:lighting] +[55:13][hhlightprof total seconds elapsed: 4.679288][:lighting :performance :run] +[55:34][Interpret our 4.679288 seconds :performance when casting no rays][:lighting :research] +[58:16][Alignment of Atlas Cells][:blackboard :lighting :memory :performance :threading] +[1:06:48][Determine to remove the mutex from our atlas traversal code][:lighting :performance :research :threading] +[1:10:18][Change ComputeLightPropagationWork() to distribute the :lighting computation along the Y axis][:performance :threading] +[1:12:00][hhlightprof total seconds elapsed: 4.422329][:lighting :performance :run] +[1:12:41][Consider compacting the lighting atlases][:lighting :performance :research] +[1:14:44][Consider the :performance of ComputeVoxelIrradianceAt()][:lighting :research] +[1:15:41][Try greatly simplifying ComputeVoxelIrradianceAt()][:lighting :performance] +[1:16:46][hhlightprof total seconds elapsed: 3.160531][:lighting :performance :run] +[1:16:59][Try further simplifying ComputeVoxelIrradianceAt()][:lighting :performance] +[1:17:33][hhlightprof total seconds elapsed: 1.815950][:lighting :performance :run] +[1:17:50][Determine to speed up ComputeVoxelIrradianceAt()][:lighting :performance :research] +[1:18:22][Instrument ComputeVoxelIrradianceAt() to more specifically gauge its :performance][:lighting] +[1:21:48][hhlightprof total seconds elapsed: 2.806293][:lighting :performance :run] +[1:22:00][Break in to ComputeVoxelIrradianceAt() and inspect the assembly][:asm :lighting :performance :run] +[1:23:31][Determine to optimise out some of the math in ComputeVoxelIrradianceAt()][:lighting :optimisation :research] +[1:26:46][Make ComputeVoxelIrradianceAt() operate wide[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref + site=uops.info + url=https://uops.info/table.html]][:lighting :optimisation :simd] +[1:49:07][Dependents, and Cycle Ordering][:hardware :performance] +[1:52:32][Finish making ComputeVoxelIrradianceAt() operate wide[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref + site=uops.info + url=https://uops.info/table.html]][:lighting :optimisation :simd] +[2:07:23][Let GridRayCast() use the original CostMetric][:lighting] +[2:07:48][We are back to normal][:lighting :run] +[2:07:53][Q&A][:speech] +[2:08:15][@sagian2005][Q: Would things go better if you started with U, V and W each 4 wide?][:lighting :simd] +[2:09:36][@cirdanvalen][Q: Could you not pad by 8 bytes to fix the overlap?][:threading] +[2:10:03][@mindmark42][Q: The L0 determines how many cache lines the CPU can hold, right?][:hardware :memory] +[2:10:43][@rooctag][Q: Would you ever consider just going over explaining how an operations takes x amount of CPU ops? Or will that be in the Intro to C?][:hardware :performance] +[2:10:59][@ali4410][Q: Hi, do you recommend learning vi keybindings, emacs keybindings, or neither?] +[2:12:00][@mindmark42][Q: The :memory caches on the CPU][:hardware] +[2:13:17][:Hardware Caches[ref + site=WikiChip + page="Skylake (client) - Microarchitectures - Intel" + url=https://en.wikichip.org/wiki/intel/microarchitectures/skylake][ref + site=TechPowerUp + page="Intel \"Skylake\" Die Layout Detailed" + url=https://www.techpowerup.com/215333/intel-skylake-die-layout-detailed]][:memory :research] +[2:22:50][@mindmark42][Q: Yes, that answers my question. I just was just off by one index][:hardware :memory] +[2:23:15][@billdstrong][Q: How much time are you expecting to shave off from making this routine wide? About 2 / 3 or so? What are your expectations for your grid walk optimization? Are you trying to get under 4 seconds or lower?][:lighting :optimisation :performance :simd] +[2:23:51][@mindmark42][Q: Yeah, I mentioned those L caches. I was wondering how to determine how many cache lines a core holds][:memory] +[2:24:23][8-Way Caches[ref + site=WikiChip + page="Skylake (client) - Microarchitectures - Intel" + url=https://en.wikichip.org/wiki/intel/microarchitectures/skylake][ref + author="Katrina Yim" + title="Cache Associativity" + publisher="University of California, Berkeley" + url=http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf]][:hardware :memory :research] +[2:29:01][Thanks, everyone][:speech] +[/video]