99 lines
8.0 KiB
Plaintext
99 lines
8.0 KiB
Plaintext
[video output=day611 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Examining the CPU Voxel Sampling" vod_platform=youtube id=MOqV_5x3qkg annotator=Miblo]
|
|
[0:00][Recap and set the stage for the day][:speech]
|
|
[0:31][Determine to blur the :lighting samples across voxels and reducing the flicker, after speeding up the :performance of the grid ray tracer][:run :sampling]
|
|
[3:02][Prepare to enable hhlightprof to capture a grid-based ray cast run][:lighting :performance]
|
|
[6:56][Update InternalLightingCore() to dump out the new source_lightboxes][:lighting :performance]
|
|
[9:47][Compile in hhlightprof and update it to work with our grid ray caster][:lighting :performance]
|
|
[16:27][Break in to InternalLightingCore()][:lighting :performance :run]
|
|
[18:24][Reload, to see the walk table break][:"hot reloading" :lighting :run]
|
|
[18:48][Investigate the walk table breakage on :"hot reloading"][:lighting :research]
|
|
[20:52][:Run in -Od, hot-reload and see the walk table break][:"hot reloading" :lighting]
|
|
[21:32][Refamiliarise ourselves with the walk table structure and code][:lighting :research]
|
|
[22:42][Break in to InternalLightingCore() and inspect the LightSamplingWalkTable][:"hot reloading" :lighting :run]
|
|
[25:28][Break in to GridRayCast() and inspect the WalkTable usage][:"hot reloading" :lighting :run]
|
|
[27:02][Fix our walk table breakage by making ComputeWalkTable() block copy the SampleDirections, and set a fresh RayD and WalkTableOffset][:"hot reloading" :lighting]
|
|
[33:32][:Run in -Od, hot-reload and see the walk table remain intact][:"hot reloading" :lighting]
|
|
[34:08][:Run in -O2, hot-reload and see the walk table remain intact][:"hot reloading" :lighting]
|
|
[34:42][Enable then disable the LightBoxDumpTrigger, to dump the :lighting][:run]
|
|
[35:17][Check out our :lighting dump files, noting the large size of the source_lighting.dump][:admin]
|
|
[36:44][Hit a read access violation in GetAlignmentOffset() from hhlightprof][:lighting :memory :run]
|
|
[37:14][Make ProfileRun() push the SampleDirectionTable onto the TempArena][:lighting :memory]
|
|
[37:34][Hit a write access violation in PushDebugLine()][:"debug system" :lighting :run]
|
|
[37:44][Make ProfileRun() disable UpdateDebugLines][:"debug system" :lighting]
|
|
[38:03][:Run hhlightprof successfully][:lighting]
|
|
[38:35][:Run an -O2 build of hhlightprof][:lighting]
|
|
[38:59][hhlightprof total seconds elapsed: 7.646482][:lighting :performance :run]
|
|
[40:03][Disable LIGHTING_USE_GRID][:lighting]
|
|
[40:20][hhlightprof total seconds elapsed: 7.287836][:lighting :performance :run]
|
|
[40:42][Save off our timings, enable LIGHTING_USE_GRID and make GridRayCast() return early if not Debugging][:lighting]
|
|
[42:34][A few words on replacing :language / compiler "Errors" and "Warnings" with "I could not compile this" and "Things I noticed about the code"][:rant :speech]
|
|
[45:29][hhlightprof total seconds elapsed (without ray casting): 1.246065][:lighting :performance :run]
|
|
[46:44][Let GridRayCast() do its work][:lighting]
|
|
[46:54][hhlightprof total seconds elapsed: 7.706672][:lighting :performance :run]
|
|
[47:10][Reorganise GridRayCast() to decrement the CostMetric after the loops, and comment out debugging code][:lighting]
|
|
[50:18][hhlightprof total seconds elapsed: 7.479611][:lighting :performance :run]
|
|
[50:46][Note why the AABB testing loop in GridRayCast() does not tend to use all four :SIMD lanes, and the simplicity of ComputeWalkTable()][:lighting :research]
|
|
[52:53][Determine to decouple the spatial and :lighting voxel grids][:lighting :research]
|
|
[53:55][Try decreasing the CostMetric from 16 to 4 in GridRayCast()][:lighting]
|
|
[54:28][hhlightprof total seconds elapsed: 6.321697][:lighting :performance :run]
|
|
[55:00][Try decreasing the CostMetric from 4 to 0 in GridRayCast()][:lighting]
|
|
[55:13][hhlightprof total seconds elapsed: 4.679288][:lighting :performance :run]
|
|
[55:34][Interpret our 4.679288 seconds :performance when casting no rays][:lighting :research]
|
|
[58:16][Alignment of Atlas Cells][:blackboard :lighting :memory :performance :threading]
|
|
[1:06:48][Determine to remove the mutex from our atlas traversal code][:lighting :performance :research :threading]
|
|
[1:10:18][Change ComputeLightPropagationWork() to distribute the :lighting computation along the Y axis][:performance :threading]
|
|
[1:12:00][hhlightprof total seconds elapsed: 4.422329][:lighting :performance :run]
|
|
[1:12:41][Consider compacting the lighting atlases][:lighting :performance :research]
|
|
[1:14:44][Consider the :performance of ComputeVoxelIrradianceAt()][:lighting :research]
|
|
[1:15:41][Try greatly simplifying ComputeVoxelIrradianceAt()][:lighting :performance]
|
|
[1:16:46][hhlightprof total seconds elapsed: 3.160531][:lighting :performance :run]
|
|
[1:16:59][Try further simplifying ComputeVoxelIrradianceAt()][:lighting :performance]
|
|
[1:17:33][hhlightprof total seconds elapsed: 1.815950][:lighting :performance :run]
|
|
[1:17:50][Determine to speed up ComputeVoxelIrradianceAt()][:lighting :performance :research]
|
|
[1:18:22][Instrument ComputeVoxelIrradianceAt() to more specifically gauge its :performance][:lighting]
|
|
[1:21:48][hhlightprof total seconds elapsed: 2.806293][:lighting :performance :run]
|
|
[1:22:00][Break in to ComputeVoxelIrradianceAt() and inspect the assembly][:asm :lighting :performance :run]
|
|
[1:23:31][Determine to optimise out some of the math in ComputeVoxelIrradianceAt()][:lighting :optimisation :research]
|
|
[1:26:46][Make ComputeVoxelIrradianceAt() operate wide[ref
|
|
site=Intel
|
|
page="Intel Intrinsics Guide"
|
|
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref
|
|
site=uops.info
|
|
url=https://uops.info/table.html]][:lighting :optimisation :simd]
|
|
[1:49:07][Dependents, and Cycle Ordering][:hardware :performance]
|
|
[1:52:32][Continue to make ComputeVoxelIrradianceAt() operate wide[ref
|
|
site=Intel
|
|
page="Intel Intrinsics Guide"
|
|
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref
|
|
site=uops.info
|
|
url=https://uops.info/table.html]][:lighting :optimisation :simd]
|
|
[2:07:23][Let GridRayCast() use the original CostMetric][:lighting]
|
|
[2:07:48][We are back to normal][:lighting :run]
|
|
[2:07:53][Q&A][:speech]
|
|
[2:08:15][@sagian2005][Q: Would things go better if you started with U, V and W each 4 wide?][:lighting :simd]
|
|
[2:09:36][@cirdanvalen][Q: Could you not pad by 8 bytes to fix the overlap?][:threading]
|
|
[2:10:03][@mindmark42][Q: The L0 determines how many cache lines the CPU can hold, right?][:hardware :memory]
|
|
[2:10:43][@rooctag][Q: Would you ever consider just going over explaining how an operations takes x amount of CPU ops? Or will that be in the Intro to C?][:hardware :performance]
|
|
[2:10:59][@ali4410][Q: Hi, do you recommend learning vi keybindings, emacs keybindings, or neither?]
|
|
[2:12:00][@mindmark42][Q: The :memory caches on the CPU][:hardware]
|
|
[2:13:17][:Hardware Caches[ref
|
|
site=WikiChip
|
|
page="Skylake (client) - Microarchitectures - Intel"
|
|
url=https://en.wikichip.org/wiki/intel/microarchitectures/skylake][ref
|
|
site=TechPowerUp
|
|
page="Intel \"Skylake\" Die Layout Detailed"
|
|
url=https://www.techpowerup.com/215333/intel-skylake-die-layout-detailed]][:memory :research]
|
|
[2:22:50][@mindmark42][Q: Yes, that answers my question. I just was just off by one index][:hardware :memory]
|
|
[2:23:15][@billdstrong][Q: How much time are you expecting to shave off from making this routine wide? About 2 / 3 or so? What are your expectations for your grid walk optimization? Are you trying to get under 4 seconds or lower?][:lighting :optimisation :performance :simd]
|
|
[2:23:51][@mindmark42][Q: Yeah, I mentioned those L caches. I was wondering how to determine how many cache lines a core holds][:memory]
|
|
[2:24:23][8-Way Caches[ref
|
|
site=WikiChip
|
|
page="Skylake (client) - Microarchitectures - Intel"
|
|
url=https://en.wikichip.org/wiki/intel/microarchitectures/skylake][ref
|
|
author="Katrina Yim"
|
|
title="Cache Associativity"
|
|
publisher="University of California, Berkeley"
|
|
url=http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf]][:hardware :memory :research]
|
|
[2:29:01][Thanks, everyone][:speech]
|
|
[/video]
|