diff --git a/cmuratori/hero/code/code615.hmml b/cmuratori/hero/code/code615.hmml new file mode 100644 index 0000000..203f65e --- /dev/null +++ b/cmuratori/hero/code/code615.hmml @@ -0,0 +1,118 @@ +[video output=day615 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimized Grid Step Selection" vod_platform=youtube id=wAfhYY4GSYU annotator=Miblo] +[0:02][Welcome to the stream with a plug of Handmade Seattle 2020[ref + site="Handmade Seattle" + page=Tickets + url=https://www.handmade-seattle.com/#tickets] and thanks to [@abnercoimbre Abner]][:research] +[5:25][Demo the current state of the :lighting][:run] +[7:01][Explain our :lighting system's two hot zones GridRayCast() and ComputeLightPropagation()][:research] +[8:52][hhlightprof total seconds elapsed: 4.534990][:lighting :performance :run] +[9:39][Toggle off the DiffuseWeightMap update in ComputeLightPropagation()][:lighting] +[9:46][hhlightprof total seconds elapsed: 3.599488][:lighting :performance :run] +[11:36][Determine to further optimise GridRayCast()][:lighting :speech] +[11:56][Try decreasing the CostMetric from 16 to 0 in GridRayCast()][:lighting] +[12:17][hhlightprof total seconds elapsed: 2.211856][:lighting :performance :run] +[12:33][Try increasing the CostMetric from 0 to 1 in GridRayCast()][:lighting] +[12:57][hhlightprof total seconds elapsed: 2.629898][:lighting :performance :run] +[13:22][Note the sensitivity of GridRayCast() to repetition][:lighting :speech] +[14:36][Let GridRayCast() set the CostMetric to our default 16][:lighting] +[14:55][Seek improvements to GridRayCast()][:lighting :optimisation :research] +[18:28][Note the fine-grained nature of our :lighting grid][:run] +[20:03][Make ProfileRun() print the spatial grid occupancy[ref + site="Microsoft Docs" + page="__popcnt16, __popcnt, __popcnt64" + url=https://docs.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=vs-2019]][:lighting :simd] +[31:45][Step in to ProfileRun()][:lighting :run] +[32:08][Try to demo ~RemedyBG's , \[comma\] Watch window syntax, with thanks to @x13pixels][:admin] +[33:57][~RemedyBG feature request: Formatters for regular variables in the Watch window][:admin] +[34:33][Check the box occupancy values produced by ProfileRun()][:lighting :run :simd] +[35:05][hhlightprof box occupancy: Low][:lighting :performance :run] +[36:34][Determine to perform ComputeWalkTable() inline][:lighting :optimisation :research] +[39:12][Introduce ComputeWalkTableFast(), which does not return anything, but may be used to verify our results][:lighting :optimisation] +[43:03][:Run hhlightprof successfully][:lighting :optimisation] +[43:12][Induce an error in ComputeWalkTableFast()][:lighting :optimisation] +[43:22][:Run hhlightprof without faulting][:lighting :optimisation] +[44:18][Step through ComputeWalkTableFast()][:lighting :run] +[46:11][Use a hand-coded assertion in ComputeWalkTableFast()][:lighting] +[47:13][:Run hhlightprof with a fault][:lighting :optimisation] +[47:22][Remove our induced error from ComputeWalkTableFast()][:lighting :optimisation] +[47:30][:Run hhlightprof successfully][:lighting :optimisation] +[47:48][Embark on optimising ComputeWalkTableFast() in :SIMD][:lighting :optimisation] +[55:07][:Run hhlightprof with a fault, due to tTerminateResult being totally wrong][:lighting :optimisation :simd] +[56:02][Fix ComputeWalkTableFast() to compute At4 inside the loop][:lighting :optimisation :simd] +[56:55][:Run hhlightprof successfully][:lighting :optimisation] +[57:15][Optimise ComputeWalkTableFast() to compute BestDim using an HCompShuffler][:lighting :optimisation :simd] +[1:02:26][:Run hhlightprof with a fault, due to dGridResult being wrong][:lighting :optimisation :simd] +[1:03:17][Remove 14 and 15 from the HCompShuffler in ComputeWalkTableFast()][:lighting :optimisation :simd] +[1:04:57][:Run hhlightprof with a fault, due to tBestRef and tBest differing][:lighting :optimisation :simd] +[1:06:57][Consider how best to traverse the walk table][:lighting :optimisation :research] +[1:10:43][Look into _mm_minpos_epu16() at the Intel Intrinsics Guide[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :research] +[1:13:14][Introduce a second HCompShufflerLow to compare the low 16-bits of values with equivalent high 16-bits][:lighting :optimisation] +[1:15:14][Revert the HCompShufflerLow][:lighting :optimisation] +[1:15:54][Optimise our WalkTable traversal using all four :SIMD lanes, replacing the HCompShuffler with BestTable][:lighting :optimisation] +[1:31:13][:Run hhlightprof with a verification fault][:lighting :optimisation :simd] +[1:34:00][Assert in ComputeWalkTableFast() that the CompMask is within bounds of the BestTable][:lighting :optimisation :simd] +[1:34:47][:Run hhlightprof with a verification fault not on the BestTable bounds][:lighting :optimisation :simd] +[1:36:35][Add a breakpoint in ComputeWalkTableFast() on SampleDirIndex 135][:lighting :optimisation :simd] +[1:36:58][Step through ComputeWalkTableFast() on SampleDirIndex 135][:lighting :optimisation :run :simd] +[1:40:55][Linguistically flip the Best checker in (the working) ComputeWalkTable()][:lighting :optimisation] +[1:41:21][:Run hhlightprof with a verification fault on SampleDirIndex 256][:lighting :optimisation :simd] +[1:42:18][Logically flip the sense of the Best checker in ComputeWalkTable(), and redo the BestTable in ComputeWalkTableFast() in line with the original logic][:lighting :optimisation :simd] +[1:45:14][:Run hhlightprof with a verification fault right off the bat][:lighting :optimisation :simd] +[1:45:24][Verify the BestTable in ComputeWalkTableFast()][:lighting :optimisation :research :simd] +[1:47:08][Reacquaint ourselves with the Best picking in ComputeWalkTable()][:lighting :optimisation :run :simd] +[1:47:44][Revert the sense of the Best checker in ComputeWalkTable()][:lighting :optimisation] +[1:48:57][:Run hhlightprof successfully][:lighting :optimisation :simd] +[1:49:26][Introduce a ShuffleTable in ComputeWalkTableFast()][:lighting :optimisation :simd] +[1:51:04][:Run hhlightprof successfully][:lighting :optimisation :simd] +[1:51:12][Optimise ComputeWalkTableFast() to pick the tBest out of the ShuffleTable][:lighting :optimisation :simd] +[1:54:12][:Run hhlightprof successfully][:lighting :optimisation :simd] +[1:54:20][Optimise ComputeWalkTableFast() to track tTerminate in :SIMD][:lighting :optimisation] +[1:55:18][:Run hhlightprof successfully][:lighting :optimisation :simd] +[1:55:22][Optimise ComputeWalkTableFast() to initialise At4 before the loop, and individually offset the four steps by the CellDim][:lighting :optimisation :simd] +[2:00:05][:Run hhlightprof successfully][:lighting :optimisation :simd] +[2:00:08][Optimise ComputeWalkTableFast() to offset all four steps in :SIMD, branchless, using a MaskTable][:lighting :optimisation] +[2:04:40][:Run hhlightprof with a verification fault][:lighting :optimisation :simd] +[2:04:55][Scrutinise our MaskTable][:lighting :optimisation :research :simd] +[2:05:37][Compute a Compare for At4 in ComputeWalkTableFast()][:lighting :optimisation :simd] +[2:06:05][Break in to ComputeWalkTableFast() and compare the Compare with our actual At4][:lighting :optimisation :run :simd] +[2:07:04][Set At4 equal to Compare, saving off the OldAt4][:lighting :optimisation :simd] +[2:07:19][:Run hhlightprof successfully][:lighting :optimisation :simd] +[2:07:34][Try making ComputeWalkTableFast() offset the At4 in two steps][:lighting :optimisation :simd] +[2:08:18][:Run hhlightprof successfully][:lighting :optimisation :simd] +[2:08:27][Gauge the :performance of our ComputeWalkTableFast()[ref + site=uops.info + url=https://uops.info/table.html]][:lighting :research :simd] +[2:11:48][Build in -O2] +[2:12:12][:Run the game successfully][:lighting :optimisation] +[2:12:26][Make ComputeWalkTable() compute InvRayD before the stepping loop, to remove a divide within it][:lighting :optimisation] +[2:13:01][The :lighting looks completely different][:optimisation :run] +[2:13:35][Make ComputeWalkTable() compute the InvRayD using a safe ratio][:lighting :optimisation] +[2:15:37][The :lighting remains different][:optimisation :run] +[2:16:06][Fix ComputeWalkTable() to compute InvRayD after RayD itself][:lighting :optimisation] +[2:16:26][The :lighting is back to how it was][:optimisation :run] +[2:16:30][Make ComputeWalkTable() compute InvRayD as normal][:lighting :optimisation] +[2:16:41][The :lighting is fine][:optimisation :run] +[2:16:58][Build in -Od] +[2:17:14][:Run hhlightprof with a verification fault][:lighting :optimisation :simd] +[2:17:22][Make ComputeWalkTableFast() also precompute InvRayD][:lighting :optimisation :simd] +[2:17:50][:Run hhlightprof successfully][:lighting :optimisation :simd] +[2:18:50][Q&A][:speech] +[2:19:11][@centhusiast][Q: Hi [@cmuratori Casey]! I was very sick and my health condition was very bad and for the last three months and now I am fortunately back to life and to [~hero Handmade Hero]. Could you briefly say what the focus of [~hero Handmade Hero] was in the last three months? Thank you!] +[2:20:16][@infinum][Q: @handmade_hero Hello [@cmuratori Casey], this question may be off-topic but it's really important for me. I know you were doing some :UI development. I saw your video on immediate mode UI. I have the only job opportunity to develop UI for mobile app but I've never done that and I need this job. So can you please give me some advice on where to find information, maybe some guides on UI development and were you using some :library or did you write everything from scratch? It would be very helpful for me] +[2:23:14][@somebody_took_my_name][Q: What is the best way to debug something that only happens in optimized code?] +[2:23:50][@mindmark42][Q: How expensive do you think the table lookups are?[ref + site=uops.info + url=https://uops.info/table.html][ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:performance] +[2:28:03][@tomtetlaw][Q: Can you give a general idea of how to optimise branches out of a function?] +[2:29:49][@legendarior][Q: Hello, thank you for all the videos. I am a bored CS student that aced his exams and now does not know what to do during his vacation] +[2:29:58][@lucid_frost][Q: What kinds of things would you like the compiler to do to help with this table stuff (if any)?][:language] +[2:30:12][@centhusiast][Q: Could you explain the compile time execution as we have in jai?][:language] +[2:30:19][@billdstrong][Q: Would meowhash be suitable to create a custom UUID? It wouldn't be part of the UUID spec, but could it serve the same purpose?][:hashing] +[2:30:42][End it there][:speech] +[/video]