cinera_handmade.network/cmuratori/hero/code/code615.hmml

119 lines
10 KiB
Plaintext
Raw Normal View History

2020-07-07 01:33:27 +00:00
[video output=day615 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimized Grid Step Selection" vod_platform=youtube id=wAfhYY4GSYU annotator=Miblo]
[0:02][Welcome to the stream with a plug of Handmade Seattle 2020[ref
site="Handmade Seattle"
page=Tickets
url=https://www.handmade-seattle.com/#tickets] and thanks to [@abnercoimbre Abner]][:research]
[5:25][Demo the current state of the :lighting][:run]
[7:01][Explain our :lighting system's two hot zones GridRayCast() and ComputeLightPropagation()][:research]
[8:52][hhlightprof total seconds elapsed: 4.534990][:lighting :performance :run]
[9:39][Toggle off the DiffuseWeightMap update in ComputeLightPropagation()][:lighting]
[9:46][hhlightprof total seconds elapsed: 3.599488][:lighting :performance :run]
[11:36][Determine to further optimise GridRayCast()][:lighting :speech]
[11:56][Try decreasing the CostMetric from 16 to 0 in GridRayCast()][:lighting]
[12:17][hhlightprof total seconds elapsed: 2.211856][:lighting :performance :run]
[12:33][Try increasing the CostMetric from 0 to 1 in GridRayCast()][:lighting]
[12:57][hhlightprof total seconds elapsed: 2.629898][:lighting :performance :run]
[13:22][Note the sensitivity of GridRayCast() to repetition][:lighting :speech]
[14:36][Let GridRayCast() set the CostMetric to our default 16][:lighting]
[14:55][Seek improvements to GridRayCast()][:lighting :optimisation :research]
[18:28][Note the fine-grained nature of our :lighting grid][:run]
[20:03][Make ProfileRun() print the spatial grid occupancy[ref
site="Microsoft Docs"
page="__popcnt16, __popcnt, __popcnt64"
url=https://docs.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=vs-2019]][:lighting :simd]
[31:45][Step in to ProfileRun()][:lighting :run]
[32:08][Try to demo ~RemedyBG's , \[comma\] Watch window syntax, with thanks to @x13pixels][:admin]
[33:57][~RemedyBG feature request: Formatters for regular variables in the Watch window][:admin]
[34:33][Check the box occupancy values produced by ProfileRun()][:lighting :run :simd]
[35:05][hhlightprof box occupancy: Low][:lighting :performance :run]
[36:34][Determine to perform ComputeWalkTable() inline][:lighting :optimisation :research]
[39:12][Introduce ComputeWalkTableFast(), which does not return anything, but may be used to verify our results][:lighting :optimisation]
[43:03][:Run hhlightprof successfully][:lighting :optimisation]
[43:12][Induce an error in ComputeWalkTableFast()][:lighting :optimisation]
[43:22][:Run hhlightprof without faulting][:lighting :optimisation]
[44:18][Step through ComputeWalkTableFast()][:lighting :run]
[46:11][Use a hand-coded assertion in ComputeWalkTableFast()][:lighting]
[47:13][:Run hhlightprof with a fault][:lighting :optimisation]
[47:22][Remove our induced error from ComputeWalkTableFast()][:lighting :optimisation]
[47:30][:Run hhlightprof successfully][:lighting :optimisation]
[47:48][Embark on optimising ComputeWalkTableFast() in :SIMD][:lighting :optimisation]
[55:07][:Run hhlightprof with a fault, due to tTerminateResult being totally wrong][:lighting :optimisation :simd]
[56:02][Fix ComputeWalkTableFast() to compute At4 inside the loop][:lighting :optimisation :simd]
[56:55][:Run hhlightprof successfully][:lighting :optimisation]
[57:15][Optimise ComputeWalkTableFast() to compute BestDim using an HCompShuffler][:lighting :optimisation :simd]
[1:02:26][:Run hhlightprof with a fault, due to dGridResult being wrong][:lighting :optimisation :simd]
[1:03:17][Remove 14 and 15 from the HCompShuffler in ComputeWalkTableFast()][:lighting :optimisation :simd]
[1:04:57][:Run hhlightprof with a fault, due to tBestRef and tBest differing][:lighting :optimisation :simd]
[1:06:57][Consider how best to traverse the walk table][:lighting :optimisation :research]
[1:10:43][Look into _mm_minpos_epu16() at the Intel Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :research]
[1:13:14][Introduce a second HCompShufflerLow to compare the low 16-bits of values with equivalent high 16-bits][:lighting :optimisation]
[1:15:14][Revert the HCompShufflerLow][:lighting :optimisation]
[1:15:54][Optimise our WalkTable traversal using all four :SIMD lanes, replacing the HCompShuffler with BestTable][:lighting :optimisation]
[1:31:13][:Run hhlightprof with a verification fault][:lighting :optimisation :simd]
[1:34:00][Assert in ComputeWalkTableFast() that the CompMask is within bounds of the BestTable][:lighting :optimisation :simd]
[1:34:47][:Run hhlightprof with a verification fault not on the BestTable bounds][:lighting :optimisation :simd]
[1:36:35][Add a breakpoint in ComputeWalkTableFast() on SampleDirIndex 135][:lighting :optimisation :simd]
[1:36:58][Step through ComputeWalkTableFast() on SampleDirIndex 135][:lighting :optimisation :run :simd]
[1:40:55][Linguistically flip the Best checker in (the working) ComputeWalkTable()][:lighting :optimisation]
[1:41:21][:Run hhlightprof with a verification fault on SampleDirIndex 256][:lighting :optimisation :simd]
[1:42:18][Logically flip the sense of the Best checker in ComputeWalkTable(), and redo the BestTable in ComputeWalkTableFast() in line with the original logic][:lighting :optimisation :simd]
[1:45:14][:Run hhlightprof with a verification fault right off the bat][:lighting :optimisation :simd]
[1:45:24][Verify the BestTable in ComputeWalkTableFast()][:lighting :optimisation :research :simd]
[1:47:08][Reacquaint ourselves with the Best picking in ComputeWalkTable()][:lighting :optimisation :run :simd]
[1:47:44][Revert the sense of the Best checker in ComputeWalkTable()][:lighting :optimisation]
[1:48:57][:Run hhlightprof successfully][:lighting :optimisation :simd]
[1:49:26][Introduce a ShuffleTable in ComputeWalkTableFast()][:lighting :optimisation :simd]
[1:51:04][:Run hhlightprof successfully][:lighting :optimisation :simd]
[1:51:12][Optimise ComputeWalkTableFast() to pick the tBest out of the ShuffleTable][:lighting :optimisation :simd]
[1:54:12][:Run hhlightprof successfully][:lighting :optimisation :simd]
[1:54:20][Optimise ComputeWalkTableFast() to track tTerminate in :SIMD][:lighting :optimisation]
[1:55:18][:Run hhlightprof successfully][:lighting :optimisation :simd]
[1:55:22][Optimise ComputeWalkTableFast() to initialise At4 before the loop, and individually offset the four steps by the CellDim][:lighting :optimisation :simd]
[2:00:05][:Run hhlightprof successfully][:lighting :optimisation :simd]
[2:00:08][Optimise ComputeWalkTableFast() to offset all four steps in :SIMD, branchless, using a MaskTable][:lighting :optimisation]
[2:04:40][:Run hhlightprof with a verification fault][:lighting :optimisation :simd]
[2:04:55][Scrutinise our MaskTable][:lighting :optimisation :research :simd]
[2:05:37][Compute a Compare for At4 in ComputeWalkTableFast()][:lighting :optimisation :simd]
[2:06:05][Break in to ComputeWalkTableFast() and compare the Compare with our actual At4][:lighting :optimisation :run :simd]
[2:07:04][Set At4 equal to Compare, saving off the OldAt4][:lighting :optimisation :simd]
[2:07:19][:Run hhlightprof successfully][:lighting :optimisation :simd]
[2:07:34][Try making ComputeWalkTableFast() offset the At4 in two steps][:lighting :optimisation :simd]
[2:08:18][:Run hhlightprof successfully][:lighting :optimisation :simd]
[2:08:27][Gauge the :performance of our ComputeWalkTableFast()[ref
site=uops.info
url=https://uops.info/table.html]][:lighting :research :simd]
[2:11:48][Build in -O2]
[2:12:12][:Run the game successfully][:lighting :optimisation]
[2:12:26][Make ComputeWalkTable() compute InvRayD before the stepping loop, to remove a divide within it][:lighting :optimisation]
[2:13:01][The :lighting looks completely different][:optimisation :run]
[2:13:35][Make ComputeWalkTable() compute the InvRayD using a safe ratio][:lighting :optimisation]
[2:15:37][The :lighting remains different][:optimisation :run]
[2:16:06][Fix ComputeWalkTable() to compute InvRayD after RayD itself][:lighting :optimisation]
[2:16:26][The :lighting is back to how it was][:optimisation :run]
[2:16:30][Make ComputeWalkTable() compute InvRayD as normal][:lighting :optimisation]
[2:16:41][The :lighting is fine][:optimisation :run]
[2:16:58][Build in -Od]
[2:17:14][:Run hhlightprof with a verification fault][:lighting :optimisation :simd]
[2:17:22][Make ComputeWalkTableFast() also precompute InvRayD][:lighting :optimisation :simd]
[2:17:50][:Run hhlightprof successfully][:lighting :optimisation :simd]
[2:18:50][Q&A][:speech]
[2:19:11][@centhusiast][Q: Hi [@cmuratori Casey]! I was very sick and my health condition was very bad and for the last three months and now I am fortunately back to life and to [~hero Handmade Hero]. Could you briefly say what the focus of [~hero Handmade Hero] was in the last three months? Thank you!]
[2:20:16][@infinum][Q: @handmade_hero Hello [@cmuratori Casey], this question may be off-topic but it's really important for me. I know you were doing some :UI development. I saw your video on immediate mode UI. I have the only job opportunity to develop UI for mobile app but I've never done that and I need this job. So can you please give me some advice on where to find information, maybe some guides on UI development and were you using some :library or did you write everything from scratch? It would be very helpful for me]
[2:23:14][@somebody_took_my_name][Q: What is the best way to debug something that only happens in optimized code?]
[2:23:50][@mindmark42][Q: How expensive do you think the table lookups are?[ref
site=uops.info
url=https://uops.info/table.html][ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:performance]
[2:28:03][@tomtetlaw][Q: Can you give a general idea of how to optimise branches out of a function?]
[2:29:49][@legendarior][Q: Hello, thank you for all the videos. I am a bored CS student that aced his exams and now does not know what to do during his vacation]
[2:29:58][@lucid_frost][Q: What kinds of things would you like the compiler to do to help with this table stuff (if any)?][:language]
[2:30:12][@centhusiast][Q: Could you explain the compile time execution as we have in jai?][:language]
[2:30:19][@billdstrong][Q: Would meowhash be suitable to create a custom UUID? It wouldn't be part of the UUID spec, but could it serve the same purpose?][:hashing]
[2:30:42][End it there][:speech]
[/video]