65 lines
5.4 KiB
Plaintext
65 lines
5.4 KiB
Plaintext
|
[video output=day612 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="First Pass Optimization of Voxel Sampling" vod_platform=youtube id=W3ml7cO96F0 annotator=Miblo]
|
||
|
[0:01][Recap and set the stage for the day][:speech]
|
||
|
[2:08][Describe our vectorisation of ComputeVoxelIrradianceAt()][:lighting :optimisation :research :simd]
|
||
|
[3:11][Instrument ComputeVoxelIrradianceAt() to verify the new :SIMD against the old scalar code][:lighting :optimisation]
|
||
|
[9:08][Continue to make ComputeVoxelIrradianceAt() operate wide][:lighting :optimisation :simd]
|
||
|
[13:14][Introduce an f32_4x version of Clamp01(), with a few words on optimising compilers][:language :mathematics :simd]
|
||
|
[16:19][Continue to make ComputeVoxelIrradianceAt() operate wide][:lighting :optimisation :simd]
|
||
|
[29:10][Change the f32_4x version of Clamp01() to use ZeroF32_4x()][:mathematics :simd]
|
||
|
[29:57][@billdstrong][How is [@cmuratori he] going to test the Clamp01() if [@cmuratori he] deleted it from the original code?]
|
||
|
[30:04][Introduce an f32_4x version Floor()[ref
|
||
|
site=Intel
|
||
|
page="Intel Intrinsics Guide"
|
||
|
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics :simd]
|
||
|
[31:46][Fix compile errors in our ComputeVoxelIrradianceAt() vectorisation][:lighting :optimisation :simd]
|
||
|
[35:52][Optimise ComputeVoxelIrradianceAt() to sum weights before broadcasting them][:lighting :optimisation :simd]
|
||
|
[38:19][On the cognitive demand of :SIMD, as opposed to instruction sets like AVX-512 and NEON][:isa :speech]
|
||
|
[42:01][Continue to make ComputeVoxelIrradianceAt() operate wide, loading in the tiles[ref
|
||
|
site=Intel
|
||
|
page="Intel Intrinsics Guide"
|
||
|
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref
|
||
|
site=uops.info
|
||
|
url=https://uops.info/table.html]][:lighting :optimisation :simd]
|
||
|
[1:09:06][Introduce ConvertS32()[ref
|
||
|
site=Intel
|
||
|
page="Intel Intrinsics Guide"
|
||
|
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:simd]
|
||
|
[1:12:44][Finish making ComputeVoxelIrradianceAt() operate wide, introducing an f32_4x version of Clamp()][:lighting :optimisation :simd]
|
||
|
[1:18:39][:Run the game][:lighting :optimisation :simd]
|
||
|
[1:19:02][Step through ComputeVoxelIrradianceAt() to find that our vectorised code has been compiled out][:lighting :optimisation :run :simd]
|
||
|
[1:19:28][Make ComputeVoxelIrradianceAt() return the :SIMD computed result][:lighting :optimisation]
|
||
|
[1:19:50][Step through ComputeVoxelIrradianceAt() and try to check out our vectorised code][:asm :lighting :optimisation :run :simd]
|
||
|
[1:21:15][Disable multithreading of the :lighting][:threading]
|
||
|
[1:21:45][Step through our multithreaded ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd]
|
||
|
[1:22:19][Comment out the old scalar ComputeVoxelIrradianceAt()][:lighting :optimisation]
|
||
|
[1:23:16][Step through our single-threaded ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd]
|
||
|
[1:23:54][Update ~RemedyBG][:admin]
|
||
|
[1:26:30][Step through the assembly of our new vectorised ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd]
|
||
|
[1:28:41][Our :lighting looks like the vectorisation just worked][:optimisation :run :simd]
|
||
|
[1:28:47][Enable multithreading of the :lighting][:threading]
|
||
|
[1:29:05][Our :lighting looks like it did before][:optimisation :run :simd]
|
||
|
[1:29:18][hhlightprof total seconds elapsed: 5.110175][:lighting :performance :run]
|
||
|
[1:30:56][Disable LIGHTING_USE_GRID][:lighting]
|
||
|
[1:31:13][hhlightprof total seconds elapsed: 6.390334][:lighting :performance :run]
|
||
|
[1:32:36][Enable LIGHTING_USE_GRID][:lighting]
|
||
|
[1:32:54][77% of our frame time spent in ComputeLightPropagationWork][:lighting :performance :run]
|
||
|
[1:34:09][Q&A][:speech]
|
||
|
[1:35:00][@billdstrong][Q: Do you plan on bringing your editor on stream, or not? You keep bragging about it]
|
||
|
[1:35:05][@mindmark42][Q: Can you run lightprof without any days?][:lighting]
|
||
|
[1:35:17][@mindmark42][rays][:lighting]
|
||
|
[1:35:28][Try decreasing the CostMetric from 16 to 0 in GridRayCast()][:lighting]
|
||
|
[1:36:03][hhlightprof total seconds elapsed: 2.583887][:lighting :performance :run]
|
||
|
[1:36:47][@vaualbus][Q: Can we time that function with the :"debug system"? So we see how long the top part of that function takes?]
|
||
|
[1:37:01][@equivocatorrr][Q: Why is frame time stability such a rare / impossible thing without leaving headroom?][:performance]
|
||
|
[1:38:54][@pragmascrypt][Q: Did you activate :threading again for the benchmark?]
|
||
|
[1:39:18][@sagian2005][Q: [@cmuratori Casey], I just sent you an email. It's re: the SSE stuff you did on today's stream. You might get a smile out of it][:simd]
|
||
|
[1:39:29][@nobodad][Q: @naysayer88 mentioned that you discussed with him why programming languages shouldn't have unsigned integers. Have you posted your rationale somewhere that I can read? Would you be willing to?][:language]
|
||
|
[1:40:49][@fl_aw3n][Q: Can I compile all files in all subdirectories with CL recursively?][:language]
|
||
|
[1:41:06][@yesyesyourmother][Q: Can you use some of the :lighting work you do on [~hero Handmade Hero] in different projects?]
|
||
|
[1:41:31][@relvet][Q: When do we add special sauce, and how much of it? I feel this game needs a Sauce-O-Meter]
|
||
|
[1:42:31][@mindmark42][Q: Couldn't the v3 XYZ be loaded with a single load if we pad them?][:simd]
|
||
|
[1:46:11][@exp_ix][Q: Are there any fundamental differences between games engines that use low poly models vs this one?]
|
||
|
[1:47:34][@noobgirrafe][How can I get your emacs config?]
|
||
|
[1:48:08][Shut it down][:speech]
|
||
|
[/video]
|