[video output=day612 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="First Pass Optimization of Voxel Sampling" vod_platform=youtube id=W3ml7cO96F0 annotator=Miblo] [0:01][Recap and set the stage for the day][:speech] [2:08][Describe our vectorisation of ComputeVoxelIrradianceAt()][:lighting :optimisation :research :simd] [3:11][Instrument ComputeVoxelIrradianceAt() to verify the new :SIMD against the old scalar code][:lighting :optimisation] [9:08][Continue to make ComputeVoxelIrradianceAt() operate wide][:lighting :optimisation :simd] [13:14][Introduce an f32_4x version of Clamp01(), with a few words on optimising compilers][:language :mathematics :simd] [16:19][Continue to make ComputeVoxelIrradianceAt() operate wide][:lighting :optimisation :simd] [29:10][Change the f32_4x version of Clamp01() to use ZeroF32_4x()][:mathematics :simd] [29:57][@billdstrong][How is [@cmuratori he] going to test the Clamp01() if [@cmuratori he] deleted it from the original code?] [30:04][Introduce an f32_4x version Floor()[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics :simd] [31:46][Fix compile errors in our ComputeVoxelIrradianceAt() vectorisation][:lighting :optimisation :simd] [35:52][Optimise ComputeVoxelIrradianceAt() to sum weights before broadcasting them][:lighting :optimisation :simd] [38:19][On the cognitive demand of :SIMD, as opposed to instruction sets like AVX-512 and NEON][:isa :speech] [42:01][Continue to make ComputeVoxelIrradianceAt() operate wide, loading in the tiles[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref site=uops.info url=https://uops.info/table.html]][:lighting :optimisation :simd] [1:09:06][Introduce ConvertS32()[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:simd] [1:12:44][Finish making ComputeVoxelIrradianceAt() operate wide, introducing an f32_4x version of Clamp()][:lighting :optimisation :simd] [1:18:39][:Run the game][:lighting :optimisation :simd] [1:19:02][Step through ComputeVoxelIrradianceAt() to find that our vectorised code has been compiled out][:lighting :optimisation :run :simd] [1:19:28][Make ComputeVoxelIrradianceAt() return the :SIMD computed result][:lighting :optimisation] [1:19:50][Step through ComputeVoxelIrradianceAt() and try to check out our vectorised code][:asm :lighting :optimisation :run :simd] [1:21:15][Disable multithreading of the :lighting][:threading] [1:21:45][Step through our multithreaded ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd] [1:22:19][Comment out the old scalar ComputeVoxelIrradianceAt()][:lighting :optimisation] [1:23:16][Step through our single-threaded ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd] [1:23:54][Update ~RemedyBG][:admin] [1:26:30][Step through the assembly of our new vectorised ComputeVoxelIrradianceAt()][:asm :lighting :optimisation :run :simd] [1:28:41][Our :lighting looks like the vectorisation just worked][:optimisation :run :simd] [1:28:47][Enable multithreading of the :lighting][:threading] [1:29:05][Our :lighting looks like it did before][:optimisation :run :simd] [1:29:18][hhlightprof total seconds elapsed: 5.110175][:lighting :performance :run] [1:30:56][Disable LIGHTING_USE_GRID][:lighting] [1:31:13][hhlightprof total seconds elapsed: 6.390334][:lighting :performance :run] [1:32:36][Enable LIGHTING_USE_GRID][:lighting] [1:32:54][77% of our frame time spent in ComputeLightPropagationWork][:lighting :performance :run] [1:34:09][Q&A][:speech] [1:35:00][@billdstrong][Q: Do you plan on bringing your editor on stream, or not? You keep bragging about it] [1:35:05][@mindmark42][Q: Can you run lightprof without any days?][:lighting] [1:35:17][@mindmark42][rays][:lighting] [1:35:28][Try decreasing the CostMetric from 16 to 0 in GridRayCast()][:lighting] [1:36:03][hhlightprof total seconds elapsed: 2.583887][:lighting :performance :run] [1:36:47][@vaualbus][Q: Can we time that function with the :"debug system"? So we see how long the top part of that function takes?] [1:37:01][@equivocatorrr][Q: Why is frame time stability such a rare / impossible thing without leaving headroom?][:performance] [1:38:54][@pragmascrypt][Q: Did you activate :threading again for the benchmark?] [1:39:18][@sagian2005][Q: [@cmuratori Casey], I just sent you an email. It's re: the SSE stuff you did on today's stream. You might get a smile out of it][:simd] [1:39:29][@nobodad][Q: @naysayer88 mentioned that you discussed with him why programming languages shouldn't have unsigned integers. Have you posted your rationale somewhere that I can read? Would you be willing to?][:language] [1:40:49][@fl_aw3n][Q: Can I compile all files in all subdirectories with CL recursively?][:language] [1:41:06][@yesyesyourmother][Q: Can you use some of the :lighting work you do on [~hero Handmade Hero] in different projects?] [1:41:31][@relvet][Q: When do we add special sauce, and how much of it? I feel this game needs a Sauce-O-Meter] [1:42:31][@mindmark42][Q: Couldn't the v3 XYZ be loaded with a single load if we pad them?][:simd] [1:46:11][@exp_ix][Q: Are there any fundamental differences between games engines that use low poly models vs this one?] [1:47:34][@noobgirrafe][How can I get your emacs config?] [1:48:08][Shut it down][:speech] [/video]