[video output=day614 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Continuing Streamlining the Raycaster" vod_platform=youtube id=IxeKOAcvgK0 annotator=Miblo] [0:01][Welcome to the stream][:speech] [0:06][Determine to continue with :optimisation][:lighting :run] [0:57][Recap yesterday's welding :optimisation in GridRayCast()][:lighting :research] [4:09][Consider :optimisation potential of the SpecTexel load / stores in GridRayCast()][:lighting :research] [7:22][Illustrate the possibility of loading in the SpecTexel values and InvBlend at the outset][:lighting :optimisation] [9:23][Seek easier :optimisation opportunities in GridRayCast()][:lighting :research] [11:43][Simplify out OcclusionN from GridRayCast()][:lighting :optimisation] [12:27][Seek :optimisation with OcclusionD and RayD in GridRayCast()][:lighting :research] [18:48][Streamline the SignRayD and NormalXYZ computations in GridRayCast()][:lighting :optimisation :simd] [25:35][Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()][:lighting :research :simd] [30:30][Streamline the Normal selection in GridRayCast()][:lighting :optimisation :simd] [34:46][Check out the port usage of various instructions, noting that we may get an AND for free[ref site=uops.info url=https://uops.info/table.html]][:isa :research] [40:23][Continue to streamline the Normal selection in GridRayCast(), introducing a NormalTable, before toggling back to the old code][:lighting :optimisation :simd] [48:12][:Run successfully][:lighting] [48:31][Streamline the ProbeSampleNSingle usage in GridRayCast()][:lighting :optimisation :simd] [55:01][:Run successfully, and consider unit testing the grid ray cast][:lighting] [56:49][Treat ProbeSampleNSingle wide in GridRayCast()][:lighting :optimisation :simd] [1:01:34][:Run successfully][:lighting] [1:01:50][Treat OcclusionD wide in GridRayCast()][:lighting :optimisation :simd] [1:03:28][:Run successfully][:lighting] [1:04:02][Finish streamlining the Normal selection in GridRayCast()][:lighting :optimisation :simd] [1:07:46][:Run successfully][:lighting] [1:08:13][Temporarily try hard setting the NormalIndex to 0 in GridRayCast()][:lighting :optimisation :simd] [1:08:27][We can't tell it's wrong][:lighting :optimisation :run :simd] [1:08:56][Let GridRayCast() set the computed NormalIndex and make a note to test this][:lighting :optimisation :simd] [1:09:36][hhlightprof total seconds elapsed: 4.534789][:lighting :performance :run :simd] [1:10:20][Simplify out tUpdateBlend in GridRayCast()][:lighting :optimisation :simd] [1:12:49][Augment light_atlas with StrideXYZ_4x and VoxelDim_4x][:"data structure" :lighting :optimisation :simd] [1:17:45][:Run successfully][:lighting] [1:17:54][Make MakeLightAtlas() set the StrideXYZ and VoxelDim, for GridRayCast() to load out of that atlas, changing their format in light_atlas to be an array of 4][:"data structure" :lighting :optimisation :simd] [1:20:37][:Run successfully][:lighting] [1:20:46][hhlightprof total seconds elapsed: 4.513986][:lighting :performance :run :simd] [1:22:09][Remove the old AABBRayCast()][:lighting] [1:24:42][:Run successfully][:lighting] [1:24:51][Prepare lighting_box to pack down to 64-bits total, propagating this change][:"data structure" :lighting] [1:28:29][:Run successfully][:lighting] [1:28:38][Clean out the sprawl from FullCast()][:lighting :optimisation :simd] [1:36:20][:Run successfully][:lighting] [1:36:25][Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd] [1:39:21][hhlightprof total seconds elapsed: 4.511818][:lighting :performance :run :simd] [1:39:36][Extend GridRayCast() to operate on twice as many samples][:lighting :optimisation :simd] [1:40:44][:Run successfully][:lighting] [1:40:46][hhlightprof total seconds elapsed: 4.394170][:lighting :performance :run :simd] [1:41:52][Toggle off the debug code in FullCast()][:"debug system" :lighting] [1:43:26][hhlightprof total seconds elapsed: 4.392245][:lighting :performance :run :simd] [1:43:41][Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd] [1:45:57][Q&A][:speech] [1:47:07][@mindmark42][Q: Yesterday you changed your :SIMD extract functions to use shuffles instead. Could you explain again why that is better?][:performance] [1:47:26][Extract vs Shuffle][:blackboard :performance :simd] [1:56:14]["Semantic" Extraction][:blackboard :language :performance :simd] [1:58:02][Unnecessary extract and cast, with thanks to @mmozeiko][:blackboard :performance :simd] [1:59:05][Shuffle][:blackboard :performance :simd] [2:00:41][@3ygun][Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?][:language] [2:01:11][@billdstrong][Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?][:lighting :performance] [2:01:40][Thank you, everyone] [/video]