diff --git a/cmuratori/hero/code/code614.hmml b/cmuratori/hero/code/code614.hmml new file mode 100644 index 0000000..527b192 --- /dev/null +++ b/cmuratori/hero/code/code614.hmml @@ -0,0 +1,59 @@ +[video output=day614 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Continuing Streamlining the Raycaster" vod_platform=youtube id=IxeKOAcvgK0 annotator=Miblo] +[0:01][Welcome to the stream][:speech] +[0:06][Determine to continue with :optimisation][:lighting :run] +[0:57][Recap yesterday's welding :optimisation in GridRayCast()][:lighting :research] +[4:09][Consider :optimisation potential of the SpecTexel load / stores in GridRayCast()][:lighting :research] +[7:22][Illustrate the possibility of loading in the SpecTexel values and InvBlend at the outset][:lighting :optimisation] +[9:23][Seek easier :optimisation opportunities in GridRayCast()][:lighting :research] +[11:43][Simplify out OcclusionN from GridRayCast()][:lighting :optimisation] +[12:27][Seek :optimisation with OcclusionD and RayD in GridRayCast()][:lighting :research] +[18:48][Streamline the SignRayD and NormalXYZ computations in GridRayCast()][:lighting :optimisation :simd] +[25:35][Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()][:lighting :research :simd] +[30:30][Streamline the Normal selection in GridRayCast()][:lighting :optimisation :simd] +[34:46][Check out the port usage of various instructions, noting that we may get an AND for free[ref + site=uops.info + url=https://uops.info/table.html]][:isa :research] +[40:23][Continue to streamline the Normal selection in GridRayCast(), introducing a NormalTable, before toggling back to the old code][:lighting :optimisation :simd] +[48:12][:Run successfully][:lighting] +[48:31][Streamline the ProbeSampleNSingle usage in GridRayCast()][:lighting :optimisation :simd] +[55:01][:Run successfully, and consider unit testing the grid ray cast][:lighting] +[56:49][Treat ProbeSampleNSingle wide in GridRayCast()][:lighting :optimisation :simd] +[1:01:34][:Run successfully][:lighting] +[1:01:50][Treat OcclusionD wide in GridRayCast()][:lighting :optimisation :simd] +[1:03:28][:Run successfully][:lighting] +[1:04:02][Finish streamlining the Normal selection in GridRayCast()][:lighting :optimisation :simd] +[1:07:46][:Run successfully][:lighting] +[1:08:13][Temporarily try hard setting the NormalIndex to 0 in GridRayCast()][:lighting :optimisation :simd] +[1:08:27][We can't tell it's wrong][:lighting :optimisation :run :simd] +[1:08:56][Let GridRayCast() set the computed NormalIndex and make a note to test this][:lighting :optimisation :simd] +[1:09:36][hhlightprof total seconds elapsed: 4.534789][:lighting :performance :run :simd] +[1:10:20][Simplify out tUpdateBlend in GridRayCast()][:lighting :optimisation :simd] +[1:12:49][Augment light_atlas with StrideXYZ_4x and VoxelDim_4x][:"data structure" :lighting :optimisation :simd] +[1:17:45][:Run successfully][:lighting] +[1:17:54][Make MakeLightAtlas() set the StrideXYZ and VoxelDim, for GridRayCast() to load out of that atlas, changing their format in light_atlas to be an array of 4][:"data structure" :lighting :optimisation :simd] +[1:20:37][:Run successfully][:lighting] +[1:20:46][hhlightprof total seconds elapsed: 4.513986][:lighting :performance :run :simd] +[1:22:09][Remove the old AABBRayCast()][:lighting] +[1:24:42][:Run successfully][:lighting] +[1:24:51][Prepare lighting_box to pack down to 64-bits total, propagating this change][:"data structure" :lighting] +[1:28:29][:Run successfully][:lighting] +[1:28:38][Clean out the sprawl from FullCast()][:lighting :optimisation :simd] +[1:36:20][:Run successfully][:lighting] +[1:36:25][Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd] +[1:39:21][hhlightprof total seconds elapsed: 4.511818][:lighting :performance :run :simd] +[1:39:36][Extend GridRayCast() to operate on twice as many samples][:lighting :optimisation :simd] +[1:40:44][:Run successfully][:lighting] +[1:40:46][hhlightprof total seconds elapsed: 4.394170][:lighting :performance :run :simd] +[1:41:52][Toggle off the debug code in FullCast()][:"debug system" :lighting] +[1:43:26][hhlightprof total seconds elapsed: 4.392245][:lighting :performance :run :simd] +[1:43:41][Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd] +[1:45:57][Q&A][:speech] +[1:47:07][@mindmark42][Q: Yesterday you changed your :SIMD extract functions to use shuffles instead. Could you explain again why that is better?][:performance] +[1:47:26][Extract vs Shuffle][:blackboard :performance :simd] +[1:56:14]["Semantic" Extraction][:blackboard :language :performance :simd] +[1:58:02][Unnecessary extract and cast, with thanks to @mmozeiko][:blackboard :performance :simd] +[1:59:05][Shuffle][:blackboard :performance :simd] +[2:00:41][@3ygun][Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?][:language] +[2:01:11][@billdstrong][Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?][:lighting :performance] +[2:01:40][Thank you, everyone] +[/video]