[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SIMD Raycasting" vod_platform=youtube id=ZnDtlj-_LYE annotator=Miblo] [0:02][Recap and set the stage for the day with a few words on the pace of the project][:speech] [4:09][:Run the game to see our current software-rendered :lighting, with the determination to see how much :performance we can get out of the CPU][:rendering] [6:47][Prevent RayCast() from summing the TotalCastsInitiated][:lighting :rendering] [7:03][:Run the game to see that that does not appreciably affect our :performance][:lighting :rendering] [8:05][Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around][:profiling :rendering] [16:08][:Run the game and inspect the profiler][:lighting :performance :rendering] [16:52][Make lighting_work and lighting_solution cache-aligned][:memory] [19:53][Cache-alignment and false sharing considerations when :threading][:blackboard :memory] [24:15][Introduce InitLighting() to align our :lighting data][:memory :rendering] [31:21][:Run the game and determine to double check that everything is aligned and figure out why the tests are slow] [32:22][Assert in ComputeLightPropagation() that the lighting_work is aligned][:memory] [33:17][:Run the game and hit that assertion][:memory] [34:01][Pad the lighting_work to 64 bytes][:memory] [34:40][:Run the game to see that we are running at full speed again][:memory :performance] [35:53][Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()][:lighting :profiling :rendering] [37:54][:Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them][:performance] [41:50][Add a TIMED_FUNCTION() in RayCast()][:profiling] [42:28][:Run the game and consult the profiler to see that our RayCast() is not too bad][:lighting :performance :rendering] [45:24][Remove the recursion in RayCastRecurse()][:optimisation] [49:25][:Run the game to see that that greatly improved our :performance] [50:35][Just make RayCast() perform our new code from RayCastRecurse()][:lighting :rendering] [52:06][:Run the game to see that that doesn't change our runtime][:performance] [52:13][Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it][:lighting :rendering] [53:09][Add a TIMED_FUNCTION() in RayCast()][:profiling] [53:25][:Run the game and see the :performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next][:optimisation] [55:45][Consider performing RayCast() in SIMD][:lighting :optimisation :rendering :research] [1:00:21][Reduce RayCount from 64 to 16 in ComputeLightPropagation()][:lighting :rendering] [1:00:36][:Run the game to see what kind of a speedup SIMD could provide][:optimisation] [1:01:42][Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()][:lighting :optimisation :rendering] [1:10:19][Update RayCast() to work with our SIMD data, introducing GetComponent()][:lighting :optimisation :rendering] [1:22:52][:Run the game, crash in RayCast() and investigate why][:lighting :optimisation :rendering] [1:24:30][Temporarily disable :threading in ComputeLightPropagation()][:lighting :optimisation :rendering] [1:26:14][Step through RayCast() to try and see what's going wrong][:lighting :optimisation :rendering :run] [1:32:12][Assert in RayCast() that the Depth is < BoxStack][:lighting :optimisation :rendering] [1:33:04][Step back into RayCast(), and eventually hit that assertion][:lighting :optimisation :rendering :run] [1:35:05][Increase BoxStack from 32 to 64 in RayCast()][:lighting :optimisation :rendering] [1:35:30][:Run the game and see nothing] [1:35:56][Prevent RayCast() from pushing on the boxes four times][:lighting :optimisation :rendering] [1:37:42][:Run the game to see that our :performance has decreased] [1:38:18][Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD][:lighting :optimisation :rendering] [1:45:35][Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:optimisation] [1:59:56][Step in to RayCast() and inspect our SIMD data][:lighting :optimisation :rendering :run] [2:01:59][Prevent AbsoluteValue() from converting the Mask to a float][:optimisation] [2:03:15][Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()][:optimisation :run] [2:06:13][:Run the game to see that our :lighting is a little bit messed up][:optimisation :rendering] [2:07:19][Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD][:lighting :optimisation :rendering] [2:11:28][:Run the game and fail to hit our verification assertion][:lighting :optimisation :rendering] [2:13:10][Prevent RayCast() from erroneously breaking out of the child box loop][:lighting :optimisation :rendering] [2:13:48][:Run the game to see that we are correct][:lighting :optimisation :rendering] [2:14:07][Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :rendering] [3:19:18][Step in to RayCast() to inspect its values][:lighting :optimisation :rendering :run] [3:21:33][Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=] [3:28:15][Step in to RayCast() and inspect the FaceRelOrigin][:lighting :optimisation :rendering :run] [3:29:19][:Run the game to see that we are producing the correct :lighting a little faster][:optimisation :rendering] [3:29:50][Enable RayCast() to perform the bounds checking in SIMD][:lighting :optimisation :rendering] [3:41:11][Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need][:optimisation] [4:02:30][:Run the game to see that it's totally fine][:lighting :optimisation :rendering] [4:03:01][Read through RayCast() with the determination to finish doing it all in SIMD][:lighting :optimisation :rendering :research] [4:04:22][Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow][:lighting :optimisation :rendering] [4:11:01][Q&A][:speech] [4:12:01][Clarify the video capture card situation mentioned in the pre-stream][:speech] [4:12:58][@wired_life][Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?] [4:14:29]["And" and "All" in a matrix][:blackboard :optimisation] [4:15:59][Call it there][:speech] [/video]