[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SIMD Raycasting" vod_platform=youtube id=ZnDtlj-_LYE annotator=Miblo]
[0:02][Recap and set the stage for the day with a few words on the pace of the project][:speech]
[4:09][:Run the game to see our current software-rendered :lighting, with the determination to see how much :performance we can get out of the CPU][:rendering]
[6:47][Prevent RayCast() from summing the TotalCastsInitiated][:lighting :rendering]
[7:03][:Run the game to see that that does not appreciably affect our :performance][:lighting :rendering]
[8:05][Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around][:profiling :rendering]
[16:08][:Run the game and inspect the profiler][:lighting :performance :rendering]
[16:52][Make lighting_work and lighting_solution cache-aligned][:memory]
[19:53][Cache-alignment and false sharing considerations when :threading][:blackboard :memory]
[24:15][Introduce InitLighting() to align our :lighting data][:memory :rendering]
[31:21][:Run the game and determine to double check that everything is aligned and figure out why the tests are slow]
[32:22][Assert in ComputeLightPropagation() that the lighting_work is aligned][:memory]
[33:17][:Run the game and hit that assertion][:memory]
[34:01][Pad the lighting_work to 64 bytes][:memory]
[34:40][:Run the game to see that we are running at full speed again][:memory :performance]
[35:53][Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()][:lighting :profiling :rendering]
[37:54][:Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them][:performance]
[41:50][Add a TIMED_FUNCTION() in RayCast()][:profiling]
[42:28][:Run the game and consult the profiler to see that our RayCast() is not too bad][:lighting :performance :rendering]
[45:24][Remove the recursion in RayCastRecurse()][:optimisation]
[49:25][:Run the game to see that that greatly improved our :performance]
[50:35][Just make RayCast() perform our new code from RayCastRecurse()][:lighting :rendering]
[52:06][:Run the game to see that that doesn't change our runtime][:performance]
[52:13][Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it][:lighting :rendering]
[53:09][Add a TIMED_FUNCTION() in RayCast()][:profiling]
[53:25][:Run the game and see the :performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next][:optimisation]
[55:45][Consider performing RayCast() in SIMD][:lighting :optimisation :rendering :research]
[1:00:21][Reduce RayCount from 64 to 16 in ComputeLightPropagation()][:lighting :rendering]
[1:00:36][:Run the game to see what kind of a speedup SIMD could provide][:optimisation]
[1:01:42][Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()][:lighting :optimisation :rendering]
[1:10:19][Update RayCast() to work with our SIMD data, introducing GetComponent()][:lighting :optimisation :rendering]
[1:22:52][:Run the game, crash in RayCast() and investigate why][:lighting :optimisation :rendering]
[1:24:30][Temporarily disable :threading in ComputeLightPropagation()][:lighting :optimisation :rendering]
[1:26:14][Step through RayCast() to try and see what's going wrong][:lighting :optimisation :rendering :run]
[1:32:12][Assert in RayCast() that the Depth is < BoxStack][:lighting :optimisation :rendering]
[1:33:04][Step back into RayCast(), and eventually hit that assertion][:lighting :optimisation :rendering :run]
[1:35:05][Increase BoxStack from 32 to 64 in RayCast()][:lighting :optimisation :rendering]
[1:35:30][:Run the game and see nothing]
[1:35:56][Prevent RayCast() from pushing on the boxes four times][:lighting :optimisation :rendering]
[1:37:42][:Run the game to see that our :performance has decreased]
[1:38:18][Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD][:lighting :optimisation :rendering]
[1:45:35][Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:optimisation]
[1:59:56][Step in to RayCast() and inspect our SIMD data][:lighting :optimisation :rendering :run]
[2:01:59][Prevent AbsoluteValue() from converting the Mask to a float][:optimisation]
[2:03:15][Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()][:optimisation :run]
[2:06:13][:Run the game to see that our :lighting is a little bit messed up][:optimisation :rendering]
[2:07:19][Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD][:lighting :optimisation :rendering]
[2:11:28][:Run the game and fail to hit our verification assertion][:lighting :optimisation :rendering]
[2:13:10][Prevent RayCast() from erroneously breaking out of the child box loop][:lighting :optimisation :rendering]
[2:13:48][:Run the game to see that we are correct][:lighting :optimisation :rendering]
[2:14:07][Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :rendering]
[3:19:18][Step in to RayCast() to inspect its values][:lighting :optimisation :rendering :run]
[3:21:33][Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=]
[3:28:15][Step in to RayCast() and inspect the FaceRelOrigin][:lighting :optimisation :rendering :run]
[3:29:19][:Run the game to see that we are producing the correct :lighting a little faster][:optimisation :rendering]
[3:29:50][Enable RayCast() to perform the bounds checking in SIMD][:lighting :optimisation :rendering]
[3:41:11][Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need][:optimisation]
[4:02:30][:Run the game to see that it's totally fine][:lighting :optimisation :rendering]
[4:03:01][Read through RayCast() with the determination to finish doing it all in SIMD][:lighting :optimisation :rendering :research]
[4:04:22][Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow][:lighting :optimisation :rendering]
[4:11:01][Q&A][:speech]
[4:12:01][Clarify the video capture card situation mentioned in the pre-stream][:speech]
[4:12:58][@wired_life][Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?]
[4:14:29]["And" and "All" in a matrix][:blackboard :optimisation]
[4:15:59][Call it there][:speech]
[/video]