From 6680a51c04df8b8298fe997f8dff78c22aca9ef2 Mon Sep 17 00:00:00 2001 From: Matt Mascarenhas Date: Sun, 4 Mar 2018 19:35:35 +0000 Subject: [PATCH] Annotate hero/code431 --- cmuratori/hero/code/code431.hmml | 72 ++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 cmuratori/hero/code/code431.hmml diff --git a/cmuratori/hero/code/code431.hmml b/cmuratori/hero/code/code431.hmml new file mode 100644 index 0000000..0906bb6 --- /dev/null +++ b/cmuratori/hero/code/code431.hmml @@ -0,0 +1,72 @@ +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SIMD Raycasting" vod_platform=youtube id=ZnDtlj-_LYE annotator=Miblo] +[0:02][Recap and set the stage for the day with a few words on the pace of the project][:speech] +[4:09][:Run the game to see our current software-rendered :lighting, with the determination to see how much :performance we can get out of the CPU][:rendering] +[6:47][Prevent RayCast() from summing the TotalCastsInitiated][:lighting :rendering] +[7:03][:Run the game to see that that does not appreciably affect our :performance][:lighting :rendering] +[8:05][Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around][:profiling :rendering] +[16:08][:Run the game and inspect the profiler][:lighting :performance :rendering] +[16:52][Make lighting_work and lighting_solution cache-aligned][:memory] +[19:53][Cache-alignment and false sharing considerations when :threading][:blackboard :memory] +[24:15][Introduce InitLighting() to align our :lighting data][:memory :rendering] +[31:21][:Run the game and determine to double check that everything is aligned and figure out why the tests are slow] +[32:22][Assert in ComputeLightPropagation() that the lighting_work is aligned][:memory] +[33:17][:Run the game and hit that assertion][:memory] +[34:01][Pad the lighting_work to 64 bytes][:memory] +[34:40][:Run the game to see that we are running at full speed again][:memory :performance] +[35:53][Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()][:lighting :profiling :rendering] +[37:54][:Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them][:performance] +[41:50][Add a TIMED_FUNCTION() in RayCast()][:profiling] +[42:28][:Run the game and consult the profiler to see that our RayCast() is not too bad][:lighting :performance :rendering] +[45:24][Remove the recursion in RayCastRecurse()][:optimisation] +[49:25][:Run the game to see that that greatly improved our :performance] +[50:35][Just make RayCast() perform our new code from RayCastRecurse()][:lighting :rendering] +[52:06][:Run the game to see that that doesn't change our runtime][:performance] +[52:13][Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it][:lighting :rendering] +[53:09][Add a TIMED_FUNCTION() in RayCast()][:profiling] +[53:25][:Run the game and see the :performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next][:optimisation] +[55:45][Consider performing RayCast() in SIMD][:lighting :optimisation :rendering :research] +[1:00:21][Reduce RayCount from 64 to 16 in ComputeLightPropagation()][:lighting :rendering] +[1:00:36][:Run the game to see what kind of a speedup SIMD could provide][:optimisation] +[1:01:42][Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()][:lighting :optimisation :rendering] +[1:10:19][Update RayCast() to work with our SIMD data, introducing GetComponent()][:lighting :optimisation :rendering] +[1:22:52][:Run the game, crash in RayCast() and investigate why][:lighting :optimisation :rendering] +[1:24:30][Temporarily disable :threading in ComputeLightPropagation()][:lighting :optimisation :rendering] +[1:26:14][Step through RayCast() to try and see what's going wrong][:lighting :optimisation :rendering :run] +[1:32:12][Assert in RayCast() that the Depth is < BoxStack][:lighting :optimisation :rendering] +[1:33:04][Step back into RayCast(), and eventually hit that assertion][:lighting :optimisation :rendering :run] +[1:35:05][Increase BoxStack from 32 to 64 in RayCast()][:lighting :optimisation :rendering] +[1:35:30][:Run the game and see nothing] +[1:35:56][Prevent RayCast() from pushing on the boxes four times][:lighting :optimisation :rendering] +[1:37:42][:Run the game to see that our :performance has decreased] +[1:38:18][Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD][:lighting :optimisation :rendering] +[1:45:35][Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:optimisation] +[1:59:56][Step in to RayCast() and inspect our SIMD data][:lighting :optimisation :rendering :run] +[2:01:59][Prevent AbsoluteValue() from converting the Mask to a float][:optimisation] +[2:03:15][Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()][:optimisation :run] +[2:06:13][:Run the game to see that our :lighting is a little bit messed up][:optimisation :rendering] +[2:07:19][Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD][:lighting :optimisation :rendering] +[2:11:28][:Run the game and fail to hit our verification assertion][:lighting :optimisation :rendering] +[2:13:10][Prevent RayCast() from erroneously breaking out of the child box loop][:lighting :optimisation :rendering] +[2:13:48][:Run the game to see that we are correct][:lighting :optimisation :rendering] +[2:14:07][Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :rendering] +[3:19:18][Step in to RayCast() to inspect its values][:lighting :optimisation :rendering :run] +[3:21:33][Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=] +[3:28:15][Step in to RayCast() and inspect the FaceRelOrigin][:lighting :optimisation :rendering :run] +[3:29:19][:Run the game to see that we are producing the correct :lighting a little faster][:optimisation :rendering] +[3:29:50][Enable RayCast() to perform the bounds checking in SIMD][:lighting :optimisation :rendering] +[3:41:11][Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need][:optimisation] +[4:02:30][:Run the game to see that it's totally fine][:lighting :optimisation :rendering] +[4:03:01][Read through RayCast() with the determination to finish doing it all in SIMD][:lighting :optimisation :rendering :research] +[4:04:22][Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow][:lighting :optimisation :rendering] +[4:11:01][Q&A][:speech] +[4:12:01][Clarify the video capture card situation mentioned in the pre-stream][:speech] +[4:12:58][@wired_life][Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?] +[4:14:29]["And" and "All" in a matrix][:blackboard :optimisation] +[4:15:59][Call it there][:speech] +[/video]