cinera_handmade.network/cmuratori/hero/code/code431.hmml

73 lines
6.8 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video output=day431 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SIMD Raycasting" vod_platform=youtube id=ZnDtlj-_LYE annotator=Miblo]
[0:02][Recap and set the stage for the day with a few words on the pace of the project][:speech]
[4:09][:Run the game to see our current software-rendered :lighting, with the determination to see how much :performance we can get out of the CPU][:rendering]
[6:47][Prevent RayCast() from summing the TotalCastsInitiated][:lighting :rendering]
[7:03][:Run the game to see that that does not appreciably affect our :performance][:lighting :rendering]
[8:05][Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around][:profiling :rendering]
[16:08][:Run the game and inspect the profiler][:lighting :performance :rendering]
[16:52][Make lighting_work and lighting_solution cache-aligned][:memory]
[19:53][Cache-alignment and false sharing considerations when :threading][:blackboard :memory]
[24:15][Introduce InitLighting() to align our :lighting data][:memory :rendering]
[31:21][:Run the game and determine to double check that everything is aligned and figure out why the tests are slow]
[32:22][Assert in ComputeLightPropagation() that the lighting_work is aligned][:memory]
[33:17][:Run the game and hit that assertion][:memory]
[34:01][Pad the lighting_work to 64 bytes][:memory]
[34:40][:Run the game to see that we are running at full speed again][:memory :performance]
[35:53][Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()][:lighting :profiling :rendering]
[37:54][:Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them][:performance]
[41:50][Add a TIMED_FUNCTION() in RayCast()][:profiling]
[42:28][:Run the game and consult the profiler to see that our RayCast() is not too bad][:lighting :performance :rendering]
[45:24][Remove the recursion in RayCastRecurse()][:optimisation]
[49:25][:Run the game to see that that greatly improved our :performance]
[50:35][Just make RayCast() perform our new code from RayCastRecurse()][:lighting :rendering]
[52:06][:Run the game to see that that doesn't change our runtime][:performance]
[52:13][Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it][:lighting :rendering]
[53:09][Add a TIMED_FUNCTION() in RayCast()][:profiling]
[53:25][:Run the game and see the :performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next][:optimisation]
[55:45][Consider performing RayCast() in SIMD][:lighting :optimisation :rendering :research]
[1:00:21][Reduce RayCount from 64 to 16 in ComputeLightPropagation()][:lighting :rendering]
[1:00:36][:Run the game to see what kind of a speedup SIMD could provide][:optimisation]
[1:01:42][Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()][:lighting :optimisation :rendering]
[1:10:19][Update RayCast() to work with our SIMD data, introducing GetComponent()][:lighting :optimisation :rendering]
[1:22:52][:Run the game, crash in RayCast() and investigate why][:lighting :optimisation :rendering]
[1:24:30][Temporarily disable :threading in ComputeLightPropagation()][:lighting :optimisation :rendering]
[1:26:14][Step through RayCast() to try and see what's going wrong][:lighting :optimisation :rendering :run]
[1:32:12][Assert in RayCast() that the Depth is < BoxStack][:lighting :optimisation :rendering]
[1:33:04][Step back into RayCast(), and eventually hit that assertion][:lighting :optimisation :rendering :run]
[1:35:05][Increase BoxStack from 32 to 64 in RayCast()][:lighting :optimisation :rendering]
[1:35:30][:Run the game and see nothing]
[1:35:56][Prevent RayCast() from pushing on the boxes four times][:lighting :optimisation :rendering]
[1:37:42][:Run the game to see that our :performance has decreased]
[1:38:18][Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD][:lighting :optimisation :rendering]
[1:45:35][Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, , | and &[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:optimisation]
[1:59:56][Step in to RayCast() and inspect our SIMD data][:lighting :optimisation :rendering :run]
[2:01:59][Prevent AbsoluteValue() from converting the Mask to a float][:optimisation]
[2:03:15][Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()][:optimisation :run]
[2:06:13][:Run the game to see that our :lighting is a little bit messed up][:optimisation :rendering]
[2:07:19][Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD][:lighting :optimisation :rendering]
[2:11:28][:Run the game and fail to hit our verification assertion][:lighting :optimisation :rendering]
[2:13:10][Prevent RayCast() from erroneously breaking out of the child box loop][:lighting :optimisation :rendering]
[2:13:48][:Run the game to see that we are correct][:lighting :optimisation :rendering]
[2:14:07][Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of for full 4-wide negation, < and *, a f32_4x struct and versions of , * and / that use this type[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :rendering]
[3:19:18][Step in to RayCast() to inspect its values][:lighting :optimisation :rendering :run]
[3:21:33][Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=]
[3:28:15][Step in to RayCast() and inspect the FaceRelOrigin][:lighting :optimisation :rendering :run]
[3:29:19][:Run the game to see that we are producing the correct :lighting a little faster][:optimisation :rendering]
[3:29:50][Enable RayCast() to perform the bounds checking in SIMD][:lighting :optimisation :rendering]
[3:41:11][Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need][:optimisation]
[4:02:30][:Run the game to see that it's totally fine][:lighting :optimisation :rendering]
[4:03:01][Read through RayCast() with the determination to finish doing it all in SIMD][:lighting :optimisation :rendering :research]
[4:04:22][Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow][:lighting :optimisation :rendering]
[4:11:01][Q&A][:speech]
[4:12:01][Clarify the video capture card situation mentioned in the pre-stream][:speech]
[4:12:58][@wired_life][Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?]
[4:14:29]["And" and "All" in a matrix][:blackboard :optimisation]
[4:15:59][Call it there][:speech]
[/video]