diff --git a/cmuratori/hero/code/code613.hmml b/cmuratori/hero/code/code613.hmml new file mode 100644 index 0000000..5390be9 --- /dev/null +++ b/cmuratori/hero/code/code613.hmml @@ -0,0 +1,91 @@ +[video output=day613 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Merging the Raycaster with the Sampler" vod_platform=youtube id=yLOCk-utMbE annotator=Miblo] +[0:04][Recap and set the stage for the day][:speech] +[0:25][Our world remains in the dark][:lighting :run] +[1:08][Let GridRayCast() set a non-zero CostMetric][:lighting] +[1:25][Demo the current :lighting][:run] +[1:36][Describe and consider the :performance of our :lighting][:speech] +[4:39][Rare ~4coder crash][:admin] +[5:26][Consider gauging the grid ray casting :performance if ComputeVoxelIrradianceAt() was optimal][:lighting :speech] +[8:25][Break into ComputeVoxelIrradianceAt()][:lighting :run] +[9:29][~RemedyBG feature request: Tabulated / colourised disassembly][:admin :asm :ui] +[12:17][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run] +[12:26][:Research the comiss instruction[ref + site=uops.info + url=https://uops.info/table.html][ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:asm] +[15:40][Try to interpret the origin of the comiss instructions][:asm :lighting :research] +[21:06][Compare our f32_4x and f32 versions of AbsoluteValue() in the Compiler Explorer[ref + site="Compiler Explorer" + url=https://godbolt.org]][:asm :mathematics :research] +[29:58][Point out our comiss instructions][:asm :lighting :run] +[30:16][Redo our f32 version of AbsoluteValue() based on the f32_4x version[ref + site=uops.info + url=https://uops.info/table.html][ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics :simd] +[33:09][Our comiss instructions are replaced with a call, which in turn retains the comiss ones][:asm :lighting :run] +[34:15][Revert the f32 version of AbsoluteValue() to use fabs()][:mathematics :simd] +[34:27][Our comiss instructions are back, welded in][:asm :lighting :run] +[34:47][Weld GetOctahedralOffset() in to ComputeVoxelIrradianceAt()][:lighting] +[36:37][Our comiss instructions remain][:asm :lighting :run] +[38:01][Weld OctahedralFromUnitVector() in to ComputeVoxelIrradianceAt()][:lighting] +[39:10][SignOf() is the source of two comiss instructions][:lighting :research] +[40:04][Make SignOf() branchless[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics] +[44:34][Two of our comiss instructions are gone][:asm :lighting :run] +[45:26][The :lighting still looks the same][:run] +[46:12][Make the UV computation in ComputeVoxelIrradianceAt() fully branchless][:lighting] +[50:06][Note why _mm_extract_ps() is often not a good idea, as @mmozeiko pointed out[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:simd :speech] +[53:18][Finish making the UV computation in ComputeVoxelIrradianceAt() fully branchless, also noting to change Extract1() and Extract2() to use SHUF+CVTSS][:lighting] +[54:16][All of our comiss instructions are gone][:asm :lighting :run] +[54:46][Seek improvements to our ComputeVoxelIrradianceAt() vectorisation][:lighting :optimisation :simd :research] +[59:54][Remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :simd] +[1:05:02][A few words on being aware of gotchas in poorly designed instruction sets][:isa :speech] +[1:06:45][Continue to remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation, introducing an f32_4x version of SignOf()][:lighting :optimisation :simd] +[1:12:38][Hit a read access violation in ComputeVoxelIrradianceAt()][:lighting :optimisation :run :simd] +[1:13:38][Remove the problematic part of the BaseXYZ computation in ComputeVoxelIrradianceAt()][:lighting :optimisation :simd] +[1:13:50][We no longer hit that read access violation][:lighting :optimisation :run :simd] +[1:14:08][Scrutinise ComputeVoxelIrradianceAt() for bugs][:lighting :optimisation :research :simd] +[1:21:25][Make ComputeVoxelIrradianceAt() call GetOctahedralOffset() as originally][:lighting :optimisation :simd] +[1:22:41][Step into ComputeVoxelIrradianceAt() and compare the Txy and Check][:lighting :optimisation :run :simd] +[1:24:48][Weld GetOctahedralOffset() into ComputeVoxelIrradianceAt() to facilitate closer comparison][:lighting :optimisation :simd] +[1:27:47][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd] +[1:29:49][Weld OctahedralFromUnitVector() into ComputeVoxelIrradianceAt() to facilitate comparison][:lighting :optimisation :simd] +[1:32:25][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd] +[1:32:54][Fix ComputeVoxelIrradianceAt() to use AbsoluteValue() when computing the OneNorm][:lighting :mathematics :optimisation :simd] +[1:33:26][We run successfully][:lighting :optimisation :run :simd] +[1:33:35][Remove the checking code from ComputeVoxelIrradianceAt()][:lighting :optimisation :simd] +[1:33:51][Our ray casting :performance is improving][:lighting :optimisation :run :simd] +[1:35:38][hhlightprof total seconds elapsed: 5.244874][:lighting :performance :run :simd] +[1:37:33][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run] +[1:38:26][Replace Extract0(), Extract1() and Extract2() with ConvertF32() and ConvertS32()][:simd] +[1:42:08][Our :lighting looks the same][:run] +[1:42:15][hhlightprof total seconds elapsed: 5.055217][:lighting :performance :run :simd] +[1:43:11][Change ComputeVoxelIrradianceAt() to return the f32_4x ResultRGB, for the callers to use directly][:lighting :optimisation :simd] +[1:46:13][Our :lighting looks the same][:run] +[1:46:18][hhlightprof total seconds elapsed: 4.963887][:lighting :performance :run :simd] +[1:47:14][Weld ComputeVoxelIrradianceAt() straight in to GridRayCast(), to save computing values twice][:lighting :optimisation :simd] +[1:49:16][Our :lighting looks the same][:run] +[1:49:30][hhlightprof total seconds elapsed: 4.701094][:lighting :performance :run :simd] +[1:50:40][Seek further improvements to GridRayCast()][:lighting :optimisation :simd :research] +[1:54:19][Our :lighting looks the same][:run] +[1:54:22][Q&A][:speech] +[1:54:43][@golido3868][Q: Sorry to be off-topic. I've finished all the five days in the Intro to C and it was awesome. But there's a huge gap between the intro and the main course that I'm not able to fully understand. What's your suggestion? I'm new to programming, started reading K&R recently.[ref + site="Star Code Galaxy" + url=https://starcodegalaxy.com]] +[1:56:35][Plug Star Code Galaxy[ref + site="Star Code Galaxy" + url=https://starcodegalaxy.com]][:research] +[1:57:39][Close it down][:speech] +[/video]