[video output=day613 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Merging the Raycaster with the Sampler" vod_platform=youtube id=yLOCk-utMbE annotator=Miblo] [0:04][Recap and set the stage for the day][:speech] [0:25][Our world remains in the dark][:lighting :run] [1:08][Let GridRayCast() set a non-zero CostMetric][:lighting] [1:25][Demo the current :lighting][:run] [1:36][Describe and consider the :performance of our :lighting][:speech] [4:39][Rare ~4coder crash][:admin] [5:26][Consider gauging the grid ray casting :performance if ComputeVoxelIrradianceAt() was optimal][:lighting :speech] [8:25][Break into ComputeVoxelIrradianceAt()][:lighting :run] [9:29][~RemedyBG feature request: Tabulated / colourised disassembly][:admin :asm :ui] [12:17][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run] [12:26][:Research the comiss instruction[ref site=uops.info url=https://uops.info/table.html][ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:asm] [15:40][Try to interpret the origin of the comiss instructions][:asm :lighting :research] [21:06][Compare our f32_4x and f32 versions of AbsoluteValue() in the Compiler Explorer[ref site="Compiler Explorer" url=https://godbolt.org]][:asm :mathematics :research] [29:58][Point out our comiss instructions][:asm :lighting :run] [30:16][Redo our f32 version of AbsoluteValue() based on the f32_4x version[ref site=uops.info url=https://uops.info/table.html][ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics :simd] [33:09][Our comiss instructions are replaced with a call, which in turn retains the comiss ones][:asm :lighting :run] [34:15][Revert the f32 version of AbsoluteValue() to use fabs()][:mathematics :simd] [34:27][Our comiss instructions are back, welded in][:asm :lighting :run] [34:47][Weld GetOctahedralOffset() in to ComputeVoxelIrradianceAt()][:lighting] [36:37][Our comiss instructions remain][:asm :lighting :run] [38:01][Weld OctahedralFromUnitVector() in to ComputeVoxelIrradianceAt()][:lighting] [39:10][SignOf() is the source of two comiss instructions][:lighting :research] [40:04][Make SignOf() branchless[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics] [44:34][Two of our comiss instructions are gone][:asm :lighting :run] [45:26][The :lighting still looks the same][:run] [46:12][Make the UV computation in ComputeVoxelIrradianceAt() fully branchless][:lighting] [50:06][Note why _mm_extract_ps() is often not a good idea, as @mmozeiko pointed out[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:simd :speech] [53:18][Finish making the UV computation in ComputeVoxelIrradianceAt() fully branchless, also noting to change Extract1() and Extract2() to use SHUF+CVTSS][:lighting] [54:16][All of our comiss instructions are gone][:asm :lighting :run] [54:46][Seek improvements to our ComputeVoxelIrradianceAt() vectorisation][:lighting :optimisation :simd :research] [59:54][Remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :simd] [1:05:02][A few words on being aware of gotchas in poorly designed instruction sets][:isa :speech] [1:06:45][Continue to remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation, introducing an f32_4x version of SignOf()][:lighting :optimisation :simd] [1:12:38][Hit a read access violation in ComputeVoxelIrradianceAt()][:lighting :optimisation :run :simd] [1:13:38][Remove the problematic part of the BaseXYZ computation in ComputeVoxelIrradianceAt()][:lighting :optimisation :simd] [1:13:50][We no longer hit that read access violation][:lighting :optimisation :run :simd] [1:14:08][Scrutinise ComputeVoxelIrradianceAt() for bugs][:lighting :optimisation :research :simd] [1:21:25][Make ComputeVoxelIrradianceAt() call GetOctahedralOffset() as originally][:lighting :optimisation :simd] [1:22:41][Step into ComputeVoxelIrradianceAt() and compare the Txy and Check][:lighting :optimisation :run :simd] [1:24:48][Weld GetOctahedralOffset() into ComputeVoxelIrradianceAt() to facilitate closer comparison][:lighting :optimisation :simd] [1:27:47][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd] [1:29:49][Weld OctahedralFromUnitVector() into ComputeVoxelIrradianceAt() to facilitate comparison][:lighting :optimisation :simd] [1:32:25][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd] [1:32:54][Fix ComputeVoxelIrradianceAt() to use AbsoluteValue() when computing the OneNorm][:lighting :mathematics :optimisation :simd] [1:33:26][We run successfully][:lighting :optimisation :run :simd] [1:33:35][Remove the checking code from ComputeVoxelIrradianceAt()][:lighting :optimisation :simd] [1:33:51][Our ray casting :performance is improving][:lighting :optimisation :run :simd] [1:35:38][hhlightprof total seconds elapsed: 5.244874][:lighting :performance :run :simd] [1:37:33][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run] [1:38:26][Replace Extract0(), Extract1() and Extract2() with ConvertF32() and ConvertS32()][:simd] [1:42:08][Our :lighting looks the same][:run] [1:42:15][hhlightprof total seconds elapsed: 5.055217][:lighting :performance :run :simd] [1:43:11][Change ComputeVoxelIrradianceAt() to return the f32_4x ResultRGB, for the callers to use directly][:lighting :optimisation :simd] [1:46:13][Our :lighting looks the same][:run] [1:46:18][hhlightprof total seconds elapsed: 4.963887][:lighting :performance :run :simd] [1:47:14][Weld ComputeVoxelIrradianceAt() straight in to GridRayCast(), to save computing values twice][:lighting :optimisation :simd] [1:49:16][Our :lighting looks the same][:run] [1:49:30][hhlightprof total seconds elapsed: 4.701094][:lighting :performance :run :simd] [1:50:40][Seek further improvements to GridRayCast()][:lighting :optimisation :simd :research] [1:54:19][Our :lighting looks the same][:run] [1:54:22][Q&A][:speech] [1:54:43][@golido3868][Q: Sorry to be off-topic. I've finished all the five days in the Intro to C and it was awesome. But there's a huge gap between the intro and the main course that I'm not able to fully understand. What's your suggestion? I'm new to programming, started reading K&R recently.[ref site="Star Code Galaxy" url=https://starcodegalaxy.com]] [1:56:35][Plug Star Code Galaxy[ref site="Star Code Galaxy" url=https://starcodegalaxy.com]][:research] [1:57:39][Close it down][:speech] [/video]