[video output=day613 member=cmuratori stream_platform=twitch stream_username=handmade_hero  project=code title="Merging the Raycaster with the Sampler" vod_platform=youtube id=yLOCk-utMbE annotator=Miblo]
[0:04][Recap and set the stage for the day][:speech]
[0:25][Our world remains in the dark][:lighting :run]
[1:08][Let GridRayCast() set a non-zero CostMetric][:lighting]
[1:25][Demo the current :lighting][:run]
[1:36][Describe and consider the :performance of our :lighting][:speech]
[4:39][Rare ~4coder crash][:admin]
[5:26][Consider gauging the grid ray casting :performance if ComputeVoxelIrradianceAt() was optimal][:lighting :speech]
[8:25][Break into ComputeVoxelIrradianceAt()][:lighting :run]
[9:29][~RemedyBG feature request: Tabulated / colourised disassembly][:admin :asm :ui]
[12:17][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run]
[12:26][:Research the comiss instruction[ref
    site=uops.info
    url=https://uops.info/table.html][ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:asm]
[15:40][Try to interpret the origin of the comiss instructions][:asm :lighting :research]
[21:06][Compare our f32_4x and f32 versions of AbsoluteValue() in the Compiler Explorer[ref
    site="Compiler Explorer"
    url=https://godbolt.org]][:asm :mathematics :research]
[29:58][Point out our comiss instructions][:asm :lighting :run]
[30:16][Redo our f32 version of AbsoluteValue() based on the f32_4x version[ref
    site=uops.info
    url=https://uops.info/table.html][ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics :simd]
[33:09][Our comiss instructions are replaced with a call, which in turn retains the comiss ones][:asm :lighting :run]
[34:15][Revert the f32 version of AbsoluteValue() to use fabs()][:mathematics :simd]
[34:27][Our comiss instructions are back, welded in][:asm :lighting :run]
[34:47][Weld GetOctahedralOffset() in to ComputeVoxelIrradianceAt()][:lighting]
[36:37][Our comiss instructions remain][:asm :lighting :run]
[38:01][Weld OctahedralFromUnitVector() in to ComputeVoxelIrradianceAt()][:lighting]
[39:10][SignOf() is the source of two comiss instructions][:lighting :research]
[40:04][Make SignOf() branchless[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:mathematics]
[44:34][Two of our comiss instructions are gone][:asm :lighting :run]
[45:26][The :lighting still looks the same][:run]
[46:12][Make the UV computation in ComputeVoxelIrradianceAt() fully branchless][:lighting]
[50:06][Note why _mm_extract_ps() is often not a good idea, as @mmozeiko pointed out[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:simd :speech]
[53:18][Finish making the UV computation in ComputeVoxelIrradianceAt() fully branchless, also noting to change Extract1() and Extract2() to use SHUF+CVTSS][:lighting]
[54:16][All of our comiss instructions are gone][:asm :lighting :run]
[54:46][Seek improvements to our ComputeVoxelIrradianceAt() vectorisation][:lighting :optimisation :simd :research]
[59:54][Remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :optimisation :simd]
[1:05:02][A few words on being aware of gotchas in poorly designed instruction sets][:isa :speech]
[1:06:45][Continue to remove the stupidity from our ComputeVoxelIrradianceAt() vectorisation, introducing an f32_4x version of SignOf()][:lighting :optimisation :simd]
[1:12:38][Hit a read access violation in ComputeVoxelIrradianceAt()][:lighting :optimisation :run :simd]
[1:13:38][Remove the problematic part of the BaseXYZ computation in ComputeVoxelIrradianceAt()][:lighting :optimisation :simd]
[1:13:50][We no longer hit that read access violation][:lighting :optimisation :run :simd]
[1:14:08][Scrutinise ComputeVoxelIrradianceAt() for bugs][:lighting :optimisation :research :simd]
[1:21:25][Make ComputeVoxelIrradianceAt() call GetOctahedralOffset() as originally][:lighting :optimisation :simd]
[1:22:41][Step into ComputeVoxelIrradianceAt() and compare the Txy and Check][:lighting :optimisation :run :simd]
[1:24:48][Weld GetOctahedralOffset() into ComputeVoxelIrradianceAt() to facilitate closer comparison][:lighting :optimisation :simd]
[1:27:47][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd]
[1:29:49][Weld OctahedralFromUnitVector() into ComputeVoxelIrradianceAt() to facilitate comparison][:lighting :optimisation :simd]
[1:32:25][Hit our assertion in ComputeVoxelIrradianceAt(), and compare the check and newly computed values][:lighting :optimisation :run :simd]
[1:32:54][Fix ComputeVoxelIrradianceAt() to use AbsoluteValue() when computing the OneNorm][:lighting :mathematics :optimisation :simd]
[1:33:26][We run successfully][:lighting :optimisation :run :simd]
[1:33:35][Remove the checking code from ComputeVoxelIrradianceAt()][:lighting :optimisation :simd]
[1:33:51][Our ray casting :performance is improving][:lighting :optimisation :run :simd]
[1:35:38][hhlightprof total seconds elapsed: 5.244874][:lighting :performance :run :simd]
[1:37:33][Inspect the assembly of ComputeVoxelIrradianceAt()][:asm :lighting :run]
[1:38:26][Replace Extract0(), Extract1() and Extract2() with ConvertF32() and ConvertS32()][:simd]
[1:42:08][Our :lighting looks the same][:run]
[1:42:15][hhlightprof total seconds elapsed: 5.055217][:lighting :performance :run :simd]
[1:43:11][Change ComputeVoxelIrradianceAt() to return the f32_4x ResultRGB, for the callers to use directly][:lighting :optimisation :simd]
[1:46:13][Our :lighting looks the same][:run]
[1:46:18][hhlightprof total seconds elapsed: 4.963887][:lighting :performance :run :simd]
[1:47:14][Weld ComputeVoxelIrradianceAt() straight in to GridRayCast(), to save computing values twice][:lighting :optimisation :simd]
[1:49:16][Our :lighting looks the same][:run]
[1:49:30][hhlightprof total seconds elapsed: 4.701094][:lighting :performance :run :simd]
[1:50:40][Seek further improvements to GridRayCast()][:lighting :optimisation :simd :research]
[1:54:19][Our :lighting looks the same][:run]
[1:54:22][Q&A][:speech]
[1:54:43][@golido3868][Q: Sorry to be off-topic. I've finished all the five days in the Intro to C and it was awesome. But there's a huge gap between the intro and the main course that I'm not able to fully understand. What's your suggestion? I'm new to programming, started reading K&R recently.[ref
    site="Star Code Galaxy"
    url=https://starcodegalaxy.com]]
[1:56:35][Plug Star Code Galaxy[ref
    site="Star Code Galaxy"
    url=https://starcodegalaxy.com]][:research]
[1:57:39][Close it down][:speech]
[/video]