From c74acbb031ee23955b6f78bc717d5162c39db72f Mon Sep 17 00:00:00 2001
From: Matt Mascarenhas <miblodelcarpio@gmail.com>
Date: Tue, 3 Mar 2020 02:05:17 +0000
Subject: [PATCH] Index hero/code582

---
 cmuratori/hero/code/code582.hmml | 100 +++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)
 create mode 100644 cmuratori/hero/code/code582.hmml

diff --git a/cmuratori/hero/code/code582.hmml b/cmuratori/hero/code/code582.hmml
new file mode 100644
index 0000000..8ba9961
--- /dev/null
+++ b/cmuratori/hero/code/code582.hmml
@@ -0,0 +1,100 @@
+[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Converting Specular Maps to Diffuse" vod_platform=youtube id=YyEvNfCgkJ0 annotator=Miblo]
+[0:03][Recap and set the stage for the day, debugging the :lighting stability over time, and efficiently :sampling diffuse surface reflections][:speech]
+[4:29][Diffuse :sampling efficiency: 1) Downsampling our map][:lighting :speech]
+[7:13][Diffuse :sampling efficiency: 2) Take multiple interior samples, e.g. 4×4][:lighting :speech]
+[8:48][Diffuse :sampling efficiency: 3) Downsampling, with pre-computed diffusion][:lighting :speech]
+[9:48][Full diffuse :sampling solution: Cosine-weight an 8×8 specular to 8×8 diffuse solution][:lighting :speech]
+[14:30][Remove OutputLightingPointsRecurse() and GetCurrentQuads(), and switch EndLightingComputation() to TEST_LIGHT_SPHERE][:lighting]
+[15:51][Demo our specular :lighting][:run]
+[16:46][Set up to convert our specular maps to diffuse][:lighting :research]
+[17:59][Switch EndLightingComputation() to TestCastFromProbes() and TEST_LIGHT_TRANSFER][:lighting]
+[19:10][See :lighting weirdness][:run]
+[20:27][Introduce TestLightSphere() to perform the TEST_LIGHT_SPHERE :lighting voxel writing code from EndLightingComputation()]
+[24:16][See our light sphere test working as before][:lighting :run]
+[24:31][Remove the TEST_LIGHT_SPHERE code from EndLightingComputation()][:lighting]
+[25:45][See our light sphere test continuing to work as before][:lighting :run]
+[25:53][Embark on our full-fat specular–diffuse conversion in EndLightingComputation()][:lighting :sampling]
+[27:54][Calculate our operations per map: 4096][:admin :lighting]
+[28:54][Set up EndLightingComputation() to sum up weighted samples][:lighting :sampling]
+[30:20][Big O notation, and separable filters][:performance :speech]
+[32:50][Introduce diffuse_weight_map for lighting_solution to contain and EndLightingComputation() to use][:"data structure" :lighting :sampling]
+[36:45][Introduce BuildDiffuseLightMaps() for InitLighting() to call, and DirectionFromTxTy() based on TestLightSphere()][:lighting :sampling]
+[41:52][See slow, but beautiful diffuse :lighting][:performance :run :sampling]
+[45:21][Toggle on the light map viewers in OpenGLEndFrame()][:"debug visualisation" :lighting]
+[45:46][Check out the light maps][:"debug visualisation" :lighting :run]
+[46:56][Set up to optimise the diffuse :lighting][:optimisation :research]
+[49:05][Compute 4-wide the diffuse light :sampling in EndLightingComputation()][:lighting :optimisation :simd]
+[53:36][Consider how to store our diffuse :lighting data for :SIMD computation][:"data structure" :research]
+[59:57][Store our diffuse :lighting data RRRR GGGG BBBB and swizzle it on output, introducing Transpose()][:simd]
+[1:03:35][Swizzling RRRR GGGG BBBB to RGBR GBRG BRGB][:blackboard :simd]
+[1:09:38][Consider how best to swizzle our diffuse :lighting data][:research :simd]
+[1:11:07][Load-Without-Broadcast][:blackboard :simd]
+[1:14:11][Trying _mm_shuffle_ps() or _mm_unpacklo_ps()[ref
+    site=Intel
+    page="Intel Intrinsics Guide"
+    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] to swizzle][:blackboard :research :simd]
+[1:26:00][Non-interleaved unpack using _mm_unpackhi_pd()[ref
+    site=Intel
+    page="Intel Intrinsics Guide"
+    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
+[1:30:41][Masked picking from two :SIMD values using _mm_blend_ps()[ref
+    site=Intel
+    page="Intel Intrinsics Guide"
+    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
+[1:32:43][Trying _mm_blend_ps() to swizzle][:blackboard :research :simd]
+[1:35:29][Trying our full RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:blackboard :research :simd]
+[1:41:19][All the possible unpacks][:blackboard :research :simd]
+[1:46:32][Constraints on Final Op][:blackboard :research :simd]
+[1:52:46][Introduce Transpose() from six Shuffle4X() operations, for RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:lighting :simd]
+[2:00:03][Define Shuffle4x() macro[ref
+    site=Intel
+    page="Intel Intrinsics Guide"
+    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :simd]
+[2:03:09][Fill Transpose() with known values for testing][:lighting :simd]
+[2:06:26][Switch diffuse_weight_map and BuildDiffuseLightMaps() to be :SIMD][:"data structure" :lighting]
+[2:08:42][See faster diffuse :lighting][:performance :run :sampling :simd]
+[2:09:16][Try to step in to Transpose(), but hit a read-access violation on DestD\[Tx4\] in EndLightingComputation()][:lighting :run :sampling :simd]
+[2:10:19][Force DestC and DestD to be stored unaligned using _mm_storeu_ps() in EndLightingComputation()][:lighting :simd]
+[2:11:56][Try to step in to Transpose()][:lighting :run :sampling :simd]
+[2:12:13][Make EndLightingComputation() call Transpose()][:lighting :simd]
+[2:12:28][Step in to Transpose() to see what it produces][:lighting :run :simd]
+[2:12:54][Fix Transpose() to set the Order as desired][:lighting :simd]
+[2:13:18][Step in to Transpose() to see that it swizzles incorrectly][:lighting :run :simd]
+[2:13:42][Fix Shuffle4x() to shift by 2-bytes][:lighting :simd]
+[2:14:39][Step in to Transpose() to see that it swizzles more sanely, but still incorrectly][:lighting :run :simd]
+[2:19:39][Check our 6-shuffle swizzle][:blackboard :research :simd]
+[2:20:13][Fix Transpose() to interleave every third lane][:lighting :simd]
+[2:21:47][Step in to Transpose() to see that it swizzles 100% correctly][:lighting :run :simd]
+[2:22:17][Add a VERIFY_SHUFFLE preprocessor path in Transpose()][:lighting :simd]
+[2:23:03][Admire our almost right :lighting][:run :simd]
+[2:24:26][Fix BuildDiffuseLightMaps() and diffuse_weight_map inspired by EndLightingComputation()][:"data structure" :lighting :simd]
+[2:31:22][Admire our closer to correct :lighting][:run :simd]
+[2:33:12][Q&A][:speech]
+[2:34:48][Transposes are symmetric][:simd :speech]
+[2:35:30][@vaualbus][Q: Why can't you do the unpack on the GPU side?][:simd]
+[2:36:47][@xxthebigfoxx][Q: Looks like in this article[ref
+    author="Stan Melax"
+    publisher="Intel Developer Zone"
+    title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
+    url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] they do it in six shuffles similarly to how you do it][:simd]
+[2:37:29][@squareysgames][Q: Could it be feasible to pack into a different GPU-supported format, maybe even compressed?][:simd]
+[2:38:28][@xxthebigfoxx][Q: They[ref
+    author="Stan Melax"
+    publisher="Intel Developer Zone"
+    title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
+    url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] have an example on 128 bits, second picture][:simd]
+[2:38:41][Consider the :performance of dependent shuffles, all on port 5[ref
+    site=uops.info
+    url=https://uops.info/table.html]][:research :simd]
+[2:42:56][@runamar][Q: Would you consider using the vector extension from clang if you would go clang only?][:simd]
+[2:43:22][@vaualbus][Q: Do all the work we did to generate the sphere samples and not use them anymore?][:lighting]
+[2:43:26][Show the sphere samples-based :lighting][:run]
+[2:45:04][@brian_nevec][Q: 15FPS? Ship it!][:performance]
+[2:45:37][@sc4llywag][Q: What's next on the TODO list after :lighting?]
+[2:46:04][@xxthebigfoxx][Q: Is there a reasonable :performance difference between aligned and unaligned move in SSE2?]
+[2:49:00][@jessem3y3r][Q: How do you reason about separable filters when, say, running along the Y axis for an image kernel might incur a cache miss on each sequential access (for non-block storage)?][:filtering]
+[2:50:05][@xxthebigfoxx][Q: But is there a reason to use MOVAPS instead of MOVUPS, then? Other than MOVAPS throws an exception if you are not aligned][:simd]
+[2:51:22][@runamar][Q: So how do we go back to 60FPS now?][:performance]
+[2:52:52][@maliusarth][Q: You said, after :lighting you'll start with level design etc. so how far from release is [~hero Handmade Hero]?]
+[2:53:52][Wrap it up with a glimpse into the future][:speech]
+[/video]