[video output=day582 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Converting Specular Maps to Diffuse" vod_platform=youtube id=YyEvNfCgkJ0 annotator=Miblo]
[0:03][Recap and set the stage for the day, debugging the :lighting stability over time, and efficiently :sampling diffuse surface reflections][:speech]
[4:29][Diffuse :sampling efficiency: 1) Downsampling our map][:lighting :speech]
[7:13][Diffuse :sampling efficiency: 2) Take multiple interior samples, e.g. 4×4][:lighting :speech]
[8:48][Diffuse :sampling efficiency: 3) Downsampling, with pre-computed diffusion][:lighting :speech]
[9:48][Full diffuse :sampling solution: Cosine-weight an 8×8 specular to 8×8 diffuse solution][:lighting :speech]
[14:30][Remove OutputLightingPointsRecurse() and GetCurrentQuads(), and switch EndLightingComputation() to TEST_LIGHT_SPHERE][:lighting]
[15:51][Demo our specular :lighting][:run]
[16:46][Set up to convert our specular maps to diffuse][:lighting :research]
[17:59][Switch EndLightingComputation() to TestCastFromProbes() and TEST_LIGHT_TRANSFER][:lighting]
[19:10][See :lighting weirdness][:run]
[20:27][Introduce TestLightSphere() to perform the TEST_LIGHT_SPHERE :lighting voxel writing code from EndLightingComputation()]
[24:16][See our light sphere test working as before][:lighting :run]
[24:31][Remove the TEST_LIGHT_SPHERE code from EndLightingComputation()][:lighting]
[25:45][See our light sphere test continuing to work as before][:lighting :run]
[25:53][Embark on our full-fat specular–diffuse conversion in EndLightingComputation()][:lighting :sampling]
[27:54][Calculate our operations per map: 4096][:admin :lighting]
[28:54][Set up EndLightingComputation() to sum up weighted samples][:lighting :sampling]
[30:20][Big O notation, and separable filters][:performance :speech]
[32:50][Introduce diffuse_weight_map for lighting_solution to contain and EndLightingComputation() to use][:"data structure" :lighting :sampling]
[36:45][Introduce BuildDiffuseLightMaps() for InitLighting() to call, and DirectionFromTxTy() based on TestLightSphere()][:lighting :sampling]
[41:52][See slow, but beautiful diffuse :lighting][:performance :run :sampling]
[45:21][Toggle on the light map viewers in OpenGLEndFrame()][:"debug visualisation" :lighting]
[45:46][Check out the light maps][:"debug visualisation" :lighting :run]
[46:56][Set up to optimise the diffuse :lighting][:optimisation :research]
[49:05][Compute 4-wide the diffuse light :sampling in EndLightingComputation()][:lighting :optimisation :simd]
[53:36][Consider how to store our diffuse :lighting data for :SIMD computation][:"data structure" :research]
[59:57][Store our diffuse :lighting data RRRR GGGG BBBB and swizzle it on output, introducing Transpose()][:simd]
[1:03:35][Swizzling RRRR GGGG BBBB to RGBR GBRG BRGB][:blackboard :simd]
[1:09:38][Consider how best to swizzle our diffuse :lighting data][:research :simd]
[1:11:07][Load-Without-Broadcast][:blackboard :simd]
[1:14:11][Trying _mm_shuffle_ps() or _mm_unpacklo_ps()[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] to swizzle][:blackboard :research :simd]
[1:26:00][Non-interleaved unpack using _mm_unpackhi_pd()[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
[1:30:41][Masked picking from two :SIMD values using _mm_blend_ps()[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
[1:32:43][Trying _mm_blend_ps() to swizzle][:blackboard :research :simd]
[1:35:29][Trying our full RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:blackboard :research :simd]
[1:41:19][All the possible unpacks][:blackboard :research :simd]
[1:46:32][Constraints on Final Op][:blackboard :research :simd]
[1:52:46][Introduce Transpose() from six Shuffle4X() operations, for RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:lighting :simd]
[2:00:03][Define Shuffle4x() macro[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :simd]
[2:03:09][Fill Transpose() with known values for testing][:lighting :simd]
[2:06:26][Switch diffuse_weight_map and BuildDiffuseLightMaps() to be :SIMD][:"data structure" :lighting]
[2:08:42][See faster diffuse :lighting][:performance :run :sampling :simd]
[2:09:16][Try to step in to Transpose(), but hit a read-access violation on DestD\[Tx4\] in EndLightingComputation()][:lighting :run :sampling :simd]
[2:10:19][Force DestC and DestD to be stored unaligned using _mm_storeu_ps() in EndLightingComputation()][:lighting :simd]
[2:11:56][Try to step in to Transpose()][:lighting :run :sampling :simd]
[2:12:13][Make EndLightingComputation() call Transpose()][:lighting :simd]
[2:12:28][Step in to Transpose() to see what it produces][:lighting :run :simd]
[2:12:54][Fix Transpose() to set the Order as desired][:lighting :simd]
[2:13:18][Step in to Transpose() to see that it swizzles incorrectly][:lighting :run :simd]
[2:13:42][Fix Shuffle4x() to shift by 2-bytes][:lighting :simd]
[2:14:39][Step in to Transpose() to see that it swizzles more sanely, but still incorrectly][:lighting :run :simd]
[2:19:39][Check our 6-shuffle swizzle][:blackboard :research :simd]
[2:20:13][Fix Transpose() to interleave every third lane][:lighting :simd]
[2:21:47][Step in to Transpose() to see that it swizzles 100% correctly][:lighting :run :simd]
[2:22:17][Add a VERIFY_SHUFFLE preprocessor path in Transpose()][:lighting :simd]
[2:23:03][Admire our almost right :lighting][:run :simd]
[2:24:26][Fix BuildDiffuseLightMaps() and diffuse_weight_map inspired by EndLightingComputation()][:"data structure" :lighting :simd]
[2:31:22][Admire our closer to correct :lighting][:run :simd]
[2:33:12][Q&A][:speech]
[2:34:48][Transposes are symmetric][:simd :speech]
[2:35:30][@vaualbus][Q: Why can't you do the unpack on the GPU side?][:simd]
[2:36:47][@xxthebigfoxx][Q: Looks like in this article[ref
    author="Stan Melax"
    publisher="Intel Developer Zone"
    title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
    url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] they do it in six shuffles similarly to how you do it][:simd]
[2:37:29][@squareysgames][Q: Could it be feasible to pack into a different GPU-supported format, maybe even compressed?][:simd]
[2:38:28][@xxthebigfoxx][Q: They[ref
    author="Stan Melax"
    publisher="Intel Developer Zone"
    title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
    url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] have an example on 128 bits, second picture][:simd]
[2:38:41][Consider the :performance of dependent shuffles, all on port 5[ref
    site=uops.info
    url=https://uops.info/table.html]][:research :simd]
[2:42:56][@runamar][Q: Would you consider using the vector extension from clang if you would go clang only?][:simd]
[2:43:22][@vaualbus][Q: Do all the work we did to generate the sphere samples and not use them anymore?][:lighting]
[2:43:26][Show the sphere samples-based :lighting][:run]
[2:45:04][@brian_nevec][Q: 15FPS? Ship it!][:performance]
[2:45:37][@sc4llywag][Q: What's next on the TODO list after :lighting?]
[2:46:04][@xxthebigfoxx][Q: Is there a reasonable :performance difference between aligned and unaligned move in SSE2?]
[2:49:00][@jessem3y3r][Q: How do you reason about separable filters when, say, running along the Y axis for an image kernel might incur a cache miss on each sequential access (for non-block storage)?][:filtering]
[2:50:05][@xxthebigfoxx][Q: But is there a reason to use MOVAPS instead of MOVUPS, then? Other than MOVAPS throws an exception if you are not aligned][:simd]
[2:51:22][@runamar][Q: So how do we go back to 60FPS now?][:performance]
[2:52:52][@maliusarth][Q: You said, after :lighting you'll start with level design etc. so how far from release is [~hero Handmade Hero]?]
[2:53:52][Wrap it up with a glimpse into the future][:speech]
[/video]