cinera_handmade.network/cmuratori/hero/code/code582.hmml

101 lines
8.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Converting Specular Maps to Diffuse" vod_platform=youtube id=YyEvNfCgkJ0 annotator=Miblo]
[0:03][Recap and set the stage for the day, debugging the :lighting stability over time, and efficiently :sampling diffuse surface reflections][:speech]
[4:29][Diffuse :sampling efficiency: 1) Downsampling our map][:lighting :speech]
[7:13][Diffuse :sampling efficiency: 2) Take multiple interior samples, e.g. 4×4][:lighting :speech]
[8:48][Diffuse :sampling efficiency: 3) Downsampling, with pre-computed diffusion][:lighting :speech]
[9:48][Full diffuse :sampling solution: Cosine-weight an 8×8 specular to 8×8 diffuse solution][:lighting :speech]
[14:30][Remove OutputLightingPointsRecurse() and GetCurrentQuads(), and switch EndLightingComputation() to TEST_LIGHT_SPHERE][:lighting]
[15:51][Demo our specular :lighting][:run]
[16:46][Set up to convert our specular maps to diffuse][:lighting :research]
[17:59][Switch EndLightingComputation() to TestCastFromProbes() and TEST_LIGHT_TRANSFER][:lighting]
[19:10][See :lighting weirdness][:run]
[20:27][Introduce TestLightSphere() to perform the TEST_LIGHT_SPHERE :lighting voxel writing code from EndLightingComputation()]
[24:16][See our light sphere test working as before][:lighting :run]
[24:31][Remove the TEST_LIGHT_SPHERE code from EndLightingComputation()][:lighting]
[25:45][See our light sphere test continuing to work as before][:lighting :run]
[25:53][Embark on our full-fat speculardiffuse conversion in EndLightingComputation()][:lighting :sampling]
[27:54][Calculate our operations per map: 4096][:admin :lighting]
[28:54][Set up EndLightingComputation() to sum up weighted samples][:lighting :sampling]
[30:20][Big O notation, and separable filters][:performance :speech]
[32:50][Introduce diffuse_weight_map for lighting_solution to contain and EndLightingComputation() to use][:"data structure" :lighting :sampling]
[36:45][Introduce BuildDiffuseLightMaps() for InitLighting() to call, and DirectionFromTxTy() based on TestLightSphere()][:lighting :sampling]
[41:52][See slow, but beautiful diffuse :lighting][:performance :run :sampling]
[45:21][Toggle on the light map viewers in OpenGLEndFrame()][:"debug visualisation" :lighting]
[45:46][Check out the light maps][:"debug visualisation" :lighting :run]
[46:56][Set up to optimise the diffuse :lighting][:optimisation :research]
[49:05][Compute 4-wide the diffuse light :sampling in EndLightingComputation()][:lighting :optimisation :simd]
[53:36][Consider how to store our diffuse :lighting data for :SIMD computation][:"data structure" :research]
[59:57][Store our diffuse :lighting data RRRR GGGG BBBB and swizzle it on output, introducing Transpose()][:simd]
[1:03:35][Swizzling RRRR GGGG BBBB to RGBR GBRG BRGB][:blackboard :simd]
[1:09:38][Consider how best to swizzle our diffuse :lighting data][:research :simd]
[1:11:07][Load-Without-Broadcast][:blackboard :simd]
[1:14:11][Trying _mm_shuffle_ps() or _mm_unpacklo_ps()[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] to swizzle][:blackboard :research :simd]
[1:26:00][Non-interleaved unpack using _mm_unpackhi_pd()[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
[1:30:41][Masked picking from two :SIMD values using _mm_blend_ps()[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd]
[1:32:43][Trying _mm_blend_ps() to swizzle][:blackboard :research :simd]
[1:35:29][Trying our full RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:blackboard :research :simd]
[1:41:19][All the possible unpacks][:blackboard :research :simd]
[1:46:32][Constraints on Final Op][:blackboard :research :simd]
[1:52:46][Introduce Transpose() from six Shuffle4X() operations, for RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:lighting :simd]
[2:00:03][Define Shuffle4x() macro[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :simd]
[2:03:09][Fill Transpose() with known values for testing][:lighting :simd]
[2:06:26][Switch diffuse_weight_map and BuildDiffuseLightMaps() to be :SIMD][:"data structure" :lighting]
[2:08:42][See faster diffuse :lighting][:performance :run :sampling :simd]
[2:09:16][Try to step in to Transpose(), but hit a read-access violation on DestD\[Tx4\] in EndLightingComputation()][:lighting :run :sampling :simd]
[2:10:19][Force DestC and DestD to be stored unaligned using _mm_storeu_ps() in EndLightingComputation()][:lighting :simd]
[2:11:56][Try to step in to Transpose()][:lighting :run :sampling :simd]
[2:12:13][Make EndLightingComputation() call Transpose()][:lighting :simd]
[2:12:28][Step in to Transpose() to see what it produces][:lighting :run :simd]
[2:12:54][Fix Transpose() to set the Order as desired][:lighting :simd]
[2:13:18][Step in to Transpose() to see that it swizzles incorrectly][:lighting :run :simd]
[2:13:42][Fix Shuffle4x() to shift by 2-bytes][:lighting :simd]
[2:14:39][Step in to Transpose() to see that it swizzles more sanely, but still incorrectly][:lighting :run :simd]
[2:19:39][Check our 6-shuffle swizzle][:blackboard :research :simd]
[2:20:13][Fix Transpose() to interleave every third lane][:lighting :simd]
[2:21:47][Step in to Transpose() to see that it swizzles 100% correctly][:lighting :run :simd]
[2:22:17][Add a VERIFY_SHUFFLE preprocessor path in Transpose()][:lighting :simd]
[2:23:03][Admire our almost right :lighting][:run :simd]
[2:24:26][Fix BuildDiffuseLightMaps() and diffuse_weight_map inspired by EndLightingComputation()][:"data structure" :lighting :simd]
[2:31:22][Admire our closer to correct :lighting][:run :simd]
[2:33:12][Q&A][:speech]
[2:34:48][Transposes are symmetric][:simd :speech]
[2:35:30][@vaualbus][Q: Why can't you do the unpack on the GPU side?][:simd]
[2:36:47][@xxthebigfoxx][Q: Looks like in this article[ref
author="Stan Melax"
publisher="Intel Developer Zone"
title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] they do it in six shuffles similarly to how you do it][:simd]
[2:37:29][@squareysgames][Q: Could it be feasible to pack into a different GPU-supported format, maybe even compressed?][:simd]
[2:38:28][@xxthebigfoxx][Q: They[ref
author="Stan Melax"
publisher="Intel Developer Zone"
title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)"
url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] have an example on 128 bits, second picture][:simd]
[2:38:41][Consider the :performance of dependent shuffles, all on port 5[ref
site=uops.info
url=https://uops.info/table.html]][:research :simd]
[2:42:56][@runamar][Q: Would you consider using the vector extension from clang if you would go clang only?][:simd]
[2:43:22][@vaualbus][Q: Do all the work we did to generate the sphere samples and not use them anymore?][:lighting]
[2:43:26][Show the sphere samples-based :lighting][:run]
[2:45:04][@brian_nevec][Q: 15FPS? Ship it!][:performance]
[2:45:37][@sc4llywag][Q: What's next on the TODO list after :lighting?]
[2:46:04][@xxthebigfoxx][Q: Is there a reasonable :performance difference between aligned and unaligned move in SSE2?]
[2:49:00][@jessem3y3r][Q: How do you reason about separable filters when, say, running along the Y axis for an image kernel might incur a cache miss on each sequential access (for non-block storage)?][:filtering]
[2:50:05][@xxthebigfoxx][Q: But is there a reason to use MOVAPS instead of MOVUPS, then? Other than MOVAPS throws an exception if you are not aligned][:simd]
[2:51:22][@runamar][Q: So how do we go back to 60FPS now?][:performance]
[2:52:52][@maliusarth][Q: You said, after :lighting you'll start with level design etc. so how far from release is [~hero Handmade Hero]?]
[2:53:52][Wrap it up with a glimpse into the future][:speech]
[/video]