[video output=day582 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Converting Specular Maps to Diffuse" vod_platform=youtube id=YyEvNfCgkJ0 annotator=Miblo] [0:03][Recap and set the stage for the day, debugging the :lighting stability over time, and efficiently :sampling diffuse surface reflections][:speech] [4:29][Diffuse :sampling efficiency: 1) Downsampling our map][:lighting :speech] [7:13][Diffuse :sampling efficiency: 2) Take multiple interior samples, e.g. 4×4][:lighting :speech] [8:48][Diffuse :sampling efficiency: 3) Downsampling, with pre-computed diffusion][:lighting :speech] [9:48][Full diffuse :sampling solution: Cosine-weight an 8×8 specular to 8×8 diffuse solution][:lighting :speech] [14:30][Remove OutputLightingPointsRecurse() and GetCurrentQuads(), and switch EndLightingComputation() to TEST_LIGHT_SPHERE][:lighting] [15:51][Demo our specular :lighting][:run] [16:46][Set up to convert our specular maps to diffuse][:lighting :research] [17:59][Switch EndLightingComputation() to TestCastFromProbes() and TEST_LIGHT_TRANSFER][:lighting] [19:10][See :lighting weirdness][:run] [20:27][Introduce TestLightSphere() to perform the TEST_LIGHT_SPHERE :lighting voxel writing code from EndLightingComputation()] [24:16][See our light sphere test working as before][:lighting :run] [24:31][Remove the TEST_LIGHT_SPHERE code from EndLightingComputation()][:lighting] [25:45][See our light sphere test continuing to work as before][:lighting :run] [25:53][Embark on our full-fat specular–diffuse conversion in EndLightingComputation()][:lighting :sampling] [27:54][Calculate our operations per map: 4096][:admin :lighting] [28:54][Set up EndLightingComputation() to sum up weighted samples][:lighting :sampling] [30:20][Big O notation, and separable filters][:performance :speech] [32:50][Introduce diffuse_weight_map for lighting_solution to contain and EndLightingComputation() to use][:"data structure" :lighting :sampling] [36:45][Introduce BuildDiffuseLightMaps() for InitLighting() to call, and DirectionFromTxTy() based on TestLightSphere()][:lighting :sampling] [41:52][See slow, but beautiful diffuse :lighting][:performance :run :sampling] [45:21][Toggle on the light map viewers in OpenGLEndFrame()][:"debug visualisation" :lighting] [45:46][Check out the light maps][:"debug visualisation" :lighting :run] [46:56][Set up to optimise the diffuse :lighting][:optimisation :research] [49:05][Compute 4-wide the diffuse light :sampling in EndLightingComputation()][:lighting :optimisation :simd] [53:36][Consider how to store our diffuse :lighting data for :SIMD computation][:"data structure" :research] [59:57][Store our diffuse :lighting data RRRR GGGG BBBB and swizzle it on output, introducing Transpose()][:simd] [1:03:35][Swizzling RRRR GGGG BBBB to RGBR GBRG BRGB][:blackboard :simd] [1:09:38][Consider how best to swizzle our diffuse :lighting data][:research :simd] [1:11:07][Load-Without-Broadcast][:blackboard :simd] [1:14:11][Trying _mm_shuffle_ps() or _mm_unpacklo_ps()[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] to swizzle][:blackboard :research :simd] [1:26:00][Non-interleaved unpack using _mm_unpackhi_pd()[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd] [1:30:41][Masked picking from two :SIMD values using _mm_blend_ps()[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd] [1:32:43][Trying _mm_blend_ps() to swizzle][:blackboard :research :simd] [1:35:29][Trying our full RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:blackboard :research :simd] [1:41:19][All the possible unpacks][:blackboard :research :simd] [1:46:32][Constraints on Final Op][:blackboard :research :simd] [1:52:46][Introduce Transpose() from six Shuffle4X() operations, for RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:lighting :simd] [2:00:03][Define Shuffle4x() macro[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :simd] [2:03:09][Fill Transpose() with known values for testing][:lighting :simd] [2:06:26][Switch diffuse_weight_map and BuildDiffuseLightMaps() to be :SIMD][:"data structure" :lighting] [2:08:42][See faster diffuse :lighting][:performance :run :sampling :simd] [2:09:16][Try to step in to Transpose(), but hit a read-access violation on DestD\[Tx4\] in EndLightingComputation()][:lighting :run :sampling :simd] [2:10:19][Force DestC and DestD to be stored unaligned using _mm_storeu_ps() in EndLightingComputation()][:lighting :simd] [2:11:56][Try to step in to Transpose()][:lighting :run :sampling :simd] [2:12:13][Make EndLightingComputation() call Transpose()][:lighting :simd] [2:12:28][Step in to Transpose() to see what it produces][:lighting :run :simd] [2:12:54][Fix Transpose() to set the Order as desired][:lighting :simd] [2:13:18][Step in to Transpose() to see that it swizzles incorrectly][:lighting :run :simd] [2:13:42][Fix Shuffle4x() to shift by 2-bytes][:lighting :simd] [2:14:39][Step in to Transpose() to see that it swizzles more sanely, but still incorrectly][:lighting :run :simd] [2:19:39][Check our 6-shuffle swizzle][:blackboard :research :simd] [2:20:13][Fix Transpose() to interleave every third lane][:lighting :simd] [2:21:47][Step in to Transpose() to see that it swizzles 100% correctly][:lighting :run :simd] [2:22:17][Add a VERIFY_SHUFFLE preprocessor path in Transpose()][:lighting :simd] [2:23:03][Admire our almost right :lighting][:run :simd] [2:24:26][Fix BuildDiffuseLightMaps() and diffuse_weight_map inspired by EndLightingComputation()][:"data structure" :lighting :simd] [2:31:22][Admire our closer to correct :lighting][:run :simd] [2:33:12][Q&A][:speech] [2:34:48][Transposes are symmetric][:simd :speech] [2:35:30][@vaualbus][Q: Why can't you do the unpack on the GPU side?][:simd] [2:36:47][@xxthebigfoxx][Q: Looks like in this article[ref author="Stan Melax" publisher="Intel Developer Zone" title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)" url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] they do it in six shuffles similarly to how you do it][:simd] [2:37:29][@squareysgames][Q: Could it be feasible to pack into a different GPU-supported format, maybe even compressed?][:simd] [2:38:28][@xxthebigfoxx][Q: They[ref author="Stan Melax" publisher="Intel Developer Zone" title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)" url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] have an example on 128 bits, second picture][:simd] [2:38:41][Consider the :performance of dependent shuffles, all on port 5[ref site=uops.info url=https://uops.info/table.html]][:research :simd] [2:42:56][@runamar][Q: Would you consider using the vector extension from clang if you would go clang only?][:simd] [2:43:22][@vaualbus][Q: Do all the work we did to generate the sphere samples and not use them anymore?][:lighting] [2:43:26][Show the sphere samples-based :lighting][:run] [2:45:04][@brian_nevec][Q: 15FPS? Ship it!][:performance] [2:45:37][@sc4llywag][Q: What's next on the TODO list after :lighting?] [2:46:04][@xxthebigfoxx][Q: Is there a reasonable :performance difference between aligned and unaligned move in SSE2?] [2:49:00][@jessem3y3r][Q: How do you reason about separable filters when, say, running along the Y axis for an image kernel might incur a cache miss on each sequential access (for non-block storage)?][:filtering] [2:50:05][@xxthebigfoxx][Q: But is there a reason to use MOVAPS instead of MOVUPS, then? Other than MOVAPS throws an exception if you are not aligned][:simd] [2:51:22][@runamar][Q: So how do we go back to 60FPS now?][:performance] [2:52:52][@maliusarth][Q: You said, after :lighting you'll start with level design etc. so how far from release is [~hero Handmade Hero]?] [2:53:52][Wrap it up with a glimpse into the future][:speech] [/video]