From c74acbb031ee23955b6f78bc717d5162c39db72f Mon Sep 17 00:00:00 2001 From: Matt Mascarenhas Date: Tue, 3 Mar 2020 02:05:17 +0000 Subject: [PATCH] Index hero/code582 --- cmuratori/hero/code/code582.hmml | 100 +++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 cmuratori/hero/code/code582.hmml diff --git a/cmuratori/hero/code/code582.hmml b/cmuratori/hero/code/code582.hmml new file mode 100644 index 0000000..8ba9961 --- /dev/null +++ b/cmuratori/hero/code/code582.hmml @@ -0,0 +1,100 @@ +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Converting Specular Maps to Diffuse" vod_platform=youtube id=YyEvNfCgkJ0 annotator=Miblo] +[0:03][Recap and set the stage for the day, debugging the :lighting stability over time, and efficiently :sampling diffuse surface reflections][:speech] +[4:29][Diffuse :sampling efficiency: 1) Downsampling our map][:lighting :speech] +[7:13][Diffuse :sampling efficiency: 2) Take multiple interior samples, e.g. 4×4][:lighting :speech] +[8:48][Diffuse :sampling efficiency: 3) Downsampling, with pre-computed diffusion][:lighting :speech] +[9:48][Full diffuse :sampling solution: Cosine-weight an 8×8 specular to 8×8 diffuse solution][:lighting :speech] +[14:30][Remove OutputLightingPointsRecurse() and GetCurrentQuads(), and switch EndLightingComputation() to TEST_LIGHT_SPHERE][:lighting] +[15:51][Demo our specular :lighting][:run] +[16:46][Set up to convert our specular maps to diffuse][:lighting :research] +[17:59][Switch EndLightingComputation() to TestCastFromProbes() and TEST_LIGHT_TRANSFER][:lighting] +[19:10][See :lighting weirdness][:run] +[20:27][Introduce TestLightSphere() to perform the TEST_LIGHT_SPHERE :lighting voxel writing code from EndLightingComputation()] +[24:16][See our light sphere test working as before][:lighting :run] +[24:31][Remove the TEST_LIGHT_SPHERE code from EndLightingComputation()][:lighting] +[25:45][See our light sphere test continuing to work as before][:lighting :run] +[25:53][Embark on our full-fat specular–diffuse conversion in EndLightingComputation()][:lighting :sampling] +[27:54][Calculate our operations per map: 4096][:admin :lighting] +[28:54][Set up EndLightingComputation() to sum up weighted samples][:lighting :sampling] +[30:20][Big O notation, and separable filters][:performance :speech] +[32:50][Introduce diffuse_weight_map for lighting_solution to contain and EndLightingComputation() to use][:"data structure" :lighting :sampling] +[36:45][Introduce BuildDiffuseLightMaps() for InitLighting() to call, and DirectionFromTxTy() based on TestLightSphere()][:lighting :sampling] +[41:52][See slow, but beautiful diffuse :lighting][:performance :run :sampling] +[45:21][Toggle on the light map viewers in OpenGLEndFrame()][:"debug visualisation" :lighting] +[45:46][Check out the light maps][:"debug visualisation" :lighting :run] +[46:56][Set up to optimise the diffuse :lighting][:optimisation :research] +[49:05][Compute 4-wide the diffuse light :sampling in EndLightingComputation()][:lighting :optimisation :simd] +[53:36][Consider how to store our diffuse :lighting data for :SIMD computation][:"data structure" :research] +[59:57][Store our diffuse :lighting data RRRR GGGG BBBB and swizzle it on output, introducing Transpose()][:simd] +[1:03:35][Swizzling RRRR GGGG BBBB to RGBR GBRG BRGB][:blackboard :simd] +[1:09:38][Consider how best to swizzle our diffuse :lighting data][:research :simd] +[1:11:07][Load-Without-Broadcast][:blackboard :simd] +[1:14:11][Trying _mm_shuffle_ps() or _mm_unpacklo_ps()[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] to swizzle][:blackboard :research :simd] +[1:26:00][Non-interleaved unpack using _mm_unpackhi_pd()[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd] +[1:30:41][Masked picking from two :SIMD values using _mm_blend_ps()[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:blackboard :research :simd] +[1:32:43][Trying _mm_blend_ps() to swizzle][:blackboard :research :simd] +[1:35:29][Trying our full RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:blackboard :research :simd] +[1:41:19][All the possible unpacks][:blackboard :research :simd] +[1:46:32][Constraints on Final Op][:blackboard :research :simd] +[1:52:46][Introduce Transpose() from six Shuffle4X() operations, for RRRR GGGG BBBB to RGBR GBRG BRGB swizzle][:lighting :simd] +[2:00:03][Define Shuffle4x() macro[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:lighting :simd] +[2:03:09][Fill Transpose() with known values for testing][:lighting :simd] +[2:06:26][Switch diffuse_weight_map and BuildDiffuseLightMaps() to be :SIMD][:"data structure" :lighting] +[2:08:42][See faster diffuse :lighting][:performance :run :sampling :simd] +[2:09:16][Try to step in to Transpose(), but hit a read-access violation on DestD\[Tx4\] in EndLightingComputation()][:lighting :run :sampling :simd] +[2:10:19][Force DestC and DestD to be stored unaligned using _mm_storeu_ps() in EndLightingComputation()][:lighting :simd] +[2:11:56][Try to step in to Transpose()][:lighting :run :sampling :simd] +[2:12:13][Make EndLightingComputation() call Transpose()][:lighting :simd] +[2:12:28][Step in to Transpose() to see what it produces][:lighting :run :simd] +[2:12:54][Fix Transpose() to set the Order as desired][:lighting :simd] +[2:13:18][Step in to Transpose() to see that it swizzles incorrectly][:lighting :run :simd] +[2:13:42][Fix Shuffle4x() to shift by 2-bytes][:lighting :simd] +[2:14:39][Step in to Transpose() to see that it swizzles more sanely, but still incorrectly][:lighting :run :simd] +[2:19:39][Check our 6-shuffle swizzle][:blackboard :research :simd] +[2:20:13][Fix Transpose() to interleave every third lane][:lighting :simd] +[2:21:47][Step in to Transpose() to see that it swizzles 100% correctly][:lighting :run :simd] +[2:22:17][Add a VERIFY_SHUFFLE preprocessor path in Transpose()][:lighting :simd] +[2:23:03][Admire our almost right :lighting][:run :simd] +[2:24:26][Fix BuildDiffuseLightMaps() and diffuse_weight_map inspired by EndLightingComputation()][:"data structure" :lighting :simd] +[2:31:22][Admire our closer to correct :lighting][:run :simd] +[2:33:12][Q&A][:speech] +[2:34:48][Transposes are symmetric][:simd :speech] +[2:35:30][@vaualbus][Q: Why can't you do the unpack on the GPU side?][:simd] +[2:36:47][@xxthebigfoxx][Q: Looks like in this article[ref + author="Stan Melax" + publisher="Intel Developer Zone" + title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)" + url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] they do it in six shuffles similarly to how you do it][:simd] +[2:37:29][@squareysgames][Q: Could it be feasible to pack into a different GPU-supported format, maybe even compressed?][:simd] +[2:38:28][@xxthebigfoxx][Q: They[ref + author="Stan Melax" + publisher="Intel Developer Zone" + title="3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX)" + url=https://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx] have an example on 128 bits, second picture][:simd] +[2:38:41][Consider the :performance of dependent shuffles, all on port 5[ref + site=uops.info + url=https://uops.info/table.html]][:research :simd] +[2:42:56][@runamar][Q: Would you consider using the vector extension from clang if you would go clang only?][:simd] +[2:43:22][@vaualbus][Q: Do all the work we did to generate the sphere samples and not use them anymore?][:lighting] +[2:43:26][Show the sphere samples-based :lighting][:run] +[2:45:04][@brian_nevec][Q: 15FPS? Ship it!][:performance] +[2:45:37][@sc4llywag][Q: What's next on the TODO list after :lighting?] +[2:46:04][@xxthebigfoxx][Q: Is there a reasonable :performance difference between aligned and unaligned move in SSE2?] +[2:49:00][@jessem3y3r][Q: How do you reason about separable filters when, say, running along the Y axis for an image kernel might incur a cache miss on each sequential access (for non-block storage)?][:filtering] +[2:50:05][@xxthebigfoxx][Q: But is there a reason to use MOVAPS instead of MOVUPS, then? Other than MOVAPS throws an exception if you are not aligned][:simd] +[2:51:22][@runamar][Q: So how do we go back to 60FPS now?][:performance] +[2:52:52][@maliusarth][Q: You said, after :lighting you'll start with level design etc. so how far from release is [~hero Handmade Hero]?] +[2:53:52][Wrap it up with a glimpse into the future][:speech] +[/video]