[video output=day587 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing the Specular to Diffuse Transform" vod_platform=youtube id=J0Z4rdTYM0Y annotator=Miblo] [0:03][Demo the current state and :performance of our :lighting][:run] [1:37][Reacquaint ourselves with the :lighting's blend-over-time parameter in EndLightingComputation()][:research] [3:02][Demo the fast-response :lighting blend][:run] [3:11][Decrease tUpdateBlend from 10/60 to 1/60][:lighting] [3:14][Check out the slower-response, but noiseless :lighting blend][:run] [3:36][Increase tUpdateBlend from 1/60 to 5/60][:lighting] [3:50][Check out the usable-response, but flickery :lighting blend][:run] [4:21][Decrease tUpdateBlend from 5/60 to 2/60][:lighting] [4:22][Check out the slower-response, but less flickery :lighting blend][:run] [4:39][Increase tUpdateBlend from 2/60 to 8/60][:lighting] [4:45][Check out the faster-response, but noisy :lighting blend][:run] [5:22][Notice light buildup in the dungeon][:lighting :run] [5:56][Check that light buildup in the dungeon, possibly due to the voxel switch][:lighting :run] [6:58][Determine to gauge the :performance of our specular–diffuse transform][:lighting :speech] [7:39][Consider shrinking the :lighting lookup voxel in Z][:run] [9:13][Comment out LIGHT_LOOKUP_VOXEL_DIM, and respecify ComputeLightPropagationWork() and EndLightingComputation() to operate in X-slices][:lighting] [14:52][Define MAX_LIGHT_LOOKUP_VOXEL_DIM for InitLighting() to use][:lighting] [17:16][Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in ComputeLightPropagationWork()][:lighting] [20:35][Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in CompileZBiasProgram()][:lighting] [27:34][Reintroduce LIGHT_LOOKUP_VOXEL_DIM for Win32InitOpenGL() to use][:lighting] [27:56][Get the same thing we saw before][:lighting :run] [28:10][Split out LIGHT_LOOKUP_VOXEL_DIM to all three dimensions for Win32InitOpenGL() to use][:lighting] [28:28][Check out our cubic :lighting lookup voxel][:run] [28:41][Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16][:lighting] [28:49][Check out our squatter, faster :lighting lookup voxel][:run] [29:34][Increase the LIGHT_LOOKUP_VOXEL_DIM_Z from 16 to 32][:lighting] [29:43][125ms per frame, with a 32×32×32 voxel][:lighting :performance :run] [29:55][Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16][:lighting] [30:04][75ms per frame, with a 32×32×16 voxel][:lighting :performance :run] [31:02][24% frame time spent in ComputeLightPropagationWork()][:lighting :performance :run] [31:17][Disable the specular–diffuse transform in ComputeLightPropagationWork()][:lighting] [31:45][65ms per frame with 3% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform][:lighting :performance :run] [32:18][Prepare to optimise the specular–diffuse transform][:lighting :research] [33:36][Re-enable the specular–diffuse transform in ComputeLightPropagationWork()][:lighting] [33:52][25% frame time spent in ComputeLightPropagationWork()][:lighting :performance :run] [33:59][Disable the specular–diffuse transform in ComputeLightPropagationWork()][:lighting] [34:03][Hit assertion in DEBUGGetArenaByLookupBlock()][:"debug system" :run] [34:33][3% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform][:lighting :performance :run] [35:14][Enable ComputeLightPropagationWork() to count up the zero weights][:lighting :optimisation] [38:01][Step in to ComputeLightPropagationWork() to find a ZeroWCount of 196][:lighting :optimisation :run] [39:22][Consider our potential for optimising ComputeLightPropagationWork()][:optimisation :research] [40:09][Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()][:asm :lighting :run] [41:54][Define LIGHT_ATLAS_ASSERT()][:lighting] [43:19][Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()][:asm :lighting :run] [43:33][Disable multithreading of the :lighting, wondering if ~RemedyBG supports step-single-thread][:threading] [44:43][Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()][:asm :lighting :run] [49:35][Optimise ComputeLightPropagationWork() to load and shuffle a row at once, introducing LoadF32_4X() and Broadcast4x()][:lighting :optimisation :simd] [1:15:35][Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()][:asm :lighting :run] [1:16:28][Re-enable multithreading of the :lighting][:threading] [1:16:46][11% frame time spent in ComputeLightPropagationWork(), but with chromatic aberration][:lighting :performance :run] [1:17:32][Double-check the specular–diffuse transform][:lighting :research] [1:20:21][Fix ComputeLightPropagationWork() to load the specular texels in strides of 4, rather than 12][:lighting :optimisation :simd] [1:20:52][Admire our correct and faster :lighting][:performance :run] [1:21:51][Consider our potential for optimising the specular–diffuse transform: Separable blur[ref site=Desmos page="Untitled Graph" url=https://desmos.com/calculator][ref site=Wikipedia page="Gaussian function" url=https://en.wikipedia.org/wiki/Gaussian_function][ref site=Wikipedia page="Raised-cosine filter" url=https://en.wikipedia.org/wiki/Raised-cosine_filter]][:lighting :optimisation :research] [1:31:00][Check out our :lighting][:run] [1:31:10][Decrease the light transmission rate from 0.975 to 0.75 in BuildDiffuseLightMaps()][:lighting] [1:31:23][More readily see our darker light map viewer][:lighting :run] [1:32:28][Set up ComputeLightPropagationWork() to perform the specular–diffuse transform as a separable filter][:lighting :optimisation] [1:48:04][Check out our :lighting][:run] [1:48:20][Q&A][:speech] [1:49:15][@lucid_frost][Q: Are there any :caching concerns? I'm not familiar with how much data is being pushed around here][:lighting :performance] [1:52:16][@philliptrudeau][Q: This scene has a little bit of variance in the :lighting between frames. Is there a way to set up this solution so that the scene looks more "static", without taking a significant :performance hit?] [1:53:35][@somebody_took_my_name][Q: The light seems to be repeating outside of the light box (before the rewrite). Is it still there and, if so, is it a modulus issue?][:lighting] [1:54:53][@mattiamanzati][Q: You mentioned something about shaders API being better at this kind of job. I lost your point on that because of me being unfamiliar with the environment. Can you please explain that a little bit more?][:hardware :lighting] [1:59:38][@czapa10][Q: You often say that there should be some high level :language feature which allows you to write :SIMD code easier. Can you tell how this feature would exactly look like? Do you mean something like [@naysayer88 Jon Blow] has in Jai (fast SOA, AOS switching)? Can't you do this feature yourself using :metaprogramming?] [2:01:05][@vtlmks][Intel Intrinsics Guide[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] is broken, it seems] [2:01:45][@i_am_seabass][He's got it cached] [2:01:56][@czapa10][You can't specify specific architecture] [2:02:25][Plug uops[ref site=uops.info url=https://uops.info/table.html]][:research] [2:03:47][Admire the :lighting][:run] [2:04:22][Close it on up][:speech] [/video]