cinera_handmade.network/cmuratori/hero/code/code587.hmml

88 lines
7.3 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video output=day587 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Optimizing the Specular to Diffuse Transform" vod_platform=youtube id=J0Z4rdTYM0Y annotator=Miblo]
[0:03][Demo the current state and :performance of our :lighting][:run]
[1:37][Reacquaint ourselves with the :lighting's blend-over-time parameter in EndLightingComputation()][:research]
[3:02][Demo the fast-response :lighting blend][:run]
[3:11][Decrease tUpdateBlend from 10/60 to 1/60][:lighting]
[3:14][Check out the slower-response, but noiseless :lighting blend][:run]
[3:36][Increase tUpdateBlend from 1/60 to 5/60][:lighting]
[3:50][Check out the usable-response, but flickery :lighting blend][:run]
[4:21][Decrease tUpdateBlend from 5/60 to 2/60][:lighting]
[4:22][Check out the slower-response, but less flickery :lighting blend][:run]
[4:39][Increase tUpdateBlend from 2/60 to 8/60][:lighting]
[4:45][Check out the faster-response, but noisy :lighting blend][:run]
[5:22][Notice light buildup in the dungeon][:lighting :run]
[5:56][Check that light buildup in the dungeon, possibly due to the voxel switch][:lighting :run]
[6:58][Determine to gauge the :performance of our speculardiffuse transform][:lighting :speech]
[7:39][Consider shrinking the :lighting lookup voxel in Z][:run]
[9:13][Comment out LIGHT_LOOKUP_VOXEL_DIM, and respecify ComputeLightPropagationWork() and EndLightingComputation() to operate in X-slices][:lighting]
[14:52][Define MAX_LIGHT_LOOKUP_VOXEL_DIM for InitLighting() to use][:lighting]
[17:16][Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in ComputeLightPropagationWork()][:lighting]
[20:35][Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in CompileZBiasProgram()][:lighting]
[27:34][Reintroduce LIGHT_LOOKUP_VOXEL_DIM for Win32InitOpenGL() to use][:lighting]
[27:56][Get the same thing we saw before][:lighting :run]
[28:10][Split out LIGHT_LOOKUP_VOXEL_DIM to all three dimensions for Win32InitOpenGL() to use][:lighting]
[28:28][Check out our cubic :lighting lookup voxel][:run]
[28:41][Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16][:lighting]
[28:49][Check out our squatter, faster :lighting lookup voxel][:run]
[29:34][Increase the LIGHT_LOOKUP_VOXEL_DIM_Z from 16 to 32][:lighting]
[29:43][125ms per frame, with a 32×32×32 voxel][:lighting :performance :run]
[29:55][Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16][:lighting]
[30:04][75ms per frame, with a 32×32×16 voxel][:lighting :performance :run]
[31:02][24% frame time spent in ComputeLightPropagationWork()][:lighting :performance :run]
[31:17][Disable the speculardiffuse transform in ComputeLightPropagationWork()][:lighting]
[31:45][65ms per frame with 3% frame time spent in ComputeLightPropagationWork(), without the speculardiffuse transform][:lighting :performance :run]
[32:18][Prepare to optimise the speculardiffuse transform][:lighting :research]
[33:36][Re-enable the speculardiffuse transform in ComputeLightPropagationWork()][:lighting]
[33:52][25% frame time spent in ComputeLightPropagationWork()][:lighting :performance :run]
[33:59][Disable the speculardiffuse transform in ComputeLightPropagationWork()][:lighting]
[34:03][Hit assertion in DEBUGGetArenaByLookupBlock()][:"debug system" :run]
[34:33][3% frame time spent in ComputeLightPropagationWork(), without the speculardiffuse transform][:lighting :performance :run]
[35:14][Enable ComputeLightPropagationWork() to count up the zero weights][:lighting :optimisation]
[38:01][Step in to ComputeLightPropagationWork() to find a ZeroWCount of 196][:lighting :optimisation :run]
[39:22][Consider our potential for optimising ComputeLightPropagationWork()][:optimisation :research]
[40:09][Inspect the assembly of the speculardiffuse transform in ComputeLightPropagationWork()][:asm :lighting :run]
[41:54][Define LIGHT_ATLAS_ASSERT()][:lighting]
[43:19][Inspect the assembly of the speculardiffuse transform in ComputeLightPropagationWork()][:asm :lighting :run]
[43:33][Disable multithreading of the :lighting, wondering if ~RemedyBG supports step-single-thread][:threading]
[44:43][Inspect the assembly of the speculardiffuse transform in ComputeLightPropagationWork()][:asm :lighting :run]
[49:35][Optimise ComputeLightPropagationWork() to load and shuffle a row at once, introducing LoadF32_4X() and Broadcast4x()][:lighting :optimisation :simd]
[1:15:35][Inspect the assembly of the speculardiffuse transform in ComputeLightPropagationWork()][:asm :lighting :run]
[1:16:28][Re-enable multithreading of the :lighting][:threading]
[1:16:46][11% frame time spent in ComputeLightPropagationWork(), but with chromatic aberration][:lighting :performance :run]
[1:17:32][Double-check the speculardiffuse transform][:lighting :research]
[1:20:21][Fix ComputeLightPropagationWork() to load the specular texels in strides of 4, rather than 12][:lighting :optimisation :simd]
[1:20:52][Admire our correct and faster :lighting][:performance :run]
[1:21:51][Consider our potential for optimising the speculardiffuse transform: Separable blur[ref
site=Desmos
page="Untitled Graph"
url=https://desmos.com/calculator][ref
site=Wikipedia
page="Gaussian function"
url=https://en.wikipedia.org/wiki/Gaussian_function][ref
site=Wikipedia
page="Raised-cosine filter"
url=https://en.wikipedia.org/wiki/Raised-cosine_filter]][:lighting :optimisation :research]
[1:31:00][Check out our :lighting][:run]
[1:31:10][Decrease the light transmission rate from 0.975 to 0.75 in BuildDiffuseLightMaps()][:lighting]
[1:31:23][More readily see our darker light map viewer][:lighting :run]
[1:32:28][Set up ComputeLightPropagationWork() to perform the speculardiffuse transform as a separable filter][:lighting :optimisation]
[1:48:04][Check out our :lighting][:run]
[1:48:20][Q&A][:speech]
[1:49:15][@lucid_frost][Q: Are there any :caching concerns? I'm not familiar with how much data is being pushed around here][:lighting :performance]
[1:52:16][@philliptrudeau][Q: This scene has a little bit of variance in the :lighting between frames. Is there a way to set up this solution so that the scene looks more "static", without taking a significant :performance hit?]
[1:53:35][@somebody_took_my_name][Q: The light seems to be repeating outside of the light box (before the rewrite). Is it still there and, if so, is it a modulus issue?][:lighting]
[1:54:53][@mattiamanzati][Q: You mentioned something about shaders API being better at this kind of job. I lost your point on that because of me being unfamiliar with the environment. Can you please explain that a little bit more?][:hardware :lighting]
[1:59:38][@czapa10][Q: You often say that there should be some high level :language feature which allows you to write :SIMD code easier. Can you tell how this feature would exactly look like? Do you mean something like [@naysayer88 Jon Blow] has in Jai (fast SOA, AOS switching)? Can't you do this feature yourself using :metaprogramming?]
[2:01:05][@vtlmks][Intel Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] is broken, it seems]
[2:01:45][@i_am_seabass][He's got it cached]
[2:01:56][@czapa10][You can't specify specific architecture]
[2:02:25][Plug uops[ref
site=uops.info
url=https://uops.info/table.html]][:research]
[2:03:47][Admire the :lighting][:run]
[2:04:22][Close it on up][:speech]
[/video]