cinera_handmade.network/cmuratori/hero/code/code614.hmml

60 lines
4.9 KiB
Plaintext
Raw Permalink Normal View History

2020-07-01 00:28:09 +00:00
[video output=day614 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Continuing Streamlining the Raycaster" vod_platform=youtube id=IxeKOAcvgK0 annotator=Miblo]
[0:01][Welcome to the stream][:speech]
[0:06][Determine to continue with :optimisation][:lighting :run]
[0:57][Recap yesterday's welding :optimisation in GridRayCast()][:lighting :research]
[4:09][Consider :optimisation potential of the SpecTexel load / stores in GridRayCast()][:lighting :research]
[7:22][Illustrate the possibility of loading in the SpecTexel values and InvBlend at the outset][:lighting :optimisation]
[9:23][Seek easier :optimisation opportunities in GridRayCast()][:lighting :research]
[11:43][Simplify out OcclusionN from GridRayCast()][:lighting :optimisation]
[12:27][Seek :optimisation with OcclusionD and RayD in GridRayCast()][:lighting :research]
[18:48][Streamline the SignRayD and NormalXYZ computations in GridRayCast()][:lighting :optimisation :simd]
[25:35][Reacquaint ourselves with the hit testing and shuffling code in GridRayCast()][:lighting :research :simd]
[30:30][Streamline the Normal selection in GridRayCast()][:lighting :optimisation :simd]
[34:46][Check out the port usage of various instructions, noting that we may get an AND for free[ref
site=uops.info
url=https://uops.info/table.html]][:isa :research]
[40:23][Continue to streamline the Normal selection in GridRayCast(), introducing a NormalTable, before toggling back to the old code][:lighting :optimisation :simd]
[48:12][:Run successfully][:lighting]
[48:31][Streamline the ProbeSampleNSingle usage in GridRayCast()][:lighting :optimisation :simd]
[55:01][:Run successfully, and consider unit testing the grid ray cast][:lighting]
[56:49][Treat ProbeSampleNSingle wide in GridRayCast()][:lighting :optimisation :simd]
[1:01:34][:Run successfully][:lighting]
[1:01:50][Treat OcclusionD wide in GridRayCast()][:lighting :optimisation :simd]
[1:03:28][:Run successfully][:lighting]
[1:04:02][Finish streamlining the Normal selection in GridRayCast()][:lighting :optimisation :simd]
[1:07:46][:Run successfully][:lighting]
[1:08:13][Temporarily try hard setting the NormalIndex to 0 in GridRayCast()][:lighting :optimisation :simd]
[1:08:27][We can't tell it's wrong][:lighting :optimisation :run :simd]
[1:08:56][Let GridRayCast() set the computed NormalIndex and make a note to test this][:lighting :optimisation :simd]
[1:09:36][hhlightprof total seconds elapsed: 4.534789][:lighting :performance :run :simd]
[1:10:20][Simplify out tUpdateBlend in GridRayCast()][:lighting :optimisation :simd]
[1:12:49][Augment light_atlas with StrideXYZ_4x and VoxelDim_4x][:"data structure" :lighting :optimisation :simd]
[1:17:45][:Run successfully][:lighting]
[1:17:54][Make MakeLightAtlas() set the StrideXYZ and VoxelDim, for GridRayCast() to load out of that atlas, changing their format in light_atlas to be an array of 4][:"data structure" :lighting :optimisation :simd]
[1:20:37][:Run successfully][:lighting]
[1:20:46][hhlightprof total seconds elapsed: 4.513986][:lighting :performance :run :simd]
[1:22:09][Remove the old AABBRayCast()][:lighting]
[1:24:42][:Run successfully][:lighting]
[1:24:51][Prepare lighting_box to pack down to 64-bits total, propagating this change][:"data structure" :lighting]
[1:28:29][:Run successfully][:lighting]
[1:28:38][Clean out the sprawl from FullCast()][:lighting :optimisation :simd]
[1:36:20][:Run successfully][:lighting]
[1:36:25][Look into welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd]
[1:39:21][hhlightprof total seconds elapsed: 4.511818][:lighting :performance :run :simd]
[1:39:36][Extend GridRayCast() to operate on twice as many samples][:lighting :optimisation :simd]
[1:40:44][:Run successfully][:lighting]
[1:40:46][hhlightprof total seconds elapsed: 4.394170][:lighting :performance :run :simd]
[1:41:52][Toggle off the debug code in FullCast()][:"debug system" :lighting]
[1:43:26][hhlightprof total seconds elapsed: 4.392245][:lighting :performance :run :simd]
[1:43:41][Consider welding the GridRayCast() calling loop from FullCast() into GridRayCast() itself][:lighting :optimisation :research :simd]
[1:45:57][Q&A][:speech]
[1:47:07][@mindmark42][Q: Yesterday you changed your :SIMD extract functions to use shuffles instead. Could you explain again why that is better?][:performance]
[1:47:26][Extract vs Shuffle][:blackboard :performance :simd]
[1:56:14]["Semantic" Extraction][:blackboard :language :performance :simd]
[1:58:02][Unnecessary extract and cast, with thanks to @mmozeiko][:blackboard :performance :simd]
[1:59:05][Shuffle][:blackboard :performance :simd]
[2:00:41][@3ygun][Q: Is there such a thing as smooching too much and causing the compiler to bail before doing optimizations?][:language]
[2:01:11][@billdstrong][Q: Would we gain any speed by moving ahead 16 and doing 12 ops per pass?][:lighting :performance]
[2:01:40][Thank you, everyone]
[/video]