diff --git a/cmuratori/hero/code/code591.hmml b/cmuratori/hero/code/code591.hmml new file mode 100644 index 0000000..42eec78 --- /dev/null +++ b/cmuratori/hero/code/code591.hmml @@ -0,0 +1,111 @@ +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Making a Stand-alone Lighting Performance Test" vod_platform=youtube id=Rj7nCMEuhMQ annotator=Miblo] +[0:01][Recap and set the stage for the day][:speech] +[1:00][Plug @x13pixels' ~RemedyBG version 0.3.0.0, with a brief history of Microsoft Visual Studio][:speech] +[5:09][Conditional Breakpoints in ~RemedyBG][:run] +[6:58][@x13pixels][I didn't end up doing that, no] +[7:17][@x13pixels][It was already 15x faster just doing it the "normal" way. Well, okay, there are some tricks under the covers] +[7:44][@x13pixels][Yup!] +[7:48][Further love for ~RemedyBG[ref + site=RemedyBG + url=https://remedybg.itch.io/remedybg]][:run] +[12:37][Demo the current state of the :lighting][:run] +[13:51][30ms per frame][:lighting :performance :run] +[15:00][Make ComputeLightPropagationWork() a TIMED_FUNCTION][:"debug system" :lighting] +[16:00][Check the Threads :performance][:lighting :run] +[18:58][Determine to reduce our time spent ray tracing][:lighting :performance :run] +[19:50][Describe our two-branch RayCast()][:lighting :performance :research] +[22:25][Why we separated the collision detection and hierarchy traversal code in RayCast()][:lighting :performance :research] +[24:22][Inspect the assembly of RayCast()][:asm :lighting :performance :run] +[26:21][Describe our k-d-tree-like SplitBox()][:lighting :performance :research] +[27:05][Consider speeding up the hierarchy traversal code in RayCast()][:lighting :performance :research] +[29:54][Launch VTune][:run] +[30:59][Set up to write our :lighting data out to file][:research] +[33:50][Make BuildSpatialPartitionForLighting() write out the :lighting boxes to file, introducing DEBUGDumpData() and a Dump platform_file_type][:"file io"] +[46:57][Traverse the world out to the dungeon with a view to triggering a debug dump of the :lighting boxes][:"file io" :lighting :run] +[48:50][~RemedyBG feature request: Editable values][:run] +[49:06][Enable the LightBoxDumpTrigger][:"file io" :lighting] +[49:22][Dump the :lighting boxes to file][:"file io" :run] +[49:50][Create hhlightprof.cpp, adding it to build.bat][:lighting] +[56:28][Invoke hhlightprof][:admin :lighting] +[57:12][Fix hhlightprof to correctly get the DumpName][:lighting] +[57:21][Add hhlightprof to ~RemedyBG][:admin] +[58:52][@x13pixels][Might have to append EXE? Thought that worked, though] +[58:57][Launch hhlightprof in ~RemedyBG][:lighting :run] +[59:34][Introduce TestRayCast() in hhlightprof][:lighting] +[1:02:36][Make hhlightprof set up the :lighting Solution from our dump] +[1:06:44][Make hhlightprof initialise the SpecAtlas and DiffuseAtlas, and derive the BoxCount from the dump][:"file io" :lighting] +[1:13:41][Hit a write access violation on the Solution][:lighting :run] +[1:14:06][Initialise the Solution globally][:lighting] +[1:14:29][Successfully :run hhlightprof][:lighting] +[1:14:56][Step through hhlightprof][:lighting :run] +[1:15:45][Fix the order of the arguments to fseek()][:"file io"] +[1:16:03][Step through hhlightprof and inspect the Solution][:lighting :run] +[1:17:26][Increase the BoxCount in an effort to allow room for all our child boxes][:lighting :memory] +[1:18:06][Hit a read access violation on the Box->Radius in BuildSpatialPartitionForLighting()][:lighting :run] +[1:19:00][Revert the BoxCount and instead allocate :memory for four times that number to allow room for child boxes][:lighting :memory] +[1:19:27][Hit a read access violation on the Solution->tUpdateBlend in RayCast()][:lighting :run] +[1:19:34][Make TestRayCast() initialise the Work][:lighting] +[1:20:06][:Run hhlightprof successfully][:lighting] +[1:20:22][Make TestRayCast() set up the :sampling sphere and cast many rays][:lighting] +[1:23:19][:Run hhlightprof, casting all its rays][:lighting] +[1:23:26][Prepare to cast enough rays to last a minimum of 10 seconds][:lighting :research] +[1:25:15][Make TestRayCast() multiply the rays cast by 256][:lighting] +[1:25:25][:Run hhlightprof for just over 10 seconds, without completing][:lighting] +[1:25:36][Decrease the ray multiplier from 256 to 32 in TestRayCast()][:lighting] +[1:25:45][:Run hhlightprof for just almost 10 seconds, without completing][:lighting] +[1:25:54][Decrease the ray multiplier from 32 to 8 in TestRayCast()][:lighting] +[1:26:02][:Run hhlightprof for 9 seconds, to completion][:lighting] +[1:26:12][Decrease the ray multiplier from 8 to 4 in TestRayCast()][:lighting] +[1:26:20][:Run hhlightprof for 5 seconds, to completion][:lighting] +[1:26:25][Prepare to time our ray caster in VTune][:lighting :speech] +[1:28:12][Create a project in VTune for hhlightprof][:admin :lighting] +[1:28:49][A few words on the sheer plethora of performance counters][:speech :profiling] +[1:30:44][Set up our project for hhlightprof][:admin :lighting] +[1:34:03][:Run hhlightprof in VTune][:lighting :profiling] +[1:36:59][:Run a -O2 build of hhlightprof in VTune][:lighting :profiling] +[1:37:11][Increase the ray multiplier from 8 to 32 in TestRayCast()][:lighting] +[1:37:28][:Run hhlightprof for under 1 second, to completion][:lighting] +[1:37:30][Increase the ray multiplier from 32 to 256 in TestRayCast()][:lighting] +[1:37:39][:Run hhlightprof for 5 seconds, to completion][:lighting] +[1:37:46][:Run hhlightprof in VTune][:lighting :profiling] +[1:39:41][Check the Hotspots of hhlightprof in VTune][:lighting :run :profiling] +[1:41:45][Microarchitecture Exploration in VTune][:run :profiling] +[1:43:15][:Run a Microarchitecture Exploration of hhlightprof in VTune][:lighting :profiling] +[1:54:00][:Run a :Memory Access analysis of hhlightprof in VTune][:lighting :profiling] +[1:55:09][Reflect on our isolated ray caster][:lighting :speech] +[1:55:41][Q&A][:speech] +[1:56:23][@yurasniper][Q: Might be a good idea to explain the difference between sampling and instrumentation profilers and how they work on some basic level, and why sampling :profiling is not great idea, despite most people believing and saying that it is very good] +[1:58:39][@lucid_frost][Q: VTune organizes those metrics by something called the "top-down performance analysis methodology". There is a pretty detailed paper that introduced this that would likely help] +[1:58:49][@dragoonx6][Q: Have you ever tried using clang-cl? It's a drop-in MSVC compatible compiler that has much better codegen than MSVC CL. It's compatible with link.exe, but even lld-link will give you usable PDBs. When I used it in my ray tracer, it ended up being 15 times faster than with regular MSVC CL] +[2:01:00][@euphius][Q: Games like CS:GO go up to 300fps. Are they pretty good optimized? Seems like getting [~hero Handmade Hero] to that FPS would be hard?] +[2:01:43][@robgeel][Q: I think you never use the sphere sampling direction in hhlightprof, also when dumping boxes, you write out Solution->BoxCount * sizeof(Solution->Boxes), the sizeof takes the size of a pointer instead of a lighting_box] +[2:01:47][Fix the SampleDirB setting in TestRayCast()][:lighting] +[2:02:04][@lobsang2][Q: What's the status of meowhash? Will it be reaching a new version soon?][:hashing] +[2:02:51][@jim0_o][Q: Have you tried debugging why the stream loses so many frames when you move the character around?] +[2:03:44][@kniffel5][Q: What should meowhash (not) be used for?[ref + site=NohatCoder + page="Hash levels" + url=http://nohatcoder.dk/2019-05-19-1.html]][:hashing] +[2:07:27][@desu_used][Q: Are you sure the hash is "secure"? People have previously pointed out some issues with meowhash, if I recall correctly, generating collisions[ref + site=xxHash + page="Collision ratio comparison" + url=https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison]][:hashing] +[2:10:03][@brian_nevec][Q: What do you use meowhash for?][:hashing] +[2:10:14][@temdisponivel][Q: Would [~hero Handmade Hero] run on a 32-bit system as it is now, or would it need porting?] +[2:10:22][@dragoonx6][Even for security?][:hashing] +[2:10:58][@vtlmks][Q: Not using the SampleDir?][:lighting] +[2:11:01][Fix TestRayCast() to set (and use) RayD][:lighting] +[2:11:22][@kniffel5][Q: Is meowhash cross platform? For ARM, PowerPC, etc?] +[2:12:29][@mindmark42][Q: Any reason we're not using the checkerboard rendering?][:lighting] +[2:12:54][Begin to wind down the stream, with a plug of the upcoming [@naysayer88 Jon] and [@nothings Sean] talk[ref + author="Jonathan Blow" + publisher=Twitter + title="Tomorrow at 3pm Pacific time I'll be streaming another in-depth programming conversation, this time with Sean Barrett (@nothings). We'll start with the topic of making compilers go fast, but who knows where we'll end up. I will post the link when it happens." + url=https://twitter.com/Jonathan_Blow/status/1246143730706337792]][:speech] +[2:14:09][@rationalcoder][Q: You use meowhash for normal hash tables in your everyday code, strings, vectors, etc?][:hashing] +[2:14:39][Anticipate the [@naysayer88 Jon] and [@nothings Sean] talk[ref + site=twitch + page=naysayer88 + url=https://twitch.tv/naysayer88]][:speech] +[2:15:32][Wind down the stream][:speech] +[/video]