cinera_handmade.network/cmuratori/hero/code/code591.hmml

112 lines
8.9 KiB
Plaintext
Raw Permalink Normal View History

[video output=day591 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Making a Stand-alone Lighting Performance Test" vod_platform=youtube id=Rj7nCMEuhMQ annotator=Miblo]
2020-04-06 18:39:52 +00:00
[0:01][Recap and set the stage for the day][:speech]
[1:00][Plug @x13pixels' ~RemedyBG version 0.3.0.0, with a brief history of Microsoft Visual Studio][:speech]
[5:09][Conditional Breakpoints in ~RemedyBG][:run]
[6:58][@x13pixels][I didn't end up doing that, no]
[7:17][@x13pixels][It was already 15x faster just doing it the "normal" way. Well, okay, there are some tricks under the covers]
[7:44][@x13pixels][Yup!]
[7:48][Further love for ~RemedyBG[ref
site=RemedyBG
url=https://remedybg.itch.io/remedybg]][:run]
[12:37][Demo the current state of the :lighting][:run]
[13:51][30ms per frame][:lighting :performance :run]
[15:00][Make ComputeLightPropagationWork() a TIMED_FUNCTION][:"debug system" :lighting]
[16:00][Check the Threads :performance][:lighting :run]
[18:58][Determine to reduce our time spent ray tracing][:lighting :performance :run]
[19:50][Describe our two-branch RayCast()][:lighting :performance :research]
[22:25][Why we separated the collision detection and hierarchy traversal code in RayCast()][:lighting :performance :research]
[24:22][Inspect the assembly of RayCast()][:asm :lighting :performance :run]
[26:21][Describe our k-d-tree-like SplitBox()][:lighting :performance :research]
[27:05][Consider speeding up the hierarchy traversal code in RayCast()][:lighting :performance :research]
[29:54][Launch VTune][:run]
[30:59][Set up to write our :lighting data out to file][:research]
[33:50][Make BuildSpatialPartitionForLighting() write out the :lighting boxes to file, introducing DEBUGDumpData() and a Dump platform_file_type][:"file io"]
[46:57][Traverse the world out to the dungeon with a view to triggering a debug dump of the :lighting boxes][:"file io" :lighting :run]
[48:50][~RemedyBG feature request: Editable values][:run]
[49:06][Enable the LightBoxDumpTrigger][:"file io" :lighting]
[49:22][Dump the :lighting boxes to file][:"file io" :run]
[49:50][Create hhlightprof.cpp, adding it to build.bat][:lighting]
[56:28][Invoke hhlightprof][:admin :lighting]
[57:12][Fix hhlightprof to correctly get the DumpName][:lighting]
[57:21][Add hhlightprof to ~RemedyBG][:admin]
[58:52][@x13pixels][Might have to append EXE? Thought that worked, though]
[58:57][Launch hhlightprof in ~RemedyBG][:lighting :run]
[59:34][Introduce TestRayCast() in hhlightprof][:lighting]
[1:02:36][Make hhlightprof set up the :lighting Solution from our dump]
[1:06:44][Make hhlightprof initialise the SpecAtlas and DiffuseAtlas, and derive the BoxCount from the dump][:"file io" :lighting]
[1:13:41][Hit a write access violation on the Solution][:lighting :run]
[1:14:06][Initialise the Solution globally][:lighting]
[1:14:29][Successfully :run hhlightprof][:lighting]
[1:14:56][Step through hhlightprof][:lighting :run]
[1:15:45][Fix the order of the arguments to fseek()][:"file io"]
[1:16:03][Step through hhlightprof and inspect the Solution][:lighting :run]
[1:17:26][Increase the BoxCount in an effort to allow room for all our child boxes][:lighting :memory]
[1:18:06][Hit a read access violation on the Box->Radius in BuildSpatialPartitionForLighting()][:lighting :run]
[1:19:00][Revert the BoxCount and instead allocate :memory for four times that number to allow room for child boxes][:lighting :memory]
[1:19:27][Hit a read access violation on the Solution->tUpdateBlend in RayCast()][:lighting :run]
[1:19:34][Make TestRayCast() initialise the Work][:lighting]
[1:20:06][:Run hhlightprof successfully][:lighting]
[1:20:22][Make TestRayCast() set up the :sampling sphere and cast many rays][:lighting]
[1:23:19][:Run hhlightprof, casting all its rays][:lighting]
[1:23:26][Prepare to cast enough rays to last a minimum of 10 seconds][:lighting :research]
[1:25:15][Make TestRayCast() multiply the rays cast by 256][:lighting]
[1:25:25][:Run hhlightprof for just over 10 seconds, without completing][:lighting]
[1:25:36][Decrease the ray multiplier from 256 to 32 in TestRayCast()][:lighting]
[1:25:45][:Run hhlightprof for just almost 10 seconds, without completing][:lighting]
[1:25:54][Decrease the ray multiplier from 32 to 8 in TestRayCast()][:lighting]
[1:26:02][:Run hhlightprof for 9 seconds, to completion][:lighting]
[1:26:12][Decrease the ray multiplier from 8 to 4 in TestRayCast()][:lighting]
[1:26:20][:Run hhlightprof for 5 seconds, to completion][:lighting]
[1:26:25][Prepare to time our ray caster in VTune][:lighting :speech]
[1:28:12][Create a project in VTune for hhlightprof][:admin :lighting]
[1:28:49][A few words on the sheer plethora of performance counters][:speech :profiling]
[1:30:44][Set up our project for hhlightprof][:admin :lighting]
[1:34:03][:Run hhlightprof in VTune][:lighting :profiling]
[1:36:59][:Run a -O2 build of hhlightprof in VTune][:lighting :profiling]
[1:37:11][Increase the ray multiplier from 8 to 32 in TestRayCast()][:lighting]
[1:37:28][:Run hhlightprof for under 1 second, to completion][:lighting]
[1:37:30][Increase the ray multiplier from 32 to 256 in TestRayCast()][:lighting]
[1:37:39][:Run hhlightprof for 5 seconds, to completion][:lighting]
[1:37:46][:Run hhlightprof in VTune][:lighting :profiling]
[1:39:41][Check the Hotspots of hhlightprof in VTune][:lighting :run :profiling]
[1:41:45][Microarchitecture Exploration in VTune][:run :profiling]
[1:43:15][:Run a Microarchitecture Exploration of hhlightprof in VTune][:lighting :profiling]
[1:54:00][:Run a :Memory Access analysis of hhlightprof in VTune][:lighting :profiling]
[1:55:09][Reflect on our isolated ray caster][:lighting :speech]
[1:55:41][Q&A][:speech]
[1:56:23][@yurasniper][Q: Might be a good idea to explain the difference between sampling and instrumentation profilers and how they work on some basic level, and why sampling :profiling is not great idea, despite most people believing and saying that it is very good]
[1:58:39][@lucid_frost][Q: VTune organizes those metrics by something called the "top-down performance analysis methodology". There is a pretty detailed paper that introduced this that would likely help]
[1:58:49][@dragoonx6][Q: Have you ever tried using clang-cl? It's a drop-in MSVC compatible compiler that has much better codegen than MSVC CL. It's compatible with link.exe, but even lld-link will give you usable PDBs. When I used it in my ray tracer, it ended up being 15 times faster than with regular MSVC CL]
[2:01:00][@euphius][Q: Games like CS:GO go up to 300fps. Are they pretty good optimized? Seems like getting [~hero Handmade Hero] to that FPS would be hard?]
[2:01:43][@robgeel][Q: I think you never use the sphere sampling direction in hhlightprof, also when dumping boxes, you write out Solution->BoxCount * sizeof(Solution->Boxes), the sizeof takes the size of a pointer instead of a lighting_box]
[2:01:47][Fix the SampleDirB setting in TestRayCast()][:lighting]
[2:02:04][@lobsang2][Q: What's the status of meowhash? Will it be reaching a new version soon?][:hashing]
[2:02:51][@jim0_o][Q: Have you tried debugging why the stream loses so many frames when you move the character around?]
[2:03:44][@kniffel5][Q: What should meowhash (not) be used for?[ref
site=NohatCoder
page="Hash levels"
url=http://nohatcoder.dk/2019-05-19-1.html]][:hashing]
[2:07:27][@desu_used][Q: Are you sure the hash is "secure"? People have previously pointed out some issues with meowhash, if I recall correctly, generating collisions[ref
site=xxHash
page="Collision ratio comparison"
url=https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison]][:hashing]
[2:10:03][@brian_nevec][Q: What do you use meowhash for?][:hashing]
[2:10:14][@temdisponivel][Q: Would [~hero Handmade Hero] run on a 32-bit system as it is now, or would it need porting?]
[2:10:22][@dragoonx6][Even for security?][:hashing]
[2:10:58][@vtlmks][Q: Not using the SampleDir?][:lighting]
[2:11:01][Fix TestRayCast() to set (and use) RayD][:lighting]
[2:11:22][@kniffel5][Q: Is meowhash cross platform? For ARM, PowerPC, etc?]
[2:12:29][@mindmark42][Q: Any reason we're not using the checkerboard rendering?][:lighting]
[2:12:54][Begin to wind down the stream, with a plug of the upcoming [@naysayer88 Jon] and [@nothings Sean] talk[ref
author="Jonathan Blow"
publisher=Twitter
title="Tomorrow at 3pm Pacific time I'll be streaming another in-depth programming conversation, this time with Sean Barrett (@nothings). We'll start with the topic of making compilers go fast, but who knows where we'll end up. I will post the link when it happens."
url=https://twitter.com/Jonathan_Blow/status/1246143730706337792]][:speech]
[2:14:09][@rationalcoder][Q: You use meowhash for normal hash tables in your everyday code, strings, vectors, etc?][:hashing]
[2:14:39][Anticipate the [@naysayer88 Jon] and [@nothings Sean] talk[ref
site=twitch
page=naysayer88
url=https://twitch.tv/naysayer88]][:speech]
[2:15:32][Wind down the stream][:speech]
[/video]