[video output=day591 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Making a Stand-alone Lighting Performance Test" vod_platform=youtube id=Rj7nCMEuhMQ annotator=Miblo] [0:01][Recap and set the stage for the day][:speech] [1:00][Plug @x13pixels' ~RemedyBG version 0.3.0.0, with a brief history of Microsoft Visual Studio][:speech] [5:09][Conditional Breakpoints in ~RemedyBG][:run] [6:58][@x13pixels][I didn't end up doing that, no] [7:17][@x13pixels][It was already 15x faster just doing it the "normal" way. Well, okay, there are some tricks under the covers] [7:44][@x13pixels][Yup!] [7:48][Further love for ~RemedyBG[ref site=RemedyBG url=https://remedybg.itch.io/remedybg]][:run] [12:37][Demo the current state of the :lighting][:run] [13:51][30ms per frame][:lighting :performance :run] [15:00][Make ComputeLightPropagationWork() a TIMED_FUNCTION][:"debug system" :lighting] [16:00][Check the Threads :performance][:lighting :run] [18:58][Determine to reduce our time spent ray tracing][:lighting :performance :run] [19:50][Describe our two-branch RayCast()][:lighting :performance :research] [22:25][Why we separated the collision detection and hierarchy traversal code in RayCast()][:lighting :performance :research] [24:22][Inspect the assembly of RayCast()][:asm :lighting :performance :run] [26:21][Describe our k-d-tree-like SplitBox()][:lighting :performance :research] [27:05][Consider speeding up the hierarchy traversal code in RayCast()][:lighting :performance :research] [29:54][Launch VTune][:run] [30:59][Set up to write our :lighting data out to file][:research] [33:50][Make BuildSpatialPartitionForLighting() write out the :lighting boxes to file, introducing DEBUGDumpData() and a Dump platform_file_type][:"file io"] [46:57][Traverse the world out to the dungeon with a view to triggering a debug dump of the :lighting boxes][:"file io" :lighting :run] [48:50][~RemedyBG feature request: Editable values][:run] [49:06][Enable the LightBoxDumpTrigger][:"file io" :lighting] [49:22][Dump the :lighting boxes to file][:"file io" :run] [49:50][Create hhlightprof.cpp, adding it to build.bat][:lighting] [56:28][Invoke hhlightprof][:admin :lighting] [57:12][Fix hhlightprof to correctly get the DumpName][:lighting] [57:21][Add hhlightprof to ~RemedyBG][:admin] [58:52][@x13pixels][Might have to append EXE? Thought that worked, though] [58:57][Launch hhlightprof in ~RemedyBG][:lighting :run] [59:34][Introduce TestRayCast() in hhlightprof][:lighting] [1:02:36][Make hhlightprof set up the :lighting Solution from our dump] [1:06:44][Make hhlightprof initialise the SpecAtlas and DiffuseAtlas, and derive the BoxCount from the dump][:"file io" :lighting] [1:13:41][Hit a write access violation on the Solution][:lighting :run] [1:14:06][Initialise the Solution globally][:lighting] [1:14:29][Successfully :run hhlightprof][:lighting] [1:14:56][Step through hhlightprof][:lighting :run] [1:15:45][Fix the order of the arguments to fseek()][:"file io"] [1:16:03][Step through hhlightprof and inspect the Solution][:lighting :run] [1:17:26][Increase the BoxCount in an effort to allow room for all our child boxes][:lighting :memory] [1:18:06][Hit a read access violation on the Box->Radius in BuildSpatialPartitionForLighting()][:lighting :run] [1:19:00][Revert the BoxCount and instead allocate :memory for four times that number to allow room for child boxes][:lighting :memory] [1:19:27][Hit a read access violation on the Solution->tUpdateBlend in RayCast()][:lighting :run] [1:19:34][Make TestRayCast() initialise the Work][:lighting] [1:20:06][:Run hhlightprof successfully][:lighting] [1:20:22][Make TestRayCast() set up the :sampling sphere and cast many rays][:lighting] [1:23:19][:Run hhlightprof, casting all its rays][:lighting] [1:23:26][Prepare to cast enough rays to last a minimum of 10 seconds][:lighting :research] [1:25:15][Make TestRayCast() multiply the rays cast by 256][:lighting] [1:25:25][:Run hhlightprof for just over 10 seconds, without completing][:lighting] [1:25:36][Decrease the ray multiplier from 256 to 32 in TestRayCast()][:lighting] [1:25:45][:Run hhlightprof for just almost 10 seconds, without completing][:lighting] [1:25:54][Decrease the ray multiplier from 32 to 8 in TestRayCast()][:lighting] [1:26:02][:Run hhlightprof for 9 seconds, to completion][:lighting] [1:26:12][Decrease the ray multiplier from 8 to 4 in TestRayCast()][:lighting] [1:26:20][:Run hhlightprof for 5 seconds, to completion][:lighting] [1:26:25][Prepare to time our ray caster in VTune][:lighting :speech] [1:28:12][Create a project in VTune for hhlightprof][:admin :lighting] [1:28:49][A few words on the sheer plethora of performance counters][:speech :profiling] [1:30:44][Set up our project for hhlightprof][:admin :lighting] [1:34:03][:Run hhlightprof in VTune][:lighting :profiling] [1:36:59][:Run a -O2 build of hhlightprof in VTune][:lighting :profiling] [1:37:11][Increase the ray multiplier from 8 to 32 in TestRayCast()][:lighting] [1:37:28][:Run hhlightprof for under 1 second, to completion][:lighting] [1:37:30][Increase the ray multiplier from 32 to 256 in TestRayCast()][:lighting] [1:37:39][:Run hhlightprof for 5 seconds, to completion][:lighting] [1:37:46][:Run hhlightprof in VTune][:lighting :profiling] [1:39:41][Check the Hotspots of hhlightprof in VTune][:lighting :run :profiling] [1:41:45][Microarchitecture Exploration in VTune][:run :profiling] [1:43:15][:Run a Microarchitecture Exploration of hhlightprof in VTune][:lighting :profiling] [1:54:00][:Run a :Memory Access analysis of hhlightprof in VTune][:lighting :profiling] [1:55:09][Reflect on our isolated ray caster][:lighting :speech] [1:55:41][Q&A][:speech] [1:56:23][@yurasniper][Q: Might be a good idea to explain the difference between sampling and instrumentation profilers and how they work on some basic level, and why sampling :profiling is not great idea, despite most people believing and saying that it is very good] [1:58:39][@lucid_frost][Q: VTune organizes those metrics by something called the "top-down performance analysis methodology". There is a pretty detailed paper that introduced this that would likely help] [1:58:49][@dragoonx6][Q: Have you ever tried using clang-cl? It's a drop-in MSVC compatible compiler that has much better codegen than MSVC CL. It's compatible with link.exe, but even lld-link will give you usable PDBs. When I used it in my ray tracer, it ended up being 15 times faster than with regular MSVC CL] [2:01:00][@euphius][Q: Games like CS:GO go up to 300fps. Are they pretty good optimized? Seems like getting [~hero Handmade Hero] to that FPS would be hard?] [2:01:43][@robgeel][Q: I think you never use the sphere sampling direction in hhlightprof, also when dumping boxes, you write out Solution->BoxCount * sizeof(Solution->Boxes), the sizeof takes the size of a pointer instead of a lighting_box] [2:01:47][Fix the SampleDirB setting in TestRayCast()][:lighting] [2:02:04][@lobsang2][Q: What's the status of meowhash? Will it be reaching a new version soon?][:hashing] [2:02:51][@jim0_o][Q: Have you tried debugging why the stream loses so many frames when you move the character around?] [2:03:44][@kniffel5][Q: What should meowhash (not) be used for?[ref site=NohatCoder page="Hash levels" url=http://nohatcoder.dk/2019-05-19-1.html]][:hashing] [2:07:27][@desu_used][Q: Are you sure the hash is "secure"? People have previously pointed out some issues with meowhash, if I recall correctly, generating collisions[ref site=xxHash page="Collision ratio comparison" url=https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison]][:hashing] [2:10:03][@brian_nevec][Q: What do you use meowhash for?][:hashing] [2:10:14][@temdisponivel][Q: Would [~hero Handmade Hero] run on a 32-bit system as it is now, or would it need porting?] [2:10:22][@dragoonx6][Even for security?][:hashing] [2:10:58][@vtlmks][Q: Not using the SampleDir?][:lighting] [2:11:01][Fix TestRayCast() to set (and use) RayD][:lighting] [2:11:22][@kniffel5][Q: Is meowhash cross platform? For ARM, PowerPC, etc?] [2:12:29][@mindmark42][Q: Any reason we're not using the checkerboard rendering?][:lighting] [2:12:54][Begin to wind down the stream, with a plug of the upcoming [@naysayer88 Jon] and [@nothings Sean] talk[ref author="Jonathan Blow" publisher=Twitter title="Tomorrow at 3pm Pacific time I'll be streaming another in-depth programming conversation, this time with Sean Barrett (@nothings). We'll start with the topic of making compilers go fast, but who knows where we'll end up. I will post the link when it happens." url=https://twitter.com/Jonathan_Blow/status/1246143730706337792]][:speech] [2:14:09][@rationalcoder][Q: You use meowhash for normal hash tables in your everyday code, strings, vectors, etc?][:hashing] [2:14:39][Anticipate the [@naysayer88 Jon] and [@nothings Sean] talk[ref site=twitch page=naysayer88 url=https://twitch.tv/naysayer88]][:speech] [2:15:32][Wind down the stream][:speech] [/video]