[video output=day592 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Capturing the Entire Lighting Data" vod_platform=youtube id=YTIz_eV_BsE annotator=Miblo] [0:02][Welcome to the stream][:speech] [0:08][Plug the Meow the Infinite printed comic Kickstarter[ref site=Kickstarter page="Meow the Infinite: Book One" url=https://www.kickstarter.com/projects/annarettberg/meow-the-infinite-book-one] and the related fun videos at Molly Rocket's YouTube channel[ref site=YouTube page="Molly Rocket" url=https://www.youtube.com/c/MollyRocket]][:research] [4:06][Dive into the code][:speech] [4:46][Fix hhlightprof to allocate :memory for four times the BoxCount to allow room for child boxes][:lighting] [5:12][Determine to pseudo-verify our standalone ray tracer][:lighting :speech] [6:44][Determine to learn VTune][:profiling :speech] [7:19][:Run hhlightprof for 16 seconds, to completion][:lighting] [7:41][Decrease the ray multiplier from 256 to 64 in TestRayCast()][:lighting] [7:51][:Run hhlightprof for 4 seconds, to completion][:lighting] [8:00][Increase the ray multiplier from 64 to 128 in TestRayCast()][:lighting] [8:08][:Run hhlightprof for 7 seconds, to completion][:lighting] [8:15][Launch VTune and hunt its interface for custom analysis capabilities][:profiling :run] [11:45][Launch VTune as administrator][:profiling :run] [12:11][Consult VTune's documentation[ref site="Intel Developer Zone" page="Intel® VTune™ Profiler: Featured Documentation" url=https://software.intel.com/en-us/vtune/documentation/featured-documentation]][:profiling :research] [14:09][:Research VTune's Hardware Event-based Sampling Collection[ref site="Intel Developer Zone" page="Intel® VTune™ Profiler User Guide: Hardware Event-based Sampling Collection" url=https://software.intel.com/en-us/vtune-help-hardware-event-based-sampling-collection]][:profiling] [15:45][:Research VTune's Microarchitecture Exploration Analysis for Hardware Issues[ref site="Intel Developer Zone" page="Intel® VTune™ Profiler User Guide: Microarchitecture Exploration Analysis for Hardware Issues" url=https://software.intel.com/en-us/vtune-help-general-exploration-analysis]][:profiling] [16:02][:Research VTune's Custom Analysis[ref site="Intel Developer Zone" page="Intel® VTune™ Profiler User Guide: Custom Analysis" url=https://software.intel.com/en-us/vtune-help-custom-analysis]][:profiling] [16:27][Create a new "Fruit Salad x12" custom analysis in VTune][:admin :profiling] [18:15][Some words on x64 performance counters][:timing :speech] [21:14][Continue to create our "Fruit Salad x12" custom analysis in VTune][:admin :profiling] [23:25][Rename "Fruit Salad x12" to "No Counters Template" and clone it as "The Ultimate Fruit Salad"][:admin :profiling] [24:36][Enable our desired counters in "The Ultimate Fruit Salad"][:admin :profiling] [25:42][:Run hhlightprof in VTune][:lighting :profiling] [25:58][Consult our VTune Hardware Events analysis][:lighting :profiling :run] [27:44][A few words on 4 µOPs per cycle issuance][:hardware :speech] [29:54][Continue to consult our VTune Hardware Events analysis][:lighting :profiling :run] [30:44][Rename "The Ultimate Fruit Salad" to "Instructions Per Clock" and swap out the UOPS_EXECUTED.CORE_CYCLES_NONE counter for UOPS_EXECUTED.CORE_CYCLES_GE_4][:admin :profiling] [31:58][:Run our "Instructions Per Clock" analysis of hhlightprof][:lighting :profiling] [32:13][Consult our "Instructions Per Clock" VTune analysis][:lighting :profiling :run] [32:42][Interpret our "Instructions Per Clock" VTune analysis][:lighting :profiling :run] [34:36][Get the Work In More Efficiently vs Do the Work More Efficiently][:performance :speech] [35:04][Create a new "Arithmetic Port Usage" custom analysis in VTune][:admin :profiling] [38:16][Understanding port usage with uops.info[ref site=uops.info url=https://uops.info/table.html]][:profiling :research] [43:08][Enable the counters of ports 0, 1 and 5, renaming "Arithmetic Port Usage" to "Float Port Usage"][:admin :profiling] [44:08][:Run our "Float Port Usage" analysis of hhlightprof][:lighting :profiling] [44:23][Consult our "Float Port Usage" VTune analysis][:lighting :profiling :run] [46:54][Create a new "All Ports Usage" custom analysis in VTune][:admin :profiling] [47:44][:Run our "All Ports Usage" analysis of hhlightprof][:lighting :profiling] [48:00][Consult our "All Ports Usage" VTune analysis, noting that port 4 is the write port[ref site=uops.info url=https://uops.info/table.html]][:lighting :profiling :run] [50:26][Interpret our "All Ports Usage" VTune analysis, desiring to Get the Work In More Efficiently][:lighting :profiling :run] [52:34][Determine to validate our code][:lighting :profiling :speech] [53:39][Step through an -Od build of hhlightprof][:lighting :profiling :run] [58:38][Disable the LightBoxDumpTrigger][:"file io" :lighting] [58:46][Continue to step through hhlightprof][:lighting :profiling :run] [1:02:25][Make EndLightingComputation() call DEBUGDumpData() on the SpecAtlas, DiffuseAtlas, and the entire :lighting Solution][:"file io"] [1:07:31][Dump the :lighting data to file][:"file io" :run] [1:08:29][Check out our :lighting debug dumps][:admin] [1:08:52][Make TestRayCast() in hhlightprof do the full ComputeLightPropagationWork(), replacing Commands in the lighting_work with the DiffuseAtlas and SpecAtlas][:"data structure" :lighting] [1:14:53][Introduce LoadEntireFile() in hhlightprof, and load in all our :lighting dumps][:"file io"] [1:21:01][Set up hhlightprof to use our loaded dumps][:lighting] [1:25:12][Hit a write access violation on the Solution->SamplingSpheres][:lighting :run] [1:25:41][Hit a write access violation on Byte in ZeroSize()][:lighting :run] [1:25:45][Allocate :memory for the Solution->Works][:lighting] [1:28:11][:Run hhlightprof successfully][:lighting] [1:28:21][Make hhlightprof validate our SpecAtlas texels][:lighting] [1:31:14][:Run hhlightprof with non-zero errors][:lighting] [1:31:57][:Run hhlightprof in -O2 with non-zero errors][:lighting] [1:32:02][Make hhlightprof print the TexelCount][:lighting] [1:32:28][:Run hhlightprof with non-zero errors][:lighting] [1:32:51][Try making hhlightprof overwrite the tUpdateBlend as 1.0f][:lighting] [1:33:49][:Run hhlightprof with a similar number of errors][:lighting] [1:34:32][Double-check our SpecAtlas validity checking][:lighting :research] [1:35:04][Try making hhlightprof overwrite the tUpdateBlend as 0.0f][:lighting] [1:35:22][:Run hhlightprof with a changed number of errors][:lighting] [1:35:41][Try making hhlightprof overwrite the tUpdateBlend as 100.0f][:lighting] [1:35:55][:Run hhlightprof with an infinite number of errors][:lighting] [1:36:03][Try making hhlightprof overwrite the tUpdateBlend as 10.0f][:lighting] [1:36:14][:Run hhlightprof with many errors][:lighting] [1:36:16][Try making hhlightprof overwrite the tUpdateBlend as 1.0f][:lighting] [1:36:22][:Run hhlightprof with few errors][:lighting] [1:36:33][Prevent hhlightprof from overwriting the tUpdateBlend][:lighting] [1:36:41][Continue to scour hhlightprof and [~hero Handmade Hero]'s :lighting system for inconsistencies][:research] [1:43:47][Make TestRayCast() set the Work->VoxelX][:lighting] [1:44:39][:Run hhlightprof, still with errors][:lighting] [1:44:54][Introduce InternalLightingCore() to perform our dumps][:lighting] [1:49:22][:Run hhlightprof, still with errors][:lighting] [1:49:33][Q&A][:speech] [1:49:58][@somebody_took_my_name][Q: Is it still building under the overdose flag?] [1:50:32][@uplinkcoder][Q: Maybe capture a single threaded run?][:lighting :threading] [1:52:05][@wakeuphate][Q: On your UOPS info table[ref site=uops.info url=https://uops.info/table.html] earlier in the stream, you were set to Skylake rather than Kaby Lake, just in case that's why you were seeing issues then] [1:55:29][@dataqsloth][Q: Have you considered releasing your C course earlier as a beta release (even full price) just because so many of us and people we know are currently on quarantine!] [1:55:59][@ikojan][How long would the course be?] [1:57:24][@vaualbus][Q: Is it really bad on x86 to do unaligned pointers? Because I have a packed structure with pointers and the compiler is warning me about unaligned pointers. (Apparently there is a declspec to allow unaligned pointers?)][:memory] [2:00:15][@leo0230][Q: Do instructions with memory operands actually use the arithmetic units before the data is ready?[ref site=uops.info url=https://uops.info/table.html]][:hardware] [2:03:03][@enyo_enev][Q: Is it going to be useful for hardcore fans of [~hero Handmade Hero]?] [2:03:51][@ikojan][Q: Will the course be project-based, where throughout the course you'll be making x or y project?] [2:04:00][@mindmark42][Q: Is there an API to access the performance counters so you don't have to use VTune?] [2:05:45][Wrap it up with a plug of the Meow the Infinite printed comic Kickstarter[ref site=Kickstarter page="Meow the Infinite: Book One" url=https://www.kickstarter.com/projects/annarettberg/meow-the-infinite-book-one] and related fun videos at Molly Rocket's YouTube channel[ref site=YouTube page="Molly Rocket" url=https://www.youtube.com/c/MollyRocket]][:speech] [/video]