From d599e075c2af597bb3c618bf1ac4e306e4da5e8f Mon Sep 17 00:00:00 2001 From: Matt Mascarenhas Date: Thu, 28 Dec 2017 18:35:32 +0000 Subject: [PATCH] Annotate ray02 --- cmuratori/hero/ray/ray02.hmml | 93 +++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 cmuratori/hero/ray/ray02.hmml diff --git a/cmuratori/hero/ray/ray02.hmml b/cmuratori/hero/ray/ray02.hmml new file mode 100644 index 0000000..504fc05 --- /dev/null +++ b/cmuratori/hero/ray/ray02.hmml @@ -0,0 +1,93 @@ +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=ray title="Replacing rand() and Preparing for SIMD" vod_platform=youtube id=dpvrPYdTkPw annotator=Miblo] +[0:06][Recap and set the stage for the day][:speech] +[1:38][Note that we're building in optimised mode][:speech] +[2:15][:Run and see our output image] +[3:39][ray.cpp: Walk through the code][:speech] +[5:23][Consider two areas of :optimisation: 1) Bounding Volume Hierarchy][:speech] +[6:57][2) Using better :math operations][:optimisation :speech] +[7:42][Step into RenderTile() and inspect the :asm, noting down routines to improve] +[15:51][Check out PCG, A Family of Better Random Number Generators[ref + site="PCG, A Family of Better Random Number Generators" + url=http://www.pcg-random.org/] with a recommendation to read the full paper[ref + author="Melissa E. O’Neill" + title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation" + url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf]][:prng :research] +[24:13][Check out the x86 SSE2 shift-left instructions[ref + site=Intel + page="Intel Intrinsics Guide" + url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:isa :research] +[27:59][Read 6.3 - Specific Implementations[ref + author="Melissa E. O’Neill" + title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation" + url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf] and the Xorshift wiki article[ref + site=Wikipedia + page=Xorshift + url=https://en.wikipedia.org/wiki/Xorshift]][:prng :research] +[31:15][Introduce XOrShift32() from Wikipedia[ref + site=Wikipedia + page=Xorshift + url=https://en.wikipedia.org/wiki/Xorshift] with a check into doing this in a 64-bit[ref + author="Melissa E. O’Neill" + title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation" + url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf]] +[37:45][:Run our program to get a benchmark :timing] +[38:59][Replace rand() with our new XOrShift32(), packing Entropy in the work_order struct][:optimisation :prng] +[46:39][:Run to see no obvious problems with our output, and note our dramatically improved :performance] +[48:47][Step into the code and inspect the :asm to see a lot of mulss calls] +[51:30][Introduce CastSampleRays() to do some of the work of RenderTile()][:lighting :rendering] +[58:21][:Run to see that we lose some speed] +[59:12][Make RenderTile() only use a random_series in its inner loop][:lighting :rendering] +[1:00:07][:Run to see that that's a little bit better] +[1:01:07][Rename cast_result to cast_state which contains both the input and output data][:lighting :rendering] +[1:08:12][:Run to see some busted imagery] +[1:08:58][Fix RenderTile() to correctly fill out the cast_state State][:lighting :rendering] +[1:12:20][:Run to see that that helps] +[1:13:05][Consider how to perform this ray casting wide][:lighting :optimisation :rendering :speech] +[1:18:04][Transform CastSampleRays() to handle the notion of operating wide][:lighting :optimisation :rendering :speech] +[1:19:06][:Run to see that it runs roughly four times faster, and that the image now contains tile-boundary artifacts] +[1:21:08][Temporarily revert RandomUnlateral() to use rand()][:prng] +[1:21:38][:Run to see no artifacts, and note that the XOrShift32() needs improving] +[1:22:46][Sketch in the code to enable CastSampleRays() to operate wide][:lighting :optimisation :rendering] +[1:33:17][Describe our current situation][:speech] +[1:34:11][Set up CastSampleRays() to let all rays in all lanes finish][:lighting :optimisation :rendering] +[1:38:21][Consider how to track the materials wide][:lighting :optimisation :rendering :speech] +[1:40:08][Set up CastSampleRays() to track the materials wide and collate all the computations][:lighting :optimisation :rendering] +[1:52:29][Create ray_lane.h to #define the lanes, and introduce RandomBilateralLane(), various permutations of ConditionalAssign(), a Max(), MaskIsZeroed() and versions of HorizontalAdd()][:optimisation :prng] +[2:03:39][:Run and see totally busted imagery] +[2:04:23][Build in debug mode and on one core] +[2:05:40][Step in to CastSampleRays() and inspect its values] +[2:05:56][Make CastSampleRays() set FilmX and FilmY to their centres][:lighting :rendering] +[2:07:14][Step in to CastSampleRays() and see that the State->Series and Order->Entropy are both 0] +[2:08:36][Make CastSampleRays() offset the Entropy and use different random series per ray][:prng] +[2:09:27][Step in to CastSampleRays() and note that the ConditionalAssign() is wrong] +[2:10:44][Make ConditionalAssign() zero the Mask if there is nothing set in it] +[2:11:20][Step in to ConditionalAssign() to see that that is better] +[2:11:41][:Run to see how the picture looks] +[2:13:24][View the image][:run] +[2:13:49][Reduce the RayCount and increase the CoreCount][:lighting :rendering] +[2:14:49][Investigate the summation][:lighting :rendering] +[2:17:53][Make CastSampleRays() correctly set the LaneMask][:lighting :rendering] +[2:18:35][:Run and see a more correct image] +[2:18:52][Switch back to the optimised version, with more RaysPerPixel] +[2:19:09][:Run to see that we're darker] +[2:20:13][Correctly set the LaneWidth][:lighting :rendering] +[2:21:20][:Run and see that the images are basically indistinguishable] +[2:22:12][Set up to support a constrained set of LANE_WIDTH values][:optimisation] +[2:30:05][:Run to see that XOrShift32() is actually fine] +[2:31:45][Do LANE_WIDTH==8 too][:optimisation] +[2:32:43][Q&A] +[2:33:46][@yurasniper][Q: How would one implement something like bloom effect in a raytracer?][:lighting :rendering] +[2:39:46][:Run our program to capture its :performance statistics] +[2:42:07][@macielda][Q: Is the Halton 2,3 sequence a good way to generate sample positions? I've heard about some people using it. It is a low discrepancy series][:prng] +[2:43:11][Rename our image and stat files][:admin] +[2:44:30][@vaualbus][Q: When you learn this way of doing SIMD? I remember in [~hero Handmade Hero] when we had optimized the renderer we use __m128 every way][:optimisation] +[2:46:05][@macielda][Q: What is your take on AA methods? I'm currently looking for one for my game. I see The Witness has MSAA option only (no FXAA, TXAA and friends)?][:rendering] +[2:46:31][@longboolean][Q: Are there any machines with :hardware RNG that just puts random values into a register with one instruction?[ref + site=Wikipedia + page=RdRand + url=https://en.wikipedia.org/wiki/RdRand]][:prng] +[2:48:44][@pseudonym73][Q: G'day, long time no stream. Low-discrepancy sequences do exhibit blue noise behaviours if you do them right, but their main advantage is that you can access the quasi-random streams in an arbitrary order. Not really relevant yet. Also, you can do better than 2,3 Halton][:prng] +[2:49:37][@macielda][Q: Do shader languages expose things like "Conditional Assign"?][:language] +[2:51:14][Ensure that everything is in good shape][:admin] +[2:52:14][Shut down][:speech] +[/video]