cinera_handmade.network/cmuratori/hero/ray/ray02.hmml

94 lines
7.0 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=ray title="Replacing rand() and Preparing for SIMD" vod_platform=youtube id=xBBEkn1x7So annotator=Miblo]
[0:06][Recap and set the stage for the day][:speech]
[1:38][Note that we're building in optimised mode][:speech]
[2:15][:Run and see our output image]
[3:39][ray.cpp: Walk through the code][:speech]
[5:23][Consider two areas of :optimisation: 1) Bounding Volume Hierarchy][:speech]
[6:57][2) Using better :math operations][:optimisation :speech]
[7:42][Step into RenderTile() and inspect the :asm, noting down routines to improve]
[15:51][Check out PCG, A Family of Better Random Number Generators[ref
site="PCG, A Family of Better Random Number Generators"
url=http://www.pcg-random.org/] with a recommendation to read the full paper[ref
author="Melissa E. ONeill"
title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation"
url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf]][:prng :research]
[24:13][Check out the x86 SSE2 shift-left instructions[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/]][:isa :research]
[27:59][Read 6.3 - Specific Implementations[ref
author="Melissa E. ONeill"
title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation"
url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf] and the Xorshift wiki article[ref
site=Wikipedia
page=Xorshift
url=https://en.wikipedia.org/wiki/Xorshift]][:prng :research]
[31:15][Introduce XOrShift32() from Wikipedia[ref
site=Wikipedia
page=Xorshift
url=https://en.wikipedia.org/wiki/Xorshift] with a check into doing this in a 64-bit[ref
author="Melissa E. ONeill"
title="PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation"
url=http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf]]
[37:45][:Run our program to get a benchmark :timing]
[38:59][Replace rand() with our new XOrShift32(), packing Entropy in the work_order struct][:optimisation :prng]
[46:39][:Run to see no obvious problems with our output, and note our dramatically improved :performance]
[48:47][Step into the code and inspect the :asm to see a lot of mulss calls]
[51:30][Introduce CastSampleRays() to do some of the work of RenderTile()][:lighting :rendering]
[58:21][:Run to see that we lose some speed]
[59:12][Make RenderTile() only use a random_series in its inner loop][:lighting :rendering]
[1:00:07][:Run to see that that's a little bit better]
[1:01:07][Rename cast_result to cast_state which contains both the input and output data][:lighting :rendering]
[1:08:12][:Run to see some busted imagery]
[1:08:58][Fix RenderTile() to correctly fill out the cast_state State][:lighting :rendering]
[1:12:20][:Run to see that that helps]
[1:13:05][Consider how to perform this ray casting wide][:lighting :optimisation :rendering :speech]
[1:18:04][Transform CastSampleRays() to handle the notion of operating wide][:lighting :optimisation :rendering :speech]
[1:19:06][:Run to see that it runs roughly four times faster, and that the image now contains tile-boundary artifacts]
[1:21:08][Temporarily revert RandomUnlateral() to use rand()][:prng]
[1:21:38][:Run to see no artifacts, and note that the XOrShift32() needs improving]
[1:22:46][Sketch in the code to enable CastSampleRays() to operate wide][:lighting :optimisation :rendering]
[1:33:17][Describe our current situation][:speech]
[1:34:11][Set up CastSampleRays() to let all rays in all lanes finish][:lighting :optimisation :rendering]
[1:38:21][Consider how to track the materials wide][:lighting :optimisation :rendering :speech]
[1:40:08][Set up CastSampleRays() to track the materials wide and collate all the computations][:lighting :optimisation :rendering]
[1:52:29][Create ray_lane.h to #define the lanes, and introduce RandomBilateralLane(), various permutations of ConditionalAssign(), a Max(), MaskIsZeroed() and versions of HorizontalAdd()][:optimisation :prng]
[2:03:39][:Run and see totally busted imagery]
[2:04:23][Build in debug mode and on one core]
[2:05:40][Step in to CastSampleRays() and inspect its values]
[2:05:56][Make CastSampleRays() set FilmX and FilmY to their centres][:lighting :rendering]
[2:07:14][Step in to CastSampleRays() and see that the State->Series and Order->Entropy are both 0]
[2:08:36][Make CastSampleRays() offset the Entropy and use different random series per ray][:prng]
[2:09:27][Step in to CastSampleRays() and note that the ConditionalAssign() is wrong]
[2:10:44][Make ConditionalAssign() zero the Mask if there is nothing set in it]
[2:11:20][Step in to ConditionalAssign() to see that that is better]
[2:11:41][:Run to see how the picture looks]
[2:13:24][View the image][:run]
[2:13:49][Reduce the RayCount and increase the CoreCount][:lighting :rendering]
[2:14:49][Investigate the summation][:lighting :rendering]
[2:17:53][Make CastSampleRays() correctly set the LaneMask][:lighting :rendering]
[2:18:35][:Run and see a more correct image]
[2:18:52][Switch back to the optimised version, with more RaysPerPixel]
[2:19:09][:Run to see that we're darker]
[2:20:13][Correctly set the LaneWidth][:lighting :rendering]
[2:21:20][:Run and see that the images are basically indistinguishable]
[2:22:12][Set up to support a constrained set of LANE_WIDTH values][:optimisation]
[2:30:05][:Run to see that XOrShift32() is actually fine]
[2:31:45][Do LANE_WIDTH==8 too][:optimisation]
[2:32:43][Q&A][:speech]
[2:33:46][@yurasniper][Q: How would one implement something like bloom effect in a raytracer?][:lighting :rendering]
[2:39:46][:Run our program to capture its :performance statistics]
[2:42:07][@macielda][Q: Is the Halton 2,3 sequence a good way to generate sample positions? I've heard about some people using it. It is a low discrepancy series][:prng]
[2:43:11][Rename our image and stat files][:admin]
[2:44:30][@vaualbus][Q: When you learn this way of doing SIMD? I remember in [~hero Handmade Hero] when we had optimized the renderer we use __m128 every way][:optimisation]
[2:46:05][@macielda][Q: What is your take on AA methods? I'm currently looking for one for my game. I see The Witness has MSAA option only (no FXAA, TXAA and friends)?][:rendering]
[2:46:31][@longboolean][Q: Are there any machines with :hardware RNG that just puts random values into a register with one instruction?[ref
site=Wikipedia
page=RdRand
url=https://en.wikipedia.org/wiki/RdRand]][:prng]
[2:48:44][@pseudonym73][Q: G'day, long time no stream. Low-discrepancy sequences do exhibit blue noise behaviours if you do them right, but their main advantage is that you can access the quasi-random streams in an arbitrary order. Not really relevant yet. Also, you can do better than 2,3 Halton][:prng]
[2:49:37][@macielda][Q: Do shader languages expose things like "Conditional Assign"?][:language]
[2:51:14][Ensure that everything is in good shape][:admin]
[2:52:14][Shut down][:speech]
[/video]