cinera_handmade.network/cmuratori/hero/ray/ray01.hmml

[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=ray title="Multithreading" vod_platform=youtube id=ZAeU3Z0PmcU annotator=Miblo]
[0:09][Recap and set the stage for the day]
[1:15][ray.cpp: Rename RayCount to BounceCount]
[1:46][View our image and determine to perform better material processing and to optimise]
[8:28][ray.cpp: Figure out the resolution of clock_t[ref
    site="Microsoft Docs"
    page="clock"
    url="https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/clock"]]
[12:56][ray.cpp. Introduce a timer using clock() from time.h]
[19:15][Run our ray caster and see the timer results, noting that the ms/bounce remains constant for various bounce counts]
[20:55][Consider the performance gain we may achieve by multithreading]
[25:29][Determine to perform the usual way of doing ray tracing, using tiles that are square shaped]
[26:26][ray.cpp: Introduce RenderTile() and GetPixelPointer()]
[32:25][ray.cpp: Break up the world into multiple tiles]
[43:36][ray.cpp: Add TileRetiredCount to the world struct]
[48:38][ray.cpp: Inline the contents of RayCast() in RenderTile()]
[54:08][ray.h: Introduce work_queue and work_order to enable multithreading]
[56:17][Describe volatile]
[1:01:09][ray.cpp: Initialise a work_queue and push our RenderTile() calls onto it]
[1:06:55][Run our ray tracer to see that everything's good]
[1:07:14][ray.cpp: Make RenderTile() only take a work_queue and get its work_order off that queue itself]
[1:10:51][Run to see that all is well]
[1:12:09][ray.cpp: Set us up for multithreading by introducing LockedAddAndReturnPreviousValue() and CreateWorkThread()]
[1:21:09][ray.cpp: Implement CreateWorkThread() using CreateThread()[ref
    site=MSDN
    page="CreateThread function"
    url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms682453.aspx][ref
        site=MSDN
        page="ThreadProc callback function"
        url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms686736.aspx] and introduce WorkerThread()]
[1:27:08][Run to see that our speed has improved dramatically]
[1:28:25][ray.cpp: Make LockedAddAndReturnPreviousValue() call InterlockedExchangeAdd64()[ref
    site=MSDN
    page="InterlockedExchangeAdd function"
    url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms683597.aspx]]
[1:29:27][Run to see our 5x speed-up]
[1:30:24][ray.cpp: Fuss with the CoreCount and Tile sizes to see if they affect the speed]
[1:32:38][Blackboard: Drain-out]
[1:35:23][ray.cpp: Introduce GetCPUCoreCount()[ref
    site="MSDN"
    page="SYSTEM_INFO structure"
    url="https://msdn.microsoft.com/en-us/library/windows/desktop/ms724958.aspx"]]
[1:39:18][win32_ray.cpp: Pull in Windows-specific functions from ray.cpp]
[1:42:29][build.bat: Prevent our program from running if compilation fails]
[1:47:04][ray.cpp: Crank up the RaysPerPixel from 16 to 512 and view our smooth image]
[1:48:20][Run our program on the command line]
[1:49:32][ray.h: Add RaysPerPixel and MaxBounceCount to work_queue]
[1:52:11][Run to see all the information we need, and note that the next step will be SIMD]
[1:53:34][Q&A][:speech]
[1:53:45][@dautor][Could you please make a portal (changing ray position and orientation upon hitting the portal to come out of the second portal)?]
[1:54:00][@syanoks][Do you assume the x86 memory model in your code?]
[1:54:27][@dautor][What about const volatile and register together in a single declaration? (I saw it once in an implementation for a BLE stack)]
[1:55:25][ray.cpp: Log stats to stderr]
[1:56:43][@macielda][What is the difference between Path Tracing and Ray Tracing?]
[1:59:24][@LongBoolean][Now that the raysPerPixel and maxBounceCount are parameters, you could potentially give different tiles different settings, to compare quality side by side on the same image?]
[1:59:44][@siltnamis][What do you think about OpenMP?]
[1:59:48][@3dextended][What is SIMD \[sic\]?]
[2:01:05][@seventh_chord][Do you get effects like bloom / lens flare "for free" if you implement more realistic camera stuff?]
[2:02:46][@butwhynot1][Isn't the uniform random sampling bad? I thought you need to use blue noise or Poisson-Disc sampling or something?]
[2:03:40][@syanoks][ARM memory model is extremely relaxed, though]
[2:03:56][@garryjohanson][So you mentioned that hyperthreading can schedule instructions on separate ALUs of the same core. Did I hear that correctly or did I misconstrue what you said in my mind?]
[2:06:30][@ttbjm][Would you change anything if you were using a Ryzen / Threadripper CPU that has 4 core core-complexes?]
[2:07:20][@macielda][Can you give a quick overview of how Importance Sampling works?]
[2:07:25][@ray_caster][Why were you trying to kill me? I was working very hard]
[2:07:36][@dautor][Could you maybe do a GJK implementation on a test stream some time?]
[2:07:45][@thisdrunkdane][Is AVX2 common enough to use it in, like, a game?]
[2:08:27][@gordolani][Will the raytracer program in the future use GPU to do the ray traces?]
[2:08:31][@y_ah][Where can we find the code?]
[2:08:46][@3dextended][Wouldn't it be more sufficient to make an entry for each ray to split up the work even more evenly and also this could work on the GPU?]
[2:09:16][@jim0_o][How much different is this raytracer to those with "live previews" where you see the image get better starting low res and goes higher and higher? (I assume they don't render line by line?)]
[2:10:23][@LongBoolean][Are there any things that you need to do to take extra advantage of hyperthreading CPUs or do you treat threading the same as a non-hyperthreading CPU?]
[2:11:33][The ARM memory model[ref
    site="Paul E. McKenney"
    page="Is Parallel Programming Hard, And, If So, What Can You Do About It?"
    url="https://arxiv.org/pdf/1701.00854.pdf"]][:research]
[2:18:50][That's it for today][:speech]
[/video]
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=ray title="Multithreading" vod_platform=youtube id=ZAeU3Z0PmcU annotator=Miblo]`
			`[0:09][Recap and set the stage for the day]`
			`[1:15][ray.cpp: Rename RayCount to BounceCount]`
			`[1:46][View our image and determine to perform better material processing and to optimise]`
			`[8:28][ray.cpp: Figure out the resolution of clock_t[ref`
			`site="Microsoft Docs"`
			`page="clock"`
			`url="https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/clock"]]`
			`[12:56][ray.cpp. Introduce a timer using clock() from time.h]`
			`[19:15][Run our ray caster and see the timer results, noting that the ms/bounce remains constant for various bounce counts]`
			`[20:55][Consider the performance gain we may achieve by multithreading]`
			`[25:29][Determine to perform the usual way of doing ray tracing, using tiles that are square shaped]`
			`[26:26][ray.cpp: Introduce RenderTile() and GetPixelPointer()]`
			`[32:25][ray.cpp: Break up the world into multiple tiles]`
			`[43:36][ray.cpp: Add TileRetiredCount to the world struct]`
			`[48:38][ray.cpp: Inline the contents of RayCast() in RenderTile()]`
			`[54:08][ray.h: Introduce work_queue and work_order to enable multithreading]`
			`[56:17][Describe volatile]`
			`[1:01:09][ray.cpp: Initialise a work_queue and push our RenderTile() calls onto it]`
			`[1:06:55][Run our ray tracer to see that everything's good]`
			`[1:07:14][ray.cpp: Make RenderTile() only take a work_queue and get its work_order off that queue itself]`
			`[1:10:51][Run to see that all is well]`
			`[1:12:09][ray.cpp: Set us up for multithreading by introducing LockedAddAndReturnPreviousValue() and CreateWorkThread()]`
			`[1:21:09][ray.cpp: Implement CreateWorkThread() using CreateThread()[ref`
			`site=MSDN`
			`page="CreateThread function"`
			`url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms682453.aspx][ref`
			`site=MSDN`
			`page="ThreadProc callback function"`
			`url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms686736.aspx] and introduce WorkerThread()]`
			`[1:27:08][Run to see that our speed has improved dramatically]`
			`[1:28:25][ray.cpp: Make LockedAddAndReturnPreviousValue() call InterlockedExchangeAdd64()[ref`
			`site=MSDN`
			`page="InterlockedExchangeAdd function"`
			`url=https://msdn.microsoft.com/en-us/library/windows/desktop/ms683597.aspx]]`
			`[1:29:27][Run to see our 5x speed-up]`
			`[1:30:24][ray.cpp: Fuss with the CoreCount and Tile sizes to see if they affect the speed]`
			`[1:32:38][Blackboard: Drain-out]`
			`[1:35:23][ray.cpp: Introduce GetCPUCoreCount()[ref`
			`site="MSDN"`
			`page="SYSTEM_INFO structure"`
			`url="https://msdn.microsoft.com/en-us/library/windows/desktop/ms724958.aspx"]]`
			`[1:39:18][win32_ray.cpp: Pull in Windows-specific functions from ray.cpp]`
			`[1:42:29][build.bat: Prevent our program from running if compilation fails]`
			`[1:47:04][ray.cpp: Crank up the RaysPerPixel from 16 to 512 and view our smooth image]`
			`[1:48:20][Run our program on the command line]`
			`[1:49:32][ray.h: Add RaysPerPixel and MaxBounceCount to work_queue]`
			`[1:52:11][Run to see all the information we need, and note that the next step will be SIMD]`
Fix some incorrectly converted annotations Also apply some :speech categorisation 2018-03-07 21:48:09 +00:00			`[1:53:34][Q&A][:speech]`
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[1:53:45][@dautor][Could you please make a portal (changing ray position and orientation upon hitting the portal to come out of the second portal)?]`
			`[1:54:00][@syanoks][Do you assume the x86 memory model in your code?]`
			`[1:54:27][@dautor][What about const volatile and register together in a single declaration? (I saw it once in an implementation for a BLE stack)]`
			`[1:55:25][ray.cpp: Log stats to stderr]`
			`[1:56:43][@macielda][What is the difference between Path Tracing and Ray Tracing?]`
			`[1:59:24][@LongBoolean][Now that the raysPerPixel and maxBounceCount are parameters, you could potentially give different tiles different settings, to compare quality side by side on the same image?]`
			`[1:59:44][@siltnamis][What do you think about OpenMP?]`
			`[1:59:48][@3dextended][What is SIMD \[sic\]?]`
			`[2:01:05][@seventh_chord][Do you get effects like bloom / lens flare "for free" if you implement more realistic camera stuff?]`
			`[2:02:46][@butwhynot1][Isn't the uniform random sampling bad? I thought you need to use blue noise or Poisson-Disc sampling or something?]`
			`[2:03:40][@syanoks][ARM memory model is extremely relaxed, though]`
			`[2:03:56][@garryjohanson][So you mentioned that hyperthreading can schedule instructions on separate ALUs of the same core. Did I hear that correctly or did I misconstrue what you said in my mind?]`
			`[2:06:30][@ttbjm][Would you change anything if you were using a Ryzen / Threadripper CPU that has 4 core core-complexes?]`
			`[2:07:20][@macielda][Can you give a quick overview of how Importance Sampling works?]`
			`[2:07:25][@ray_caster][Why were you trying to kill me? I was working very hard]`
			`[2:07:36][@dautor][Could you maybe do a GJK implementation on a test stream some time?]`
			`[2:07:45][@thisdrunkdane][Is AVX2 common enough to use it in, like, a game?]`
			`[2:08:27][@gordolani][Will the raytracer program in the future use GPU to do the ray traces?]`
			`[2:08:31][@y_ah][Where can we find the code?]`
			`[2:08:46][@3dextended][Wouldn't it be more sufficient to make an entry for each ray to split up the work even more evenly and also this could work on the GPU?]`
			`[2:09:16][@jim0_o][How much different is this raytracer to those with "live previews" where you see the image get better starting low res and goes higher and higher? (I assume they don't render line by line?)]`
			`[2:10:23][@LongBoolean][Are there any things that you need to do to take extra advantage of hyperthreading CPUs or do you treat threading the same as a non-hyperthreading CPU?]`
			`[2:11:33][The ARM memory model[ref`
			`site="Paul E. McKenney"`
			`page="Is Parallel Programming Hard, And, If So, What Can You Do About It?"`
Fix some incorrectly converted annotations Also apply some :speech categorisation 2018-03-07 21:48:09 +00:00			`url="https://arxiv.org/pdf/1701.00854.pdf"]][:research]`
			`[2:18:50][That's it for today][:speech]`
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[/video]`