[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Abstracting the Work Queue" vod_platform=youtube id=ZAZV_PGlQ0s annotator=Miblo annotator=dspecht] [0:07][We are absolute control freaks here, people][quote 99] [1:25][Recap and set the stage for today] [3:46][win32_handmade.cpp: Introduce DoWorkerWork] [5:43][Let our normal thread do work] [6:55][Run and see what the threads are doing] [7:37][Follow the compression oriented programming approach] [9:10][handmade_render_group.cpp: Figure out a way to do TiledRenderGroupToOutput on multiple threads] [10:38][handmade_platform.h: Consider pulling in work_queue_entry] [11:54][win32_handmade.cpp: Rewrite PushString as AddWorkQueueEntry] [15:18][Note the necessity of _mm_sfence] [16:13][Pull work_queue_entry down into the test code] [17:18][Split DoWorkerWork in two] [20:34][Think] [21:18][Put while(EntryCount != EntryCompletionCount) into QueueWorkStillInProgress] [22:49][Rename and finish writing these functions] [30:48][Compile and run and see what the threads are doing] [31:14][Discuss our options] [34:31][Rename GetNextWorkQueueEntry to CompleteAndGetNextWorkQueueEntry and make it take work_queue_entry Completed] [35:29][Rearrange ThreadProc slightly] [37:24][Massage DoWorkerWork] [37:51][Tweak the QueueWorkStillInProgress loop] [38:55][Compile and consider removing one more call] [40:12][Go for it and make the work_queue two separate things] [44:22][Run this again] [45:19][handmade_platform.h: Hoist these functions in] [46:01][Think about this a little bit more] [47:11][handmade_render_group.cpp: Write the usage code first] [50:06][Compile and express hate for const][quote 100] [50:40][Finish writing TiledRenderGroupToOutput] [54:58][Compile and run and crash][quote 101] [55:22][Moment of realisation: Gotta increment by the correct value] [55:32][Recap and glimpse into the multithreaded future] [56:09][Q&A] [57:35][@BrainCruser][Will you start new threads for every queue that you make?] [59:29][@niegrfiegr0][Still don't understand the use of volatile and memory barrier] [59:53][Blackboard: Memory and Code Fences] [1:07:46][@kelimion][Can Entry.IsValid be removed and replaced with a test to see if Entry.Data != NULL?] [1:08:04][@kil4h][What is your take on Naughty Dog's approach using fibers (+ manual management) and thread affinity to core instead of using classic worker / job approach for multithreaded gameplay?] [1:08:23][@robrobby][The work queue will take any function to do it multithreaded? Does the function need to be special so that this will work?] [1:08:53][@waterlimon][Please write a lock free queue, even though I don't know what those are and if you used one] [1:09:33][@boogie0815][How many CPU cycles does spawning a thread cost? Or better: what's the minimum amount of cycles to work in 2 threads to gain speed?] [1:10:45][@gasto5][I don't understand why you call it a queue if it is done potentially simultaneously] [1:11:46][@flyingwafflenyc][Wasn't there already a bit of thread-related code in the win32 file?] [1:12:04][@waterlimon][Will you add a cool graph over time that shows what task (e.g. from which subsystem) each thread is working on at each moment?] [1:14:19][@zuurr_][Is false sharing between the entries in the work queue potentially problematic (from a performance standpoint)?] [1:14:34][@popcorn0x90][Does volatile clear the assembly registers by pushing them into the stack and then restore by popping?] [1:14:56][@grubuck][Why would you want a compiler fence and not a process fence, and vice versa?] [1:15:41][@Pseudonym73][Shouldn't _mm_fence() imply a compiler fence? Surely there's no point otherwise...] [1:16:44][@thordura][Have you implemented friction?] [1:16:54][@jameswidman][So, thread management is a bit like memory management (in that you want to set it up ahead of time rather than allocating them on-demand)] [1:17:57][@robrobby][The code to ask how many threads will be done simultaneously by the processor is to be added?] [1:18:35][@alephant][Is it possible that a work queue entry spawn another work queue entry?] [1:19:26][@zuurr_][(Not an expert at all, which was part of why I asked the question) False sharing causes the processor to skip the cache when different threads access stuff on the same cache line] [1:21:41][@Pseudonym73][Actually, I can think of one use case for a compiler fence without a memory fence: writing to CPU special registers like control registers or MSRs] [1:22:34][@kemosabe76][Getting back into C/C++ coding after many years. Don't know why you are mixing C-style structs and C++ structs?] [1:22:52][Close things down] [/video]