[video output=day125 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Abstracting the Work Queue" vod_platform=youtube id=ZAZV_PGlQ0s annotator=Miblo annotator=dspecht]
[0:07][We are absolute control freaks here, people][quote 99]
[1:25][Recap and set the stage for today]
[3:46][win32_handmade.cpp: Introduce DoWorkerWork]
[5:43][Let our normal thread do work]
[6:55][Run and see what the threads are doing]
[7:37][Follow the compression oriented programming approach]
[9:10][handmade_render_group.cpp: Figure out a way to do TiledRenderGroupToOutput on multiple threads]
[10:38][handmade_platform.h: Consider pulling in work_queue_entry]
[11:54][win32_handmade.cpp: Rewrite PushString as AddWorkQueueEntry]
[15:18][Note the necessity of _mm_sfence]
[16:13][Pull work_queue_entry down into the test code]
[17:18][Split DoWorkerWork in two]
[20:34][Think]
[21:18][Put while(EntryCount != EntryCompletionCount) into QueueWorkStillInProgress]
[22:49][Rename and finish writing these functions]
[30:48][Compile and run and see what the threads are doing]
[31:14][Discuss our options]
[34:31][Rename GetNextWorkQueueEntry to CompleteAndGetNextWorkQueueEntry and make it take work_queue_entry Completed]
[35:29][Rearrange ThreadProc slightly]
[37:24][Massage DoWorkerWork]
[37:51][Tweak the QueueWorkStillInProgress loop]
[38:55][Compile and consider removing one more call]
[40:12][Go for it and make the work_queue two separate things]
[44:22][Run this again]
[45:19][handmade_platform.h: Hoist these functions in]
[46:01][Think about this a little bit more]
[47:11][handmade_render_group.cpp: Write the usage code first]
[50:06][Compile and express hate for const][quote 100]
[50:40][Finish writing TiledRenderGroupToOutput]
[54:58][Compile and run and crash][quote 101]
[55:22][Moment of realisation: Gotta increment by the correct value]
[55:32][Recap and glimpse into the multithreaded future]
[56:09][Q&A][:speech]
[57:35][@BrainCruser][Will you start new threads for every queue that you make?]
[59:29][@niegrfiegr0][Still don't understand the use of volatile and memory barrier]
[59:53][Blackboard: Memory and Code Fences]
[1:07:46][@kelimion][Can Entry.IsValid be removed and replaced with a test to see if Entry.Data != NULL?]
[1:08:04][@kil4h][What is your take on Naughty Dog's approach using fibers (+ manual management) and thread affinity to core instead of using classic worker / job approach for multithreaded gameplay?]
[1:08:23][@robrobby][The work queue will take any function to do it multithreaded? Does the function need to be special so that this will work?]
[1:08:53][@waterlimon][Please write a lock free queue, even though I don't know what those are and if you used one]
[1:09:33][@boogie0815][How many CPU cycles does spawning a thread cost? Or better: what's the minimum amount of cycles to work in 2 threads to gain speed?]
[1:10:45][@gasto5][I don't understand why you call it a queue if it is done potentially simultaneously]
[1:11:46][@flyingwafflenyc][Wasn't there already a bit of thread-related code in the win32 file?]
[1:12:04][@waterlimon][Will you add a cool graph over time that shows what task (e.g. from which subsystem) each thread is working on at each moment?]
[1:14:19][@zuurr_][Is false sharing between the entries in the work queue potentially problematic (from a performance standpoint)?]
[1:14:34][@popcorn0x90][Does volatile clear the assembly registers by pushing them into the stack and then restore by popping?]
[1:14:56][@grubuck][Why would you want a compiler fence and not a process fence, and vice versa?]
[1:15:41][@Pseudonym73][Shouldn't _mm_fence() imply a compiler fence? Surely there's no point otherwise...]
[1:16:44][@thordura][Have you implemented friction?]
[1:16:54][@jameswidman][So, thread management is a bit like memory management (in that you want to set it up ahead of time rather than allocating them on-demand)]
[1:17:57][@robrobby][The code to ask how many threads will be done simultaneously by the processor is to be added?]
[1:18:35][@alephant][Is it possible that a work queue entry spawn another work queue entry?]
[1:19:26][@zuurr_][(Not an expert at all, which was part of why I asked the question) False sharing causes the processor to skip the cache when different threads access stuff on the same cache line]
[1:21:41][@Pseudonym73][Actually, I can think of one use case for a compiler fence without a memory fence: writing to CPU special registers like control registers or MSRs]
[1:22:34][@kemosabe76][Getting back into C/C++ coding after many years. Don't know why you are mixing C-style structs and C++ structs?]
[1:22:52][Close things down][:speech]
[/video]