[video output=day144 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SSE Mixer Pre and Post Loops" vod_platform=youtube id=l3zbzEYRLJc annotator=ZedZull annotator=Miblo annotator=debiatan]
[00:02:11][Plan for today: SIMDizing the mixer]
[00:03:41][Aligning the temporary buffer]
[00:05:00][Making sure the temporary sound buffers are big enough to fit all samples]
[00:05:29][Explanation of Align16]
[00:06:23][Alignment macro for any power of two: AlignPow2]
[00:11:17][Clamping samples to the signed 16-bit integer range]
[00:18:09][(intermission) Two's complement]
[00:34:44][Back to SIMD]
[00:36:48][Rounding the samples]
[00:37:37][Downconverting from 32-bit to 16-bit integers. No clamping necessary!]
[00:39:54][Looking for intrinsics that interleave 16-bit values]
[00:44:18][Interleaving the samples before packing them]
[00:47:27][Making sure we don't write out of bounds]
[00:49:00][Debugging output using structured input]
[00:52:50][Padding the buffer in the platform layer to make sure we always have space for overwrites]
[00:54:20][Casey remembers that the horizontal mouse position was linked to music panning]
[00:54:52][Getting rid of unnecessary clamping operations]
[00:55:45][Using aligned loads and stores]
[00:57:24][Plan for next episode]
[01:01:30][More 2s complement. Full example]
[01:11:30][Q&A][:speech]
[01:11:37][@cubercaleb][Why isn't 2's complement used for floating-point numbers if it makes signed arithmetic easy?]
[01:16:35][@poohshoes][Are you not going to profile it too see how much faster it gets?]
[01:16:55][@dr_s80][When you implemented streaming in chunks of audio; I believe the code actually loads the entire file (with a platform layer VirtualAlloc) for each chunk. Is this just an artifact of the debug nature of that code?]
[01:17:33][@ishytarus][Does the audio make the framerate in debug mode?]
[01:26:09][@cubercaleb][If 1111 (-1) is supposed to be less than 0000 (0) then how do number comparisons work on the CPU level?]
[01:32:39][@marumoto][Do you have any tips for speeding up compile time when using multiple translation units?]
[01:32:55][@sssmcgrath][It's movsx for signed]
[/video]