[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="SSE Mixer Pre and Post Loops" vod_platform=youtube id=l3zbzEYRLJc annotator=ZedZull annotator=Miblo annotator=debiatan] [00:02:11][Plan for today: SIMDizing the mixer] [00:03:41][Aligning the temporary buffer] [00:05:00][Making sure the temporary sound buffers are big enough to fit all samples] [00:05:29][Explanation of Align16] [00:06:23][Alignment macro for any power of two: AlignPow2] [00:11:17][Clamping samples to the signed 16-bit integer range] [00:18:09][(intermission) Two's complement] [00:34:44][Back to SIMD] [00:36:48][Rounding the samples] [00:37:37][Downconverting from 32-bit to 16-bit integers. No clamping necessary!] [00:39:54][Looking for intrinsics that interleave 16-bit values] [00:44:18][Interleaving the samples before packing them] [00:47:27][Making sure we don't write out of bounds] [00:49:00][Debugging output using structured input] [00:52:50][Padding the buffer in the platform layer to make sure we always have space for overwrites] [00:54:20][Casey remembers that the horizontal mouse position was linked to music panning] [00:54:52][Getting rid of unnecessary clamping operations] [00:55:45][Using aligned loads and stores] [00:57:24][Plan for next episode] [01:01:30][More 2s complement. Full example] [01:11:30][Q&A][:speech] [01:11:37][@cubercaleb][Why isn't 2's complement used for floating-point numbers if it makes signed arithmetic easy?] [01:16:35][@poohshoes][Are you not going to profile it too see how much faster it gets?] [01:16:55][@dr_s80][When you implemented streaming in chunks of audio; I believe the code actually loads the entire file (with a platform layer VirtualAlloc) for each chunk. Is this just an artifact of the debug nature of that code?] [01:17:33][@ishytarus][Does the audio make the framerate in debug mode?] [01:26:09][@cubercaleb][If 1111 (-1) is supposed to be less than 0000 (0) then how do number comparisons work on the CPU level?] [01:32:39][@marumoto][Do you have any tips for speeding up compile time when using multiple translation units?] [01:32:55][@sssmcgrath][It's movsx for signed] [/video]