[1:02:03][@grumpygiant256][Could you not just align the X coord to a 4-pixel boundary up front, and thereby use aligned loads and stores?]
[1:03:03][@garlandobloom][Are you pulling this code over into ground splats soon?]
[1:05:15][@ostrovskivlad][Is it me or after this whole SIMD conversion the cycles per pixel are much more consistent?]
[1:05:44][@ifingerbangedurcat][I have kind of missed the past few days, I'm wondering if doing CPU intrinsics exclusively for SSE2 in your game code is bad or are we targetting SSE2? For example, should we wrap everything into platform-specific files so its easier to target other platforms?]
[1:08:35][@flyingsand][What does it mean for intrinsics that don't have a specified throughput?]
[1:08:51][@kelimion][Instead of loading the destination first would it be faster to skip that and instead do a masked write e.g. _mm_maskmoveu_si128]
[1:11:56][@tobeypeters][Would it be a good idea to just use SIMD for all our math operations in all our programs?]
[1:15:36][@flyingsand][Example of an intrinsic with no throughput: _mm_cmpgt_ps]
[1:21:00][@grumpygiant][Agner Fog says the throughput is 1]
[1:22:16][@mrstone56][\[What is latency vs throughput?\]]
[1:22:46][@themarsala][What is the end goal of the optimization, trying to get below a certain threshold, or just to get everything converted?]
[1:23:54][@tobeypeters][Does size of variables and stuff matter to SIMD, like 32bit vs 64bit?]
[1:25:45][@hellotanjent][Is the SSE code doing any cache prefetch or hinting stuff yet?]
[1:27:12][@allaizn][Couldn't we use a half-float instead of floats as we don't need that much precision with only 255 discrete values?]
[1:28:50][@ttbjm][Is the normal map code going to be converted to SIMD?]