[1:04:30][@thesizik][Would it be faster to unpack pixels using a union of an int32 with a struct of 4 int8's, instead of doing 4 shifts and masks per pixel?]
[1:05:15][@houb_][Why don't we go: Y<2 and X<2 and go through in blocks, instead of a line?]
[1:07:44][@culver_fly][Is it better if we calculate if the pixel should be filled and queue it up and only do the calculations once we hit 4 of them?]
[1:10:45][@hmh_bot][Casey was using a Das Keyboard 4, but it broke, so he is currently using an unknown keyboard he had lying around]
[1:11:30][@hguleryuz][Sorry, maybe this is off-topic: Would it be correct to say anyone coding in Java, by default, is not making use of any of the SIMD stuff, or do you think the JIT compiler is smart enough to make use of it in certain circumstances, maybe with some analysis of the bytecode?]
[1:12:29][@guit4rfreak][How often do you optimize for cache misses vs optimizing with SIMD? I got the impression that cache misses are by far the most important things to look out for]
[1:14:40][@culver_fly][Please send my best regards to Jeff]
[1:14:52][@sharlock93][Schedule-wise, how many more weeks until you are done with optimization of the renderer?]
[1:15:01][@ray_caster][Will you be covering Morton order texture swizzling?]
[1:16:54][@dr_fubar][Possibly a noob Q: Have you ever run into problems with floating point arithmetic, and what are some good approaches to avoiding those problems?[ref
author="Forman S. Acton"
title="Real Computing Made Real"
isbn=9780691036632][ref
author="Forman S. Acton"
title="Numerical Methods that Work"
isbn=9780883854501]]
[1:21:37][starchypancakes \[...\] Casey said SSE2 was standard, I guess I'll start there[ref
site="Valve"
page="Steam Hardware & Software Survey"
url="https://store.steampowered.com/hwsurvey"]]
[1:24:06][@houb_][Is there a way to track how memory gets stored to cache?[ref
[1:28:01][@hguleryuz][Off-topic: Do you know if JAI will have extensions / a method for using SIMD?]
[1:28:50][@xaitra][How much do you need to think about the intrinsic instructions while programming, or does the compiler usually take care of that? Is this the big difference between using GNU and Intel compiler, for example?]
[1:30:37][@ray_caster][I think he's essentially asking how proficient compilers are at automatically emitting SIMD instructions[ref
site="LLVM"
page="Auto-Vectorization in LLVM"
url="http://llvm.org/docs/Vectorizers.html"]]
[1:33:54][@rooctag][Do you have to take the instruction cache into account? Or is it large enough?]
[1:34:39][@goodoldmalk][How does intrinsics and parallel processing work together? Does each CPU have registers to do intrinsics? If so, could we increase X-fold the number of pixel rendering in our code if we computed in parallel?]