[1:11:32][@kknewkles][How do you cover multiple CPU technologies intrinsic-wise? Preprocessor switches on dedicated intrinsics for each? Also, whom to read on ASM? I'm thinking Mike Abrash?]
[1:13:09][@houb_][We have come from 385 cycles to ~123. Does something like the 80%-20% rule apply? Do you think we will get down to 50 cycles?]
[1:15:22][@maexono][The way we use mmSquare, does it calculate the argument twice?]
[1:15:41][Debugger: Determine if the compiler is doing common subexpression elimination for these multiplies]
[1:26:19][@cvaucher][Where do OpenCL and other GPGPU frameworks fit into optimization? It seems like if something is SIMD-able, it could just be done wider on a GPU. Are there workloads that are better suited to the CPU and SIMD?]
[1:29:06][@garlandobloom][We have optimizations still on?]
[1:29:19][@gasto5][Why are there optimizing options in the compiler if one will end up typing SIMD functions?]
[1:31:01][@quylthulg][Do you know of the _mm_setr_ps intrinsic (and _pd etc) - note the r in setr? It loads the values in reverse order, i.e. in the order that is more intuitive]
[1:31:38][@garlandobloom][When do you think we will thread the renderer?]
[1:31:57][@goodoldmalk][Possibly misguided question, is there a way to overload operators to use SIMD instructions instead?]
[1:32:45][@digitaldomovoi][Is padding and alignment still something you have to concern yourself with? I remember doing SIMD in the mid 2000s, and SIMD was essentially worthless (much of the time) if your data wasn't aligned]
[1:33:43][@digitaldomovoi][Addendum: By "concern yourself", I mean, is it something the compiler now handles more autonomously when you "engage" SIMD]
[1:34:15][@kil4h][Will you generate asm for NEON (if you port to arm of course)? GCC seems to be pretty bad at generating correct code with intrinsics (from my experience on Android)]
[1:35:03][@culver_fly][How would you know if doing something will speed up the code? Especially when it's a fairly large change to the codebase and when time is limited, I find myself reluctant to perform such optimizations in fear of introducing bugs]
[1:36:46][@miblo][What do you think you'll next want to convert to SIMD, in case I want to practise over the weekend?]
[1:38:52][@flaturated][Can you compile it -Od and show how SIMD has helped there?]
[1:39:32][@kknewkles][Would it be a good exercise (albeit a large one) to study a simple CPU and write some soft for it? Arduino or something ancient? I wanted to learn coding for GBA for a while]
[1:41:04][@kknewkles][Let's rephrase: what CPU would you advise to study that would be simple enough yet representative enough of the general stuff you should know about when working with CPUs?][quote 87]
[1:42:52][@theitchyninja][How long have you been working on this and when do you think you will finish?]
[1:43:29][@gasto5][re you going to optimize gameplay code as well?]
[1:43:45][@houb_][Have you heard of the JayStation2 Project from Jaymin Kessler, working with the Raspberry Pi 2 B+?]