[1:47:37][Try to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:49:16][Inspect the assembly of RayCast() to still see no cmov instructions][:asm :lighting :optimisation :run]
[1:49:27][Try again to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:50:15][Inspect the assembly of RayCast() to see one cmov instruction][:asm :lighting :optimisation :run]
[1:50:41][Try again to make the compiler generate a cmov for the conditional box pushing code in RayTest()][:language :lighting :optimisation]
[1:50:52][Inspect the assembly of RayCast() to see no further cmov instructions][:asm :lighting :optimisation :run]
[1:51:55][Consult the Intel Intrinsics Guide[ref
site=Intel
page="Intel Intrinsics Guide"
url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/] for mask instructions][:research :simd]
[1:53:53][Try making RayCast() determine in :SIMD if it should push a box][:lighting :optimisation]
[1:58:19][Inspect the assembly of RayCast() to still see no further cmov instructions][:asm :lighting :optimisation :run]
[1:59:44][Try changing RayCast() to store off the StackAt to write back to the BoxStack][:lighting :optimisation]
[2:00:30][Inspect the assembly of RayCast() to see cmov instructions][:asm :lighting :optimisation :run]
[2:01:02][hhlightprof total seconds elapsed: 7.496200][:lighting :optimisation :run]
[2:01:33][Toggle RayCast() back to determine in scalar if it should push a box][:lighting :optimisation]
[2:01:41][Inspect the assembly of RayCast() to see our dreaded jmp instructions][:asm :lighting :optimisation :run]
[2:01:57][Try making RayCast() set ShouldPush using a bitwise, rather than a conditional, OR][:lighting :optimisation]
[2:02:22][Inspect the assembly of RayCast() to see cmov instructions][:asm :lighting :optimisation :run]
[2:02:31][Consider our ShouldPush setting, in terms of OR'ing][:lighting :optimisation :research]
[2:04:12][hhlightprof total seconds elapsed: 7.360275][:lighting :optimisation :run]
[2:04:35][Inspect the assembly of RayCast()][:asm :lighting :optimisation :run]
[2:05:20][Try making RayCast() compute ShouldPush bitwise OR'ing and AND'ing only tInside, Mask and CloseEnough][:lighting :optimisation]
[2:07:01][Inspect the assembly of RayCast()][:asm :lighting :optimisation :run]
[2:07:23][hhlightprof total seconds elapsed: 7.307276][:lighting :optimisation :run]
[2:07:40][Note that it seems cheaper to jmp than cmov][:lighting :optimisation :research]
[2:08:27][Try making RayCast() compute ShouldPush using conditional tests][:lighting :optimisation]
[2:08:47][hhlightprof total seconds elapsed: 7.253258][:lighting :optimisation :run]
[2:08:56][Revert RayCast() to the original box pushing code][:lighting :optimisation]
[2:09:29][hhlightprof total seconds elapsed: 6.878540][:lighting :optimisation :run]
[2:09:45][Save off our jmp and cmov versions of the box pushing code in RayCast()][:lighting :optimisation]
[2:12:16][hhlightprof total seconds elapsed: 6.888590][:lighting :optimisation :run]
[2:12:31][Toggle RayCast() to the slower cmov box pushing code][:lighting :optimisation]
[2:12:41][hhlightprof total seconds elapsed: 7.079671][:lighting :optimisation :run]
[2:13:11][Q&A][:speech]
[2:13:24][Realise why the cmov version isn't faster][:lighting :optimisation :research]
[2:14:56][Q&A][:speech]
[2:15:09][@vaualbus][Q: I think the MSDN you were looking at is SetFileInformationByHandle, maybe?]
[2:16:00][@yurasniper][Q: Logical || and && are short-circuited, so they will always have a jump, unless compiler can figure out some property that will allow it to collapse the expression. So to avoid jumps one should use bitwise | and & if possible. But also there were some people saying that cmov is worse than a jump over a few instructions. I think LLVM people, but I may be wrong]
[2:16:35][@somebody_took_my_name][Q: Can't you just use CloseEnough instead of CloserCloseEnough in the ShouldPush assignment? CloseEnough is already AND'd with Mask. Oh, and the assignment of StackY is busted. There is a 0, 1 instead of 2, 3 at the end][:lighting :optimisation]
[2:16:59][Fix the StackY setting in RayCast() and toggle to the faster jmp box pushing code][:lighting :optimisation :simd]
[2:17:37][hhlightprof total seconds elapsed: 6.963080][:lighting :optimisation :run]
[2:18:20][Toggle RayCast() to the slower, :SIMD cmov box pushing code][:lighting :optimisation]
[2:18:27][hhlightprof total seconds elapsed: 7.533641][:lighting :optimisation :run]
[2:18:39][Toggle RayCast() to the faster, scalar cmov box pushing code][:lighting :optimisation]
[2:18:44][hhlightprof total seconds elapsed: 7.265596][:lighting :optimisation :run]
[2:19:00][@somebody_took_my_name][Q: Can't you just use CloseEnough instead of CloserCloseEnough in the ShouldPush assignment? CloseEnough is already AND'd with Mask][:lighting :optimisation]
[2:19:31][Remove the superfluous CloserCloseEnough from RayCast()][:lighting :optimisation]
[2:20:17][hhlightprof total seconds elapsed: 7.195799][:lighting :optimisation :run]
[2:20:29][Toggle RayCast() to the faster jmp box pushing code][:lighting :optimisation]
[2:20:45][hhlightprof total seconds elapsed: 6.874347][:lighting :optimisation :run]
[2:21:00][Make a note to try pushing boxes using a circular buffer][:lighting :memory :optimisation]
[2:22:17][@somebody_took_my_name][Q: Somebody in chat had the idea of loading the ray caster through the dll for testing. Could this remove the floating point errors in the test code?][:lighting]
[2:23:25][@tjom2000][Q: Is it possible to structure the game code so it would run reasonably fast in debug mode? Would that be worth the hassle?]
[2:23:53][@emperormetallix][Q: Will you come to the dark side and try const?][:language]
[2:24:40][@martinsmemory][Q: When you dropping that low level course?]
[2:24:51][@vaualbus][Q: Why have you uploaded the last [~hero Handmade Hero] episode on Molly Rocket's YouTube account?]
[2:25:47][@isfoo][Q: MSVC is very often not optimizing away obvious things (basically you cannot do so-called zero cost abstractions with it). For example it always does short circuiting.[ref
site="Compiler Explorer"
page=oVz2i7
url=https://godbolt.org/z/oVz2i7] Or from experience, I also remember it sometimes calling empty constructors. Why not use some reasonable compiler like clang / gcc / icc?][:language]
[2:26:58][@x1bzzr][Q: In one of the first episodes of [~hero Handmade Hero] you mentioned maybe it would be a good idea to recreate the window if WM_DESTROY was caught in the window procedure. In what sort of scenario does that happen?]
[2:27:48][A few words on the C++ spec effectively preventing the optimising compiler from using const][:language :speech]
[2:28:08][@emperormetallix][Q: These optimisations seem to be very low level. How do you know when it is worth going to this level vs zooming out to examine the overall algorithm, or memory layout, data volume, etc?][:optimisation]
[2:30:20][@vaualbus][Q: Will you be in [@naysayer88 Jon]'s talk today? It's already started! Let's go there after]
[2:30:30][@x1bzzr][Q: I guess what I'm curious about is why you shouldn't just terminate the application if you get WM_DESTROY in the window procedure]
[2:30:42][@martinsmemory][Q: How many hours do you work a week normally?]
[2:31:38][@maavelar][Q: Has your RSI problem improved from a couple of years back? If so, what helped you?][:health]
[2:32:00][@internationalizationist][Q: Is it possible to not call VirtualAlloc at all? You can create global array of bytes of whatever size you want and point Persistent / Transient storages to those arrays. Global storage should go to BSS (is it?), and the operating system must allocate enough :memory at startup time (and you already allocated a determined amount of space (so you know how much it is at compile time))]
[2:32:32][@yakvie][@handmade_hero Hey [@cmuratori Casey], I was wondering what's the current plan for [~hero Handmade Hero]? Will you be releasing pieces of this code to the public domain?]
[2:32:47][@emperormetallix][Q: What do you use instead of Ctrl / Alt keys? Vim controls?]
[2:33:38][@internationalizationist][Q: Could you stream your daily working process( of 1935, for example) sometime?]
[2:33:52][@zzyzzyxx][Here's an example where adding const with clang induced some optimization (minutes 27–28)[ref
author=CppCon
publisher=YouTube
title="CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"