[video output=day112 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="A Mental Model of CPU Performance" vod_platform=youtube id=qin-Eps3U_E annotator=dspecht annotator=Miblo] [2:17][Blackboard: Optimization] [3:58][Blackboard: CPU + GPU instructions] [5:28][Blackboard: Math operations done wide (SIMD)] [8:26][Blackboard: An example instruction] [10:10][Blackboard: Issuing an instruction is expensive] [13:13][Blackboard: Optimization considerations] [15:56][Blackboard: Memory access costs] [17:54][Blackboard: Cycles] [22:05][Blackboard: You should always know how many cycles you have to work with] [24:37][Blackboard: You won't always have all cycles available for use] [25:57][Blackboard: What is a cycle?] [31:38][Blackboard: Pipeline stages] [34:36][Blackboard: Why pipeline? (Doing the laundry)] [39:41][Blackboard: Latency and Throughput] [43:53][Blackboard: Where latency causes us a problem] [48:33][Blackboard: Cache miss] [51:01][Blackboard: Hyperthreading] [52:42][Blackboard: Optimization, the platform] [55:07][Blackboard: So that is optimization][quote 83] [55:25][Blackboard: Efficiency] [59:30][Q&A][:speech] [1:00:13][@atomiclich][Would you be willing to make more blackboard episodes? This is very informative] [1:00:46][@grumpygiant256][Are you going to be using anything like VTune for measuring performance?] [1:01:26][@bakeheart][How are instructions written in cache memory?] [1:02:33][@d7samurai][Do we manually issue prefetching or is that something inferred by the CPU by looking at how we access memory?] [1:05:08][@childz][I know this is a long way off, but after Handmade Hero is done, do you plan to continue educational streams?] [1:05:34][@andsz_][How often do you estimate the actual amount of work prior to implementing a feature vs just implementing it and measuring it?] [1:08:59][@snobrdr97][So if memory takes a few hundred cycles if the instructions have to reach out to the hard drive, what impact would that have?] [1:11:14][@starchypancakes][Two questions: 1) Are there ever any cases where we have to worry about one of our instructions being decoded into multiple microcode instructions without our knowledge?] [1:13:16][@starchypancakes][2) In optimizing, have you set up the code in such a way that you can optimize things function-by-function with this eventuality in mind, or will we have to restructure some of the functions to allow them to be optimized?] [1:16:37][@hyco24][Would it be inefficient to offload the cache to an SSD over/or with minimal RAM usage or would the latency be too much?] [1:17:32][@vertex_]["Premature Optimization is the root of all evil." What's your take on that quote?] [1:19:28][@noxy_key][Is there any way to use or avoid hyperthreading to your advantage?] [1:19:44][@zjadekkarenvae][What do you tell someone who doesn't like emacs?] [1:20:39][@bakeheart][Does hyperthreading reduce maximum bandwidth because it has to switch between states, or can both states operate at the same time?] [1:21:20][@quatzequatel][In your experience what drives the "good enough" optimization and how do the novice guys get a handle on that?] [1:24:00][Wrap things up][:speech] [/video]