[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=chat title="Modern x64 Architectures and the Cache" vod_platform=youtube id=tk5P7mt2fAw annotator=Miblo] [4:55][@vateferfout][@handmade_hero Hello, it's nice to finally be able to catch a stream live] [5:11][@ivereadthesequel][@handmade_hero Hey [@cmuratori Casey], it's my birthday today and I'm glad to catch some [~hero Handmade Hero] on it! Woo!] [5:48][Desire a sponsor for [~hero Handmade Hero]] [6:54][@simpalaxy][Q: You mentioned on Twitter being interested in a hiring process that includes an option for people to record themselves doing their normal programming work. Is there a way you think the industry could be led to feasibly adopt that?] [9:26][@ivereadthesequel][@handmade_hero Had you seen the RLM parody of those "nerdbox" services that send you some cheap stuff in a box every month?] [10:33][@culdevu][Q: I started my first programming job a couple months ago and have been thrown into an unfamiliar codebase a couple of times now. I wouldn't have said so previously, but now I'd say that the hardest part of learning a new codebase is the :threading. That stuff can get crazy if it's not thought out carefully beforehand. Thoughts?] [19:30][@blaster_junior][Q: Can you explain cache misses and how to avoid them? I'm coming from the Java world and never had to think about that][:hardware :performance] [22:07][Modern Caches][:blackboard :hardware :isa :memory :performance] [25:32][The structure and work of an x64 CPU][:blackboard :hardware :isa :memory :performance] [36:19][x64 Scheduler out-of-order processing][:blackboard :hardware :isa :performance :scheduling] [38:25][x64 Caches[ref site=WikiChip url=https://en.wikichip.org/wiki/WikiChip]][:blackboard :caching :hardware :isa :memory :performance] [47:23][Cache misses, and how to avoid them][:blackboard :caching :hardware :isa :memory :performance :scheduling] [57:16][IPC (Instructions Per Clock) vs. Cache Lines][:blackboard :caching :hardware :isa :memory :performance :scheduling] [1:08:35][Intel's undocumented L1 ← L2 "fill" penalty][:blackboard :caching :hardware :isa :memory :performance :scheduling] [1:10:59][Avoiding cache misses: 1) Learn cache sizes][:blackboard :caching :hardware :isa :memory :performance :scheduling] [1:12:58][Avoiding cache misses: 2) Organize for the cache][:blackboard :caching :hardware :isa :memory :performance :scheduling] [1:20:16][Avoiding cache misses: 3) Linear, simple access patterns (prefetching)][:blackboard :caching :hardware :isa :memory :performance :scheduling] [1:26:07][x64 Line buffers][:blackboard :hardware :isa :performance :scheduling] [1:31:04][Hyperthreading][:blackboard :hardware :isa :performance :scheduling] [1:31:58][Measuring cache utilisation with VTune or perf][:blackboard :hardware :isa :performance :scheduling] [1:33:58][@saidwho12][Q: Is there a way to transfer data to the cache manually?][:caching :hardware :isa :performance :scheduling] [1:34:39][Intel's PREFETCH instructions[ref site=Intel page="Intel Intrinsics Guide" url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref site=uops.info url=https://uops.info/table.html][ref site=Intel page="Intel® 64 and IA-32 Architectures Software Developer Manuals" url=https://software.intel.com/en-us/articles/intel-sdm]][:caching :hardware :isa :performance :research :scheduling] [1:41:33][Manual cache control in the Nintendo GameCube's Dolphin CPU and the Sony PlayStation 3][:caching :hardware :isa :performance :scheduling] [1:43:58][@saidwho12][Is 64 bytes the cache line size on every CPU?][:caching :hardware :performance :scheduling] [1:45:31][@sapper123][Q: Have you tested MeowHash on the new Zen2 processors?] [1:45:58][@printf_armin][@handmade_hero This is also important for DMA :memory][:caching :hardware :performance :scheduling] [1:47:09][@kkrabz][@handmade_hero How does the processor handle multiple programs running at the same time, in regards to the cache?][:caching :hardware :performance :scheduling] [1:49:21][The structure of a Zen CPU[ref site=Wikichip page="Zen - Microarchitectures - AMD" url=https://en.wikichip.org/wiki/amd/microarchitectures/zen]][:hardware :isa :memory :performance :research] [1:53:12][Yield and extreme ultraviolet lithography[ref site=TechPowerUp page="AMD Ryzen Threadripper 1900X Core Configuration Detailed" url=https://www.techpowerup.com/236680/amd-ryzen-threadripper-1900x-core-configuration-detailed][ref site=Wikipedia page="Extreme ultraviolet lithography" url=https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithography]][:fabrication :hardware :research] [1:55:06][Chip :fabrication[ref site="Taiwan Semiconductor Manufacturing Company Limited" url=https://www.tsmc.com/english/default.htm][ref site=AnandTech page="Intel Details Manufacturing through 2023: 7nm, 7+, 7++, with Next Gen Packaging" url=https://www.anandtech.com/show/14312/intel-process-technology-roadmap-refined-nodes-specialized-technologies] and transistor density[ref site=Wikipedia page="Transistor count" url=https://en.wikipedia.org/wiki/Transistor_count]][:blackboard :hardware] [2:10:11][Failure in chip :fabrication due to the sheer precision of the process[ref site=Wikipedia page="Extreme ultraviolet lithography" url=https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithography]][:hardware :research] [2:27:43][:Fabrication Yield and multiple cores][:blackboard :hardware] [2:37:14][Cache Between Cores: 1) NUMA (Non-Uniform Memory Access) architecture][:blackboard :caching :hardware] [2:43:38][Cache Between Cores: 2) MESI (Modified, Exclusive, Shared, Invalid) protocol[ref site=Wikipedia page="MESI protocol" url=https://en.wikipedia.org/wiki/MESI_protocol]][:blackboard :caching :hardware] [2:48:40][@cultofrig][Q: @handmade_hero 64-byte lines are pervasive due to the burst size of DDR controllers. And yes, Arm also moved from 32-byte to 64-byte][:fabrication] [2:49:25][@rroohhh][Q: The process for the silicon ingots is called Czochralski process][:fabrication] [2:49:37][@sanchopanzo][@handmade_hero Why don't they focus on increasing the cache sizes? Is there a hard limit to it or are they happy with the sizes as they are?][:fabrication] [2:51:05][@pythno][Q: Isn't wavelength and size of electron kind of the same? Just different models?] [2:51:17][@printf_armin][Q: Did have the privilege to wear one at Infineon. Really interesting experience] [2:51:35][@cubercaleb][Q: If the probability of failure for one chip is P(F), and the probability of failure for two combined chips is denoted P(FC), then P(FC) = 2 * P(F) - P(F) * P(F)][:fabrication :mathematics] [2:51:56][Failure probability of two combined chips][:blackboard :fabrication :mathematics] [3:00:48][@cubercaleb][Q: It's the union of either chip being a dud, minus the intersection of both chips being a dud (since this is already accounted for in the union)][:fabrication :mathematics] [3:01:38][@cubercaleb][Q: I over simplified. It should just be Bayes' theorem and these events should be independent, so just multiply the failure rate][:fabrication :mathematics] [3:02:53][Close this down] [/video]