[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=chat title="Modern x64 Architectures and the Cache" vod_platform=youtube id=tk5P7mt2fAw annotator=Miblo]
[4:55][@vateferfout][@handmade_hero Hello, it's nice to finally be able to catch a stream live]
[5:11][@ivereadthesequel][@handmade_hero Hey [@cmuratori Casey], it's my birthday today and I'm glad to catch some [~hero Handmade Hero] on it! Woo!]
[5:48][Desire a sponsor for [~hero Handmade Hero]]
[6:54][@simpalaxy][Q: You mentioned on Twitter being interested in a hiring process that includes an option for people to record themselves doing their normal programming work. Is there a way you think the industry could be led to feasibly adopt that?]
[9:26][@ivereadthesequel][@handmade_hero Had you seen the RLM parody of those "nerdbox" services that send you some cheap stuff in a box every month?]
[10:33][@culdevu][Q: I started my first programming job a couple months ago and have been thrown into an unfamiliar codebase a couple of times now. I wouldn't have said so previously, but now I'd say that the hardest part of learning a new codebase is the :threading. That stuff can get crazy if it's not thought out carefully beforehand. Thoughts?]
[19:30][@blaster_junior][Q: Can you explain cache misses and how to avoid them? I'm coming from the Java world and never had to think about that][:hardware :performance]
[22:07][Modern Caches][:blackboard :hardware :isa :memory :performance]
[25:32][The structure and work of an x64 CPU][:blackboard :hardware :isa :memory :performance]
[36:19][x64 Scheduler out-of-order processing][:blackboard :hardware :isa :performance :scheduling]
[38:25][x64 Caches[ref
    site=WikiChip
    url=https://en.wikichip.org/wiki/WikiChip]][:blackboard :caching :hardware :isa :memory :performance]
[47:23][Cache misses, and how to avoid them][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[57:16][IPC (Instructions Per Clock) vs. Cache Lines][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[1:08:35][Intel's undocumented L1 ← L2 "fill" penalty][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[1:10:59][Avoiding cache misses: 1) Learn cache sizes][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[1:12:58][Avoiding cache misses: 2) Organize for the cache][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[1:20:16][Avoiding cache misses: 3) Linear, simple access patterns (prefetching)][:blackboard :caching :hardware :isa :memory :performance :scheduling]
[1:26:07][x64 Line buffers][:blackboard :hardware :isa :performance :scheduling]
[1:31:04][Hyperthreading][:blackboard :hardware :isa :performance :scheduling]
[1:31:58][Measuring cache utilisation with VTune or perf][:blackboard :hardware :isa :performance :scheduling]
[1:33:58][@saidwho12][Q: Is there a way to transfer data to the cache manually?][:caching :hardware :isa :performance :scheduling]
[1:34:39][Intel's PREFETCH instructions[ref
    site=Intel
    page="Intel Intrinsics Guide"
    url=https://software.intel.com/sites/landingpage/IntrinsicsGuide/][ref
    site=uops.info
    url=https://uops.info/table.html][ref
    site=Intel
    page="Intel® 64 and IA-32 Architectures Software Developer Manuals"
    url=https://software.intel.com/en-us/articles/intel-sdm]][:caching :hardware :isa :performance :research :scheduling]
[1:41:33][Manual cache control in the Nintendo GameCube's Dolphin CPU and the Sony PlayStation 3][:caching :hardware :isa :performance :scheduling]
[1:43:58][@saidwho12][Is 64 bytes the cache line size on every CPU?][:caching :hardware :performance :scheduling]
[1:45:31][@sapper123][Q: Have you tested MeowHash on the new Zen2 processors?]
[1:45:58][@printf_armin][@handmade_hero This is also important for DMA :memory][:caching :hardware :performance :scheduling]
[1:47:09][@kkrabz][@handmade_hero How does the processor handle multiple programs running at the same time, in regards to the cache?][:caching :hardware :performance :scheduling]
[1:49:21][The structure of a Zen CPU[ref
    site=Wikichip
    page="Zen - Microarchitectures - AMD"
    url=https://en.wikichip.org/wiki/amd/microarchitectures/zen]][:hardware :isa :memory :performance :research]
[1:53:12][Yield and extreme ultraviolet lithography[ref
    site=TechPowerUp
    page="AMD Ryzen Threadripper 1900X Core Configuration Detailed"
    url=https://www.techpowerup.com/236680/amd-ryzen-threadripper-1900x-core-configuration-detailed][ref
        site=Wikipedia
        page="Extreme ultraviolet lithography"
        url=https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithography]][:fabrication :hardware :research]
[1:55:06][Chip :fabrication[ref
    site="Taiwan Semiconductor Manufacturing Company Limited"
    url=https://www.tsmc.com/english/default.htm][ref
        site=AnandTech
        page="Intel Details Manufacturing through 2023: 7nm, 7+, 7++, with Next Gen Packaging"
        url=https://www.anandtech.com/show/14312/intel-process-technology-roadmap-refined-nodes-specialized-technologies] and transistor density[ref
            site=Wikipedia
            page="Transistor count"
            url=https://en.wikipedia.org/wiki/Transistor_count]][:blackboard :hardware]
[2:10:11][Failure in chip :fabrication due to the sheer precision of the process[ref
        site=Wikipedia
        page="Extreme ultraviolet lithography"
        url=https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithography]][:hardware :research]
[2:27:43][:Fabrication Yield and multiple cores][:blackboard :hardware]
[2:37:14][Cache Between Cores: 1) NUMA (Non-Uniform Memory Access) architecture][:blackboard :caching :hardware]
[2:43:38][Cache Between Cores: 2) MESI (Modified, Exclusive, Shared, Invalid) protocol[ref
    site=Wikipedia
    page="MESI protocol"
    url=https://en.wikipedia.org/wiki/MESI_protocol]][:blackboard :caching :hardware]
[2:48:40][@cultofrig][Q: @handmade_hero 64-byte lines are pervasive due to the burst size of DDR controllers. And yes, Arm also moved from 32-byte to 64-byte][:fabrication]
[2:49:25][@rroohhh][Q: The process for the silicon ingots is called Czochralski process][:fabrication]
[2:49:37][@sanchopanzo][@handmade_hero Why don't they focus on increasing the cache sizes? Is there a hard limit to it or are they happy with the sizes as they are?][:fabrication]
[2:51:05][@pythno][Q: Isn't wavelength and size of electron kind of the same? Just different models?]
[2:51:17][@printf_armin][Q: Did have the privilege to wear one at Infineon. Really interesting experience]
[2:51:35][@cubercaleb][Q: If the probability of failure for one chip is P(F), and the probability of failure for two combined chips is denoted P(FC), then P(FC) = 2 * P(F) - P(F) * P(F)][:fabrication :mathematics]
[2:51:56][Failure probability of two combined chips][:blackboard :fabrication :mathematics]
[3:00:48][@cubercaleb][Q: It's the union of either chip being a dud, minus the intersection of both chips being a dud (since this is already accounted for in the union)][:fabrication :mathematics]
[3:01:38][@cubercaleb][Q: I over simplified. It should just be Bayes' theorem and these events should be independent, so just multiply the failure rate][:fabrication :mathematics]
[3:02:53][Close this down]
[/video]