education-2022/selected/http_to_phy.md

13 KiB

Stuffing a Cat Through Internet Tubes in 4 Easy Steps

When you want to send a picture of a cat to your buddy, or get the latest weather, or watch a video, your computer chunks up a bunch of information about what you're looking for, into a structured message, usually referred to as a packet.

At a high level, packets are split into a few important sections. We're going to start with the simple case: Sending a packet over ethernet with HTTP over TCP.

Our packets look like this:

ETHERNET > IP > TCP > HTTP

[NOTE(cloin): Figure out how to break out / reflow Hayden's analogy, this makes a useful textual transition]

Relate it to what a USPS facility might be responsible for on a given day, perhaps?

  • HTTP is the request to Amazon to buy a product? "I want this product sent to my address"
  • HTML is the contents of the contents of the package you will receive -- the final product itself
  • TCP is the information USPS uses internally to move the package from facility to facility, until it arrives at your address. Each facility is a proxy.
  • IP is the SHIPPING LABEL telling USPS where the information should arrive

HTTP -- Where Your Message Lives

Browsers use HTTP (Hypertext Transfer Protocol) to communicate with web servers to request and receive web pages, and perform basic transactions, like updating your account, posting images, and getting the latest updates on the weather. HTTP defines a process for packing messages to be sent back and forth over the network. It provides context for the type of message being sent, like GET, PUT, POST, or DELETE, and the format that data is sent in, like images, text or JSON.

Good learning resources for HTTP:

[NOTE(cloin): Do we want to grab images from the mdn HTTP guide, and only point users to specific sections? Find better intro materials? Proxies and "HTTP is simple" are pretty in the weeds at this point]

<-- Link to expanded, less curated library of topical info -->
<-- Branch into REST/GraphQL here -->

IP -- What's My Address?

Starting with IP (Internet Protocol), it sits before the TCP section, and contains important information for routing across larger networks, like where the packet is coming from (the source IP address), and where the packet needs to go (the destination IP address). Routers use IP addresses to route the packet across the internet.

Good learning resources for IP:

<-- Link to expanded, less curated library of topical info -->
<-- Branch into TUN via link here -->

TCP -- Retry Retry Retry

TCP (Transmission Control Protocol) comes right after IP. TCP's job is to ensure messages get to their destination as reliably as possible.

[NOTE(cloin): rephrase me?] The internet has a big problem to deal with. Messages don't always arrive as expected. TCP does a handful of important things to make communication possible. https://www.youtube.com/watch?v=IP-rGJKSZ3s

TCP provides a few nice guarantees which make writing network code a little easier. When a message gets sent via TCP, it ensures that chunks get passed to the application in order. If packets get dropped along the way, or arrive out of order, TCP handles resending missed packets and holding on to your packets until it can send them to your program properly, in order. TCP also handles congestion control, monitoring network capacity and using that to automatically scale how fast it sends messages.

TCP contains an important bit of information, called a port, that your operating system uses to send packets to the right program on your machine. If you listen on a port, like 80 (the typical port used for HTTP), your OS will direct all traffic tagged with port 80, to you. Likewise, you can send to a specific port, and the destination will use that to route it to the right program.

Good learning resources for TCP:

<-- Link to expanded, less curated library of topical info -->
<-- Branch into UDP, QUIC, TLS, etc. via link here -->


Exercise Time

So, how do you actually send that cat though? How do you send a real packet yourself? It's time to make that theory stick.

[NOTE(cloin): HTTP Server from Scratch is python, Beej uses C, might cause user confusion. It's worth thinking about article transition, how do we flow from A -> B, and how much prerequiste knowledge we expect]

DNS -- Wait, how do I get an IP?

The last really important bit you need to know is DNS. The job of DNS is to help you find IP addresses for domain names, like "handmade.network".

Here's a great tool for playing with setting up DNS records yourself. Don't worry if you don't 100% get it, there's a bunch of great resources below for grokking the details

[NOTE(cloin): Hey, this one is in Go. Do we want to worry about having a consistent language for these?] DNS detailed:

^ this is the simple DNS resolver mentioned in b0rk's 80 lines of Go tutorial

<-- Link to expanded, less curated library of topical info -->
<-- Branch into DNS over HTTPS / DNS Lookup Security via link here -->

Rock Bottom (Ethernet and PHY)

Ok, so you've got some of the basics down, and you're ready for some serious spelunking? Let's talk bits and bytes.

Watch these up, stopping after video 8 (The Internet Protocol)


Fun Tangents

  • Network Bridging
  • DHCP Robin Hood
  • PXE Booting
  • SMTP
  • Routing and Switching
  • BGP
  • TLS/SSL
  • Inspection and Testing Tools: tcpdump, wireshark, netcat and more
  • HTTP/2, HTTP/3

NET RAMBLE

physical cables -- bits on wire / optics BGP -- Major Routing Hub to Major Routing Hub https://blog.benjojo.co.uk/post/bgp-battleships

IP Distribution via IANA / ICANN -- blocks of IPv4 addresses auctioned to autonomous systems / organizations, who communication routing tables for blocks via BGP

TTL / congestion control / TCP_NODELAY vs TCP_QUICKACK / TCP_CORK https://news.ycombinator.com/item?id=9048947

DNS -- Domain Name Registrars who own TLDs (ex: .com, .org, .io) https://www.iana.org/domains/root/db https://messwithdns.net/ https://wizardzines.com/zines/dns/ https://jvns.ca/blog/2022/05/10/pages-that-didn-t-make-it-into--how-dns-works-/

Switching -- on the Ethernet / MAC level, layer 2, VLANs can happen here <> Spanning Tree Protocol -- solves ARPSTORMs

Link speed negotiation

(intel) NUC with two (usb) NICs -- VMs that would tag traffic with VLAN. Ethernet packet tagged with VLAN 1,

| 1 1 1 1 1 1 1 2 | | 2i 2o |
|               2 | |  NUC  |
|               2 | |       |

layer 2 ethernet -- hamachi / layer 3 ip -- openvpn

Router in bridge mode -- Router A <=====> Router B

  • Hubs are layer 1
  • Switches are layer 2
  • Routers are layer 3

Home "router" is a router / switch combo

Network Topology -- this is mostly outside my wheelhouse; infiniband/optics?

"crossover cable"
A            B
TX ---\/--- TX
RX ---/\--- RX

"standard cable"
A            B
TX -------- TX
RX -------- RX

Switch maintains an ethernet routing table, uses mac addresses to determine which ports need to be routed to other ports

TTL -- preventing packets from hopping forever on layer 3 connections, ICMP is a totally separate thing ICMP -- https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol

SEND CHEESEBURGER TO GOOGLE
A -> HR -> ISP -> | | | | -> GOOGLE
subnet 192.168.1.X
HR -> ISP | DESTINATION UNREACHABLE {ICMP 3} | TIME EXCEEDED {ICMP 11} TTL Expires

Blocking ICMP is messy, be careful!

https://en.wikipedia.org/wiki/Black_hole_(networking) PING sends ICMP | TTL traceroute {ICMP 30} tells each router on the chain to send back a response, they don't have to, they can just send through. traceroute bad.horse

Network Tomography -- Mapping networks by gathering a bunch of timing data sending packets between nodes https://en.wikipedia.org/wiki/Network_tomography

DHCP is automatic IP handouts But also, it tells you where your mail server is, how to get fortune cookies, and is a source of fun vulnerabilities

TLS 1.2/1.3 https://tls12.ulfheim.net/ bearSSL


Hayden's Notes:

  • If HTTP is a transfer protocol, I'm confused about why TCP is necessary -- why does a transfer protocol need a second transfer protocol to transfer itself? lol
  • I think this article could use a breif description of what we mean by HTTP to PHY. From reading the article, it seems to just mean explaining the entire network stack all the way down to the way ethernet cables physically transfer network information. Putting a short explainer at the very top may help with motivating the reader (and helping them understand whether or not the article would be worth reading)
  • Each of the links should probably have their own slight preface, explaining what you will learn by following the link. The Odin Project does this, and I think that's generally a good pedagogical approach (before you learn, you are given context about what you are going to learn)
  • Holistic learning should probably include video and articles, so I'd recommend having both types whenever possible for each topic. I can certainly help with this!
  • Starting at the "Taking the Real Plunge" section, I started feeling more lost. I think this may have been intentional since it starts branching off, but it seems like there is a through-line between almost everything listed. Like at some point, information travels through the ethernet cable, and everything listed here is used in some specific order. Is there a higher level article we can find that covers exactly how all of that information is passed along from protocol to protocol? I think even just a simple image (that still manages to cover basically everything in this article) would help a lot here! I think "The Recipe" thing at the top is kinda close to what I'm after, but it's not detailed enough in terms of the linear procedure. I could even make a graphic potentially (it wouldn't be super beautiful, but it would get the job done), if we can put it into text at a very high level first. I also wonder if we should recommend the reader just watch the entire Ben Eater Networking tutorial in order at some point, since it might help a good bit with having a holistic understanding of these things
  • Overall, I think this is great work so far! I definitely learned a good bit about how the internet works, and I have to admit, there are several things about this that piqued my interest! I think you did a really good job with picking out articles. They all seemed high quality to me with great information and pedagogical approaches to the way the information was presented

My attempt to understand:

  • HTTP are simple requests for information, and responses to those requests
    • On top of this, TCP is used to "oversee" the reliable transmission of these files, ensuring proper ordering and that nothing is lost. This is necessary, since it often requires many computers all over the globe to route information to the correct location
      • IP is an additional layer, often bundled with TCP, that provides information regarding where the HTTP requests should be routed
  • I get much more lost when it comes to where exactly in the pipeline Discovery Protocols, DNS, DHCP, and ARP come into play
  • An analogy for the above would be amazing. Relate it to what a USPS facility might be responsible for on a given day, perhaps? Is the below analogy accurate at all?
    • HTTP is the request to Amazon to buy a product? "I want this product sent to my address"
    • HTML is the contents of the contents of the package you will receive -- the final product itself
    • TCP is the information USPS uses internally to move the package from facility to facility, until it arrives at your address. Each facility is a proxy.
    • IP is the SHIPPING LABEL telling USPS where the information should arrive
    • Is there an analogy for the box itself -- a container that holds the "product"? Packets, perhaps?
      • Maybe the entire analogy should be multiple packages. Amazon sending you one part of a product at a time rather than an entire product?