education-2022/selected/http_to_phy.md

252 lines
12 KiB
Markdown
Raw Normal View History

2022-02-14 04:54:41 +00:00
# So You Want to Make Internet Lasagna?
## The Recipe
| Communication | Discovery |
| ------------- | --------- |
| HTTP | DNS |
| TCP | |
2022-02-14 04:54:41 +00:00
| IP | DHCP |
| Ethernet | ARP |
<br>
## Starting from HTTP
HTTP (Hypertext Transfer Protocol), is what browsers use to talk to web servers to send and receive web pages,
2022-02-14 04:54:41 +00:00
do basic transactions, like sending a form from your browser to the server, requesting some database
information for display, or updating your account settings.
Good learning resources for HTTP:
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview [NOTE(hayden): This link mentions proxies -- do those proxies use TCP to transfer the HTTP messages?]
2022-02-14 04:54:41 +00:00
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages#http_requests
<-- Link to expanded, less curated library of topical info -->
<-- Branch into REST/GraphQL here -->
## Diving into the meat and potatoes with TCP and IP
HTTP responses and requests are a sequence of bytes, chunked up and sent in packets,
often [NOTE(hayden): "often" -- should exceptions be mentioned?] through a protocol called TCP (Transmission Control Protocol). TCP provides a few
2022-02-14 04:54:41 +00:00
nice guarantees which make writing reliable network code a little easier.
When a message gets sent via TCP, it ensures that chunks get passed to the application
in order. If packets get dropped along the way, or arrive out of order, TCP handles resending
missed packets and buffering before the application gets the packet, until order can be restored.
Good learning resources for TCP:
[NOTE(hayden): The first video here mentions ports -- as if they are already understood (and previous articles briefly mentioned them). Should ports have their own section befor this?]
2022-02-14 04:54:41 +00:00
- https://www.youtube.com/watch?v=4IMc3CaMhyY
- https://www.youtube.com/watch?v=F27PLin3TV0
- https://www.youtube.com/watch?v=IP-rGJKSZ3s [NOTE(hayden): I think this link in particular needs further context (although I do understand why it is relevant, especially given the previous video)]
2022-02-14 04:54:41 +00:00
<-- Link to expanded, less curated library of topical info -->
<-- Branch into UDP, QUIC, TLS, etc. via link here -->
IP (Internet Protocol) sits directly below TCP, but often gets bundled together. IP is a
small header attached right above the protocol header [NOTE(hayden): Which protocol? TCP?], and contains important information,
2022-02-14 04:54:41 +00:00
like where the packet is coming from, and where the packet needs to go, so network hardware
along the way can route it from A->B to reach it's destination
Good learning resources for IP:
- https://www.youtube.com/watch?v=rPoalUa4m8E [NOTE(hayden): Should point to point links, 'frames', and MAC addresses be covered before this point?]
2022-02-14 04:54:41 +00:00
- https://www.youtube.com/watch?v=VWJ8GmYnjTs
<-- Link to expanded, less curated library of topical info -->
<-- Branch into TUN via link here -->
<br>
## Making it Tractable
So, how do you take all that theory and make it stick? How do you send a real packet yourself?
Some good outlets for exercises:
- https://beej.us/guide/bgnet/html/ [NOTE(hayden): This seems like an excellent resource!]
- https://github.com/shuveb/zerohttpd [NOTE(hayden): If we can provide a brief description of the different folders in this repo, that might be helpful]
2022-02-14 04:54:41 +00:00
## Taking the Real Plunge
Ok, so you've got some of the basics down, and you're ready for some serious spelunking?
Let's talk Ethernet and PHY.
- https://www.youtube.com/watch?v=XaGXPObx2Gs&list=PLowKtXNTBypH19whXTVoG3oKSuOcw_XeW
<br>
## Buttoning up with Discovery Protocols
So, how does the computer get an IP address? How do we know what the router's IP is?
How do find the IP address of "https://handmade.network/" so we can send it a request?
Discovery protocols to the rescue!
[NOTE(hayden): Do we need links for this section?]
2022-05-23 05:46:01 +00:00
## Using the DNS Phonebook
2022-02-14 04:54:41 +00:00
2022-05-23 05:46:01 +00:00
DNS sits at the top acting as a final, important, icing on the cake. The job of DNS is primarily to
provide lookup services for domain names. To resolve "https://handmade.network/" into an IP address
so we can send it an HTTP request, we send a lookup request to the DNS server, and it will do the requisite
forwarding until it either has an IP address to send back, or fails.
2022-02-14 04:54:41 +00:00
2022-05-23 05:46:01 +00:00
Good learning resources for DNS:
- https://www.cloudflare.com/learning/dns/what-is-dns/
2022-02-14 04:54:41 +00:00
<-- Link to expanded, less curated library of topical info -->
2022-05-23 05:46:01 +00:00
<-- Branch into DNS over HTTPS / DNS Lookup Security via link here -->
2022-02-14 04:54:41 +00:00
## Finding the Mailman with DHCP
DHCP sits near the middle, but is incredibly important. When you want to send a packet to a network
beyond your own, somebody has to deliver that packet. To find the packet post office, your computer
broadcasts a DHCP discover packet, and collects IP offers from all DHCP servers on the network.
At that point, typically it will fire off a request for the first IP it recieves, and get a
confirmation or denial for that request. DHCP acks typically also contain the IP address of the router,
the local DNS server, and more
Good learning resources for DHCP:
- https://docs.microsoft.com/en-us/windows-server/networking/technologies/dhcp/dhcp-top
<-- Link to expanded, less curated library of topical info -->
<-- Branch into PXE Booting via link here -->
2022-05-23 05:46:01 +00:00
## Putting on the ARP Goggles
2022-02-14 04:54:41 +00:00
2022-05-23 05:46:01 +00:00
At the bottom of the protocol stack, ARP (Address Resolution Protocol) is how your computer
reaches out and understands the local network it lives on. When an ethernet cable gets plugged into
your computer, it broadcasts an ARP packet, gathering responses to know how to address
messages directly to specific local machines. The initial ARP packet contains the MAC
address of the computer sending, and responses from all machines that want to be discovered
get fired back with their MAC addresses in tow.
2022-02-14 04:54:41 +00:00
2022-05-23 05:46:01 +00:00
Good learning resources for ARP:
- https://www.youtube.com/watch?v=aamG4-tH_m8&list=PLowKtXNTBypH19whXTVoG3oKSuOcw_XeW&index=9
- https://www.saminiir.com/lets-code-tcp-ip-stack-1-ethernet-arp/
2022-02-14 04:54:41 +00:00
<-- Link to expanded, less curated library of topical info -->
2022-05-23 05:46:01 +00:00
<-- Branch into TAP via link here -->
2022-02-14 04:54:41 +00:00
## Fun Tangents
- Network Bridging
- DHCP Robin Hood
- PXE Booting
- SMTP
- Routing and Switching
- BGP
- TLS/SSL
- Inspection and Testing Tools: tcpdump, wireshark, netcat and more
2022-05-09 22:06:14 +00:00
- HTTP/2, HTTP/3
2022-05-28 04:01:12 +00:00
## NET RAMBLE
physical cables -- bits on wire / optics
BGP -- Major Routing Hub to Major Routing Hub
https://blog.benjojo.co.uk/post/bgp-battleships
IP Distribution via IANA / ICANN
-- blocks of IPv4 addresses auctioned to autonomous systems / organizations, who communication routing tables for blocks via BGP
TTL / congestion control / TCP_NODELAY vs TCP_QUICKACK / TCP_CORK
https://news.ycombinator.com/item?id=9048947
DNS -- Domain Name Registrars who own TLDs (ex: .com, .org, .io)
https://www.iana.org/domains/root/db
https://messwithdns.net/
https://wizardzines.com/zines/dns/
https://jvns.ca/blog/2022/05/10/pages-that-didn-t-make-it-into--how-dns-works-/
Switching -- on the Ethernet / MAC level, layer 2, VLANs can happen here
<ETHERNET><><ETHERNET>
Spanning Tree Protocol -- solves ARPSTORMs
Link speed negotiation
(intel) NUC with two (usb) NICs -- VMs that would tag traffic with VLAN.
Ethernet packet tagged with VLAN 1,
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
| 1 1 1 1 1 1 1 2 | | 2i 2o |
| 2 | | NUC |
| 2 | | |
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
layer 2 ethernet -- hamachi / layer 3 ip -- openvpn
Router in bridge mode -- Router A <=====> Router B
2022-05-28 04:02:18 +00:00
- Hubs are layer 1
- Switches are layer 2
- Routers are layer 3
2022-05-28 04:01:12 +00:00
Home "router" is a router / switch combo
Network Topology -- this is mostly outside my wheelhouse; infiniband/optics?
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
"crossover cable"
A B
TX ---\/--- TX
RX ---/\--- RX
"standard cable"
A B
TX -------- TX
RX -------- RX
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
Switch maintains an ethernet routing table, uses mac addresses to determine which ports need to be routed to other ports
TTL -- preventing packets from hopping forever on layer 3 connections, ICMP is a totally separate thing
ICMP -- https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
SEND CHEESEBURGER TO GOOGLE
A -> HR -> ISP -> | | | | -> GOOGLE
subnet 192.168.1.X
HR -> ISP | DESTINATION UNREACHABLE {ICMP 3} | TIME EXCEEDED {ICMP 11} TTL Expires
2022-05-28 04:01:42 +00:00
```
2022-05-28 04:01:12 +00:00
Blocking ICMP is messy, be careful!
https://en.wikipedia.org/wiki/Black_hole_(networking)
PING sends ICMP | TTL
traceroute {ICMP 30} tells each router on the chain to send back a response, they don't have to, they can just send through.
`traceroute bad.horse`
Network Tomography -- Mapping networks by gathering a bunch of timing data sending packets between nodes
https://en.wikipedia.org/wiki/Network_tomography
DHCP is automatic IP handouts
But also, it tells you where your mail server is, how to get fortune cookies, and is a source of fun vulnerabilities
TLS 1.2/1.3
https://tls12.ulfheim.net/
bearSSL
----------
Hayden's Notes:
* If HTTP is a transfer protocol, I'm confused about why TCP is necessary -- why does a transfer protocol need a second transfer protocol to transfer itself? lol
* I think this article could use a breif description of what we mean by HTTP to PHY. From reading the article, it seems to just mean explaining the entire network stack all the way down to the way ethernet cables physically transfer network information. Putting a short explainer at the very top may help with motivating the reader (and helping them understand whether or not the article would be worth reading)
* Each of the links should probably have their own slight preface, explaining what you will learn by following the link. The Odin Project does this, and I think that's generally a good pedagogical approach (before you learn, you are given context about what you are going to learn)
* Holistic learning should probably include video and articles, so I'd recommend having both types whenever possible for each topic. I can certainly help with this!
* Starting at the "Taking the Real Plunge" section, I started feeling more lost. I think this may have been intentional since it starts branching off, but it seems like there is a through-line between almost everything listed. Like at some point, information travels through the ethernet cable, and everything listed here is used in some specific order. Is there a higher level article we can find that covers exactly how all of that information is passed along from protocol to protocol? I think even just a simple image (that still manages to cover basically everything in this article) would help a lot here! I think "The Recipe" thing at the top is kinda close to what I'm after, but it's not detailed enough in terms of the linear procedure. I could even make a graphic potentially (it wouldn't be super beautiful, but it would get the job done), if we can put it into text at a very high level first
* Overall, I think this is great work so far! I definitely learned a good bit about how the internet works, and I have to admit, there are several things about this that piqued my interest! I think you did a really good job with picking out articles. They all seemed high quality to me with great information and pedagogical approaches to the way the information was presented
My attempt to understand:
- HTTP are simple requests for information, and responses to those requests
- On top of this, TCP is used to "oversee" the reliable transmission of these files, ensuring proper ordering and that nothing is lost. This is necessary, since it often requires many computers all over the globe to route information to the correct location
- IP is an additional layer, often bundled with TCP, that provides information regarding where the HTTP requests should be routed
- I get much more lost when it comes to where exactly in the pipeline Discovery Protocols, DNS, DHCP, and ARP come into play
* An analogy for the above would be amazing. Relate it to what a USPS facility might be responsible for on a given day, perhaps? Is the below analogy accurate at all?
* HTTP is the request to Amazon to buy a product? "I want this product sent to my address"
* HTML is the contents of the contents of the package you will receive -- the final product itself
* TCP is the information USPS uses internally to move the package from facility to facility, until it arrives at your address. Each facility is a proxy.
* IP is the SHIPPING LABEL telling USPS where the information should arrive
* Is there an analogy for the box itself -- a container that holds the "product"? Packets, perhaps?
* Maybe the entire analogy should be multiple packages. Amazon sending you one part of a product at a time rather than an entire product?