Thursday, 19 March 2026

Digesting Networking Concepts From Beej's Guide Using Claude's Explaination - Lesson 2



LESSON 2 — IP Addresses, Ports, and Identity


Part 1 — The Problem of Identity on a Network

For data to travel from one machine to another, two questions must be answered with absolute precision:

  1. Which machine? — Out of billions of connected devices, which exact one?
  2. Which program on that machine? — Your machine runs dozens of programs simultaneously. Which one should receive this data?

IP addresses answer question 1. Port numbers answer question 2. Together they form a complete address for a network endpoint.


Part 2 — IP Addresses (IPv4)

What an IP Address Actually Is

An IPv4 address is a 32-bit number. That's it. 32 ones and zeroes.

But 32 ones and zeroes are hard for humans to read, so we represent it as four groups of 8 bits (called octets), each converted to decimal, separated by dots.

Binary:   11000000 . 10101000 . 00000001 . 00000101
Decimal:    192    .   168    .    1     .    5
Human:              192.168.1.5

Each octet can be 0–255 (because 8 bits can represent 2⁸ = 256 values, 0 to 255).

Total possible IPv4 addresses: 2³² = 4,294,967,296 (about 4.3 billion)

The Critical Architectural Insight About 4.3 Billion

In the 1970s and 80s when IPv4 was designed, 4.3 billion addresses seemed impossibly large. There were only a few thousand computers on the entire Internet.

The designers made a decision that seemed perfectly reasonable at the time: they allocated huge blocks of addresses generously to organizations. MIT got 16 million addresses. Apple got 16 million. The US Department of Defense got 16 million times 13. Ford, GE, IBM, HP — all got enormous blocks.

By the 2000s, it was clear we were running out. Today every phone, tablet, laptop, smart TV, IoT device needs an address. We've dealt with the shortage through NAT (which we'll cover), but the fundamental problem remained.

This is why IPv6 was invented.


Part 3 — IPv6 — The Solution to Address Exhaustion

What IPv6 Actually Is

IPv6 addresses are 128-bit numbers. Four times the bits of IPv4.

But 96 extra bits isn't just "a bit more space." Because addresses grow exponentially with bits:

IPv4:  2³²  = 4,294,967,296 addresses (4.3 billion)
IPv6:  2¹²⁸ = 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses

That's 340 undecillion addresses. 340 followed by 36 zeros. Every square millimeter of Earth's surface could have billions of addresses. Every grain of sand. Every atom.

We will never run out.

How IPv6 Addresses Are Written

Because 128 bits in decimal would be unreadable, IPv6 uses hexadecimal, grouped into 8 chunks of 16 bits (2 bytes each), separated by colons:

Full form:   2001:0db8:c9d2:aee5:73e3:934a:a5ae:9551

Two compression rules exist to shorten addresses:

Rule 1: Leading zeros in each group can be dropped:

2001:0db8:0012:0000  →  2001:db8:12:0

Rule 2: One consecutive run of all-zero groups can be replaced with :::

2001:0db8:0000:0000:0000:0000:0000:0051  →  2001:db8::51

You can only use :: once per address (otherwise you couldn't tell how many zero groups it represents).

Special IPv6 Addresses You Must Know

::1          →  Loopback address (equivalent to 127.0.0.1 in IPv4)
             →  Always means "this machine itself"

::           →  All zeros — used to mean "any address" (like 0.0.0.0 in IPv4)

::ffff:192.0.2.33  →  IPv4-mapped IPv6 address
                   →  Represents an IPv4 address in IPv6 notation

The Loopback Address — Why It Matters

The loopback address (127.0.0.1 in IPv4, ::1 in IPv6) is special. Data sent to this address never leaves your machine. It goes through the network stack but loops back to the same machine.

This is invaluable for development — you can run a server and client on the same machine and test them talking to each other without needing a real network.


Part 4 — Subnets — How Networks Are Organized

The Problem Subnets Solve

An IP address identifies a specific machine. But a network has structure — there are groups of machines (like all machines in one office, or one department, or one country). How do you express that structure?

Subnets divide the IP address into two parts:

  • The network portion — identifies which network (shared by all machines in that network)
  • The host portion — identifies which specific machine within that network

The Netmask

netmask (or subnet mask) tells you which bits of the IP address are the network portion and which are the host portion.

IP Address:  192.168.1.5
             11000000.10101000.00000001.00000101

Netmask:     255.255.255.0
             11111111.11111111.11111111.00000000

The netmask has 1s where the network bits are and 0s where the host bits are.

To find the network address, you AND the IP address with the netmask:

  11000000.10101000.00000001.00000101   (192.168.1.5)
& 11111111.11111111.11111111.00000000   (255.255.255.0)
= 11000000.10101000.00000001.00000000   (192.168.1.0)

So 192.168.1.5 is host 5 on network 192.168.1.0.

CIDR Notation — The Modern Way

Writing out the full netmask is verbose. CIDR (Classless Inter-Domain Routing) notation just counts the number of 1-bits in the netmask:

192.168.1.5/24   →  24 bits of network, 8 bits of host
                 →  256 possible host addresses on this network
                 →  Same as netmask 255.255.255.0

10.0.0.1/8       →  8 bits of network, 24 bits of host
                 →  16 million possible hosts on this network

2001:db8::/32    →  IPv6 subnet with 32-bit network prefix

Why You Need to Know This

As a Software Architect, you'll design systems where you need to understand whether two machines are on the same network (can communicate directly) or different networks (need to go through a router). Subnet knowledge is essential for understanding network topology, firewall rules, and cloud infrastructure (AWS VPCs, subnets, security groups all use this directly).


Part 5 — Port Numbers — The Second Half of Identity

The Problem Ports Solve

An IP address gets data to the right machine. But your machine is running many programs simultaneously — a web browser, a code editor, a Slack client, a database, your own server. How does the operating system know which program should receive incoming data?

Port numbers solve this. A port is a 16-bit number (0–65535) that identifies a specific process/service on a machine.

The complete address of a network endpoint is therefore:

IP Address + Port Number = Socket Address

Example:  192.168.1.5:80
          └───────────┘ └┘
          Which machine  Which program

This combination (IP + Port) is called a socket address or endpoint.

Well-Known Ports

The Internet Assigned Numbers Authority (IANA) maintains a list of standard port assignments:

Port 20, 21  →  FTP (file transfer)
Port 22      →  SSH (secure shell)
Port 23      →  Telnet
Port 25      →  SMTP (email sending)
Port 53      →  DNS (domain name lookup)
Port 80      →  HTTP (web)
Port 443     →  HTTPS (secure web)
Port 3306    →  MySQL database
Port 5432    →  PostgreSQL database
Port 6379    →  Redis
Port 27017   →  MongoDB

Ports 0–1023 are reserved — only privileged processes (root/admin) can bind to them. This is why you can't run a web server on port 80 without root access.

Ports 1024–49151 are registered ports — can be used by user applications, many are assigned to known services.

Ports 49152–65535 are ephemeral (dynamic) ports — the OS automatically assigns these to client-side connections. When your browser connects to a server on port 80, your browser itself gets a random high port (like 54832) for the other side of the conversation.

The Full Picture of a Connection

A TCP connection is uniquely identified by four values:

Source IP     + Source Port     (your machine, ephemeral port)
Destination IP + Destination Port (server, well-known port)

Example:
  192.168.1.5:54832  →  93.184.216.34:80

This 4-tuple uniquely identifies this connection in the entire network.

This is why a server can handle thousands of connections all to port 80 simultaneously — each connection has a different source IP + port combination, making each 4-tuple unique.


Part 6 — NAT — The Hidden Architecture of the Modern Internet

The Problem NAT Solves

With only 4.3 billion IPv4 addresses and billions of devices, we ran out. But we couldn't just replace IPv4 overnight — too much infrastructure depended on it. NAT (Network Address Translation) was invented as a stopgap measure that ended up defining how most of the world's Internet works.

How NAT Works

Your home router has one public IP address (assigned by your ISP). Behind it, all your home devices have private IP addresses from a reserved range that is not routable on the public Internet.

Private IP ranges (RFC 1918):
  10.0.0.0    –  10.255.255.255    (10.x.x.x)
  172.16.0.0  –  172.31.255.255   (172.16-31.x.x)
  192.168.0.0 –  192.168.255.255  (192.168.x.x)

These addresses are never routed on the public Internet. Every home network in the world can use 192.168.1.x — no conflict, because they're all private.

When your laptop (192.168.1.5) makes a request to google.com:

Your laptop:
  Sends packet: FROM 192.168.1.5:54832  TO 142.250.80.46:443

Your router (NAT):
  Records the mapping: 192.168.1.5:54832 ↔ public_ip:random_port
  Rewrites packet:     FROM 203.0.113.1:61234  TO 142.250.80.46:443
  Sends it out

Google responds:
  Sends packet: FROM 142.250.80.46:443  TO 203.0.113.1:61234

Your router (NAT):
  Looks up its table: 203.0.113.1:61234 → 192.168.1.5:54832
  Rewrites packet:    FROM 142.250.80.46:443  TO 192.168.1.5:54832
  Delivers to your laptop

Google only ever sees your router's public IP. Your laptop's private IP is completely hidden.

The Architectural Implications of NAT

NAT has profound implications you'll encounter as an architect:

Servers cannot initiate connections to NAT'd devices — the router has no mapping for unsolicited incoming connections. This is why you can't just connect to a computer on someone's home network from the Internet. This is what "being behind a NAT" means.

Peer-to-peer is hard — when both parties are behind NAT, neither can initiate to the other directly. Techniques like STUN, TURN, and ICE exist to work around this (used in WebRTC, video calling).

IPv6 eliminates the need for NAT — every device gets a globally routable address. This is one of IPv6's major benefits beyond just more addresses.


Part 7 — Byte Order — The Hidden War Inside Your CPU

The Problem

When you store a multi-byte number in memory, which byte do you store first?

Consider the number 0xb34f (hexadecimal). It's two bytes: 0xb3 and 0x4f. When storing these two bytes in memory at addresses 1000 and 1001, which goes where?

Different CPU architectures made different choices:

Big-Endian (Network Byte Order):
  Address 1000: 0xb3  (most significant byte first)
  Address 1001: 0x4f

Little-Endian (Intel x86, x64, ARM in most modes):
  Address 1000: 0x4f  (least significant byte first)
  Address 1001: 0xb3

Big-Endian stores the most significant byte at the lowest address — the "big end" first. Like writing numbers left to right in decimal — the most significant digit comes first.

Little-Endian stores the least significant byte first. Intel processors do this. Most consumer computers today are little-endian.

Why This Matters for Networking

The Internet standardized on Big-Endian as the Network Byte Order. This decision was made so that all machines on the Internet agree on how to interpret multi-byte numbers in packet headers.

If you're on an Intel machine (little-endian) and you put the number 3490 directly into a packet header without conversion, the bytes will be in the wrong order. The receiving machine will read a completely different number.

The Conversion Functions

The C sockets API provides four functions to convert between host byte order and network byte order:

htons()  →  Host TO Network Short  (16-bit)
htonl()  →  Host TO Network Long   (32-bit)
ntohs()  →  Network TO Host Short  (16-bit)
ntohl()  →  Network TO Host Long   (32-bit)

The rule is simple and absolute:

  • Any multi-byte number going into a packet → convert with hton*()
  • Any multi-byte number coming out of a packet → convert with ntoh*()

On a big-endian machine these functions do nothing (it's already in the right order). On a little-endian machine they swap the bytes. You always call them regardless of your machine — that's what makes your code portable.

Port 3490 in host byte order (little-endian Intel):
  Memory: 0xa2 0x0d  (3490 = 0x0DA2, stored as 0xA2 0x0D)

After htons(3490) — Network Byte Order:
  Memory: 0x0d 0xa2  (big-endian: most significant first)

Lessons 1 & 2 — Master Summary

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LESSON 1 — WHAT IS A NETWORK?

Problem solved:    Moving data between machines reliably
Solution:          Agreed protocols organized in layers

Four layers:
  Application    →  your code
  Transport      →  TCP (reliable) | UDP (fast)
  Internet       →  IP (routing)
  Network Access →  physical bits

Data encapsulation:
  Each layer adds its header going DOWN the stack
  Each layer strips its header coming UP the stack
  Layers are opaque to each other

TCP:  connection + handshake + ordered + reliable + slower
UDP:  connectionless + fire-and-forget + fast + unreliable

Sockets: the API boundary between your app and the network

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LESSON 2 — IDENTITY ON A NETWORK

IP Address:   32-bit (IPv4) or 128-bit (IPv6) number
              Identifies which MACHINE

IPv4:         4.3 billion addresses, running out
IPv6:         340 undecillion addresses, never running out
              Written in hex groups: 2001:db8::1
              ::1 = loopback | :: = any address

Subnet:       Network portion + Host portion of IP address
              Netmask tells you where the split is
              CIDR: /24 means 24 bits are network portion
              AND (IP, netmask) = network address

Port:         16-bit number
              Identifies which PROGRAM on the machine
              0–1023: reserved (need root)
              1024–49151: registered
              49152–65535: ephemeral (auto-assigned to clients)

Full address: IP + Port = Socket Address (endpoint)
Connection:   identified by 4-tuple:
              (src_ip, src_port, dst_ip, dst_port)

NAT:          Router translates private IPs to one public IP
              Private ranges: 10.x.x.x, 172.16-31.x.x,
                              192.168.x.x
              Implication: servers can't initiate to NAT'd devices

Byte Order:
  Big-Endian    = Network Byte Order (Internet standard)
  Little-Endian = Intel/most consumer CPUs
  htons/htonl   = Host → Network (use before sending)
  ntohs/ntohl   = Network → Host (use after receiving)
  Always call these. No exceptions.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━