Notes

Resources

BOLD 2003: Development and the Internet

Module I	Module II	Module III	Module IV	Module V
Architecture	Entrepreneurship	Learning	Policy	Conclusion

NOTES - Architecture
[Click here to return to the main lecture page. Or, to return to your place in the text, click the "back" button in your browser.]

[1] The Internet Protocol

If you want to dive into the excruciating technical details, check out the IP's technical specification, known as RFC 791. Even though that document was published in 1981 (long before the invention of the World Wide Web made the Internet so popular), it still remains the authoritative definition of the IP.

If you are interested in the history of the Internet and the Internet protocol, Robert Kahn and Vint Cerf have published a very readable and informative paper: "Internet History: What Is the Internet (And What Makes It Work)?". A few years earlier, a larger gang of the Internet's founding fathers published a terrific, and somewhat more detailed, overview: "A Brief History of the Internet." If you want to delve into still greater detail, the Internet Society maintains a useful collection of Internet history links.

[2] Internet Protocol Addresses

Of course, things have gotten far more complicated as the Internet has grown larger. The current version of the Internet protocol (know as IP version 4, or IPv4) uses a 32-bit address, which theoretically provides for more than 4 billion (4,294,967,296, to be exact) possible unique addresses. Because IP addresses are distributed in a heirarchical manner (a central registry -- the IANA -- allocates huge blocks of numbers to four regional IP address registries -- APNIC, ARIN, LACNIC, and RIPE NCC -- which in turn allocate smaller blocks to large Internet service providers, which allocate blocks to smaller ISPs, which allocate smaller blocks to network customers which allocate addresses to individual machines), the actual number of computers and devices that can be assigned unique IPv4 addresses is much much smaller. This has led to fears in recent years that the vast growth in the number of Internet-connected devices would lead to the exhaustion of the IPv4 space. These fears have not yet been realized, however, in large part due to the use of techniques -- known as Network Address Translation (NAT) -- that allow networks to use a single IP address for many hundreds of Internet-connected machines. As many network engineers will tell you, though, NAT creates all kinds of problems for the routing of Internet packets, and for the writing of Internet-enabled software. (And NAT violates the Internet's "end-to-end" design principle, which holds that Internet machines in the middle should only deliver packets, never altering them in any way.) Even though there is a lot of unassigned IPv4 space left, we are continuing to exhaust more and more of it, which means that some day, we might run out of IPv4 addresses.

As a result of these worries about IPv4, a new version of the Internet protocol has been developed by the Internet Engineering Task Force, the Internet's leading standards body. That new version is called IPv6 (don't ask us what happened to poor IP version 5; we're sure it must have been ugly) and will use a 128-bit address, rather than IPv4's 32-bit address. Doesn't sound like such a big difference, does it? In fact, it will create an almost unimaginably huge expansion of the IP address space. How's that? Because each additional bit expands the space exponentially.

Thus:

	- a 32-bit address has:	4,294,967,296 possible IP addresses
	- a 128-bit address has:	340,282,366,920,938,463,463,374,607,431,768,211,456 possible IP addresses

The latter number is so big that if every human projected to be alive in 2050 (about 10 billion humans) were each given the total amount of existing IPv4 address space today -- a full 32-bit block of addresses -- it wouldn't really make more than a tiny dent in the total amount of 128-bit address space. Since we don't really expect each human to own more than 4 billion Internet-connected devices, the conclusion is that IPv6 should give the human race a workable addressing system for a long long long time to come. (We're not even sure what to call that latter number, though "over 340 kazillion" seems about right.)

It will probably be a while, though, before you find yourself using an IPv6 device. Because of the major worldwide effort that will be required to convert the software on existing Internet-connected computers to IPv6, that protocol's widespread adoption will likely take at least a few years. Many expect that the first widespread consumer devices to use IPv6 will be next generation mobile telephones, for which the big carriers are starting to building entirely new network infrastructure from the ground up.

For more on IP addressing, see 3Com's really comprehensive guide "Understanding IP Addressing: Everything You Ever Wanted to Know."

[3] RFCs

We have twice now referred you to a document in the RFC series. What's an RFC? The RFC series is a set of technical and organizational documents relating to the Internet and its ancestors. Think of the RFCs as notes and memos published by techies primarily for other techies. It was started in 1969, in the early days of the ARPANET, a precedessor to today's Internet. Memos in the RFC series discuss a vast range of topics in computer networking, including "protocols, procedures, programs, and concepts, as well as meeting notes, opinions, and sometimes humor." The RFCs are published by the RFC Editor, who maintains a searchable RFC database at rfc-editor.org.

The term "RFC" is a bit anachronistic, meaning "Requests for Comment," which is how the documents were original viewed.. Currently, the RFC series includes various different kinds of documents, all of which have been subjected to some form of review and approval within the Internet standards process of the Internet Engineering Task Force. In particular, all official specification documents relating to the Internet Protocol suite are published as "standards-track RFCs." Which means that many of the exciting new Internet services first get defined as protocol specifications in the RFC series.

For more details, see RFC 2555 ("30 Years of RFCs"), published in 1999 as a tribute to the late, much-beloved Jon Postel, who served as the RFC Editor for nearly 30 years.

Not all RFCs are serious, by the way. The IETF has a tradition of publishing one or two April Fool's Day RFCs each year. For example:

RFC 1149: A Standard for the Transmission of IP Datagrams on Avian Carriers
"This memo describes an experimental method for the encapsulation of IP datagrams in avian carriers. This specification is primarily useful in Metropolitan Area Networks. This is an experimental, not recommended standard. Distribution of this memo is unlimited."
RFC 1925: The Twelve Networking Truths
"This memo documents the fundamental truths of networking for the Internet community. This memo does not specify a standard, except in the sense that all standards must implicitly follow the fundamental truths."
RFC 2324: Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0)
"This document describes HTCPCP, a protocol for controlling, monitoring, and diagnosing coffee pots."
RFC 2549: IP over Avian Carriers with Quality of Service
"This memo amends RFC 1149, 'A Standard for the Transmission of IP Datagrams on Avian Carriers', with Quality of Service information. This is an experimental, not recommended standard."
RFC 2795: The Infinite Monkey Protocol Suite (IMPS)
A protocol suite which supports an infinite number of monkeys that sit at an infinite number of typewriters in order to determine when they have either produced the entire works of William Shakespeare or a good television show. The suite includes communications and control protocols for monkeys and the organizations that interact with them.

[4] DHCP

DHCP stands for "Dynamic Host Configuration Protocol." For a nice overview of DHCP (with good sections on IP addressing and Network Address Translation), see Webmonkey's "DHCP Primer" by Michael Calore.

[5] Traceroute

Traceroute is a clever tool that allows network administrators to isolate and debug complex network problems. It's also a critical tool for folks interested in how the Internet is connected together. It's available on most Unix systems and on many Windows installations. From a Unix shell, the command is "traceroute n" where n is either a domain name or an IP address. On a Windows system, select "Run" from the start menu. Type "command" (or "cmd") into the "Open:" window and hit Okay to open a shell window with command line. At the C: prompt, type "tracert n", where n is either a domain name or an IP address.

If you don't have access to traceroute on your local system, you're not out of luck. There are some terrific online traceroute tools that allow you to trace the routes from different internet backbones to an arbitrary host. These tools are often very helpful even if you have a local instance of traceroute, as traceroute only allows you to trace from your system to another host. If you're trying to do network mapping, it's important to be able to trace paths between two arbitrary machines, and these online tools can help you do this.

Should you be lucky enough to have access to a good Unix shell, you may also find the "whois" and "host" commands useful. Whois is most useful with the -h flag, which allows you to query multiple whois hosts - for instance, when looking up hosts in Asia, it's useful to use the syntax "whois -h whois.apnic.net domainname.com" to query the Asia Pacific whois server.

Happy mapping.

[6] HTML

HTML is a standardized technical language created and maintained by the World Wide Web Consortium (W3C), the Internet standards body responsible for the World Wide Web, which is a hugely popular application that runs over the Internet protocol. HTML "marks up" text with attributes and tags that define its structure, appearance, and layout on a web page.

Dave Raggett of the W3C has published a straightforward introduction: "Getting Started with HTML".

[7] Estimating Routes

A certain amount of educated guesswork was involved in developing the packet routes you see documented in this lecture. We have the capability of tracing routes from Harvard machines to machines across the 'net, so those routes are quite close to being accurate. For other routes, we used a combination of techniques to guess at the actual routing of packets. As a result, there may be egregious errors in our routing logic that could lead to these paths being inaccurate representations of the path packets actually take. Even so, we're confident that the routes we describe are reasonably solid guesses, and make reasonable scenarios for the lecture.

[8] WiFi

WiFi stands for "wireless fidelity," and is a popular term for the 801.11b standard for high-frequency wireless local area networks. Over the past 2 years, WiFi has become the leading standard for wireless home and office networks. WiFi uses the Ethernet protocol and operates in the 2.4 GHz range, offering data speeds up to 11 megabits per second.

[9] Switch

A switch is a device that filters and forwards packets between different networks, or different segments of a local area network. Switches examine each data packet as it is received, determines the source and destination device of that packet, and forwards it appropriately.

[10] Router

A router is a more sophisticated piece of network hardware, compared to a switch. It connects networks together and is capable of filtering or blocking packets at the boundary of the network.

[11] BGP-4

"BGP-4" refers to the Border Gateway Protocol (version 4), the widely-used exterior routing protocol. Adjacent networks use BGP-4 to determine how to route a given outbound packet. BGP-4 allows neighboring networks to inform each other about the set of destinations served by their own networks.

[12] ISP Tiers

Internet service providers are often categorized by a hierarchy of tiers. Tier-1 ISPs are the largest. To oversimplify a bit, the term "Tier 1" is self-defining, in a sense: Tier-1 ISPs are those ISP that peer with the other Tier-1 ISPs. Another often-used definition is that Tier-1 ISPs are those ISPs that run no-default routing tables on their backbones. Tier-2 ISPs buy connectivity (upstream transit) from one or more Tier-1 ISPs. In a sense, a Tier-2 ISP's network is a sub-set of those upstream Tier-1 ISPs' networks. Of course, Tier-2 ISPs will seek to peer with each other to minimize the amount of traffic to and from the upstream Tier-1 ISPs, whom they must pay for transit to all non-peer routes. Tier-3 ISPs purchase upstream transit from Tier-2 ISPs and so on. At the lower tiers, this hierarchical classification system gets quite murky, however, since a given ISP might buy upstream transit from both a Tier-1 ISP and a Tier-2 ISP, and may peer with Tier-2 and Tier-3 ISPs, and occasionally a Tier-1 ISP, and so on. In general, the term is only useful to distinguish between Tier-1 ISPs (who do not need to buy upstream transit due because they peer with other Tier-1 ISPs), and all other ISPs, who must pay for at least some upstream transit to obtain global connectivity.

contact: BOLD@cyber.law.harvard.edu