Introduction to TCP/IP and Networking
What is Internet
The Internet is a global interconnected network
of computers. Using the Internet you can look at documents and images even view
videos or listen to sound files from anywhere in the world using your computer.
You can also use the Internet to publish, so that others can look at your
information in any of a number of standard file formats.
You can also use the Internet to send messages
through e-mail, as long as you know the e-mail address of the recipient. The
Internet can also be used to transfer files between any two people or
computers. The Internet also creates new communities of individuals, belonging
to newsgroups where information is shared between people with similar
interests, even though individuals could be geographically dispersed. Letters
and files can be posted to newsgroups, where others can share them.
Internet is the world's largest computer network,
the network of networks, scattered all over the world. It was
created nearly 25 years ago as a project for the U.S. Department of Defense.
Its goal was to create a method for widely separated computers to transfer data
efficiently even in the event of a nuclear attack. From a handful of computer
and users in the 1960s, today the Internet has grown to thousands of regional
networks that can connect millions of users. Any single individual, company, or
country does not own this global network.
How Internet works
When you connect computers together, you get a
"network" which allows computers to "talk" to each other.
This communication was originally part of the "operating system" of a
computer. The Internet originally arose as a bunch of UNIX-based (UNIX is an
operating system) computers linked together, so a lot of the terms on the
Internet have their origins in the UNIX world. This means a lot of weird
cryptic terms or acronyms are used (words made up from initial letters of
longer words). The standard for communicating on the Internet is called "TCP/IP"
(pronounced as TCPIP without the '/') which is short for Transmission Control
Protocol/Internet Protocol.
The key concept in TCP/IP is that every computer
has to know or can figure out where all other computers are on the network, and
can send data by the quickest route, even if part of the route is down. The
reason the route is down might include a computer is shut down or a phone line
disconnected or in repair. This is done by maintaining indexes of all IP
addresses in a domain at multiple servers strategically spread around the
country, so that messages are quickly routed along the fastest path.
TCP/IP transfers information in small chunks
called "packets." Each packet includes the following information: the
computer (or last few computers) the data came from, the computer to which it
is headed, the data itself, and error-checking information (to ensure that the
individual packet was accurately and completely sent and received). The
elegance of TCP/IP is that a large file can be broken into multiple packets,
each sent over different paths in the network. These packets then re-assembled
at the other end into one file and saved on the destination computer.
To access the Internet you need an Internet
Service Provider or "ISP". The ISP is connected to the Internet
"backbone" which is the permanent cabling of the Internet. This
backbone may consist of copper wire, fiber optic cable, microwave, and even
satellite connections between any two points. To you it doesn't matter – the
Internet's TCP/IP works this out for you. You can connect to the Internet in
one of two basic ways, dialing into an Internet Service Provider's computer, or
with a direct connection to an Internet Service Provider. The difference is
mainly in the speed and cost.
The previous figure gives a pictorial
representation of how Internet works. You want to access a web site that is
hosted on a server somewhere in the world (say USA)
and you want to access the information from India. You connect (using a
computer and modem - Dial up access) to your local Internet Access Provider,
then you type in the address of your site. Your request is sent form the local
ISP's server through the different computers in the network (Internet backbone)
till it reaches the server where you have hosted your site. It is like a letter
traveling though the various postal networks and reaching the addressee’s
place. Then the information stored on the web site that you are trying to access
is sent back to your computer so that you can access it.
Networking Basics
The Internet
protocol suite is the set of protocols that
implement the protocol stack on which the Internet runs. It is
sometimes called the TCP/IP protocol suite, after two of the
many protocols that make up the suite: the Transmission Control Protocol (TCP) and the Internet Protocol (IP),
which were the first two defined. The authoritative reference on this subject
is RFC 1122.
The Internet
protocol suite can be described by analogy with the OSI model,
which describes the layers of a protocol
stack, not all of which correspond well with Internet practice. In a
protocol stack, each layer solves a set of problems involving the transmission
of data, and provides a well-defined service to the higher layers. Higher
layers are logically closer to the user and deal with more abstract data,
relying on lower layers to translate data into forms that can eventually be
physically manipulated.
The Internet
model was produced as the solution to a practical engineering problem. The OSI
model, on the other hand, was a more theoretical approach, and was also
produced at an earlier stage in the evolution of networks.
Therefore, the OSI model is easier to understand, but the TCP/IP model is the
one in actual use. It is helpful to have an understanding of the OSI model
before learning TCP/IP, as the same principles apply, but are easier to
understand in the OSI model. The following diagram attempts to show where
various TCP/IP and other protocols would reside in the original OSI model:
7
|
Application
|
|
6
|
Presentation
|
|
5
|
Session
|
|
4
|
Transport
|
|
3
|
Network
|
|
2
|
Data Link
|
|
1
|
Physical
|
|
Commonly,
the top three layers of the OSI model (Application, Presentation and Session)
are considered as a single Application Layer in the TCP/IP suite. Because the
TCP/IP suite has no unified session layer on which higher layers are built,
these functions are typically carried out (or ignored) by individual
applications. A simplified TCP/IP interpretation of the stack is shown below:
As we see on the diagram
above, computers running on the Internet communicate to each other using either
the Transmission Control Protocol (TCP) or the User Datagram Protocol (UDP). When
you write Java programs that communicate over the network, you are programming
at the application layer. Typically, you don't need to concern yourself with
the TCP and UDP layers. Instead, you can use the classes in the java.net
package. These classes provide system-independent
network communication. However, to decide which Java classes your programs
should use, you do need to understand how TCP and UDP differ.
TCP
When two applications want
to communicate to each other reliably, they establish a connection and send
data back and forth over that connection. This is analogous to making a
telephone call. If you want to speak to Aunt Beatrice in Kentucky, a connection is established when
you dial her phone number and she answers. You send data back and forth over
the connection by speaking to one another over the phone lines. Like the phone
company, TCP guarantees that data sent from one end of the connection actually
gets to the other end and in the same order it was sent. Otherwise, an error is
reported.
TCP provides a
point-to-point channel for applications that require reliable communications.
The Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and
Telnet are all examples of applications that require a reliable communication
channel. The order in which the data is sent and received over the network is
critical to the success of these applications. When HTTP is used to read from a
URL, the data must be received in the order in which it was sent. Otherwise, you
end up with a jumbled HTML file, a corrupt zip file, or some other invalid
information.
Definition: TCP (Transmission Control Protocol) is a
connection-based protocol that provides a reliable flow of data between two
computers.
UDP
The UDP protocol provides
for communication that is not guaranteed between two applications on the
network. UDP is not connection-based like TCP. Rather, it sends independent
packets of data, called datagrams, from one application to another.
Sending datagrams is much like sending a letter through the postal service: The
order of delivery is not important and is not guaranteed, and each message is
independent of any other.
Definition: UDP (User Datagram Protocol)
is a protocol that sends independent packets of data, called datagrams, from
one computer to another with no guarantees about arrival. UDP is not
connection-based like TCP.
For many
applications, the guarantee of reliability is critical to the success of the
transfer of information from one end of the connection to the other. However,
other forms of communication don't require such strict standards. In fact, they
may be slowed down by the extra overhead or the reliable connection may
invalidate the service altogether.
Consider, for
example, a clock server that sends the current time to its client when
requested to do so. If the client misses a packet, it doesn't really make sense
to resend it because the time will be incorrect when the client receives it on
the second try. If the client makes two requests and receives packets from the
server out of order, it doesn't really matter because the client can figure out
that the packets are out of order and make another request. The reliability of
TCP is unnecessary in this instance because it causes performance degradation
and may hinder the usefulness of the service.
Another example of
a service that doesn't need the guarantee of a reliable channel is the ping
command. The purpose of the ping command is to test the communication between
two programs over the network. In fact, ping needs to know about dropped or
out-of-order packets to determine how good or bad the connection is. A reliable
channel would invalidate this service altogether.
The UDP protocol
provides for communication that is not guaranteed between two applications on
the network. UDP is not connection-based like TCP. Rather, it sends independent
packets of data from one application to another. Sending datagrams is much like
sending a letter through the mail service: The order of delivery is not
important and is not guaranteed, and each message is independent of any others.
Note: Many firewalls and routers have been configured
not to allow UDP packets. If you're having trouble connecting to a service
outside your firewall, or if clients are having trouble connecting to your
service, ask your system administrator if UDP is permitted.
Understanding Ports
Generally speaking, a
computer has a single physical connection to the network. All data destined for
a particular computer arrives through that connection. However, the data may be
intended for different applications running on the computer. So how does the
computer know to which application to forward the data? Through the use of ports.
Data transmitted
over the Internet is accompanied by addressing information that identifies the
computer and the port for which it is destined. The computer is identified by
its 32-bit IP address, which IP uses to deliver data to the right computer on
the network. Ports are identified by a 16-bit number, which TCP and UDP use to
deliver the data to the right application.
In
connection-based communication such as TCP, a server application binds a socket
to a specific port number. This has the effect of registering the server with
the system to receive all data destined for that port. A client can then
rendezvous with the server at the server's port, as illustrated here:
Definition: The TCP and UDP protocols use ports to map
incoming data to a particular process running on a computer.
In datagram-based
communication such as UDP, the datagram packet contains the port number of its
destination and UDP routes the packet to the appropriate application, as
illustrated in this figure:
Port numbers range from 0 to
65,535 because ports are represented by 16-bit numbers. The port numbers
ranging from 0 - 1023 are restricted; they are reserved for use by well-known
services such as HTTP and FTP and other system services. These ports are called
well-known ports. Your applications should not attempt to bind to
them.
Networking Classes in the JDK
Through the classes in java.net
, Java programs can use TCP or UDP to communicate over
the Internet. The URL
, URLConnection
, Socket
, and ServerSocket
classes all use TCP to communicate over the network. The DatagramPacket
, DatagramSocket
,
and MulticastSocket
classes are for use with UDP.
The Internet Protocol
Suite
The java.net package provides a set of classes that support
network programming using the communication protocols employed by the Internet.
These protocols are known as the Internet protocol suite and include the
Internet Protocol (IP), the Transport Control Protocol (TCP), and
the User Datagram Protocol (UDP) as well as other, less-prominent
supporting protocols. Although this section cannot provide a full description
of the Internet protocols, it gives you the basic information that you need to
get started with Java network programming. In order to take full advantage of
this chapter, you need an Internet connection.
What Is the Internet and How Does It Work?
Asking the
question What is the Internet? may bring about a heated discussion in some
circles. In this book, the Internet is defined as the collection of all
computers that are able to communicate, using the Internet protocol suite, with
the computers and networks registered with the Internet Network
Information Center
(InterNIC). This definition includes all computers to which you can directly
(or indirectly through a firewall) send Internet Protocol packets.
Computers on the
Internet communicate by exchanging packets of data, known as Internet Protocol,
or IP, packets. IP is the network protocol used to send information from one
computer to another over the Internet. All computers on the Internet (by our
definition in this book) communicate using IP. IP moves information contained
in IP packets. The IP packets are routed via special routing algorithms from a
source computer that sends the packets to a destination computer that receives
them. The routing algorithms figure out the best way to send the packets from
source to destination.
In order for IP to
send packets from a source computer to a destination computer, it must have
some way of identifying these computers. All computers on the Internet are
identified using one or more IP addresses. A computer may have more than one IP
address if it has more than one interface to computers that are connected to
the Internet.
IP addresses
are 32-bit numbers. They may be written in decimal, hexadecimal, or other
formats, but the most common format is dotted decimal notation. This format
breaks the 32-bit address up into four bytes and writes each byte of the
address as unsigned decimal integers separated by dots. For example, one of my
IP addresses is 0xccD499C1. Because 0xcc = 204, 0xD4 = 212, 0x99 = 153, and 0xC1 = 193, my address in dotted decimal form is 204.212.153.193.
IP addresses are
not easy to remember, even using dotted decimal notation. The Internet has
adopted a mechanism, referred to as the Domain
Name System (DNS), whereby computer names can be associated with IP
addresses. These computer names are referred to as domain names. The DNS
has several rules that determine how domain names are constructed and how they
relate to one another. For the purposes of this chapter, it is sufficient to
know that domain names are computer names and that they are mapped to IP
addresses.
The mapping of
domain names to IP addresses is maintained by a system of domain name
servers. These servers are able to look up the IP address corresponding to
a domain name. They also provide the capability to look up the domain name
associated with a particular IP address, if one exists.
As I mentioned, IP
enables communication between computers on the Internet by routing data from a
source computer to a destination computer. However, computer-to-computer
communication only solves half of the network communication problem. In order
for an application program, such as a mail program, to communicate with another
application, such as a mail server, there needs to be a way to send data to
specific programs within a computer.
Ports are
used to enable communication between programs. A port is an address
within a computer. Port addresses are 16-bit addresses that are usually
associated with a particular application protocol. An application server, such
as a Web server or an FTP server, listens on a particular port for service
requests, performs whatever service is requested of it, and returns information
to the port used by the application program requesting the service.
Internet services
Servers in
Internet provide various services, accessible by the Internet users. Examples
of such services are WWW, FTP, IRC, E-Mail, etc.
Popular Internet
application protocols are associated with well-known
ports and wel-known Internet services. The server programs implementing
these protocols listen on these ports for service requests. The well-known
ports for some common Internet application protocols are:
Port
|
Protocol
|
Service
description
|
21
|
File Transfer
Protocol (FTP)
|
Transfers files
|
22
|
SSH (Secure Shell
Protocol)
|
Allows secure
remote administration through standard shell (console)
|
25
|
Simple Mail
Transfer Protocol
|
Send email
|
80
|
HyperText Transfer
Protocol (HTTP)
|
Accesse the WWW
(World Wide Web)
|
110
|
Post Office Protocol
|
Receive email
|
The well-known
ports are used to standardize the location of Internet services.
What is WWW
A
technical definition of the World Wide Web is: all the resources and users on
the Internet that are using the Hypertext Transfer Protocol (
HTTP).
A broader definition is:
"The World Wide Web is the
universe of network-accessible information, an embodiment of human
knowledge."
Actually,
World Wide Web is a distributed information system of
Internet servers that
support specially
formatted documents. The
documents are formatted in a markup language called HTML (HyperText Markup
Language) that supports links to other documents, as well as graphics, audio, and
video files. This means
you can jump from one document to another simply by clicking on hot spots. Not all
Internet servers are part of the World Wide Web. World Wide Web is not
synonymous with the
Internet!
Connection-Oriented Versus Connectionless
Communication
Transport
protocols are used to deliver information from one port to another and thereby
enable communication between application programs. They use either a
connection-oriented or connectionless method of communication. TCP is a
connection-oriented protocol and UDP is a connectionless transport protocol.
The TCP
connection-oriented protocol establishes a communication link between a source
port/IP address and a destination port/IP address. The ports are bound together
via this link until the connection is terminated and the link is broken. An
example of a connection-oriented protocol is a telephone conversation. A
telephone connection is established, communication takes place, and then the
connection is terminated.
The reliability of
the communication between the source and destination programs is ensured
through error-detection and error-correction mechanisms that are implemented
within TCP. TCP implements the connection as a stream of bytes from source to
destination. This feature allows the use of the stream I/O classes provided by java.io.
The UDP
connectionless protocol differs from the TCP connection-oriented protocol in
that it does not establish a link for the duration of the connection. An
example of a connectionless protocol is postal mail. To mail something, you
just write down a destination address (and an optional return address) on the
envelope of the item you're sending and drop it in a mailbox. When using UDP,
an application program writes the destination port and IP address on a datagram
and then sends the datagram to its destination. UDP is less reliable than TCP
because there are no delivery-assurance or error-detection and -correction
mechanisms built into the protocol.
Application
protocols such as FTP, SMTP, and HTTP use TCP to provide reliable, stream-based
communication between client and server programs. Other protocols, such as the
Time Protocol, use UDP because speed of delivery is more important than
end-to-end reliability.
The Client/Server Computing Model and the
Internet
The Internet
provides a variety of services that contribute to its appeal. These services include
e-mail, newsgroups, file transfer, remote login, and the Web. Internet services
are organized according to a client/server architecture. Client programs, such
as Web browsers and file transfer programs, create connections to servers, such
as Web and FTP servers. The clients make requests of the server, and the server
responds to the requests by providing the service requested by the client.
The Web provides a
good example of client/server computing. Web browsers are the clients and Web
servers are the servers. Browsers request HTML files from Web servers on your
behalf by establishing a connection with a Web server and submitting file
requests to the server. The server receives the file requests, retrieves the
files, and sends them to the browser over the established connection. The
browser receives the files and displays them to your browser window.
Sockets and Client/Server Communication
Clients and
servers establish connections and communicate via sockets. Connections
are communication links that are created over the Internet using TCP. Some
client/server applications are also built around the connectionless UDP. These
applications also use sockets to communicate.
Sockets are the
endpoints of Internet communication. Clients create client sockets and connect
them to server sockets. Sockets are associated with a host address and a port
address. The host address is the IP address of the host where the client or
server program is located. The port address is the communication port used by
the client or server program. Server programs use the well-known port number
associated with their application protocol.
A client
communicates with a server by establishing a connection to the socket of the
server. The client and server then exchange data over the connection.
Connection-oriented communication is more reliable than connectionless
communication because the underlying TCP provides message-acknowledgment,
error-detection, and error-recovery services.
When a
connectionless protocol is used, the client and server communicate by sending
datagrams to each other's socket. The UDP is used for connectionless protocols.
It does not support reliable communication like TCP.