Computer Network Programming

Computer Network Programming
August 1, 2014
2 c
USQ, August 1, 2014
Table of Contents
I Elementary Sockets 5
1 Introduction to TCP/UDP 9
2 Sockets Introduction 17
3 Elementary TCP Sockets 23
4 TCP Client-Server Example 31
5 Elementary UDP Sockets 41
II Advanced Sockets 49
6 Daemon Processes and inetd Superserver 51
7 Advanced UDP Sockets 59
8 SCTP Sockets and Programming 69
3
4 Table of Contents
c USQ, August 1, 2014
Part I
Elementary Sockets
5

7
This strand covers the basic sockets functions for TCP and UDP network programming. The first module serves as the introduction to the entire course. It covers
various important concepts of networking with TCP/UDP from the application developers’ view. The remaining modules cover the elementary functions on sockets
used to write basic network applications.
c USQ, August 1, 2014
8 c
USQ, August 1, 2014
Module 1
Introduction to TCP/UDP
This module is an introduction to the TCP/UDP transport layer protocols, serving
as the preliminary of the entire course. It basically does two things:
review some of the basic concepts of computer networking; and
introduce the basic concepts of two different transport layer protocols: TCP
and UDP.
It also contains information about the software of network programs used in the
textbook as well as the network environment in which there programs were running.
This module serves as an introduction to the entire course.
Module contents
1.1 Layered Structure of Network Protocols . . . . . . . . . . . . . . . 10
1.2 TCP and UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 TCP Connection Establishment and Termination . . . . . . . . . . 14
1.4 Ports, Sockets, and Concurrent Servers . . . . . . . . . . . . . . . 14
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The text for this module is Chapters 1 and 2 of Stevens’ UNIX Network Programming (3rd Edition), Vol. 1, referred to as UNP3ev1.
to understand the layered structure of network protocols;
to understand the basic concepts of two different transport layer protocols:
TCP and UDP;
9

10 Module 1. Introduction to TCP/UDP
to understand how TCP connection is established and terminated and the
TCP state transition diagram; and
to understand the basic concepts of port, socket pair and concurrent server
associated with TCP.
1.1 Layered Structure of Network Protocols
Most network applications use the client-server programming model, in which an
application is basically divided into two parts: client and server. The client and the
server use the network support from the operating systems of the hosts to exchange
information between them. This course is about how to develop such applications.
Obviously, before we can talk about the software development methodology for
such network applications, we need to know the kinds of support and functions
Network protocols are provided by the computer networks between the hosts of the client and the server.
organised into many layers.
The network support from operating systems is provided through many layers of
network software in the operating systems in the form of communication protocols.
The most popular layered structure of network software is called
TCP/IP protocols
which consist of four layers as follows:
Application Layer: This is the top layer where client-server network applications are developed using application-specific protocol.
Transport Layer: This is the layer where two transport protocols, TCP and
UDP, used by the applications are implemented. The interface between this
layer and the application layer is one used by network programmers to develop network applications (also known as Application programming Interface – API) and, therefore, is the subject of the first part of this course. The
services provided by this layer are the communication channels between the
client and the server
processes.
Network Layers: This layer implements the IP protocol to transmit and route
IP packets. The transport layer uses the IP protocol to implement the transport layer protocols (TCP and UDP). The services provided by this layer are
the communication channels between the
hosts involved.
Datalink Layer: This is the layer where local network transmission and routing such as Ethernet protocol or PPP (Point-to-Point Protocol) are implemented. It concerns mainly with the data transmission between the machines
physically connected.
c USQ, August 1, 2014
1.2. TCP and UDP 11
The design and implementation of the whole stack of layered network protocols is
the subject of the computer science course normally known as
Computer Networks.
We will not discuss them in detail in this course.
Chapter 1 of the text gives a brief review of this layered architecture of network
protocols. It also provide a simple example of network application using the API
of TCP.
Reading 1.A: Read §1.1-5 [UNP3ev1, 1-15] and §1.7-8
[UNP3ev1, 18-20] for the review of layered network protocols and
the simple network daytime application.
Chapter 1 of the text also gives an introduction to the software of network programs
throughout the book and the test environment in which these programs run.
Reading 1.B: Read §1.6 [UNP3ev1, 16-7]and §1.9-11
[UNP3ev1, 20-7] for the road map the network programs in
the text and the test environment.
1.2 TCP and UDP
TCP and UDP are two different transport layer protocols for data communication

between processes. TCP and UDP are two
popular data transmission

protocols; while SCTP is new
and more efficient data
transmission protocol.
UDP is a connection-less and unreliable datagram protocol while TCP is a connectionoriented, reliable byte-stream protocol. As a consequence, the implementation of
TCP is far more complicated than that of UDP and it includes:
flow control for byte-stream transmission after the connection is established;
and
reliability through time-out and retransmission.
However, both are end-to-end protocols which transmit data from one process on a
host to the peer process on another host. Both TCP and UDP use
port, an abstract
locator, to locate the end processes.
The UDP of the receiving host is simply a demultiplexer to sort out incoming UDP
packets according to the port numbers of the packets. Each UDP port maintains
a queue to store the packets destined for the port. The receiving process take the
packets from this queue thought a socket bound to the corresponding port.
c USQ, August 1, 2014
12 Module 1. Introduction to TCP/UDP
Reading 1.C: Read §2.3 [UNP3ev1, 32] for a brief description
of UDP.
TCP is implemented with two buffers in both sender and receive sides. The sending process writes data to the sending buffer first, and the TCP then transmits data
in
segments to the receiver TCP which put them in the receiving buffer. The transmission of segments is done by the IP layer and is unreliable. The segments may
be out of order when they arrive.
The heart of TCP protocol is the sliding window flow control to ensure the reliable
and ordered byte-stream between the two ends. Each byte in the stream has a
sequence number. A segment contains the sequence number of the first byte in the
segment. The receiving TCP acknowledges the segment by sending an ACK packet
with the sequence number for the data received. It also sends another numbered
called
AdvertisedWindow to tell the sender how much further data it can receive
at a give time. Figure
1.1 shows the sliding window mechanism for implementing
LastByteSent
Sending Application
Write
Bytes
Read
Bytes
LastByteRead
NextByteExpected
LastByteRcvd
TCP TCP
Receiving Application
LastByteAcked LastByteWritten
Figure 1.1: Sliding Window for TCP Flow Control
reliable and ordered byte-stream in TCP. The TCP of the sender maintains three
pointers to the sending buffer:
LastByteAcked points to the last byte acknowledged by the receiver.
LastByteSent points to the last byte sent to the received. The bytes between
LastByteAcked and LastByteSent are those which are sent but not acknowledged yet.
LastByteWritten points to the last byte written by the application in the sending buffer.
c USQ, August 1, 2014
1.2. TCP and UDP 13
We should have
LastByteAcked LastByteSent and LastByteSent LastByteWritten.
At the receiver site, the TCP maintains another three pointers:
LastByteRead points to the last byte read by the application from the buffer.
LastByteRcvd points the last byte received by the receiver.
NextByteExpected point to the position of the next byte expected from the
sender. It is the largest position before which all the bytes have been received
(i.e. there are no gaps between
LastByteRead and NextByteExpected). We
should have
LastByteRead NextByteExpected – 1
Note that
LastByteRcvd is not guaranteed to be equal to NextByteExpected because
the segments may arrive at the receiver out of order. Actually,
NextByteExpected
points the first byte of the first gap expected. Therefore we should have NextByteExpected LastByteRcvd + 1.
The flow control between the sender and receiver works as follows. Assuming
the size of the receiver buffer is
MaxRcvBuffer, the size of the window to receive
further data are
MaxRcvBuffer – (LastByteRcvd LastByteRead) because all the
bytes from
LastByteRead up to LastByteRcvd need to be preserved before they are
read by the application. This is the number sent back to the sender as the so-called
AdvertisedWindow.
At the sender site, the bytes between
LastByteAcked and LastByteSent are those
that have been sent but not acknowledged yet. (
LastByteSent LastByteAcked) is
the number of bytes that can be sent to the receiver without the acknowledgement
received. This number must be less than
AdvertisedWindow received from the
receiver; otherwise some of acknowledged bytes in the receiver buffer (between
LastByteRead and NextByteExpected) may be overwritten before they are read by
the receiving application. At any time,
AdvertisedWindow – (LastByteSent LastByteAcked) is the number of bytes the sender can send to the receiver and it is
called
EffectiveWindow.
On the other hand, the TCP of the sender should also prevent the application from
writing too many data to the sending buffer. Assuming the size of the sending
buffer is
MaxSendBuffer, the TCP must maintain LastByteWritten LastByteAcked
MaxSendBuffer; otherwise some of the un-acknowledged bytes in the buffer will
be overwritten and lost. In other words, if the sending application tries to write
x
bytes, but LastByteWritten LastByteAcked + x > MaxSendBuffer, it should be
blocked.
Reading 1.D: Read §2.4 [UNP3ev1, 32-33] and §2.9 [UNP3ev1,
46-50] for a brief description of TCP and other aspects of the sending buffer of TCP.
c USQ, August 1, 2014
14 Module 1. Introduction to TCP/UDP
1.3 TCP Connection Establishment and Termination

TCP connection is sort of
virtual link
TCP connection has to be established before data can be transmitted between the
end processes. TCP connection establishment and termination are network oper
ations in the operating system kernel to set up the relevant sockets, sendng and

receiving buffers, etc. They are accomplished by exchanging control packets such
TCP connection estalishing is as ACK and SYN.
a process of exchanging a lot
of information between the
end processes such as the IP
addresses and the size of
buffers.
Section 2.6 of the textbook provide the details of TCP connection establishment
and termination. These processes are complicated and best described by the state
transition diagrams(STD). State transition diagram is one way to define a finite
state machine, the most common abstract model to describe processes with internal
states. We will give a formal model of finite state machine in Module
??. Almost
all network protocols and applications can be defined using the finite statement
machine model. In short, the state transitions of a finite state machine are triggered
by external events. In the context of network protocols and applications, events are
mostly receiving a message or request from the network or receiving a system call
from the application. Events trigger state transitions during which some actions
may be taken. Actions can be sending some messages or request over the network
or sending some signals to the applications.
The state transition diagrams for the TCP connection establishment as well as termination are shown in Figure 2.4 of the textbook [UNP3ev1, 41].
Reading 1.E: Read §2.6-7 [UNP3ev1, 36-44] for the details of
three-way handshake for connection establishment
four-way handshake for connection termination
state transition diagram of connection establishment and termination
the meanings and roles of various states including the
TIME-WAIT state.
1.4 Ports, Sockets, and Concurrent Servers
Both TCP and UDP use ports as the mechanism to identify end processes involved.
Each process attaches to a port number and the TCP or UDP of the operating system kernel uses this port number to communication with the processes. Therefore,
an end process in the network applications is identified as a pair of (
host, port),
where
host is the IP address of the host and port the port number used by the process for the application. The pair of (host, port) is called a socket.
c USQ, August 1, 2014
1.5. Summary 15
In the case of UDP, the socket of (
host, port) is used as the keys to demultiplex
UDP datagrams in the Internet. It is like the address of a mail delivered through
the post office system.
In the case of TCP, a TCP connection is fully defined by two sockets, or a socket
pair, for the client and the server processes which use the connection.
The port number used by the client process can be arbitrarily chosen by the operating system. The server will be notified of this port number when the connection
is established. The client process does not need to know it. However, the client
process needs to know the port number used by the server process before they can
send UDP datagrams to it or make a TCP connection with it. Most well-known
These well-known port
number can be seen at
/etc/services.
Internet applications use certain fixed well-known port numbers for the servers. Of
course, if you develop a server for a new application (service), you need let the
clients know the port number used by the server.
This mechanism of using ports (and port numbers) for connecting with end processes allows
concurrent servers (normally for TCP) to be built easily. A concurrent server forks a separate process or thread upon each TCP connection request
for the actual data communication. Although the (
host, port)) pair at the server
remains the same, the (
host, port)) pairs for the client processes will be different. Therefore, the different TCP connections will be established with the different
clients. Recall that a TCP connection is fully defined by two (
host, port)) pairs for
the client and server.
Reading 1.F: Read §2.7-8 [UNP3ev1, 41-6] for the concepts of
port and port number
socket and socket pair
concurrent TCP servers
1.5 Summary
We have described the major concepts of TCP and UDP networking from the application developers’ view as the introduction to the entire course. The understanding
of these concepts is obviously very important to the study of this course.
c USQ, August 1, 2014
16 Module 1. Introduction to TCP/UDP
c USQ, August 1, 2014
Module 2
Sockets Introduction
This module covers various socket address structures and associated auxiliary functions.
Almost all socket API functions need to deal with socket addresses either in argu-
API functions are Application
Programming Interface
functions for implementing
specific applications.
ments or function returns.
This module also includes a couple of useful functions to covert IP addresses between their different forms.
The knowledge in this chapter paves the way to subsequent chapters about how to
use these socket API functions.
Module contents
2.1 Internet Socket Address Structure sockaddr in . . . . . . . . . . 18
2.1.1 sockaddr in . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Other Sockets Address Structures . . . . . . . . . . . . . . . 18
2.1.3 Generic Socket Address Structure . . . . . . . . . . . . . . . 19
2.2 Byte Ordering Functions . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Byte Manipulation Functions . . . . . . . . . . . . . . . . . . . . . 20
2.4 IP Address Conversion Functions . . . . . . . . . . . . . . . . . . 20
2.5 read() and write() Functions for TCP . . . . . . . . . . . . . . . 21
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
The text for this module is Chapter 3 of Stevens’ UNIX Network Programming
(3nd Edition)
, Vol. 1, referred to as UNP3ev1.
17

18 Module 2. Sockets Introduction
Main Aims:
understand the Internet socket address structure sockaddr in and its use;
understand different byte orders and the functions for their conversion;
understand various byte manipulation functions;
understand various IP address conversion functions; and
understand the read and write system calls for TCP sockets.
2.1 Internet Socket Address Structure sockaddr in
2.1.1 sockaddr in
Internet socket address structure sockaddr in is the data structure used to define
sockets in network applications.
The concept of socket is very simple: it is simply a pair of the IP address of the
host and the port number used for the application, (
host, port). Therefore, the major
components of
sockaddr in are an IP address of the host and a port number used
by the end process (the client or the server). It also defines the type of the socket
for a particular intended use.
sockaddr in is 4 bytes long The IP address in sockaddr in is defined as type struct in addr which is simply a wrapper for a 32-bit IPv4 address for historical reasons.
Other data types of the field in
sockaddr in are Posix.1g-defined data types such
int8 t and in port t are 1 and as int8 t and in port t. They are shown in Figure 3.2 of the text.
2 bytes long, respectively
Structure sockaddr in has the fixed size of 16 bytes.
2.1.2 Other Sockets Address Structures
To make network applications run on different domains (for instance, Internet
domain and UNIX domain via interprocess communication ), network systems
calls should accept different type of sockets addresses. In fact, in addition to
sockaddr in which is for IPv4 family (type AF INET), we have other socket structures sockaddr in6, sockaddr un and sockaddr dl for IPv6, UNIX domain, and
datalink domain, respectively.
c USQ, August 1, 2014
2.2. Byte Ordering Functions 19
2.1.3 Generic Socket Address Structure
Since the networks software for TCP and UDP protocols predate the ANSI C programming language, which allows generic pointer type void *, all network system
calls use a generic socket address structure called
sockaddr as the type of sockets
in the interfaces. The application programmer need to cast this type to whatever
socket address type used in the application.
Reading 2.A: Read §3.1-3 [UNP3ev1, 67-77] for the details of
the various socket address structures and the way they are used.
2.2 Byte Ordering Functions
There are two byte ordering for any data types which occupy multiple (K) bytes in
computer memory:
little-ending: the address a of the data type is the address of the least significant byte and the address of the most significant byte is a+K 1
big-ending: the address a of the data type is the address of the most significant byte and the address of the least significant byte is a+K 1
When the multiple-byte data are transmitted in the network, they are ordered according to big-ending. This is called
network byte order. In particular, port numbers (2 bytes) and IP addresses (4 bytes) in protocol headers such as TCP and UDP
headers are transmitted in big-ending order. These numbers are certainly taken
from the socket address structures in which IP addresses and port numbers may be
stored in different orders for different machines.
To avoid confusion and errors, both history and Posix.1g standard require the IP
address and port number in sockets address structures to be stored in the network
byte order regardless of the machine byte order.
To store IP addresses and port numbers in the network byte order, we need to
covert them before storing them. (We also need to covert it back the machine byte
ordering after read them from the structure).
Two functions,
htons() and htonl(), convert the 16-bit and 32-bit values, respectively, from the machine byte order to the network byte order. Another two
functions,
ntohs() and ntohl(), do the inverse.
Reading 2.B: Read §3.5 [UNP3ev1, 77-80] for the concept of
machine byte and network byte orderings and the conversion functions between them:
htons(), htonl(), ntohs() and ntohl()
c USQ, August 1, 2014
20 Module 2. Sockets Introduction
2.3 Byte Manipulation Functions
Three functions, bzero(), bcopy(), and bcmp(), are used for preparing socket
address structures. The are originated from BSD UNIX.
Another three ANSI C functions,
memset(), memcpy() and memcmp(), can be used
for the same purpose, but they are used less frequently.
Reading 2.C: Read §3.5 [UNP3ev1, 80-81] for the details
of byte manipulation functions
bzero(), bcopy(), bcmp(),
memset(), memcpy() and memcmp().
2.4 IP Address Conversion Functions
There are two forms of IP addresses:
dotted-decimal character string such as 139.86.26.189 (IPv4); and
binary value in network byte order to be stored in struct in addr in socket
address structures.
The dotted-decimal form is more readable.
There are two groups of functions to covert IP addresses between these two forms.
The first group of functions include
inet aton(), inet ntoa() and deprecated
inet addr(). They make conversions between the character string pointed by a
variable of type
char * and the binary IP address in a variable of type struct
in addr
.
The second group of functions includes
inet pton() and inet ntop(). These
functions work for both IPv4 and IPv6. Moreover, they do not use
struct in addr
in their arguments directly. Instead, they use the pointer to IPv4 or IPv6 socket address structures in the arguments. The functions will get into the socket address
structure to store or read the binary IP address value in it.
Reading 2.D: Read §3.6-7 [UNP3ev1, 82-5] for
the details of these two group of functions
a simple implementation of inet pton() and inet ntop()
c USQ, August 1, 2014
2.5. read() and write() Functions for TCP 21
Another group of functions provided in the package of this textbook operate on
socket address structures to manipulate the IP address or port fields inside. They
are:
sock ntop(),
sock bind wild(),
sock cmp addr(),
sock cmp port(),
sock get port(),
sock ntop host(),
sock set addr(),
sock set port() and
sock set wild().
Reading 2.E: Read §3.8 [UNP3ev1, 86-8] for details of these
functions.
2.5 read() and write() Functions for TCP
read() or write() functions on a TCP connections have a subtle difference from
those on ordinary files: they might read or write fewer bytes than requested. This
means that to finish a read or write, the program has to use a
while loop to ensure
all the requested bytes to be read or written.
The reason for this difference is obviously the flow control of the byte-stream connection between the two ends discussed in Module
1.
To finish the read and write of the number of bytes requested, you need to put
read() and write() in a while loop until the entire task is completed. These
actions are coded in two functions
readn() and writen() provided by the author
of the textbook.
Another function
readline() which reads a text line from a TCP socket is also
provided in the text.
Reading 2.F: Read §3.9 [UNP3ev1, 88-92] for
the source codes of functions readn(), writen() and
readline()
how to improve the efficiency of readline() with buffering
c USQ, August 1, 2014
22 Module 2. Sockets Introduction
2.6 Summary
In this module, we introduced various socket address structures including the most
commonly used
sockaddr in. We also introduced many C functions which are
useful to prapare and manipulate socket address structures. Finally, we pointed
out an important difference of system calls (
read() and write()) between TCP
sockets and ordinary file descriptors.
c USQ, August 1, 2014
Module 3
Elementary TCP Sockets
This module covers all the system calls used to establish and terminate a TCP Programming with Transport
Layer Interface is the most
interesting part of network
application programming.
connection between the client and the server.
The system calls used by the client are
socket(), connect() and close().
The system calls used by the server are
socket(), bind(), listen(), accept()
and close().
When introducing these system calls, the module also addresses several important
concepts and issues in TCP connection establishment termination such as connection errors, wildcard IP address, backlog queues, listening and connected sockets
and concurrent severs.
Module contents
3.1 Overview of System Call Sequences of Client and Server . . . . . 24
3.2 socket() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 connect() Function . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 bind() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 listen() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 accept() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 fork() and exec() Functions . . . . . . . . . . . . . . . . . . . . 28
3.8 Concurrent Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.9 close() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 Functions for Querying Socket Addresses . . . . . . . . . . . . . . 29
3.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
The text for this module is Chapter 4 of Stevens’ UNIX Network Programming (3rd
Edition)
, Vol. 1, referred to as UNP3ev1.
23

24 Module 3. Elementary TCP Sockets
Main Aims:
understand the normal system call sequences for the client and the server to
establish and terminate TCP connections;
understand all the functions used by the client and the server to establish and
terminate TCP connections; and
understand the concept of concurrent server and the way concurrent servers
are constructed.
3.1 Overview of System Call Sequences of Client and Server
A network application usually In network applications, the client and the server play different roles. A TCP server
consists of a server and a
client.
normally runs as background daemon process on the server host waiting for connection requests from clients. A client on another host initiates the application by
sending a connection request. Therefore, the sequences of system calls used by the
server and client are different.
The normal system call sequence for the client is as follows:
1. call socket() to create a socket
2. call connect() to make a connection with the server
3. use the connection (through the connected socket) for data transmission with
read() and write() functions.
4. call close() to initiate the termination of the connection.
The normal system call sequence for the server is as follows:
1. call socket() to create a listening socket
2. call bind() to bind the listening socket to its port number
3. call listen() to turn the socket into the listen state and specify the listen
queue length
4. call accept() to wait for connection requests and returns a connected socket
after finishing the three-way handshake for the connection establishment.
5. use the connection (through the connected socket) for data transmission with
read() and write() functions.
6. call close() when receiving an EOF from read() function to complete the
four-way handshake for the connection termination.
These two system call sequences are illustrated in Figure 4.1 of the text [UNP3ev1,
96].
c USQ, August 1, 2014
3.2. socket() Function 25
3.2 socket() Function
This function is called by both clients and servers to create a socket for an intended
type of communication protocol. Its interface is
int socket(int family, int type, int protocol).
The combination of
family and type determines the type of communication protocol
as shown in Figure 4.5 of the text. The return integer value is a
socket descriptor
or a sockfd analogous to a file descriptor.
Reading 3.A: Read §4.2 [UNP3ev1, 95-9] for all the details of
function
socket().
3.3 connect() Function
The socket returned from function socket() is not bound to any IP address and
port number yet, let alone a TCP connection. In order to make a TCP connection
to a remote socket, the client must call function
connect() with this local socket
just returned from function
socket().
The interface of function
connect() is
int connect(int sockfd, const struct sockaddr *servaddr,
socklen t
addrlen).
The local socket is passed as the first argument
sockfd, which is just returned from
function
socket(). The IP address and port number for this local socket are determined by the operating system kernel during connect(). The client does not need
to know these details.
The remote socket is specified by a socket address structure in the second argument
servaddr which must contain the IP address and port number of the remote socket.
This implies that the remote socket of the server must be bound to the same port
number by using function
bind() to be discussed in the next section.
Function
connect() goes through the three-way handshake to establish a TCP
connection. On success, a TCP connection is established between the local and
remote sockets and the socket
sockfd is the local end of the connection. The client
then uses this socket for
read() and write().
There are three kinds of errors this function may return:
ETIMEOUT
ECONNREFUSED
and EHOSTUNREACH or ENETUNREACH.
Reading 3.B: Read §4.3 [UNP3ev1, 99-101] for all the details
of function
connect().
c USQ, August 1, 2014
26 Module 3. Elementary TCP Sockets
3.4 bind() Function
This function is called by a TCP server to bind its local socket with a socket address
structure. This socket address structure must contain at least the port number for
the service of the server. The clients use the same socket address (IP address plus
port number) in their
connect() function calls to make TCP connections with the
server.
The interface of this function is
int bind(int sockfd, const struct sockaddr *myaddr,
socklen_t addrlen).
This first argument sockfd is the local socket of the server just returned from function socket(). The second argument is the socket address of the service of the
server. The IP address of this socket address is normally
INADDR ANY (wildcard)
because the host may have multiple IP addresses and we want the server to be able
to use any of theses IP addresses. The port number can be 0 or a chosen number.
Figure 4.6 of the text shows the way the actual IP address and port number are
chosen for the socket according to the IP address and port number provided in the
socket address.
The wildcard IP address,
INADDR ANY, is used to let the kernel to decide the IP
address of this socket according to the client’s request when a TCP connection is
established later on.
Reading 3.C: Read §4.4 [UNP3ev1, 101-3] for all the details of
function
bind().
3.5 listen() Function
The purpose of this function is (1) to turn an unconnected socket (the one after
bind() call) to a listening socket or passive socket and (2) to tell the kernel the
size of backlog queues.
A listening socket is the one that is ready to accept connection requests from
clients. According to the TCP state transition diagram, a listening socket is a socket
in the LISTEN state and function
listen() makes the transition from CLOSED
to LISTEN.
The interface of this function is
int listen(int sockfd, int backlog).
c USQ, August 1, 2014
3.6. accept() Function 27
The first argument
sockfd is the unconnected socket (input) turning into a listening socket (output). For each listening socket, the kernel maintains two queues
whose total size is specified by the second integer argument
backlog in the listen
function call:
incomplete queue: connection requests waiting for the completion of threeway handshake
complete queue: connection requests having completed the 3-way handshake and waiting for the accept() call
The second argument
backlog is the maximum value of the sum of the lengths of
these two queues.
Reading 3.D: Read §4.5 [UNP3ev1, 104-109] for the details of
concepts of listening socket, complete queue and incomplete queue
function listen()
3.6 accept() Function
This function is called by the server after the function call of listen() to accept
TCP connection requests from any client. If no clients make connection request, it
blocks until the first connection request arrives. When a connection request arrives,
the kernel of the server communicates with the kernel of the client (which calls
connect()) to complete the three-way handshake of connection establishment.
The interface of this function is
int accept(int sockfd, struct sockaddr *cliaddr,
socklen_t *addrlen).
The first argument sockfd is the socket descriptor of the listening socket. The return
integer is the socket descriptor of the
connected socket to be used by server for data
communication with
read() and write(). The first argument sockfd remains the
listening socket which the server can use to call
accept() again. Note that the
listening socket and the connected socket are two different sockets with two socket
descriptors. This enables the server to spawn a child process to communicates with
the client using the connected socket, while it continues to call
accept() using the
listening socket. This kind of TCP servers are called concurrent servers.
The second argument
cliaddr points to a socket address structure which will contain the socket address of the connected client after the function is completed.
c USQ, August 1, 2014
28 Module 3. Elementary TCP Sockets
Reading 3.E: Read §4.6 [UNP3ev1, 109-111] for the details of
the concepts of the listening and connected sockets
function accept()
the code of the daytime server example
3.7 fork() and exec() Functions
Concurrent TCP servers need to create separate child processes to serve the clients.
In UNIX operating systems, function
fork() is the only way to create a new proA process can clown many cess. The newly created process (called child process) duplicates the address space
processes, in which their PID
will be different.
of the calling process (called parent) and inherits all of its the open resources (open
file, sockets, etc) from the calling process.
To run a different program, the child process can then call
exec functions (there
are six of them) to replace its address space with a new address space by loading
the binary executable of a program.
However, the child and parent processes can share the same program (but not the
same address space) as long as they execute the different parts which are normally
separated by an
if statement using the return value of fork().
Functions
fork() and various exec functions are very important for programming
concurrent servers.
Reading 3.F: Read §4.7 [UNP3ev1, 111-114] for the details of
concepts of parent and child processes
function fork()
six exec functions: execlp(), execl(), execle(),
execvp(), execv() and execve()
3.8 Concurrent Server
If a server can serve only one client at a time, it is called a iterative server. If the
service takes a long time to complete, it would deny a lot of clients which would
be otherwise served if we allow the server to create a lot of child process to serve
them concurrently. This is the idea of behind of the concept of concurrent server.
c USQ, August 1, 2014
3.9. close() Function 29
Recall that function
accept() returns a separate socket for the connected socket
and keeps the original listening socket. This arrangement is designed exactly to
support the concurrent servers.
After a connection with a client is established through the
accept() function call,
the server spawns a separate child process which inherits all the resources including the connected socket. The child process continues to serve client using this
connected socket, while the parent process loops back to call
accept() again to
wait for the next connection request using the listening socket.
Reading 3.G: Read §4.8 [UNP3ev1, 114-6] for the details of
design and implementation of concurrent TCP servers.
3.9 close() Function
Function close() is a system call used to close a open file. It can also be used to
close a TCP connection if its argument is a socket descriptor of the local socket of
the connection.
The interface of this function is
int close(int sockfd).
In order to complete the four-way handshake to terminate a TCP connection, both
the client and the server need to call function
close(). In normal cases, the client
starts the process by calling
close() first. The server is normally waiting on a
read() when the client calls close(). This read() returns 0 after the server
kernel receives the FIN from the kernel of the client. The server then calls
close()
to complete the four-way handshake.
Reading 3.H: Read §4.9 [UNP3ev1, 117] for the details of function close() to terminate TCP connections.
3.10 Functions for Querying Socket Addresses
There are many situations where the client or the server does not know the complete
socket address (IP address plus port number) of its local socket. For example, when
client calls
connect() after it calls socket(), it simply lets the kernel decide the
IP address and port number for the socket. Although the server calls
bind() with
c USQ, August 1, 2014
30 Module 3. Elementary TCP Sockets
a socket address, it can use the wild card IP address and port number 0 and let the
kernel to select the actual IP address and port number.
Function
getsocketname() can be used in these situations to get the socket address of the local socket after the connection is established.
The interface of
getsocketname() is
int getsockname(int sockfd, struct sockaddr *localaddr,
socklen_t *addrlen)
Through the second argument localaddr returned, we can find the IP address, port
number and even the family type of this local socket.
Another useful function is
getpeername() used to get the socket address of the
remote socket of the connection.
The interface of
getpeername() is
int getpeername(int sockfd, struct sockaddr *peeraddr,
socklen_t *addrlen)
The first argument is the local socket and the second argument returns the socket
address of the remote host.
Reading 3.I: Read §4.10 [UNP3ev1, 117-120] for the details of
functions getsocketname() and getpeername()
the reasons that these functions are useful
3.11 Summary
We have covered all the system calls used by clients and servers to establish and
terminate TCP connections. The different sequences of system calls used by the
client and server are discussed. We also covered the design and implementation of
concurrent TCP servers.
c USQ, August 1, 2014
Module 4
TCP Client-Server Example
This module introduces a simple and complete TCP client-server example to discuss both normal and abnormal operations of TCP connection establishment and A simple, but complete
network application that
means it has all the
procedures a nontrival
network application has.
termination as well as data transmission.
In addition to the normal and abnormal TCP operations, this module also introduces the UNIX concept and technique of signal handling and how to avoid generating zombie child processes at the server site.
Module contents
4.1 A Simple TCP Client-Server Example . . . . . . . . . . . . . . . . 32
4.2 Normal Startup and Termination . . . . . . . . . . . . . . . . . . 32
4.3 Avoiding Zombie Child Server Processes . . . . . . . . . . . . . . 33
4.4 Abnormal Termination . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Abnormal Data Transmission . . . . . . . . . . . . . . . . . . . . . 39
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
The text for this module is Chapter 5 of Stevens’ UNIX Network Programming (3rd
Edition)
, Vol. 1, referred to as UNP3ev1.
Main Aims:
understand the source code of a simple TCP client-server application;
understand normal startup and termination of TCP connections and how to
examine the established TCP connections;
understand the concepts and techniques of signal handling in UNIX;
31

32 Module 4. TCP Client-Server Example
understand how to avoid generating zombie child processes at the server site;
and
understand various abnormal terminations of TCP connections and the techniques to make TCP applications more robust.
4.1 A Simple TCP Client-Server Example
The source code of a simple and complete TCP client-server application is provided
in this module.
The server code consists of
1. the main program in tcpcliserv/tcpserv01.c
2. the str echo() function in lib/str echo.c
3. the readline() function in lib/readline.c
4. the writen() in lib/writen.c
The client code consists of
1. the main program tcpcliserv/tcpcli01.c
2. the str cli() function in lib/str cli.c function
3. the readline() function in lib/readline.c
4. the writen() in lib/writen.c
Reading 4.A: Read §5.1-5 [UNP3ev1, 121-5] for the details of
the source code of the example. Make sure that you understand
every line of the code.
4.2 Normal Startup and Termination
The procedure of normal startup is described in §5.6. You need to follow this
procedure and experiment with it. It is important to use
netstat -a to verify
connections and help understand how TCP connections are established.
The normal termination of the TCP connection of the example starts with typing
You can use netstat -a control-D on the input to the client. Again, it is essential to follow this process
and ps to verify the change
after the connection is
terminated.
step by step until the connection is terminated. The details are covered in §5.7.
However, when reading it, you need to trace the source code and make sure that
you understand every step.
c USQ, August 1, 2014
4.3. Avoiding Zombie Child Server Processes 33
Reading 4.B: Read §5.6-7 [UNP3ev1, 125-8] for the details of
the normal processes of TCP startup and termination.
Activity 4.C: Finish the following experiments using two
LINUX machines: one is your own home LINUX machine and
the other the LINUX machine from the department on which you
have an account. Run the client on the departmental machine and
use your own machine to run the server.
1. Download the source code of the programs of the
textbook from the departmental CDROM set you
have purchased or from the web site of this course
www.sci.usq.edu.au/courses/CSC8415/.
2. Manage to compile both the client and server code of this
example.
3. Manage to run the server on one machine and the client on
the other.
4. Test the client and the server to see if they are working correctly.
5. Use netstat -a and ps to verify the changes in TCP connection status on both machines before and after the connection is established and terminated.
4.3 Avoiding Zombie Child Server Processes
A child process forked by a concurrent server will become a zombie process if the
concurrent server does not execute the
wait() or waitpid() functions when it
terminates. The purpose of changing the child process to a zombie process is to
keep its exit status in the kernel so that its parent can get this information when
executing
wait() or waitpid() later on. Zombie processes take resources of the
kernel. The echo server of this example,
tcpcliserv/tcpserv01.c (Figure 5.2
of the text), can create a lot of zombie processes, because it never executes
wait()
or waitpid() functions for its children. The problem is that this concurrent server
is constantly listening on the socket when executing function
accept.
One technique to solve this problem is to use signal handling. In particular, when
a child process terminates, it always sends a
SIGCHLD signal to its parent process.
§5.8 of the text introduces the concept and technique of signal handling in UNIX.
c USQ, August 1, 2014
34 Module 4. TCP Client-Server Example
Signals are sometimes called software interrupts. Processes can send signals to
each others. Signals can be processed by the receiving process in either of the following three ways: (1) it can be ignored, (2) it can be handled by the default
signal
handler
or (3) it can be handled by a user-defined signal handler. Signal handlers
are simply C functions. A signal is
handled by the signal handler by simply executing the signal handler no matter where the control of the process is. It is analogous
to hardware interrupt: the normal execution of the process is interrupted and the
signal handler (C function) is executed. After the signal handler is executed, the
control goes back to where it was and resumes the normal execution. The normal
execution of the process can be
interrupted anywhere by a signal. If it happens in
the middle of a system call like
accept(), the system call usually returns with the
EINTR error code and the process has to process this error explicitly.
The user-defined signal handler can be set up by using function
signal(). §5.8
of the text shows an implementation of the
signal() function in lib/signal.c
(Figure 5.6 of the text) using the POSIX sigaction() function. The purpose of
the
signal() function is to set up the signal handler for a signal.
The original complicated interface of function
signal() is simplified to
Sigfunc *signal(int signo, Sigfunc *func), where type Sigfunc is defined as the function with one integer argument returning void:
typedef void Sigfunc(int) .
Reading 4.D: Read §5.8 [UNP3ev1, 129-32] for the details
1. the concept and technique of signal handling
2. function signal() and its implementation
3. POSIX signal semantics
To avoid generating zombie child processes in a concurrent server, the parent process must catch and handle the SIGCHLD signal with the wait() or waitpid()
function call. §5.9 of the text shows how the server is set up to handle signal
SIGCHLD to prevent zombie child processes.
Reading 4.E: Read §5.9 [UNP3ev1, 132-35] for the details of
the techniques for a concurrent server to
1. handle SIGCHLD signal to prevent zombie child processes
2. handle interrupted system calls
3. the source code of the server of
tcpcliserv/tcpserv03.c.
c USQ, August 1, 2014
4.3. Avoiding Zombie Child Server Processes 35
Using
wait() to catch and handle SIGCHLD is not ideal. §5.10 of the text shows
how to use
waitpid() instead of wait() to handle the situation when many child
processes raise signals SIGCHLD at about the same time. The situation like this
may occur if a client establishes multiple TCP connections of the concurrent server,
each of which is handled by a child process of the server. When the client exits and
closes the multiple local sockets for these connections, the corresponding child
processes of the server would terminate at about the same time. Because signals
are not queued in UNIX, some of the
SIGCHLD signal may not be caught by the
server if they are delivered when the signal handler is being executed. We could
use a
for loop in the signal handler to call wait() the same number of times as
the number of connections. However, this signal handler will not work for general
situation where child processes do not terminates for the same time, because the
signal hander will block at
wait() if any child process has not terminated yet. The
solution is to use function
waitpid() in a while loop shown in the signal hander
as follows:
void
sig_chld(int signo)
{

pid_t
int
pid; stat;

while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0)
printf(“child %d terminatedn”, pid);
return;
}
The first argument -1 means to accept the SIGCHLD from any process. It returns
the process identity (positive integer) of the process whose
SIGCHLD is accepted.
WNOHANG in the third argument is to avoid being blocked if any of the child processes has not terminated yet. In this case, the waitpid() returns immediately and
the return value is 0. This allows the while loop to exit. In other words, this signal
handler can handle a various number of
SIGCHLD signals without being blocked if
any of the child process has not terminated yet.
Reading 4.F: Read §5.10 [UNP3ev1, 135-9] for the details of
1. the problem of using function wait() in the SIGCHLD handler
2. the solution of the problem using function waitpid()
3. the complete source code of the concurrent server
which can avoid generating zombie child processes in
tcpcliserv/sigchldwiatpid.c and
tcpcliserv/tcpserv04.c.
c USQ, August 1, 2014
36 Module 4. TCP Client-Server Example
Activity 4.G: Compile and run the following two versions of the
echo server using the client of
tcpcliserv/tcpcli04.c:
1. the server of tcpcliserv/tcpserv03.c
2. the server of tcpcliserv/tcpserv04.c
Experiment with these different servers to
1. show the problem of zombie child processes caused by the
server of
tcpcliserv/tcpserv03.c
2. show that the same problem does not exists with the server
of
tcpcliserv/tcpserv04.c
4.4 Abnormal Termination
The normal termination of a TCP connection starts with the close() function call
by the client followed by the
close() function by the server as explained §5.7 of
the text. However, to make the TCP network applications more robust we must
consider various situations of abnormal termination of TCP connection.
An abnormal TCP termination is usually caused by the premature termination of
the server process or server host. There are four situations where a TCP connection
may be terminated abnormally.
1. Termination of the Server Process(§5.12 of the text).
If the server child process is killed during the normal operation, (say by using
kill command from a UNIX shell), it will close all the resources including
the connected socket of the TCP connection.
The problem in this situation is that the client process will not know that the
server is terminated until it receives an EOF (0) at
readline() after it sends
the last line to the server. This last line was sent to the server by the client
even after the TCP of the client host has received the closing FIN from the
server host. The client process continues to send the last line to the server
process because it does not know the server process has terminated. It found
the abnormality only when it receives an EOF at
readline(). The reason
it receives an EOF at
readline() is that the TCP of the client has already
received the closing FIN from the TCP of the server and it knows that the the
half of the connection from the server to the client is closed and the server
will not send any data to the client anymore.
Finding the abnormality at
readline() is too late for the client process,
because it has already sent the last line to the server process which has terminated.
c USQ, August 1, 2014
4.4. Abnormal Termination 37
To detect this abnormality earlier, the client process must be able to wait
on both the standard input terminal (with
fgets()) and the socket (with
readline()) at the same time. This can be achieved only by using functions
select() or poll(). The details of functions select() or poll() can be
found in Chapter 6 of the text.
Here are some important facts:
The client can still send data to the server even if it is in the CLOSE WAIT
state.
The TCP of the server will send an RST back to client when it receives
the last line in the FIN WAIT 2 state.
Reading 4.H: Read §5.12 [UNP3ev1, 141-2] for the details of
the problem caused the abnormal termination of the server process.
Activity 4.I: Repeat the experiment described in §5.12
[UNP3ev1, 141-2]. Use
netstat to observe the change of the
states of the client TCP and the sever TCP.
2. SIGPIPE signal problem(§5.13 of the text).
Continuing with the situation above, the TCP of the server will send an RST
back to the client when it receives the last line in the FIN WAIT 2 state.
Can the client process detect the abnormality (the server process has prematurely terminated and the half of the connection from the server to the
client has been closed) earlier, say when it sends the last line to the server
process? One attempt is change the client code to call
writen() twice for
every line to send to the server process, because the first
writen() will solicit an RST from the TCP of the server and the client can do something
about it earlier. This idea is coded in the new version of function
str cli()
in tcpcliserv/str cli11.c in Figure 5.14 of the text. But, this change
not only cannot solve the problem, but also creates another problem: when
the TCP of the client received the RST, it sends a
SIGPIPE signal to the
client process. The default signal handler for the
SIGPIPE signal is to terminate the client process without any print-out by the shell. To avoid that,
the client must handle the
SIGPIPE signal using a user-defined handler or
simply ignores it by using the
SIG IGN handler. After the client finishes the
user-defined signal handler or the the
SIG IGN handler, the second writen()
will return with error code EPIPE. The client process can catch this error and
do something about the abnormal situation.
c USQ, August 1, 2014
38 Module 4. TCP Client-Server Example
Reading 4.J: Read §5.12 [UNP3ev1, 141-2] for the details
(a) the problem of Signal SIGPEPE
(b) the code of tcpcliserv/str cli11.c
Activity 4.K: Modify tcpcliserv/str cli11.c and
tcpcliserv/tcpcli11.c
(a) to change the default signal handler for SIGPEPE to
SIG IGN
(b) to catch the EPEPE error at the second writen() to
i. call function close() explicitly so that another half of
the TCP connection from the client TCP to the server
TCP can be closed properly, and then
ii. sleep for 120 seconds, and then
iii. exit with an error message.
Compile and run your modified program and experiment with it
to show that both halves of the TCP connection have been closed
properly during the time when the client process sleeps.
3. Crashing of Server Host(§5.14 of the text).
The crashing of the server host means that the machine is stopped unexpectedly due to, say, power failure and it was not able to shut-down itself
properly through the normal shut-down process.
If the server host crashes, the client is informed of nothing. When the client
sends data to the server after, no server responses will be sent to the client.
Because TCP is a reliable protocol, the TCP of the client will re-transmit
the data after time-outs. It will eventually give up after a certain number
of re-transmission (12 times in BSD UNIX). The client process will get the
ETIMEOUT error at readline().
4. Crashing and Rebooting of Server Host(§5.15 of the text).
Rebooting the crashed server host will not restore the previous TCP connection because the server host has lost all the information about it. Sending
data to the non-existing socket on the rebooted host causes an RST sent back
the client. The client will then get an ECONNRESET error at
readline().
5. Shutdown of Server Host(§5.16 of the text).
Because the server child process will be killed by the
init process (PID
1) during the shut-down, the consequence is the same as the termination of
server process described in §5.12 of the text.
c USQ, August 1, 2014
4.5. Abnormal Data Transmission 39
Reading 4.L: Read §5.14-6 [UNP3ev1, 144-5] for the detailed
consequences of
1. crashing of the server host
2. crashing and rebooting of the server host
3. shut-down of the server host
4.5 Abnormal Data Transmission
The normal data transmission of TCP is to use ASCII character strings.
Sending binary data of other types such as integer and float as well as structured
data types over TCP connection will cause all sorts of problems.
This is because
the client and the server host machines may use different byte orders (bigending or little-ending) for multiple-byte binary data, or
they may have different implementations for the same data type such as the
case with
long in C, or
they may have different way to pack the structured data types.
Sending binary data over TCP connections is not recommended.
Reading 4.M: Read §5.18 [UNP3ev1, 147-150] for the details
of the problems sending binary data over TCP connections.
Activity 4.N: Compile and run the client and server using the
new
str cli() and str echo() in tcpcliserv/tcpcli09.c
and tcpcliserv/tcpserv09.c, respectively. Experiment with
the new client and server on the two machines of your environment to see if they use the same byte order.
c USQ, August 1, 2014
40 Module 4. TCP Client-Server Example
4.6 Summary
We presented a simple TCP application: a string-echo server and a client interacting with both the standard input terminal and the server.
We presented the technique to avoid zombie child processes in the host of the
concurrent server using UNIX signal handling with functions
signal(), wait()
and waitpid().
We have shown the source code and the normal operation of the client and the
server.
To make network applications more robust, we need to consider various abnormal
terminations and understand the behaviors of the TCP in these situations.
Reading 4.O: Read §5.17 [UNP3ev1, 146-7] for a brief summary of TCP connection from both the client’s and server’s perspectives.
c USQ, August 1, 2014
Module 5
Elementary UDP Sockets
In Module 3 we introduced the basic functions to establish, use and terminate TCP
connections through TCP sockets in the client and server processes. In Module
4,
we further looked at an example using TCP sockets and discussed both normal and
abnormal operations of TCP applications.
In this module, we go through the same steps as in Modules
3 and 4 for the network applications using the UDP protocol. In particular, we first describe the basic
system call sequences of the client and server processes. We then show a simple
and complete application using UDP sockets. This simple example is then further
refined to reveal some advanced features of UDP sockets such as connected sockets. This module also describes the experiments showing the basic features of UDP
sockets.
Module contents
5.1 Overview of System Call Sequences of Client and Server . . . . . 42
5.2 recvfrom() and sendto() Functions . . . . . . . . . . . . . . . . 43
5.3 A UDP Echo Example . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Improvement of Echo Example . . . . . . . . . . . . . . . . . . . . 44
5.5 Asynchronous Error and Connected UDP Socket . . . . . . . . . . 45
5.6 Lack of Flow Control and Reliability with UDP . . . . . . . . . . . 47
5.7 Outgoing IP Address . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
The text for this module is Chapter 8 of Stevens’ UNIX Network Programming (3rd
Edition)
, Vol. 1, referred to as UNP3ev1.
41

42 Module 5. Elementary UDP Sockets
Main Aims:
to understand basic system calls to develop network application using the
UDP protocol;
understand the source code of a simple UDP application; and
understand the basic and advanced features of UDP sockets.
5.1 Overview of System Call Sequences of Client and Server
UDP is an unreliable and connection-less datagram protocol and, therefore, is
much less complicated than TCP.
The normal system call sequence for the client is as follows:
1. call socket() to create a UDP socket
2. call sendto() and recvfrom() through the UDP socket to send and receive
datagrams from the server
3. call close() to delete the socket after use
The normal system call sequence for the server is as follows:
1. call socket() to create a UDP socket
2. call bind() to bind the UDP socket to a port number for the service
3. call sendto() and recvfrom() through the UDP socket to send and receive
datagrams from the client
These two system call sequences are illustrated in Figure 8.1 of the text [UNP3ev1,
240].
Because UDP is a connection-less datagram protocol, there is no actual connection
between the client and server processes. Each datagram is a self-contained unit of
data with the socket address of its destination. Therefore, a client can use the UDP
socket to send datagrams right after it is created. However, the server has to bind
a port number to the UDP socket it created for the service so that clients can send
datagrams to this port number (plus an IP address of the server host). Recall that a
socket on the Internet is defined by a pair of (IP address, port number).
The datagram received by the server process also contains the IP address and the
port number of the UDP socket of the client process. After receiving the first
datagram from the client, the server can use this pair of (IP address, port number)
to send its own datagrams back to the client.
c USQ, August 1, 2014
5.2. recvfrom() and sendto() Functions 43
Reading 5.A: Read §8.1 [UNP3ev1, 239-41] for a brief
overview of basic systems calls for UDP network applications as
well as UDP sockets and datagrams.
5.2 recvfrom() and sendto() Functions
Before we go to the code of the UDP example, we need to study the two functions, recvfrom() and sendto(), used to receive and send datagrams through
UDP sockets.
Reading 5.B: Read §8.2 [UNP3ev1, 240-1] for the details of the
interfaces of functions
recvfrom() and sendto().
In particular, the first three arguments and the return value are the same as in function read() and write(). The data pointed by the second argument, void* buff,
are the data of the datagram to send or receive. The third argument, the data size
of the datagram, can be 0 and therefore, the return value, i.e., the actual data size
of the datagram received or sent, can be 0 as well. This is not an error and simply
means that the data of the datagram is empty (but the datagram packet is not empty
because it contains UDP and IP headers.)
The fifth and sixth arguments of
sendto() specify the socket address (IP address
plus port number) of the UDP socket in the receiving process. The socket address
(IP address plus port number) of the sending process will be included in the datagram packet automatically by the kernel.
The fifth and sixth arguments of
recvfrom() takes socket address (IP address
plus port number) of the UDP socket of the sending process extracted from the
datagram packet. A
NULL for the fifth argument means that the receiving process is
not interested in the socket address of the source of the datagram.
5.3 A UDP Echo Example
The simple TCP echo application we have seen in Module 4 is now re-implemented
using UDP.
The source code for the server consists of
udpcliserv/udpserv01.c and lib/dg echo.c.
Reading 5.C: Read §8.3-4 [UNP3ev1, 241-4] for the details of
the source code for the UDP echo server.
c USQ, August 1, 2014
44 Module 5. Elementary UDP Sockets
Note that the socket address structure for the client socket, cliaddr is declared in
the main program and whose address is passed to function
dg echo(). Also note
that the actual socket address of the socket of the client is updated on the arrival
of each datagram. It is extracted from the datagram received and used for sending
the echo back to the corresponding client. Therefore, this server can serve multiple
clients at the same time, even though the incoming datagrams from the different
sources are interleaved.
The source code for the client consists of two programs:
udpcliserv/udpcli01.c
and lib/dg cli.c.
Reading 5.D: Read §8.5-6 [UNP3ev1, 244-5] for the details of
the source code for the UDP client.
As shown in function dg cli(), the client uses fgets() to get a line string from
the standard input and put it in the buffer pointed by
sendline. Then it calls
sendto() to send the line. But, function strlen() returns the length of the line
which does not include the terminating ’
0’. In other words, the terminating null
character ’
0’ is not sent to the server and the string echoed back from the server, of
course, does have the terminating ’
0’. This is the reason why function dg cli()
needs to append the terminating ’0’ to the data echoed back in recvline() before
it calls
fput().
Activity 5.E: Compile the source codes for this UDP echo example and run the server and the client on two separate machines.
Experiment with the example to show that it works correctly. Note
that UDP is unreliable. The client can hang at
recvfrom() if the
datagram sent to the server or the echo sent back to the client is
lost. Read §8.7 [UNP3ev1, 245-6] for the comments on this possibility.
5.4 Improvement of Echo Example
The client of the UDP echo example above can receive datagrams from any source.
Since the fifth argument of the
recvfrom() call in the function dg cli() is NULL,
the client does not have record of the source of the datagram received. Therefore,
it cannot check whether the datagram received is from the server which it communicates with.
An in improved version of function
dg cli() can be found in udpcliserv/dgcliaddr.c.
c USQ, August 1, 2014
5.5. Asynchronous Error and Connected UDP Socket 45
Reading 5.F: Read §8.8 [UNP3ev1, 246-8] for the new version
of
dg cli() in
udpcliserv/dgcliaddr.c which enables the client to verify the
source of the datagrams received and ignore those which are not
from the server which it communicate with.
However, this fix will cause another problem if:
the server has multiple interfaces and its kernel chooses a different interface
for the outgoing datagrams it sends back to the client; or
if the client sends the datagrams to the non-primary IP address of the interface.
In both cases, the IP address of the server in the outgoing datagrams sent back to
the client will be different from the one the client uses to send the datagram to the
server. As a result, the client will ignore the echo from the server.
A solution to this new problem is to have the server create multiple sockets and
bind each of its IP address to a socket explicitly. The server then has to use function
select() or poll() to wait for the datagrams from all the sockets simultaneously.
The details of functions
select() and poll() can be found in Chapter 6 of the
text [UNP3ev1, 153-89].
5.5 Asynchronous Error and Connected UDP Socket
The normal operation with the UDP socket in the client as above is to use function
sendto() which specifies the socket address of the server as the destination for
each datagram to be sent. As far as the UDP socket as concerned, it does not
“connect” to any server: the client can use it to send datagrams to any server.
If the application uses the UDP socket for only one server, it is better to “connect”
it to the socket address of the server. This “connection” is by no means the same as
the connection we have seen in the TCP protocol. Here, connection simply means
that any datagram sent (received) through this “connected” UDP socket is to (from)
the specific socket address of the server.
There are many advantages of using connected UDP sockets. One of them is that
the client will able to detect
asynchronous errors in the network. For example, an
asynchronous error occurs if the client in the UDP echo example above sends a
datagram to a non-existing server (the server has not started yet). The
sendto()
function call returns without error, because this function simply put the datagram
in the output buffer of the UDP of the client. The error (about server not being
c USQ, August 1, 2014
46 Module 5. Elementary UDP Sockets
reached) will be known to the TCP of the client until it sends the datagram to the
non-existing server and attracts an ICMP error telling that the socket of the server
is not reached. This error is called asynchronous error because it does not occur at
the same time with the function
sendto() call.
The problem with the unconnected UDP sockets in relation to asynchronous error
is that the client process will not be notified of any asynchronous error. In the
UDP echo example above, the client would wait at
recvfrom() indefinitely, even
though the UDP of the client has received the asynchronous error telling that the
server does not exist.
Reading 5.G: Read §8.9 [UNP3ev1, 248-9] for the details of the
problem of unconnected UDP socket in relation to asynchronous
error.
By using a connected UDP socket, the client will receive the asynchronous error
from its UDP when it waits to receive the echo from the (non-existing) server.
Additional benefits of using connected sockets include:
Performance will be improved in some UDP implementations.
The client will receive the datagrams only from the connected server.
A UDP connection is made by calling function
connect() with the socket address
of the peer after the UDP socket is created. The datagrams can be sent and received
by using
write() and read(), respectively.
Reading 5.H: Read §8.11-2 [UNP3ev1, 252-7] for the details of
1. connected UDP sockets
2. a new version of dg cli() for the client using the connected
UDP socket.
Activity 5.I:
1. Repeat the experiment described in §8.9 to show that the
asynchronous error is not received by the client.
2. Compile the new client using new version of dg cli() in
udpcliserv/dgcliconnect.c and repeat the experiment
described in §8.12 to show that the asynchronous error is
received by the client.
c USQ, August 1, 2014
5.6. Lack of Flow Control and Reliability with UDP 47
5.6 Lack of Flow Control and Reliability with UDP
UDP does not have flow control and datagrams can be lost on the way to the destination host.
The UDP of the receiving host is simply a datagram demultiplexor using the port
number to dispatch datagrams to the UDP sockets. Each socket has a receiving
buffer with limited size. When the buffer is full (e.g., the application may be slow
to take the datagrams from it), the arriving datagrams will be discarded.
§8.13 shows an experiment in which the client sends large datagrams continuously
and the server loses many datagrams.
Reading 5.J: Read §8.13 [UNP3ev1, 257-61] for the details of
1. new version of dg cli() in udpcliserv/dgcliloop1.c
and new version of dg echo() in
udpcliserv/dgecholoop1.c.
2. the experiment with the new client
udpcliserv/udpcli06.c and the server
udpcliserv/udpserv06.c.
3. new version of dg echo() in
udpcliserv/dgecholoop2.c to change the UDP receiving buffer size.
In the experiment shown in Figure 8.21 of the text, we can see that among the total
2000 datagrams sent by the client, only 1994 reached the UDP of the server host.
Among 1994 datagrams received by the server host, only 1912 datagrams were
delivered to the server process and 84 datagrams were discarded.
Activity 5.K:
1. Repeat the experiment with the client
udpcliserv/udpcli06.c and the server
udpcliserv/udpserv06.c using two machine in your
environment. Find out how many datagrams are (1) lost
on the way to the server host and (2) discarded due to full
receiving buffer.
2. Repeat the experiment above with the
server
udpcliserv/udpserv07.c using
udpcliserv/dgecholoop2.c to see the effect of changing
the receiving buffer size.
c USQ, August 1, 2014
48 Module 5. Elementary UDP Sockets
5.7 Outgoing IP Address
Connecting a UDP socket to a remote socket address using function connect()
also chooses the outgoing local IP address (and the port number) and attaches it to
the socket. This IP address is chosen by the kernel to be the the primary IP address
of the interface determined by searching the routing table for the IP address of the
remote socket.
Reading 5.L: Read §8.14 [UNP3ev1, 261-2] for a new client
in
udpcliserv/udpcli09.c to show how to get the outgoing IP
address of a connected UDP socket.
5.8 Summary
We have discussed the basic system call sequences for the client and server for
UDP applications and shown a simple UDP echo example application.
We also discussed some problems associated with the first version of the simple
echo example such as receiving datagrams from processes other than the server
and not being able to receive asynchronous errors. We presented the solutions to
these problems and showed the new versions of the application.
We discussed connected UDP sockets with function
connect() calls.
Lastly, we showed some experiments to illustrate that UDP is not reliable and lacks
flow control.
This module completes our discussion in the first part of this course: Elementary
Sockets. After this module, you should be able to write simple TCP and UDP
network applications.
In the next part of the course, we will get into the details of a real network application: Trivial File Transfer Protocol (TFTP) implemented in both UDP and TCP to
learn the software methodology for developing network applications.
c USQ, August 1, 2014
Part II
Advanced Sockets
49

Module 6
Daemon Processes and inetd
Superserver
This module does two things:
It introduces the concept of daemon process and the code to convert an ordinary process to be a daemon.
It introduces the most widely used daemon for Internet applications called
inetd.
A daemon is a background process which is independent of control from all terminals. Most daemons are created when the system is booted and stay alive as long
as the system is running. The purpose of daemon processes is to provide various
system services including network services. Examples of these services include the
printing service (daemon
lpd), file system mount service (daemon mountd and, of
course, the Internet services such as ftp, telnet and rlogin (daemons
ftpd, telnetd
and rlogind).
Among many commonly used daemons, the superserver daemon
inetd is a special
one. It acts like a concurrent server, but its purpose is not to provide a particular
Internet service, but rather to create various Internet servers upon the requests from
the clients. This is way it is called Internet Superserver.
inetd is very important,
because it runs on every UNIX computer on the Internet.
Module contents
6.1 What is a Daemon? . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 syslogd Daemon and syslog and openlog System calls . . . . . . 53
6.3 Converting a Process to a Daemon . . . . . . . . . . . . . . . . . . 54
51
52 Module 6. Daemon Processes and inetd Superserver
6.4 inetd Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 daemon inetd() function . . . . . . . . . . . . . . . . . . . . . . . 56
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
The text for this module is Chapter 13 of Stevens’ UNIX Network Programming
(3rd Edition)
, Vol. 1, referred to as UNP3ev1.
Main Aims:
understand what are daemon processes;
know how syslogd daemon works;
know how to use system calls syslog and openlog;
understand what is inetd daemon and how it works; and
understand files /etc/syslogd.conf and /etc/inetd.conf.
6.1 What is a Daemon?
A daemon is a background process which:
is not attached to any terminals; and
uses syslogd for log and error messages.
Because a daemon is not associated with any terminal, the standard file descriptors
0, 1, and 2 have not meaning. For example, it cannot use
printf to print messages to the standard output, stdout. As a consequence, it must use system call
syslog to sent messages to daemon syslogd which forwards the message to the
appropriate log files.
Reading 6.A: Read §12.1 [UNP3ev1, pp331-2 ] for the concept
of daemon, how a daemon is normally started and why a daemon
should not be attached to any terminal.
c USQ, August 1, 2014
6.2. syslogd Daemon and syslog and openlog System calls 53
/var/run/log
port 514
/dev/klog
/dev/console
/var/log/cisco.log
….
log files
kernal
Daemons syslogd
Figure 6.1: Function of Daemon syslogd
6.2 syslogd Daemon and syslog and openlog System calls
Daemon syslogd is used to forward log messages from all other daemons to the
appropriate log files. Figure
6.1 illustrates the basic function of syslogd. The two
path names
/var/run/log and /dev/klog are the UNIX domain sockets used by
syslogd to receive messages. Another UDP socket bound to port 514 can also
be used to receive messages. The job of
syslogd is to process and forward log
messages it receives to the appropriate log files according to the type of the log
messages.
Reading 6.B: Read §13.2 [UNP3ev1, 364-5] for the purpose and
function of
syslogd
Each log message is classified by level and facility. File /etc/syslog.conf defines how to handle the messages with with certain level and facility. The first
column of each line specifies list of
level.facility pairs and the second field the log
file name to which to send log messages.
Activity 6.C: Find and read file /etc/syslog.conf on your
LINUX system. Figure out how
syslogd of your LINUX processes various log messages.
A daemon process sends log messages to syslogd by calling
void syslog(int priority, const char *message, …)
Here argument priority is a combination of a level and a facility. But, before it calls
syslog for the first time, it must do one of the two things as follows:
either create a UNIX domain socket and connect it to path name /var/run/log
(alternately, it can create a UDP socket and connect it port 514 of the local
host (IP = 127.0.0.0),
c USQ, August 1, 2014
54 Module 6. Daemon Processes and inetd Superserver
or call
void openlog(const char *ident, int options, int *facility)
Reading 6.D: Read §13.3 [UNP3ev1, 365-7] for the details of
system calls
syslog and openlog.
6.3 Converting a Process to a Daemon
To become a daemon, the process needs to become a background process and, more
importantly, it has to be de-associated with the terminal of original process. You
also have to make sure that it will not be able to acquire any control terminal in the
future. The steps to accomplish these requirements are encapsulated in a function
called
daemon init() presented in §13.4 [UNP3ev1, 367-71]. Basically, it:
1. calls fork() to continue as the background child process which is definitely
not the process group leader (thus it can call
setsid(),
2. calls setsid() to become the leader of a new process session and de-associate
any terminal,
3. sets the hander to ignoreSIGHUP, and
4. calls fork() again to make sure the daemon (child) will not acquire any
controlling terminal in the future.
The code of
daemon init() is shown in Figure 13.4 of [UNP3ev1, 368].
Reading 6.E: Read §13.4 [UNP3ev1, 367-71] for the details of
how
daemon init() works and the reason for each step in it.
As an example, the daytime server we studied previously was converted to a daemon server. The new version of the code is shown in Figure 13.5 of [UNP3ev1,
371].
Activity 6.F: In the code in Figure 13.5 of [UNP3ev1, 371], we
cannot find any place where the daytime daemon call
syslog to
log any messages. Instead it calls
err quit() or err msg() to
dump some messages. Trace these two calls to find out
where and how syslog() is called
where do these log messages end up
Activity 6.G: Answer the question of Exercise 13.1 at the end of
Chapter 13 [UNP3ev1, 380]
c USQ, August 1, 2014
6.4. inetd Daemon 55
6.4 inetd Daemon
The purpose of inetd daemon is to reduce the total number of processes of Internet
servers which are not heavily used and busy. It also simplifies the code of these
servers because do not need to go through the same procedure to create a socket
and bind it on a port as well as to daemonize itself. These repetitive operations are
factored out and put into the code of inetd. That is, the reasons for inetd are:
to reduce the number of system daemons; and
to simplify writing daemon processes.
The technique used by
inetd to start the corresponding servers according to the
request from the clients is to use function
select(). The details of function
select() can be found in Chapter 6 of the text [UNP3ev1, 143-76]. Figure 13.7
in Stevens [UNP3ev1, 374] shows the control flow of
inetd.
Basically, what
inetd does is
create a socket (UDP or TCP) for each Internet server list in file /etc/inetd.conf.
The line format of
/etc/inetd.conf is described in Figure 13.6 in Stevens
[UNP3ev1, 372]. The first and third fields of each line defines the service
name and the protocol (tcp or udp) of the corresponding server.
inetd uses
this information to consult file /etc/services to find the well-known port number for the service.
use function select() to wait on all these sockets. (See Chapter 6 of the
text for the details of how to use function
select().) Once a connection
request (tcp) or a datagram (udp) arrives,
inetd spawns a child process (usingfork()) to invoke the corresponding server by using function exec().
The 6th field of lines in
/etc/inetd.conf is the path name of the binary
executable of the server and the 7th field the arguments for
exec().
Of course, there are many details. For example, the spawn child closes all the other
sockets inherited from the
inetd except the one it uses for the service.
There are two delicate points which you should be clear about:
If the request from the client is a udp datagram, inetd should wait for the
child server to exit before it goes back to wait on
select(). (See Chapter 6
of the text for the details of function
select().) The reason is that the inetd
could spawn another child server to service
the same request if it does not
wait.
c USQ, August 1, 2014
56 Module 6. Daemon Processes and inetd Superserver
The TCP servers started by inetd upon the requests of clients are all one-off
servers. That is, each of them provide only one service for the particular
connection only. The server will exist until the client close the connect.
These servers are not TCP concurrent servers. They cannot accept any other
connection request from other clients. For busy tcp services such as http, it
is better for them to be a concurrent server themselves, because they could
overload
inetd if they are to be started by it.
Reading 6.H: Read §10.2 and §10.3 on 3 13.5
Activity 6.I: Compile in Stevens [UNP3ev1, 371-7] for the detail
of
inetd.
Activity 6.J: Find and read file /etc/inetd.conf of your
LINUX system. Figure out what TCP and UDP Internet services
are available on your system.
Activity 6.K: Answer the questions of Exercise 13.2-4 of Chapter 13 in Stevens [UNP3ev1, 380].
6.5 daemon inetd() function
§13.6 in Stevens [UNP3ev1, 377-9] shows how to make a daemon server to be
started by
inetd. Function daemon inetd() in Figure 13.11 in Stevens is used
to initialize the log mechanism. Figure 13.12 in Stevens shows the example of
Daytime server as an Internet server started by
inetd.
Activity 6.L: Study and compare the daytime servers in Figures
13.12 and 13.5 in Stevens. Indicate the differences between them
and explain the reasons for them.
6.6 Summary
This module covers two aspects of network server programming:
how to make a server a daemon process; and
how to configure an Internet daemon server to be invoked inetd daemon.
c USQ, August 1, 2014
6.6. Summary 57
It touches upon
syslogd daemon, because all daemon servers need to log various
messages through it.
inetd daemon is one of the major focuses of this module,
because it is so important for Internet applications.
You should know:
how to write the code for a daemon server;
how to write a server to be invoked by inetd; and
how to configure files /etc/syslog.conf and /etc/inetd.conf.
Activity 6.M: Try to find the source code of inetd of your
LINUX system and study it.
c USQ, August 1, 2014
58 Module 6. Daemon Processes and inetd Superserver
c USQ, August 1, 2014
Module 7
Advanced UDP Sockets
This module addresses advanced uses of the UDP protocol. You have already
known that UDP is a simple unreliable datagram protocol. It is not as comprehensive as TCP which is a reliable byte-stream protocol. The operation and implementation of TCP are much more complicated than those of UDP, because TCP uses
the techniques of time-out and retransmission as well as sliding-window control
flow to guarantee each byte of the byte-stream to be delivered to the application of
the other end of the connection. TCP also uses the other techniques such as slow
start and exponential back-off to avoid congestion in the network. UDP uses none
of these.
However, there are many applications to justify the use of UDP. The attractiveness
of UDP is its low cost. For the applications with simple request-reply protocols,
UDP is sufficient. But, we often need to add time-out and retransmission to the
UDP for the reliable datagram delivery. The major focus of this module is to show
how to add time-out and retransmission to the applications using UDP. The material
used for this purpose is §22.5 in Stevens [UNP3ev1, pp597-608].
The algorithms and techniques for reliability described in this module are the same
as as implemented in TCP. In essence, we are reinventing the wheel of part of the
TCP techniques in the UDP application level. A benefit of this is that you are learning some implementation techniques of TCP. Of course, the whole implementation
of TCP is beyond the scope of this course.
In order to understand the programs in §22.5 in Stevens, you need to be familiar
with the advanced I/O operations
sendmsg() and recvmsg as well as readv and
writev covered in §14.4-5 in Stevens. You also need to study the technique of
sigsetjmp() and siglongjmp() to handle signals. The topic is covered in §20.5
of Stevens.
Module contents
59
60 Module 7. Advanced UDP Sockets
7.1 UDP versus TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Advanced I/O with sendmsg() and recvmsg() . . . . . . . . . . . 61
7.3 Advanced Signal Handling . . . . . . . . . . . . . . . . . . . . . . 63
7.4 Jacobson/Karel’s Algorithm for RTO and Retransmission . . . . . 65
7.5 Adding Reliability to UDP Applications . . . . . . . . . . . . . . . 66
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
The material for this module are the following sections in Stevens:
§22.1 and §22.4 for comparison of TCP and UDP
§14.4-5 for I/O operations sendmsg() and recvmsg()
§20.4-5 for advanced signal handling with sigsetjmp() and siglongjmp()
§22.5 for time-out and retransmission
Main Aims:
understand when to use UDP and TCP;
know how sendmsg() and recvmsg work;
know how to use sigsetjmp() and siglongjmp() for advanced signal handling;
understand the Jacobson/Karel’s algorithm for retransmission time-out; and
understand the implementation of time-out and retransmission.
7.1 UDP versus TCP
TCP and UDP are two different transport protocols for network applications. It is
important to choose the right one for your application.
To make a right decision, you need to understand the major differences between
them and the advantages of each.
The advantages of UDP lie in its (1) low cost and (2) capability to support broadcast
and multicast. The limitation of UDP is is that it is a simple protocol without
reliability. But, this can be overcome by adding the time-out and retransmission to
your application. For many applications with simple
request-reply patterns, UDP
plus the application-level reliability is a simple and efficient solution.
c USQ, August 1, 2014
7.2. Advanced I/O with sendmsg() and recvmsg() 61
Reading 7.A: Read §22.1 and §22.4 of Stevens [UNP3ev1,
pp621-2, pp594-597]
.
Activity 7.B: Answer the following questions:
1. What kinds of applications should use TCP and UDP, respectively? Why?
2. Why do TFPF and NFS use UDP despite the fact that they
both transfer bulk data?
3. Why many network applications such as NFS and RPC have
both TCP and UDP implementations?
7.2 Advanced I/O with sendmsg() and recvmsg()
Almost all UDP applications define their own format for the messages or packets
to be exchanged between the clients and the servers. These packets normally have
many fields. Before sending a packet the sender process can copy the data of
various fields from the different places to a simple buffer and then send the packet
in the buffer using
sendto(). Similarly, the receiver has to unpack the packet after
receiving it using
recvfrom(). You have seen this strategy in the implementation
of TFTP in Module
??.
However, you can save these operations of packing and unpacking by using advanced I/O operations
sendmsg() and recvmsg() instead.
§14.4 of Stevens [UNP3ev1, pp389-90] includes the definition of data type
iovec
as follows (LINUX version):

/* Structure for scatter/gather I/O. */
struct iovec
{
void *iov_base;
size_t iov_len;
};
*/
*/
/* Pointer to data.
/* Length of data.

Here iov base is a pointer to a data buffer and iov len the size of the buffer.
Functions
readv() or writev(), called scatter read or gather write, respectively,
completes an atomic I/O operation between a set of
iovec in the memory and the
file/socket descriptor.
c USQ, August 1, 2014
62 Module 7. Advanced UDP Sockets
Reading 7.C: Read §14.4 of Stevens [UNP3ev1, pp389-90] for
the details of the interfaces and semantics of functions
readv() or
writev().
More general I/O functions are sendmsg() and recvmsg() which use a even more
general data structure to organize the data areas for I/O. This data structure is called
msghdr and is as follows (LINUX version):
/* Structure describing messages sent by

‘sendmsg’ and received by ‘recvmsg’. */
struct msghdr
{
__ptr_t msg_name;
socklen_t msg_namelen;
*/
/* Address to send to/receive from.
/* Length of address data. */

 

struct iovec *msg_iov;
size_t msg_iovlen;
/* Vector of data to send/receive into. */
/* Number of elements in the vector. */

 

__ptr_t msg_control;
size_t msg_controllen;
/* Ancillary data (eg BSD filedesc passing). */
/* Ancillary data buffer length. */

int msg_flags; /* Flags on received message. */
};
One object of such type bundles the socket address (msg name and msg namelen)
and a vector of
iovec (*msg iov and msg iovlen) together to be used by functions sendmsg() and recvmsg(). msg name and msg namelen are used only if the
socket is not connected to the destination.
The interfaces of
sendmsg() and recvmsg() are much simpler that those of sendto()
and recvfrom(), shown as follows:
ssize_t sendmsg(int sockfd, struct msghdr *msg, int flags)
ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags)
The third argument, flags, is used to control how the message is transmitted.
Reading 7.D: Read §14.5 of Stevens [UNP3ev1, pp390-5] for
the detailed description of functions
sendmsg() and recvmsg()
and data type msghdr.
c USQ, August 1, 2014
7.3. Advanced Signal Handling 63
7.3 Advanced Signal Handling
In order to understand the code for time-out in the UDP example with reliability
added, you need to know a pair of functions
sigsetjmp() and siglongjmp().
Functions
sigsetjmp() and siglongjmp() are evolved from functions setjmp()
and longjmp() used to realize non-local goto in C. Non-local goto means to
jump
into a function, which cannot be completed by using goto in C. A sigsetjmp()
(or setjmp()) sets the target for the non-local GOTOs and a siglongjmp() (or
longjmp()) makes a non-local goto.
Why do we need non-local goto? Well, there are some situations where the nonlocal goto is essential. A normal (local) goto can change the control only within
the function. The normal way to change control to the calling function is
exit()
or execute return statement. Non-local goto allows the control to be transferred to
any function which calls the function either directly or indirectly (i.e. any function
in the call sequence to the functions).
The interfaces of
sigsetjmp() and siglongjmp() are as follows:
int sigsetjmp(sigjmp_buf env, int samemask)
void siglongjmp(sigjmp_buf env, int val)
where argument env of type sigjmp buf is a some form of array that is capable of holding all the information required to restore the state of the stack to
when we call
siglongjmp(). The return value of sigsetjmp() is 0 when it is
called directly. When the
siglongjmp() is called, the non-local goto makes int
sigsetjmp()
returns again and its return value is the second argument val of the
the
siglongjmp().
siglongjmp() is normally called in signal handlers. By doing so, we can control
the place to return from the signal handler no matter when the signal is delivered.
If
samemask is nonzero, sigsetjmp() also save the the current signal mask in env.
When the
siglongjmp() is called, the signal masked is restored.
The signal mask is a signal set of type
sigset t, a bitmap type of structure, to
block certain signals to be delivered (a signal is
generated by a hardware or software event, but can be held by the kernel until it is delivered to the process). The
reason we want to save the current signal mask when the
sigsetjmp() is called is
that the signal handler automatically change the signal mask by adding the signal
to it. The purpose of this is obviously that we do not want another signal of the
same kind to be delivered when we are currently handle this signal. But, when we
call
siglongjmp() to get out of the signal handler, we need to restore the original signal mask to catch the second signal of the same kind again. The design of
siglongjmp() ensures of just that.
Therefore, the normal code pattern of using them for signal handling is as follows:
c USQ, August 1, 2014
64 Module 7. Advanced UDP Sockets

/*
/*
a function in which we want to response the the signal no
matter when the signal is delivered
*/
*/

void
any_funtion(…)
{ . .
/* set signal handler */
signal(SIGALARM, handler);
. .
/* we want to response to the singal here */
if (sigsetjmp(jmpbuf, 1) != 0 )
…. /* code to respond to the signal
break;
. . }
/* signal handler
static void
handler(int signo)
{
siglongjmp(jmpbuf, 1);
}
An example of advanced signal handling with sigsetjmp() and siglongjmp()
is shown in §20.4 and §20.5 of Stevens. Figure 20.5 of Stevens [UNP3ev1, pp476]
shows the code of
bcast/dgclbcast.c, a new version of dg cli function in our
elementary UDP example discussed in Module mod:elem-udp to illustrate broadcast sockets.
Reading 7.E: Read §20.4 of Stevens [UNP3ev1, pp535-8].
In the code of Figure 20.5 of Stevens [UNP3ev1, pp537], after setting an alarm
for 5 seconds (line 16) we have a
for loop (line 17-30) to receive all the daytime
messages from the hosts on the local network. This is because the socket is a
broadcast socket as set in line 12. The signal handling mechanism for
SIGALRM
in the code (which we have been using so far) is based on the expectation that the
signal will be delivered while waiting on the blocking I/O,
recvfrom(), and then
by checking the error code to be
EINTR we can be notified that the signal has been
delivered.
This code has a subtle problem with race conditions common in network programming. The problem is that we cannot guarantee that the signal
SIGALRM will be
c USQ, August 1, 2014
7.4. Jacobson/Karel’s Algorithm for RTO and Retransmission 65
delivered during the execution of
recvfrom(). It can happen at anytime. For instance it can happen during the printf statement in lines 28-9 and this will cause
the
for loop to miss out the signal and wait on recvfrom() for ever.
§20.5 offers 4 solutions to this problem: one incorrect (first one) and three correct.
You are required to read and understand the first and third solutions.
Reading 7.F: Read §20.5 of Stevens [UNP3ev1, pp538-47]
skipping the second (using
pselect() and the fourth (using IPC)
solutions.
The correct solution using sigsetjmp() and siglongjmp() is shown in Figure 20.9 of Stevens [UNP3ev1, pp544].
Activity 7.G:
Explain why the first solution using signal blocking/unblocking only is incorrect. Show a scenario of events
to demonstrate that this solution is still incorrect.
Explain why the third solution using sigsetjmp() and
siglongjmp() is correct.
7.4 Jacobson/Karel’s Algorithm for RTO and Retransmission
The basic technique for reliable transmission of data packets is:
use sequence numbers to identify each data packet and establish the correspondence between the data packet and its reply; and
retransmit the same data packet if its reply is not received within a predetermined time called Retransmission Time-Out (RTO).
The value of RTO is important, because it affects the performance of the application. It should be close to actual
Round Trip-Time (RTT) between the client and
the server. Since the actual RTT is constantly changing according the situation of
congestion of the Internet, RTO should also dynamically change according to the
changing RTT.
Activity 7.H: In what way can the excessively small or large
RTO values degrade the performance of the application?
c USQ, August 1, 2014
66 Module 7. Advanced UDP Sockets
The mathematical model to calculate RTO using the measured RTT was proposed
by Jacobson and Karel and is as follows:
Di f f = RTT srtt
srtt
= srtt +d×Di f f
Dev
= Dev+h×(jDi f fj-Dev)
RTO = µ×srtt +f×Dev
Here, d and h are constants between 0 and 1. Normally we take µ = 1 and f = 4. d
is normally 1=8 and h 1=4 or 1=8.
RTT is measured by using the time stamp of the request packet and the time when
its reply is received. RTO is updated every time the expected reply is received. The
fresh RTO will be used for the next request.
The Jacobson/Karel’s model also include the exponential back-off to double the
RTO value every time a time-out occurs. This is to reduce the pressure on the congested network between the client and the server indicated by the time-out itself.
Reading 7.I: Read §22.5 of Stevens [UNP3ev1, pp597-608].
Activity 7.J: Before the time stamp was used to measure RTT,
many implementations of time-out and retransmission only measure the RTT of those packets (requests) whose replies are received
before time-out. Why can the RTT of other packets not be used to
update RTO?
7.5 Adding Reliability to UDP Applications
An example of using RTO and retransmission is shown in §20.5 of Stevens. It is
an extension of function
dg cli() of the UDP echo client discussed in Module 5.
The code of the new
dg cli() is shown in Figures 22.6 and 22.8 of Stevens.
Part of the code dealing with time-out and retransmission is encapsulated in a package defined by file
rtt/rtt.c and its header lib/unprtt.h, which are shown in
Figures 22.10-14 of Stevens.
Reading 7.K: Read all the source code in Figures 22.6-14 of
Stevens [UNP3ev1, pp600-7].
The structure struct rtt info contains the data fields to compute and update
RTO using Jacobson/Karel’s algorithm. Several functions to use or update an
rtt info are available:
c USQ, August 1, 2014
7.5. Adding Reliability to UDP Applications 67
void rtt init(struct rtt info *) to initialize the rtt info before the
first packet is sent.
void rtt newpack(struct rtt info *) to reset the retransmission counter
to start a new packet. The RTO for the new packet are carried over to this
new packet from the previous one through the
rtt info. It is called before
the new packet is sent.
int rtt start(struct rtt info *) to simply return the RTO stored in
the
rtt info. It is called every time a packet is sent or retransmitted to set
the alarm.
void rtt stop(struct rtt info *, unit32 t) to update the RTO stored
in the
rtt info using the RTT provided through the second argument. It is
called every time a valid reply for the current packet is received. The RTT is
calculated by using the current time and the time stamp of the packet echoed
back with the reply.
int rtt timeout(struct rtt info *) to double the RTO stored in the
rtt info for exponential back-off. It is called every time a time-out occurs.
It returns -1 if the total number of time-outs has exceeded a limit.
The code pattern for the reliable UDP transmission is shown in Figure 22.7 of
Stevens [UNP3ev1, pp600]. Note that:
before a packet is sent or retransmitted, we call alarm() to set the alarm
clock using the RTO value returned from
rtt start(). If we do not receive
the reply of the packet in the time of this RTO, a
SIGALRM signal will be
generated and delivered by the kernel.
we use sigsetjmp() and siglongjmp() for the SIGALRM signal handling.
This guarantees that we always end up in following place after the signal is
handled(no matter where we are when the signal is delivered):
if (sigsetjmp(jmpbuf, 1) != 0) {
if (rtt_timeout(&rttinfo) < 0) {
err_msg(“dg_send_recv: no response from server, giving up”);
rttinit = 0; /* reinit in case we’re called again */
errno = ETIMEDOUT;
return(-1);
}
#ifdef RTT_DEBUG
err_msg(“dg_send_recv: timeout, retransmitting”);
#endif
goto sendagain;
}
c USQ, August 1, 2014
68 Module 7. Advanced UDP Sockets
In function dg send recv() used by our new version of dg cli, we use sendmsg()
and recvmsg() for the I/O with the server. Note that
each packet and reply has two fields: header and data. The header is required to assist reliable transmission of packets and replies and includes the
sequence number and the time stamp.
two msghdr structures msgsend and msgrecv are set up to be used by sendmsg()
and recvmsg(). The msg iov and msg iovlen of both are set to refer to a
vector of two buffers: one for the header and the other for the data.
Activity 7.L: Do the experiment described in Exercise 22.6 of
Stevens [UNP3ev1, pp620].
7.6 Summary
This module has let you experience inventing the wheel for reliability of TCP in
UDP applications. You should have know how the time-out and retransmission is
implemented in TCP, because the same techniques and algorithms are used here.
You should have extended you knowledge and skills on UNIX with:
advanced I/O with sendmsg() and recvmsg(); and
advanced signal handling with sigsetjmp() and siglongjmp().
Finally, you should have known:
the techniques to add reliability to the packets transmission of any network
protocols; and
how to develop your own UDP applications with reliability added.
c USQ, August 1, 2014
Module 8
SCTP Sockets and Programming
This module addresses the SCTP programming with SCTP sockets.
SCTP is a newer transport protocol. It was first designed to meet the needs of
the growing
IP telephone markets; in particular, transporting signaling across the SCTP is a reliable,
general-purpose transport
layer protocol for use on IP
networks
Internet. SCTP is a reliable, message-oriented protocol, providing multiple streams
between endpoints and transport-level support for multihoming. The multihoming
feature provide increased robustness against network failure.
In this module, the head-of-line blocking is discussed. In contrast to the TCP
connection, SCTP needs to establish a sort of connection. It is called association
in SCTP. SCTP’s association is a
four-way handshake process. A function used
exclusively with SCTP including
shutdown is described and a concept of ”notification” is covered, which allows an application to be informed of significant protocol
events other than the arrival of user data.
Module contents
8.1 Head-of-Line Blocking in TCP . . . . . . . . . . . . . . . . . . . . 70
8.2 SCTP Multiple Streams and Multihoming . . . . . . . . . . . . . . 71
8.2.1 Why SCTP? . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.2 Features of SCTP . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3 SCTP functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3.1 SCTP sockets . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3.2 sctp bindx Function . . . . . . . . . . . . . . . . . . . . . 73
8.3.3 sctp connectx Function . . . . . . . . . . . . . . . . . . . 73
8.3.4 sctp getpaddrs and sctp getladdrs Function . . . . . . . . . 74
8.3.5 sctp freepaddrs and sctp freeladdrs . . . . . . . . . . . . . . 74
8.3.6 sctp sendmsg Function . . . . . . . . . . . . . . . . . . . . 74
8.3.7 sctp recvmsg Function . . . . . . . . . . . . . . . . . . . . 75
69
70 Module 8. SCTP Sockets and Programming
8.4 An Example of SCTP One-to-Many-Style Streaming . . . . . . . . 75
8.4.1 SCTP four-way handshake . . . . . . . . . . . . . . . . . . . 76
8.4.2 Controlling the Number of Streams . . . . . . . . . . . . . . 76
8.4.3 Controlling Termination . . . . . . . . . . . . . . . . . . . . 76
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Main Aims:
to understand the advantages of SCTP over TCP and UDP
to appreciate the multiple streaming and multihoming;
to be able to program simple applications using SCTP
to examine the possible application where SCTP can be applied.
8.1 Head-of-Line Blocking in TCP
Head-of-line blocking occurs when a TCP segment is lost and a subsequent TCP
TCP performance suffers segment arrives out of order. That subsequent segment is held until the first TCP
from the effect of head-of-line
blocking
segment is retransmitted and arrives at the receiver. For a server that sends three
semantically independent messages, a lost of first message will block the client to
hold all data until that missing piece is retransmitted and arrives successfully. This
blocking is not really what the application would like to occur.
6 7 8
5
receiver
Application
buffer
Figure 8.1: Illustration of head-of-line blocking
Figure 8.1 is an illustration of the head-of-line blocking problem. In Fig.8.1since
the 5th packet is dropped, the receiver has to wait it to be retransmitted while
storing incoming packets. During this period, applications cannot get data, so it
suffers from bad delay. Since the receiver buffer is constrained, if receiver buffer
becomes full, the sender cannot transmit more data.
TCP suffers this kind of head-of-line blocking in some situations, but Head-of-line
blocking can be minimized by SCTP’s multistream feature.
c USQ, August 1, 2014
8.2. SCTP Multiple Streams and Multihoming 71
8.2 SCTP Multiple Streams and Multihoming
SCTP offers such advantages as multihoming and multi-streaming capabilities,
both of which increase availability.
8.2.1 Why SCTP?
Given TCP and UDP, why is SCTP still necessary? In order to answer this obvious
question, we here examin the features of TCP and UDP. TCP is a reliable, stream
The most important
enhancements in SCTP over
traditional transport layer
protocols are the end-host
multihoming and
multi-streaming capabilities.
transmission protocol with full control to the congestion and transmission speed
while UDP a unreliable datagram transmission protocol without any control to the
congestion.
Multihoming is novel mechanism SCTP can provide for mission-critical system
which rely on redundancy at multiple levels.






Figure 8.2: SCTP multihoming
Multistreaming is another novel service SCTP includes at the transport layer. An
SCTP stream is a unidirectional logical data flow within an SCTP association. The
SCTP end points negotiate application-requested streams during association setup
that exist for the life of the association.
8.2.2 Features of SCTP
SCTP is a general-purpose transport protocol, which provides additional features
comparing with TCP.
c USQ, August 1, 2014
72 Module 8. SCTP Sockets and Programming



 
Figure 8.3: SCTP Association
1. SCTP is a protocol that directly supports multihoming. A multihomed host
is one that has more than one network interface and therefor more than one
IP address for which it can be addressed.
2. Head-of-line blocking can be eliminated.
3. Application layer message boundaries are preserved.
4. An unordered message service is provided.
5. A partially reliable service is available in some SCTP implementations.
6. An easy migration path from TCP is provided by SCTP with its one-to-one
style interface.
7. Many of the features of TCP are included in SCTP.
6 7 8
5
receiver
buffer
Application
12
Association
Figure 8.4: Multistreaming
c USQ, August 1, 2014
8.3. SCTP functions 73
8.3 SCTP functions
Standard sockets functions defined for TCP are not adequate for SCTP. There are
more than 10 additional elementary socket functions that can be used with SCTP.
8.3.1 SCTP sockets
There are two types of SCTP sockets: a one-to-one socket and a one-to-many
socket. The one-to-one style was developed to ease the porting of existing TCP
applications to SCTP.
The one-to-many style provides an application writer the ability to write a server
without managing a large number of socket descriptors.
8.3.2 sctp bindx Function
The sctp bindx function provides more flexibility by allowing an SCTP socket to
bind a particular subset of addresses.
int sctp_bindx(int sockfd, const struct sockaddr *addrs,
int addrcnt, int flags)
Note that the second argument, addr, is a pointer to a packed list of addresses.
8.3.3 sctp connectx Function
The sctp connectx function is used to connect to a multihomed peer. The sctp connectx
has a format as follows:
int sctp_connectx(int sockfd, const struct sockaddr *addrs,
int addrcnt)
Like the sctp bindx, the addrs parameter is a packed list of addresses. The SCTP
stack uses one or more of the given addresses for establishing the association.
c USQ, August 1, 2014
74 Module 8. SCTP Sockets and Programming
8.3.4 sctp getpaddrs and sctp getladdrs Function
When all the addresses are required, the sctp getaddrs function provides a mechanism for an application to retrieve all the addresses of a peer.
int sctp_getpaddrs(int sockfd, sctp_assoc_t id,
struct sockaddr **addrs)
The id is the association identification for a one-to-many-style socket. Note that
addrs is the address of a pointer that sctp getaddrs will fill in with a locally allocated, packed list of addresses.
Same as the
sctp getpaddrs in most of places, while the sctp getladdrs can be
used to retrieve the local addresses that are part of an association.
8.3.5 sctp freepaddrs and sctp freeladdrs
The sctp freepaddrs function frees resources allocated by the sctp getpaddrs function, while the sctp freeladdrs function frees resources allocated by the sctp getladdrs
function. They have the format as follows.
int sctp_freepaddrs(struct sockaddr *addrs)
void sctp_freeladdrs(struct sockaddr *addrs);
8.3.6 sctp sendmsg Function
The user must use the sendto, sendmsg or sctp sendmsg functions to send data. The
sctp sendmsg can be used to control various features of SCTP along with ancillary
data. The user of
sctp sendmsg has a greatly simplified sending method at the cost
of more arguments.
The
sctp sendmsg function has the following form:
ssize_t sctp_sendmsg(int sockfd, const void *msg, size_t msgsz,
const struct sockaddr *to, socklen_t tolen, uint 32_t ppid,
uint32_t flags, uint16_t stream, uint32_t timetolive,
uint32_t context);
c USQ, August 1, 2014
8.4. An Example of SCTP One-to-Many-Style Streaming 75
8.3.7 sctp recvmsg Function
Like sctp sendmsg, the sctp recvmsg function provides a more user friendly interface to the advanced SCTP features. Using this function allows a user to retrieve
not only its peer’s address, but also the
msg flags field that would normally accompany the recvmsg function.
8.4 An Example of SCTP One-to-Many-Style Streaming
One minor change will turn the TCP connection into SCTP association. In the
function Socket(),specify
IPPROTO SCTP instead of IPPROTO TCP as the third argument. However, simply making this change would not take advantage of any of
the additional features provided by SCTP except multihoming.
The example is a simple one-to-many SCTP streaming echo client and server.
Reading 8.A: Read §10.2 and §10.3 on Stevens [UNP3ev1, 288-
93].
Activity 8.B: Compile and run Figure 10.2 on page 289 and Figure 10.3 on page 291
On the client side:
A client reads a line of text from standard input and sends the line to the server. The
line follows the form
[#]text, where the number in brackets is the SCTP stream
number on which the text message should be sent.
The client reads the echoed line and prints it on its standard output, displaying the
stream number, stream sequence number, and the text string.
On the server side:
The server receives the text message from the network, increases the stream number on which the message arrived by one, and sends the text message back to the
client on this new stream number.
c USQ, August 1, 2014
76 Module 8. SCTP Sockets and Programming
8.4.1 SCTP four-way handshake
When the server is started, it opens a socket, binds to an address, calls listen to
enable client associations, and calls
sctp recvmsg, which blocks waiting for the
first message to arrive.
Reading 8.C: Read for the details of the source code in Figure
10.2
A client opens a socket and calls sctp sendto, which implicitly sets up the association and piggybacks the data request to the server on the third packet of the
four-way handshake. The server receives the request, and processes and sends
back a reply. The client receives the reply and close the socket, thus closing the
association. The server loops back to receive the next message.
Reading 8.D: Read the source code in Figure 10.3
8.4.2 Controlling the Number of Streams
Controling the number of streams an endpoint requests at association intialization
is done on the socket before an association is created. The default number for the
streams is set to 10.
Activity 8.E: Find out which line of the code in Figure 10.10
page 299 is to set the number of streams.
8.4.3 Controlling Termination
There are two alternative mechanisms for shutting down an association in the case
that the client application wishes not to close the socket even it is able to do so.
If a server wishes to to shut down an association after sending a message
8.5 Summary
This module covers the SCTP APIs and programming. SCTP is a relatively new
protocol, considering that it became an RFC in October 2000. Since then it has
found its way into all major operating systems.
You should have known:
c USQ, August 1, 2014
8.5. Summary 77
how to use SCTP to implement a network application
the advantages of using SCTP over TCP and UDP
the additional functions of SCTP sockets
SCTP has many advanced features such as autoclosing one-to-many style server,
partial delivery, notifications and unordered data delivery. For more detals, refer to
Chapter 22 on Stevens on [UNP3ev1, 587-642].
c USQ, August 1, 2014
78 Module 8. SCTP Sockets and Programming
c USQ, August 1, 2014