Perl, Sockets and TCP/IP Networking

Saturday, January 5, 2008

Perl, Sockets and TCP/IP Networking

Saturday, January 05, 2008 ANSOR No comments

An simplified introduction to sockets
Sockets are a mechanism that allows programs to communicate, either on the same machine or across a network. The way it works is pretty simple: Each machine on a network is identified by some address. In this tutorial we will talk about tcp/ip networking, so by network address we mean an IP address. (like 192.168.4.4) Apart from the IP address that specifies a machine, each machine has a number of ports that allow handling multiple connections simultaneously.

A program that wishes to receive a connection from another program, asks the operating system to create a socket and bind it to some port. Then the program sits and listens on the socket it has created to receive incoming connections. The other program also creates a socket for communicating witht he receiver. The caller needs to specify the IP address and the port number of the receiving end. If all goes well, and as we will see shortly, the two programs establish a communication through the network using their sockets. The two programs may exchange information, each by writing to and reading from the socket it has created.
Can I do this with Perl?
Sure. Perl provides support for the socket API natively. Although the interface is not that bad anyway, there is also a very convenient module, IO::Socket that works like a wrapper on the native API and provides a simpler and easier way to deal with sockets. We'll use IO::Socket in this tutorial to demonstrate writing two simple programs that communicate with sockets.

The Receiver
The first thing we need to do is to create a socket. We will use it to receive connections. The code below shows how to create a receiving socket. Note that we need to specify the local hostname and the port to which the socket will be bound. Of course, if the port is already in use this call will fail. Also note the 'Listen' parameter: this is the maximum number of connections that can be queued by the socket waiting for you to accept and process them. For the time being we will only accept a maximum of one connection at any time. (This means that a connection attempt while we're dealing with another connection, will return with an error like 'connection refused') Finally the 'Reuse' option tells the system to allow reuse of the port after the program exits. This is to ensure that if our program exits abnormally and does not properly close the socket, running it again will allow opening a new socket on the same port.
1 use IO::Socket;
2 my $sock = new IO::Socket::INET (
3 LocalHost => 'thekla',
4 LocalPort => '7070',
5 Proto => 'tcp',
6 Listen => 1,
7 Reuse => 1,
8 );
9 die "Could not create socket: $!\n" unless $sock;

Now the socket is ready to receive incoming connections. To wait for a connection, we use the accept() method which will return a new socket through which we can communicate with the calling program. Information exchange is achieved by reading/writing on the new socket. The socket can be treated like a regular filehandle.

10 my $new_sock = $sock->accept();
11 while(<$new_sock>) {
12 print $_;
13 }
14 close($sock);

The Caller

The other side of the communication is even simpler. All we need to do is to create a socket specifying the remote address and port. The constructor will return a socket object after the connection has been etablished, and we may start sending data right away by writing onto the socket like any other filehandle.

1 use IO::Socket;
2 my $sock = new IO::Socket::INET (
3 PeerAddr => 'asomatos',
4 PeerPort => '7070',
5 Proto => 'tcp',
6 );
7 die "Could not create socket: $!\n" unless $sock;
8 print $sock "Hello there!\n";
9 close($sock);

Go ahead and try it!
You can easily try out the example programs above. All you need to do is execute the receiver first and then the sender. On the receiver end, you will see the line "Hello there!" printed on the terminal screen. If you do not have a network, you can still use 'localhost' for the hostname of both the receiver and caller just to test it out.

Synchronization
An important issue to consider in this style of communication is that the two ends must follow a commonly-agreed procedure of data exchange. Otherwise it is very easy to end up in a deadlock situation where either both ends try to read, or both ends try to write. Ther is no way to guess whether the other end has finshed sending data, unless there is some protocol of communication between them that denotes logical sections of the communication in the contents of the transmitted messages. In the example above, the model is very simplistic: The caller sends a message and closes its end of the connection, while the receiver just reads the data until it's all finished.
Usually client-server transactions consist of a caller (the client) sending a request, followed by the receiver (server) sending back a reply. In order for the server to understand when the request is finished, there is some agreed marker (such as two consecutive empty lines, or a line saying "END REQUEST") that denotes the last line in the request. The server, starts sending its reply only after this line has been received, and afterwards closes the socket. The receiver, after sending the entire request switches to reading until the socket is closed, so it will receive the reply. More complex schemes can be established in a similar mannere, according to your needs and taste. What is important is to make sure that the two ends of the communication have a way of knowing when to speak and when to listen, thus avoiding the possibility of getting locked up.