CSE 127: Lecture 19

The topics covered in this lecture are: networks and payloads, rpc stubs, and data marshalling.

Networks and payloads

We have been discussing the TCP and IP layers of the internet (because RPC is built on them). Recall that TCP is a layer on top of IP, and that TCP is a stream-oriented protocol. That means that the TCP stack handles breaking up data into packets for the sender, and reassembling them for the receiver. There are other layers on top of IP, such as UDP, which is not stream-oriented.

So the IP protocol handles getting individual packets between hosts that are communicating, the TCP protocol on top of that handles fragmentation and keeping the stream in sync. On top of TCP, you may impose your own record types (as we do in RPC) to fit your application. Thus the structure of a data packet is:

application data
TCP data
IP data
And below that there may be further information, such as the ethernet data (but we won't go into that).

The application portion of the payload for RPC will look something like this:

RPC # count arguments
Where the RPC number is a unique identifier which indicates the number of the service being called, count is the number of bytes remaining in the RPC payload, and arguments is a specially-formatted array that contains the arguments for the RPC function.

RPC stubs

To create RPC applications easily, we have to allow the user to write the client and server portions without worrying about the network interface. This includes sending requests, receiving requests, blocking, and handling data translations. Normally, an RPC package will be able to generate stubs which are functions that handle all this stuff we don't want to worry about, and leave us with only the application code to write.

To generate a stub, we give the RPC stub generator a specification, much like a declaration of a function in C. For example, we could have the following declaration:

int my_function(IN int x, IN char *y, OUT char *z);
While this looks like C declaration, it isn't. The IN and OUT declarations tell the RPC stub generator the types of arguments these are. There can also parameters that are declared IN and OUT at the same time.

Given such a declaration, the stub generator will generate several things in C code (or another language):

  1. client code to transparently call my_function() across the network
  2. server code to recognize and receive the call to my_function()
  3. an empty function on the server side called my_function() that has the required parameters
So when we are writing the client, we can make transparent calls to the client's version of my_function(). When writing the server, we can fill in the empty function my_function() to do the work we want, and trust that RPC will handle the rest (networking, data handling).

How does RPC know which function to call on the server? That's the RPC # that we saw in the above payload. The server gets a request, and then it checks to see if it has a function (such as my_function() registered with that RPC #. If it does, then it unpacks the arguments in a way particular to that function, and calls the function that the user wrote. It takes the output from the user function, packages it, and sends it back to the client.

Data marshalling

So far we have talked around the fact that RPC handles the data for us. This is an important point. First, we need to recognize that an integer on an Intel machine (little-endian) is not represented the same way as an integer on a Sparc machine (big-endian). So what happens if a client on an Intel machine tries to send an RPC request to a server on a Sparc machine, with an integer parameter?

Fortunately, RPC handles a lot of this for us. It uses functions such as the following to translate long and short integers:

Here, the function htonl() converts a long integer (4 bytes) into a standardized (which is usually big-endian) so it can be transported across the network. The function ntohl() does the inverse. Additionally, RPC knows how to handle other primitives like char *. However, if you are going to use any fancy data structures that RPC doesn't know about, you have to write your own routines to handle packing and unpacking the data.

The process of taking data from a host and converting it for transport across the network is called data marshalling. The reverse process of taking data from the network and converting it to the host for processing is called data unmarshalling.

RPC handles these things for us, in the following manner. For the client, when an RPC function is called:

  1. marshall the arguments to the function
  2. send rpc request over the network
  3. block until reply received
  4. unmarshall the data in the reply
  5. return to the calling function

And for the server, it does the following:

  1. blocks, waiting for a request
  2. receives request, finds the requested service
  3. unmarshalls the arguments for the function
  4. calls the user-supplied server function
  5. marshalls the return values from the user function
  6. sends a reply to the client with the data to return

[ search CSE | CSE | bsy's home page | links | webster | MRQE | google | yahoo | citeseer | certserver ]
picture of bsy

bsy+cse127w02@cs.ucsd.edu, last updated Mon Mar 25 15:22:09 PST 2002. Copyright 2002 Bennet Yee.
email bsy.

Don't make me hand over my privacy keys!