10 RMI Wire Protocol
- Overview
- RMI Transport Protocol
- RMI's Use of Object Serialization
- RMI's Use of HTTP POST Protocol
- Application-Specific Values for RMI
- RMI's Multiplexing Protocol
10.1 Overview
The RMI protocol makes use of two other protocols for its on-the-wire format: Java Object Serialization and HTTP. The Object Serialization protocol is used to marshal call and return data. The HTTP protocol is used to "POST" a remote method invocation and obtain return data when circumstances warrant. Each protocol is documented as a separate grammar. Nonterminal symbols in production rules may refer to rules governed by another protocol (either Object Serialization or HTTP). When a protocol boundary is crossed, subsequent productions use that embedded protocol.
Notes about Grammar Notation
- We use a similar notation to that used in The Java Language Specification.
- Control codes in the stream are represented by literal values expressed in hexadecimal.
- Some nonterminal symbols in the grammar represent application specific values supplied in a method invocation. The definition of such a nonterminal consists of its Java programming language type. A table mapping each of these nonterminals to its respective type follows the grammar.
10.2 RMI Transport Protocol
The wire format for RMI is represented by a Stream
. The terminology adopted here reflects a client perspective. Out
refers to output messages and In
refers to input messages. The contents of the transport header are not formatted using object serialization.
Stream:
Out
In
The input and output streams used by RMI are paired. Each Out
stream has a corresponding In
stream. An Out
stream in the grammar maps to the output stream of a socket (from the client's perspective). An In
stream (in the grammar) is paired with the corresponding socket's input stream. Since output and input streams are paired, the only header information needed on an input stream is an acknowledgment as to whether the protocol is understood; other header information (such as the magic number and version number) can be implied by the context of stream pairing.
10.2.1 Format of an Output Stream
An output stream in RMI consists of transport Header
information followed by a sequence of Messages
. Alternatively, an output stream can contain an invocation embedded in the HTTP protocol.
Out:
Header Messages
HttpMessage
Header:
0x4a 0x52 0x4d 0x49 Version Protocol
Version:
0x00 0x01
Protocol:
StreamProtocol
SingleOpProtocol
MultiplexProtocol
StreamProtocol:
0x4b
SingleOpProtocol:
0x4c
MultiplexProtocol:
0x4d
Messages:
Message
Messages Message
The Messages
are wrapped within a particular protocol as specified by Protocol
. For the SingleOpProtocol
, there may only be one Message
after the Header
, and there is no additional data that the Message
is wrapped in. The SingleOpProtocol
is used for invocation embedded in HTTP requests, where interaction beyond a single request and response is not possible.
For the StreamProtocol
and the MultiplexProtocol
, the server must respond with a byte 0x4e
acknowledging support for the protocol, and an EndpointIdentifier
that contains the host name and port number that the server can see is being used by the client. The client can use this information to determine its host name if it is otherwise unable to do that for security reasons. The client must then respond with another EndpointIdentifier
that contains the client's default endpoint for accepting connections. This can be used by a server in the MultiplexProtocol
case to identify the client.
For the StreamProtocol
, after this endpoint negotiation, the Messages
are sent over the output stream without any additional wrapping of the data. For the MultiplexProtocol
, the socket connection is used as the concrete connection for a multiplexed connection, as described in Section 10.6, "RMI's Multiplexing Protocol". Virtual connections initiated over this multiplexed connection consist of a series of Messages
as described below.
There are three types of output messages: Call
, Ping
and DgcAck
. A Call
encodes a method invocation. A Ping
is a transport-level message for testing liveness of a remote virtual machine. A DgcAck
is an acknowledgment directed to a server's distributed garbage collector that indicates that remote objects in a return value from a server have been received by the client.
Message:
Call
Ping
DgcAck
Call:
0x50 CallData
Ping:
0x52
DgcAck:
0x54 UniqueIdentifier
10.2.2 Format of an Input Stream
There are currently three types of input messages: ReturnData
, HttpReturn
and PingAck
. ReturnData
is the result of a "normal" RMI call. An HttpReturn
is a return result from an invocation embedded in the HTTP protocol. A PingAck
is the acknowledgment for a Ping
message.
In:
ProtocolAck Returns
ProtocolNotSupported
HttpReturn
ProtocolAck:
0x4e
ProtocolNotSupported:
0x4f
Returns:
Return
Returns Return
Return:
ReturnData
PingAck
ReturnData:
0x51 ReturnValue
PingAck:
0x53
10.3 RMI's Use of Object Serialization
Call and return data in RMI calls are formatted using the Java Object Serialization protocol. Each method invocation's CallData
is written to a Java object output stream that contains the ObjectIdentifier
(the target of the call), an Operation
(a number representing the method to be invoked), a Hash
(a number that verifies that client stub and remote object skeleton use the same stub protocol), followed by a list of zero or more Arguments
for the call.
In the JDK1.1 stub protocol the Operation
represents the method number as assigned by rmic,
and the Hash
was the stub/skeleton hash which is the stub's interface hash. As of the Java 2 stub protocol (Java 2 stubs are generated using the -v1.2
option with rmic
), Operation
has the value -1 and the Hash
is a hash representing the method to call. The hash is described in the section "The RemoteRef
Interface".
CallData:
ObjectIdentifier Operation Hash Arguments[opt]
ObjectIdentifier:
ObjectNumber UniqueIdentifier
UniqueIdentifier:
Number Time Count
Arguments:
Value
Arguments Value
Value:
Object
Primitive
A ReturnValue
of an RMI call consists of a return code to indicate either a normal or exceptional return, a UniqueIdentifier
to tag the return value (used to send a DGCAck
if necessary) followed by the return result: either the Value
returned or the Exception
thrown.
ReturnValue:
0x01 UniqueIdentifier Value[opt]
0x02 UniqueIdentifier Exception
Note: ObjectIdentifier
, UniqueIdentifier,
and EndpointIdentifier
are not written out using default serialization, but each uses its own special write
method (this is not the writeObject
method used by object serialization); the write
method for each type of identifier adds its component data consecutively to the output stream.
10.3.1 Class Annotation and Class Loading
RMI overrides the annotateClass
and resolveClass
methods of ObjectOutputStream
and ObjectInputStream
respectively. Each class is annotated with the codebase URL (the location from which the class can be loaded). In the annotateClass
method, the classloader that loaded the class is queried for its codebase URL. If the classloader is non-null
and the classloader has a non-null
codebase, then the codebase is written to the stream using the ObjectOutputStream.writeObject
method; otherwise a null
is written to the stream using the writeObject
method. Note: as an optimization, classes in the "java
" package are not annotated, since they are always available to the receiver.
The class annotation is resolved during deserialization using the ObjectInputStream.resolveClass
method. The resolveClass
method first reads the annotation via the ObjectInputStream.readObject
method. If the annotation, a codebase URL, is non-null
, then it obtains the classloader for that URL and attempts to load the class. The class is loaded by using a java.net.URLConnection
to fetch the class bytes (the same mechanism used by a web browser's applet classloader).
10.4 RMI's Use of HTTP POST Protocol
The implementation of RMI calls through firewalls via proxies has been removed as of JDK 9.
10.5 Application-Specific Values for RMI
This table lists the nonterminal symbols that represent application-specific values used by RMI. The table maps each symbol to its respective type. Each is formatted using the protocol in which it is embedded.
Symbol | type |
---|---|
Count |
short |
Exception |
java.lang.Exception |
Hash |
long |
Hostname |
UTF |
Number |
int |
Object |
java.lang.Object |
ObjectNumber |
long |
Operation |
int |
PortNumber |
int |
Primitive |
byte , int , short , long ... |
Time |
long |
10.6 RMI's Multiplexing Protocol
The purpose of multiplexing is to provide a model where two endpoints can each open multiple full duplex connections to the other endpoint in an environment where only one of the endpoints is able to open such a bidirectional connection using some other facility (e.g., a TCP connection). RMI use this simple multiplexing protocol to allow a client to connect to an RMI server object in some situations where that is otherwise not possible. For example, some security managers for applet environments disallow the creation of server sockets to listen for incoming connections, thereby preventing such applets to export RMI objects and service remote calls from direct socket connections. If the applet can open a normal socket connection to its codebase host, however, then it can use the multiplexing protocol over that connection to allow the codebase host to invoke methods on RMI objects exported by the applet. This section describes how the format and rules of the multiplexing protocol.
10.6.1 Definitions
This sections defines some terms as they are used in the rest of the description of the protocol.
An endpoint is one of the two users of a connection using the multiplexing protocol.
The multiplexing protocol must layer on top of one existing bidirectional, reliable byte stream, presumably initiated by one of the endpoints to the other. In current RMI usage, this is always a TCP connection, made with a java.net.Socket
object. This connection will be referred to as the concrete connection.
The multiplexing protocol facilitates the use of virtual connections, which are themselves bidirectional, reliable byte streams, representing a particular session between two endpoints. The set of virtual connections between two endpoints over a single concrete connection comprises a multiplexed connection. Using the multiplexing protocol, virtual connections can be opened and closed by either endpoint. The state of an virtual connection with respect to a given endpoint is defined by the elements of the multiplexing protocol that are sent and received over the concrete connection. Such state involves if the connection is open or closed, the actual data that has been transmitted across, and the related flow control mechanisms. If not otherwise qualified, the term connection used in the remainder of this section means virtual connection.
A virtual connections within a given multiplexed connection is identified by a 16 bit integer, known as the connection identifier. Thus, there exist 65,536 possible virtual connections in one multiplexed connection. The implementation may limit the number of these virtual connections that may be used simultaneously.
10.6.2 Connection State and Flow Control
Connections are manipulated using the various operations defined by the multiplexing protocol. The following are the names of the operations defined by the protocol: OPEN, CLOSE, CLOSEACK, REQUEST, and TRANSMIT. The exact format and rules for all the operations are detailed in Section 10.6.3, "Protocol Format".
The OPEN, CLOSE, and CLOSEACK operations control connections becoming opened and closed, while the REQUEST and TRANSMIT operations are used to transmit data across an open connection within the constraints of the flow control mechanism.
Connection States
A virtual connection is open with respect to a particular endpoint if the endpoint has sent an OPEN operation for that connection, or it has received an OPEN operation for that connection (and it had not been subsequently closed). The various protocol operations are described below.
A virtual connection is pending close with respect to a particular endpoint if the endpoint has sent a CLOSE operation for that connection, but it has not yet received a subsequent CLOSE or CLOSEACK operation for that connection.
A virtual connection is closed with respect to a particular endpoint if it has never been opened, or if it has received a CLOSE or a CLOSEACK operation for that connection (and it has not been subsequently opened).
Flow Control
The multiplexing protocol uses a simple packeting flow control mechanism to allow multiple virtual connections to exist in parallel over the same concrete connection. The high level requirement of the flow control mechanism is that the state of all virtual connections is independent; the state of one connection may not affect the behavior of others. For example, if the data buffers handling data coming in from one connection become full, this cannot prevent the transmission and processing of data for any other connection. This is necessary if the continuation of one connection is dependent on the completion of the use of another connection, such as would happen with recursive RMI calls. Therefore, the practical implication is that the implementation must always be able to consume and process all of the multiplexing protocol data ready for input on the concrete connection (assuming that it conforms to this specification).
Each endpoint has two state values associated with each connection: how many bytes of data the endpoint has requested but not received (input request count) and how many bytes the other endpoint has requested but have not been supplied by this endpoint (output request count).
An endpoint's output request count is increased when it receives a REQUEST operation from the other endpoint, and it is decreased when it sends a TRANSMIT operation. An endpoint's input request count is increased when it sends a REQUEST operation, and it is decreased when it receives a TRANSMIT operation. It is a protocol violation if either of these values becomes negative.
It is a protocol violation for an endpoint to send a REQUEST operation that would increase its input request count to more bytes that it can currently handle without blocking. It should, however, make sure that its input request count is greater than zero if the user of the connection is waiting to read data.
It is a protocol violation for an endpoint to send a TRANSMIT operation containing more bytes that its output request count. It may buffer outgoing data until the user of the connection requests that data written to the connection be explicitly flushed. If data must be sent over the connection, however, by either an explicit flush or because the implementation's output buffers are full, then the user of the connection may be blocked until sufficient TRANSMIT operations can proceed.
Beyond the rules outlined above, implementations are free to send REQUEST and TRANSMIT operations as deemed appropriate. For example, an endpoint may request more data for a connection even if its input buffer is not empty.
10.6.3 Protocol Format
The byte stream format of the multiplexing protocol consists of a contiguous series of variable length records. The first byte of the record is an operation code that identifies the operation of the record and determines the format of the rest of its content. The following legal operation codes are defined:
value | name |
---|---|
0xE1 | OPEN |
0xE2 | CLOSE |
0xE3 | CLOSEACK |
0xE4 | REQUEST |
0xE5 | TRANSMIT |
It is a protocol violation if the first byte of a record is not one of the defined operation codes. The following sections describe the format of the records for each operation code.
OPEN operation
This is the format for records of the OPEN operation:
size (bytes) | name | description |
---|---|---|
1 | opcode | operation code (OPEN) |
2 | ID | connection identifier |
An endpoint sends an OPEN operation to open the indicated connection. It is a protocol violation if ID refers to a connection that is currently open or pending close with respect to the sending endpoint. After the connection is opened, both input and request count states for the connection are zero for both endpoints.
Receipt of an OPEN operation indicates that the other endpoint is opening the indicated connection. After the connection is opened, both input and output request count states for the connection are zero for both endpoints.
To prevent identifier collisions between the two endpoints, the space of valid connection identifiers is divided in half, depending on the value of the most significant bit. Each endpoint is only allowed to open connections with a particular value for the high bit. The endpoint that initiated the concrete connection must only open connections with the high bit set in the identifier and the other endpoint must only open connections with a zero in the high bit. For example, if an RMI applet that cannot create a server socket initiates a multiplexed connection to its codebase host, the applet may open virtual connections in the identifier range 0x8000-7FFF, and the server may open virtual connection in the identifier range 0-0x7FFF.
CLOSE operation
This is the format for records of the CLOSE operation:
size (bytes) | name | description |
---|---|---|
1 | opcode | operation code (OPEN) |
2 | ID | connection identifier |
An endpoint sends a CLOSE operation to close the indicated connection. It is a protocol violation if ID refers to a connection that is currently closed or pending close with respect to the sending endpoint (it may be pending close with respect to the receiving endpoint if it has also sent a CLOSE operation for this connection). After sending the CLOSE, the connection becomes pending close for the sending endpoint. Thus, it may not reopen the connection until it has received a CLOSE or a CLOSEACK for it from the other endpoint.
Receipt of a CLOSE operation indicates that the other endpoint has closed the indicated connection, and it thus becomes closed on the receiving endpoint. Although the receiving endpoint may not send any more operations for this connection (until it is opened again), it still should provide data in the implementation's input buffers to readers of the connection. If the connection had previously been open instead of pending close, the receiving endpoint must respond with a CLOSEACK operation for the connection.
CLOSEACK operation
The following is the format for records with the CLOSEACK operation:
size (bytes) | name | description |
---|---|---|
1 | opcode | operation code (OPEN) |
2 | ID | connection identifier |
An endpoint sends a CLOSEACK operation to acknowledge a CLOSE operation from the receiving endpoint. It is a protocol violation if ID refers to a connection that is not pending close for the receiving endpoint when the operation is received.
Receipt of a CLOSEACK operation changes the state of the indicated connection from pending close to closed, and thus the connection may be reopened in the future.
REQUEST operation
This is the format for records of the REQUEST operation:
size (bytes) | name | description |
---|---|---|
1 | opcode | operation code (OPEN) |
2 | ID | connection identifier |
4 | count | number of additional bytes requested |
An endpoint sends a REQUEST operation to increase its input request count for the indicated connection. It is a protocol violation if ID does not refer to a connection that is open with respect to the sending endpoint. The endpoint's input request count is incremented by the value count. The value of count is a signed 32 bit integer, and it is a protocol violation if it is negative or zero.
Receipt of a REQUEST operation causes the output request count for the indicated connection to increase by count. If the connection is pending close by the receiving endpoint, then any REQUEST operations may be ignored.
TRANSMIT operation
This is the format for records of the TRANSMIT operation.
size (bytes) | name | description |
---|---|---|
1 | opcode | operation code (OPEN) |
2 | ID | connection identifier |
4 | count | number of bytes in transmission |
count | data | transmission data |
An endpoint sends a TRANSMIT operation to actually transmit data over the indicated connection. It is a protocol violation if ID does not refer to a connection that is open with respect to the sending endpoint. The endpoint's output request count is decremented by the value count. The value of count is a signed 32 bit integer, and it is a protocol violation if it is negative or zero. It is also a protocol violation if the TRANSMIT operation would cause the sending endpoint's output request count to become negative.
Receipt of a TRANSMIT operation causes the count bytes of data to be added to the queue of bytes available for reading from the connection. The receiving endpoint's input request count is decremented by count. If this causes the input request count to become zero and the user of the connection is trying to read more data, the endpoint should respond with another REQUEST operation. If the connection is pending close by the receiving endpoint, then any TRANSMIT operations may be ignored.
Protocol Violations
If a protocol violation occurs, as defined above or if a communication error is detected in the concrete connection, then the multiplexed connection is shut down. The real connection is terminated, and all virtual connections become closed immediately. Data already available for reading from virtual connections may be read by the users of the connections.