Here is a pictorial view of the Thrift network stack:
+-------------------------------------------+ | cGRE | | Server | | (single-threaded, event-driven etc) | +-------------------------------------------+ | cBLU | | Processor | | (compiler generated) | +-------------------------------------------+ | cGRE | | Protocol | | (JSON, compact etc) | +-------------------------------------------+ | cGRE | | Transport | | (raw TCP, HTTP etc) | +-------------------------------------------+
The Transport layer provides a simple abstraction for reading/writing from/to the network. This enables Thrift to decouple the underlying transport from the rest of the system (serialization/deserialization, for instance).
Here are some of the methods exposed by the Transport interface:
-
open
-
close
-
read
-
write
-
flush
In addition to the Transport interface above, Thrift also uses a ServerTransport interface used to accept or create primitive transport objects. As the name suggest, ServerTransport is used mainly on the server side to create new Transport objects for incoming connections.
-
open
-
listen
-
accept
-
close
Here are some of the transports available for majority of the Thrift-supported languages:
-
file: read/write to/from a file on disk
-
http: as the name suggests
The Protocol abstraction defines a mechanism to map in-memory data structures to a wire-format. In other words, a protocol specifies how datatypes use the underlying Transport to encode/decode themselves. Thus the protocol implementation governs the encoding scheme and is responsible for (de)serialization. Some examples of protocols in this sense include JSON, XML, plain text, compact binary etc.
Here is the Protocol interface:
writeMessageBegin(name, type, seq)
writeMessageEnd()
writeStructBegin(name)
writeStructEnd()
writeFieldBegin(name, type, id)
writeFieldEnd()
writeFieldStop()
writeMapBegin(ktype, vtype, size)
writeMapEnd()
writeListBegin(etype, size)
writeListEnd()
writeSetBegin(etype, size)
writeSetEnd()
writeBool(bool)
writeByte(byte)
writeI16(i16)
writeI32(i32)
writeI64(i64)
writeDouble(double)
writeString(string)
name, type, seq = readMessageBegin()
readMessageEnd()
name = readStructBegin()
readStructEnd()
name, type, id = readFieldBegin()
readFieldEnd()
k, v, size = readMapBegin()
readMapEnd()
etype, size = readListBegin()
readListEnd()
etype, size = readSetBegin()
readSetEnd()
bool = readBool()
byte = readByte()
i16 = readI16()
i32 = readI32()
i64 = readI64()
double = readDouble()
string = readString()
Thrift Protocols are stream oriented by design. There is no need for any explicit framing. For instance, it is not necessary to know the length of a string or the number of items in a list before we start serializing them.
Here are some of the protocols available for majority of the Thrift-supported languages:
-
binary: Fairly simple binary encoding — the length and type of a field are encoded as bytes followed by the actual value of the field.
-
compact: Described in THRIFT-110
-
json:
A Processor encapsulates the ability to read data from input streams and write to output streams. The input and output streams are represented by Protocol objects. The Processor interface is extremely simple:
interface TProcessor {
bool process(TProtocol in, TProtocol out) throws TException
}
Service-specific processor implementations are generated by the compiler. The Processor essentially reads data from the wire (using the input protocol), delegates processing to the handler (implemented by the user) and writes the response over the wire (using the output protocol).
A Server pulls together all of the various features described above:
-
Create a transport
-
Create input/output protocols for the transport
-
Create a processor based on the input/output protocols
-
Wait for incoming connections and hand them off to the processor
Next we discuss the generated code for specific languages. Unless mentioned otherwise, the sections below will assume the following Thrift specification:
link:example.thrift[role=include]
In an earlier section, we saw how Thrift allows structs to contain other structs (no nested definitions yet though!) In most object-oriented and/or dynamic languages, structs map to objects and so it is instructive to understand how Thrift initializes nested structs. One reasonable approach would be to treat the nested structs as pointers or references and initialize them with NULL, until explicitly set by the user.
Unfortunately, for many languages, Thrift uses a 'pass by value' model. As a concrete example, consider the generated C++ code for the Tweet struct in our example above:
...
int32_t userId;
std::string userName;
std::string text;
Location loc;
TweetType::type tweetType;
std::string language;
...
As you can see, the nested Location structure is fully allocated inline. Because Location is optional, the code uses the internal '__isset' flags to determine if the field has actually been "set" by the user.
This can lead to some surprising and unintuitive behavior:
-
Since the full size of every sub-structure may be allocated at initialization in some languages, memory usage may be higher than you expect, especially for complicated structures with many unset fields.
-
The parameters and return types for service methods may not be "optional" and you can’t assign or return null in any dynamic language. Thus to return a "no value" result from a method, you must declare an envelope structure with an optional field containing the value and then return the envelope with that field unset.
-
The transport layer can, however, marshal method calls from older versions of a service definition with missing parameters. Thus, if the original service contained a method postTweet(1: Tweet tweet) and a later version changes it to postTweet(1: Tweet tweet, 2: string group), then an older client invoking the previous method will result in a newer server receiving the call with the new parameter unset. If the new server is in Java, for instance, you may in fact receive a null value for the new parameter. And yet you may not declare a parameter to be nullable within the IDL.