Skip to content
ninjudd edited this page Nov 22, 2011 · 19 revisions

nREPL has been around for some bit over a year now, with moderate success in attaining its objective to provide a tool-agnostic networked Clojure REPL implementation. Current events are such that now may be a good time to apply some of the lessons learned over that period to maximize nREPL’s applicability and reach.

Warning

Apologies

This process will result in breakage. For this, I apologize, especially to those that have implemented nREPL clients for other platforms/languages:

Martin, Meikel, and anyone else out there that has taken it upon themselves to write an nREPL client (get in touch if you have!): I hope you will come to support the changes in nREPL.Next, and I welcome your input in particular as design discussions move along. If it is any consolation, many of the proposed changes are intended to maximize nREPL’s interoperability even further, including making it even easier to implement non-Clojure nREPL clients.

Background

Before jumping in, please take the time to:

  • read and understand the current nREPL implementation as documented in its README,

  • ideally, peruse the original nREPL design notes that drove the considerations leading to that implementation

  • check out this thread from the main Clojure ML where a variety of topics were addressed around the current wire protocol

Problems

Let’s be clear about what doesn’t work about nREPL before making changes to "fix" things.

Its protocol is structurally unsuitable for some tasks, and may be a barrier to client implementation

It has become clear that nREPL’s underlying protocol is no longer suitable.

Recent advancements (in particular, enabling "richer interactions") have exposed some practical shortcomings. Most painfully, the use of strings as the only value type in passed messages has led to unreasonable complexity and overhead when returning (and perhaps, in the future, sending) binary data, which necessitates applying base64-coding on both sides of a connection above and beyond the wire protocol itself.

A solution to this problem should allow for binary data to eventually be effectively streamed to (and from?) the nREPL client.

In addition, the original design process for nREPL largely punted on rigorously obtaining input on protocol design and soliciting input from those that had protocol-sensitive use cases and/or limited potential client implementation languages/environments, now clearly a very serious mistake. Specific concerns in this department include:

  • the current nREPL protocol is unnecessarily line-based, making implementations challenging in contexts wherein processing lines of text is not a sort of well-supported abstraction

  • local APIs sometimes make working with fixed-length or defined-length messages easier and/or more efficient

Sessions are tied to connections

"Session" data (i.e. environment that is bound to key vars for the duration of client evaluations) is currently associated with a socket connection to the REPL server. This:

  • presents problems for various clients that cannot reasonably maintain an open connection (e.g. vimclojure, perhaps others)

  • makes it impossible (or unreasonably difficult, at least) to "clone" or "migrate" a known session to another connection (as with portal's fork command)

  • represents a likely (guaranteed?) point of complexity for building an nREPL endpoint for services that do not have socket-connection semantics (e.g. HTTP, STOMP, JMX)

Implementation is hopelessly tied up with sockets

There’s no reason why the core nREPL implementation and its evaluation semantics couldn’t be used with multiple transports. Sockets are likely the base case (certainly from a dependency standpoint), but that’s just a start. The current implementation provides no point of abstraction for implementing or using "alternative" transports. These might reasonably include:

  • HTTP

  • JMX

  • STOMP and other mq-related formats/protocols

Emacs/SLIME/swank

The majority of Clojure programmers use Emacs with SLIME and swank, and they are currently locked out of any environment that uses nREPL just as anyone that uses nREPL-based tooling (Eclipse + Counterclockwise, vimclojure, jark…) is locked out of any environment that uses swank. If nREPL is going to fulfill its objective of providing a common protocol for REPL interoperability across tooling environments, we must find some way to bridge these two worlds.

(The practical upshot is significant: perhaps someday we can dispose of the parochialism of e.g. lein-swank, lein-nrepl, and the command-line-only lein repl. What if all of our tools (including a command-line client) could talk to a service started by a unified lein repl command and invocation?)

I’m open to all options for bridging this divide, and look forward to hearing from the swank and SLIME/elisp wizards among us. That said, it is understood that it is possible that technical or nontechnical factors surrounding the SLIME/swank codebase and development process may prove to be insurmountable at this time.

Unknowns

ClojureScript

Where does ClojureScript’s browser-repl fit in? The execution model of browser-repl is fundamentally different than anything in Clojure-land AFAICT (using polling of the cljs repl to get forms to execute browser-side, if I’m understanding things properly), so perhaps ne’er the twain shall meet. It is notable that portal does have a clojurescript client which we plan to port to use Nrepl.next.

Versioning

The question of versioning was punted when originally designing nREPL and its protocol. Now that we’re considering a revision cycle, and existing clients will likely just break hard with nREPL.Next and with no sensible indication why, providing a sane versioning mechanism is a priority (if only to allow clients to provide some kind of useful guidance to users). Pointers to known-good approaches and recommended practices are most welcome.

out / err

Given Clojure 1.3+ and its binding conveyance, content sent to out and err are properly returned to the nREPL client even after a sent expression is evaluated, as long as that content is being sent from an agent or future (as opposed to a bare JVM Thread, which, without intervention, will just dump data to System/out and System/err).

Is this sufficient? Other REPL implementations (Cake) take pains to multiplex System/out and System/err so that clients are delivered content sent to those writers, regardless of its source. Even given binding conveyance, being able to receive that content is useful, especially in cases where the REPL is deployed remotely and without a way to subscribe to these streams, one must log in another way to view e.g. log files.

Here is a link to how Cake supported multi-outstreams for reference. A proxied BufferedOutputStream is created around the outs and errs vars so that rebinding these will cause System/out and System/err to go to a different location for the current thread. This code could be pulled up into Clojure itself to provide multiplexing for all Clojure programs.

in

nREPL provides very limited means for getting stdin data to a context (messages may provide that data in an :in slot in a request, but there is no way for the remote side to request it). Swank provides a model where any attempt to read from in prompts the connected tool to supply data for stdin in a later message. nREPL can certainly duplicate this, but is it sufficient? It may not be a general solution (e.g. can’t read just a single character off of stdin, only lines at a time?).

Portal provides a command for sending stdin separately from the eval command, but it does suffer from the problem of not knowing when the remote code is trying to read from in. A mechanism like swank uses would solve this problem.

Unknown unknowns

Take this opportunity to address any further lingering issues that prevent nREPL from being the canonical network REPL implementation for Clojure tooling and applications.

Strawman proposal

The below is an initial proposal that is the result of discussions with users, tool builders, nREPL client implementors, and implementors of other Clojure network REPLs, but it is provided fundamentally to motivate discussion, enhancements, and/or full counter-proposals. Tear it apart.

Retain the "good parts" of nREPL as it sits today

nREPL 0.0.x got a lot right:

  • asynchronous evaluation model

  • message-based protocol

  • generally easy to implement clients for

  • simple model: every message just evaluates code. No privileged "commands"; "meta" operations (impacting the REPL server or session itself) are performed by evaluating code that touches well-defined REPL server APIs.

    • jlb: I agree with the spirit of this point, but only having a single command seems overly limiting to me. For example, how can we send stdin asynchronously with only one command? Or fork and close sessions? The number of commands should be as small as possible, but I don’t think there’s a good reason to keep it to just one. Portal, as an example, has four commands: eval, stdin, fork and close.

  • generally assumes nothing about its usage and context

    • e.g. can be used interactively as well as for supporting tooling

  • functionally dependency-free for the base case of socket-based transport

  • etc.

More of that, please. The remainder of this strawman is essentially a diff with the existing nREPL "spec" and implementation as a baseline.

New wire protocol: switch to bytestrings, retain messages (most) semantics

The current nREPL protocol is fundamentally textual, requires escaping of strings, and costly encoding of binary data. An example request message:

2
"id"
"foo"
"code"
"(println 5)"

…which corresponds to this Clojure map:

{:id "foo" :code "(println 5)"}

Using netstrings (originally suggested by James Reeves) but retaining the fundamental structure of nREPL messages would lead to this transliteration (linebreaks added for clarity, and should not be taken to add to the byte count as indicated by the netstring header integers):

43:
2:id,
3:foo,
4:code,
12:(println 5),

Put more formally, each message would be expressed as one netstring that consists of 2n netstrings, where n is the number of key/value pairs that are to be found in that message. Each key and value is provided as a separate netstring. The "outer" netstring provides the cumulative size of the message, allowing one to allocate buffers as necessary in environments where that is helpful.

Note

It has been suggested that message-size prefixes be padded to a fixed length (e.g. 0000043 instead of 43). This is in conflict with the specification of netstrings (and bencode, discussed later). Is there any value in adopting such a fixed-length prefix?

Beyond this, the existing semantics specified by nREPL (see the protocol discussion in the README) should be retained. In particular, a recent addition there allows for the description of sequences based on values of repeated keys in messages, so this:

66:
2:id,
3:foo,
4:code,
12:(println 5),
8:accepts,
3:png,
8:accepts,
4:jpeg,

Would correspond to this Clojure map containing a vector for the :accepts entry:

{:id "foo"
 :code "(println 5)"
 :accepts ["png", "jpeg"]}

The flexibility provided by a message-based protocol that allows for an open set of slots has proven very useful (e.g. when implementing the "rich interactions" previously mentioned). In general, striking a reasonable balance between representing Clojure-idiomatic data structures and a minimum of encoding overhead/gymnastics is desirable.

All message values must be UTF8-encoded byte ranges by default. Unencoded binary values must be indicated by including their keys in an unencoded slot that precedes them in the message so that each message’s content is self-describing outside of any particular environment, e.g. (again, linebreaks added only for clarity):

161:
2:id,
3:foo,
10:unencoded,
11:other-data,
10:unencoded,
13:overtone-data,
13:overtone-data,
588:...unencoded binary data...,
11:other-data,
18:...unencoded binary data...,
Note

The netstring-based protocol described here, especially given the semantics of sequences, suggests that using Bencode (of which netstrings are essentially a part) would be far preferable insofar as we would be able to express arbitrary compositions of maps and sequences/vectors. Necessary additions to Bencode would be:

  • implication of UTF-8 (we do not want to get into variable character encodings, just not worth it AFAICT),

  • …therefore, a continued requirement for an unencoded slot to specify values that should not be decoded as UTF-8

  • the addition of the prefixed cumulative message length (making the entire message a netstring) as discussed earlier so as to benefit those that need to allocate read buffers efficiently.

    • jlb: I don’t think this field is necessary. A bencode parser does not need to know the size of the entire message, and I can’t think of a scenario where allocating a buffer for the entire message would make parsing more efficient.

Here is the :accepts example from above in bencode format (with newlines added for readability)

d
2:id
3:foo
4:code
12:(println 5)
8:accepts
l3:png4:jpege
e
Note

I just came across tagged netstrings. Maybe something similar would also be possible, however with type tag again a netstring (to allow more than one char). The type tag could correspond to the accepts tags. This would eliminate the :unencoded field, because the type tag would basically define that. Similar to clojure "jpeg" would be a global type tag, "my-prefix/datatype" would be a qualified one for custom data.

(Meikel, 20111118T12:18+0100)

The existing nREPL status of done should be eliminated, since it is impossible to know when an agent/future/etc spawned by a particular evaluation will send content to out or err.

Use agents to retain/manage "sessions" across connections

Currently, when a client disconnects from an nREPL server, its "session" (essentially, all dynamic scope associated with their REPL) disappears. Lifting that state up into an agent that client requests can return to across connections (and therefore on top of transports where connections are generally not persistent) is desirable. See portal.server for a sample implementation of this agent-based strategy.

This will require some implementation details re: holding those agents, identifying them using e.g. an opaque string ID, and allowing for clients to:

  • 1) Specify their "session" ID for any particular message (which defines the context/agent within which an evaluation will occur).

  • 2) Additionally specify a message ID (as nREPL requires now), so that responses can be paired back up with prior requests.

  • 3) (Optionally) "fork" an existing session, which would create a copy of the named existing session’s data for a new context/agent.

Without #3, asynchronous evaluations from a single client would not be possible under this model while maintaining user expectations re: the permanence of the state of the dynamic environment; i.e. using the same session ID for multiple long-lived evaluations will result in those evaluations happening serially through the same agent, and just using a different session ID for each evaluation will result in set! effects not propagating to later / concurrent evaluations.

Warning

The error conditions associated with agents are significantly more complicated than nREPL’s current 1:1 relationship between connection and session, and deserve some consideration. In particular:

  • How do we handle aging and disposal of environments?

  • What does it look like to a client when they attempt to evaluate with an already-disposed environment?

  • How can we make "environment management" as simple as possible so that desirable interactive semantics (esp. re: asynchronous evaluations) are the default?

    • Is this as easy as carrying around the "same" environment by default throughout a single socket connection?

  • Insofar as the aim here is to enable e.g. nREPL over HTTP/JMX/STOMP, how does environment ID and its semantics mesh with the various session identifiers and semantics present in each of those protocols and the systems they are usually hooked up to?

Looking at how e.g. web servers manage in-memory sessions would be instructive (timeouts, max-sessions, local policies), as many of the issues are fundamentally the same.

Clone this wiki locally