Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for interacting with Remote R sessions #109

Closed
wants to merge 7 commits into from

Conversation

dcharbon
Copy link
Contributor

Add support for interacting with running remote R sessions, such as RGui or RStudio. Requires and uses the R svSocket package to make requests and marshal data to/from the remote R session. Also, provides a remote R type provider.

The remote R session type provider support requires a different syntax than the original R type provider due to current limitations in the FSharp compiler. A future version of the FSharp compiler should support type extensions generated by type providers; this expected improvement in the compiler will enable the remote R type provider to provide a syntax and development experience much closer to the original local R session type provider.

@tpetricek
Copy link
Member

I'm happy to merge this (especially if @hmansell agrees!) but I'm not really sure if the current version is all that practically usable - mainly because I think that having to write e.g. RR.''base''.foo to access pretty much any function makes the API quite ugly.

I had two thoughts how to make this a bit more usable in the current version (it'll be much nicer when extension methods are supported - but even then, we might want to support alternative way for people with older F#).


Using local R for discovery

One option would be to use local installation of R for discovery of packages and remote R for execution. This should work nicely in the BM use case (when people want to connect to R session on local machine running in a GUI). You still need to have R locally, but you could write e.g.:

open RProvider.stats
open RProvider.``base``

let someWork (df) = 
  use session = RemoteR("localhost", 12345) // Using some global mutable state...
  R.as_data_frame(R.cor(df)).Value

Allow opening packages

Another option would be to allow opening packages by specifying their names in static parameters. This is quite similar to what you have but it makes the syntax more compatible:

type RRSession = new RemoteR<"localhost", 12345, "stats,base">
let R = new RRSession("localhost",12345)  

let someWork (df) = 
  R.as_data_frame(R.cor(df)).Value

Config file

The other option would be to look for some local config file and then you'd get exactly the same API as currently, but I think nobody was very enthusiastic about this option.


One thing I think would be useful would be to separate the remote R connection at compile-time and at run-time - see my example with CurrentR above. At compile-time, you might want to use different remote R port than at run-time (e.g. if you run the compiled code, I guess you might want to figure out the port number dynamically).

I sketched this above - where RRSession has a constructor that takes the runtime port - the static one can be default (i.e. we could provide overloaded constructor to override the default port).

@dcharbon
Copy link
Contributor Author

dcharbon commented Jul 7, 2014

@tpetricek I have a branch in my fork that implements exactly the comma-separated package syntax you describe above. It works nicely, but, is a bit strange.

I like the option above it much better. Perhaps the use of a global, mutable, session could be made more explicit? I can see two useful approaches.

Set global session once, explicitly

// SetRemoteSession() establishes the remote session to use, instead of a
// local R session, in global mutable state.
R.SetRemoteSession("localhost", 12345)

Scoped use of remote session

// InRemoteSession() creates and uses the remote session for the scope of the 
// provided function, disposes of the remote session when completed.
let result = R.InRemoteSession("localhost", 12345, fun () -> ...)

This should be pretty easy to implement; do you like it?
@hmansell What do you think?

@dcharbon
Copy link
Contributor Author

dcharbon commented Jul 7, 2014

On deeper reflection, there is a complication with a global, mutable, session approach: A RemoteSymbolicExpression is not a SymbolicExpression - it's a handle to a SymbolicExpression in the remote session. To make this work the functions generated for the R type would need to handle both types and perform appropriately. Further... calling code would have to deal with them, too; the return type of all R functions generated by the type provider would have to be a discriminated union, something like:

type RProviderExpression =
    | Local of SymbolicExpression
    | Remote of RemoteSymbolicExpression

Of course, that's a non-starter as it would break all current code.

Maybe we could establish a ... generic kind of RemoteR type that uses local discovery and represents a remote session.

open RProvider.``base``
  ...
// RR is a type provided similarly to R and must be runtime-configured to
// use a remote session. Discovery is done locally.
RR.UseRemoteSession("localhost", 12345)
RR.do_something_in_remote_session(some_arg, another_arg)

I think that my suggestion to have an InRemoteSession option isn't a good idea. The use of a global session type that can be configured is better as it will be closer, syntactically, to the desired type provider with extension functions. You could adopt in the future just by replacing the UseRemoteSession calls with type declarations, like

type RR = RemoteR<"localhost", 12345, false>

@hmansell
Copy link
Contributor

hmansell commented Jul 7, 2014

My thoughts:

  • I don't like the comma-separated list static parameter, or the config file. They don't allow iterative/exploratory development.
  • On reflection, I think doing reflection based on the local version is fine.
  • I prefer @tpetricek's use/IDisposable approach for defining sessions to @dcharbon's alternative suggestions.
  • I don't see a big issue with having remote symbolic expressions from multiple sessions. We must have a way of taking a remote symbolic expression and turning it into a local one, and vice-versa. So I should be able to pass a value from one session to another, right?

@dcharbon
Copy link
Contributor Author

dcharbon commented Jul 7, 2014

I was concerned about the use statement primarily in the threaded case. But, use of thread locals would resolve that concern. Though... nested calls could be a worry?

On the remote symbolic expressions being passed around - I wasn't really addressing that. I agree, though, you should be able to pass them around between remote sessions and I need to check if that is handled correctly already (it may not be).

The real concern I have is overloading the R type to be configurable as a local or remote session as seemed to be implied by the code snippets. I think it has to be a different type, like RR, for the remote session since remote sessions return RemoteSymbolicExpression; the alternative is dataframes being marshaled/unmarshaled on every call to the remote session to preserve the correct semantics. For example, if I want to do the following from the snippets:

let dowork(df) =
    use session = RemoteR("localhost", 12345)
    R.as_data_frame(R.cor(df)).Value

The R.cor() call would need to return a SymbolicExpression. That is, it would unmarshal the RemoteSymbolicExpression returned by invoking cor() in the remote session. This is fine if you're only performing a single operation on a large dataframe - but if you're performing multiple you'll have a marshal/unmarshall of the RemoteSymbolicExpression's real value for each remote R function call. That's why I'd rewrite this snippet this way:

let dowork(df) =
    use session = RemoteR("localhost", 12345)
    RR.as_data_frame(RR.cor(df)).Value

This keeps the local R session type distinct from the RR session type because they'll have different return types for their functions - one returning RDotNet.SymbolicExpression, and the other returning RProvider.RemoteSymbolicExpression.

dcharbon added 7 commits July 7, 2014 18:12
RemoteSession provides a way to create a connection to a remote R
process via the svSocket package for R. Some functions in RInterop are
generalized to allow as much reuse as possible for functions that
perform the same behavior whether local or remote.
The RemoteR type provider provides a way to import and export data
between F# and R sessions, as well as make remote calls to a running R
session.
Enable launch of RGui from fsi. Ensure that svSocket is installed and
launch the socket server.
When the R svSocket server isn't running produce a better error message.
Also, set a default timeout so that the type provider doesn't hang
waiting for the R svSocket server if it isn't running.
The addition of the RData paramterized type caused problems resolving the
RemoteR type. To fix this, the RemoteR type was moved to the
RProvider.Remote namespace. There appears to be a limitation in the F#
type provider facility restricting the number of parameterized type
providers to one per namespace.
The new RR type provider works much the same as the R type provider, but
requires that there be an active session in the current thread. To use,
you create an RRSession. For example:

  use session = RRSession("localhost", 8888)
  let remoteSymbolicHandle = RR.c(1,2,3)
@dcharbon
Copy link
Contributor Author

This is superseded by #115 .

@dcharbon dcharbon closed this Jul 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants