Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg asset loading in Jupyter assumes a single global Julia package install #124

Closed
rdeits opened this issue May 22, 2018 · 12 comments
Closed

Comments

@rdeits
Copy link
Collaborator

rdeits commented May 22, 2018

This started out as #100 but it's a more general issue, and it's continued to be a problem in other cases. Fundamentally, by creating a single server plugin that tries to load /pkg/ assets by looking them up in the Julia package directory and load path, we are assuming that there's a single, global, julia package directory. This fails in JuliaBox, for example, but it also fails any time a user defines a different JULIA_PKGDIR or modifies their LOAD_PATH after buliding WebIO. This results in a lot of confusing issues where Julia packages can be loaded but their assets cannot.

I'm starting to wonder whether a global jupyter plugin is the right solution, since the only place where the information about what package should be loaded is available is inside the kernel itself, not the jupyter server.

Is there an alternative? Could we perhaps send the JS files over the Jupyter comms protocol instead?

@shashi
Copy link
Member

shashi commented May 23, 2018

Yes, we should definitely find an alternative solution to this problem.

Could we perhaps send the JS files over the Jupyter comms protocol instead?

Perhaps, but the problem will be SystemJS loads dependencies by making HTTP requests to the dependency files. This won't work well with the comms protocol. This is the reason I decided to do the /pkg/ solution.

Another awkward problem is that we can't have the server inside WebIO (for Blink, Mux and Atom) since we don't want WebIO to depend on Mux. This is why I put it in Mux.defaults itself.

To add another angle to this problem, @ranjanan has been trying to deploy a statically compiled app written with WebIO and running into this issue.

The best idea for a solution I have for now is:

  1. Write a Mux middleware that serves static files and directories that are explicitly "registered" by other code. This could replace or live along side the /pkg/ middleware in Mux.defaults
  2. Registering with this global dictionary gives a unique URL (UUID/sha1-based, e.g. register(dict, joinpath(dirname(@__FILE__), "assets") -> "283d9b30-9356-94f9-eefc-21810e45f1f4"), this URL can be accessed to get the registered directory on the client-side.
  3. Packages such as MeshCat / Interact can register their assets directories with this server, and use the UUID to load assets. (maybe good to think of some convenieces for this. i.e. any absolute path given as an import should check to see if the file itself or a parent directory is "registered" and replace itself with the unique path that works.)

If packages use @__FILE__ to register the assets, one should always be able to find the right files.

To make this work with Jupyter we need to have the "register" function work even without Mux (maybe by updating a json file which stores the registry), the python server should be able to read this file and dynamically route requests to the right file/folders.

What do you think? Are there any obvious problems?

@shashi
Copy link
Member

shashi commented May 23, 2018

To make this work with Jupyter we need to have the "register" function work even without Mux (maybe by updating a json file which stores the registry), the python server should be able to read this file and dynamically route requests to the right file/folders.

I guess we circled back to the original issue with this. Where does the .json file get written to given there is no global Julia package directory? How will the jupyter plugin pick it up?

@rdeits
Copy link
Collaborator Author

rdeits commented May 23, 2018

I see. That makes sense. I wonder if we could coerce SystemJS to load from a string instead? Kind of like systemjs/systemjs#1431

I think your registry idea can also be made to work, but as you say, with a global jupyter install all of the kernels will have to use the same registry somehow. I'll try to think about making that work, though.

Another idea: just use full paths to asset files, so /pkg URLs would look like /pkg/home/rdeits/.julia/v0.6/MeshCat/assets/meshcat.js. This makes me super nervous (I really don't want to be serving /pkg/etc/passwd), but maybe we could do something like:

  • The webio jupyter plugin loads a .json file every time it's asked to serve an asset
  • That .json file contains a list of whitelisted package directories, and we simply append Pkg.dir() to that list any time a user does Pkg.build("WebIO").
  • /pkg URLs use the full path to the file (to disambiguate), but we only serve files living in the whitelisted directories (perhaps only files which live in whitelisted directories and which match the pkg/PackageName/assets/asset_name.js pattern).

That's similar to what we have now, but would allow arbitrarily many package directories to live in harmony, and you'd never get assets for the wrong package directory. One potential issue is that it wouldn't be obvious for users how to remove asset folders from the whitelist, but we kind of have that problem already.

@shashi
Copy link
Member

shashi commented May 23, 2018

@ranjanan and I wrote some code today to do this.

JuliaWeb/Mux.jl#55
https://github.com/JuliaGizmos/AssetRegistry.jl
#125

I think we just need to get the Jupyter server to work for this to happen now.

That .json file contains a list of whitelisted package directories, and we simply append Pkg.dir() to that list any time a user does Pkg.build("WebIO").

This seems workable (although I think dirname(dirname(@__FILE__)) maybe better than Pkg.dir()), but still seems to not play nicely with Pkg3 "projects". For example, there are talks of embedding a project toml file with every notebook so that a notebook is perfectly reproducible. I guess we can solve this problem when we get there though.

@rdeits
Copy link
Collaborator Author

rdeits commented May 23, 2018

Cool! So is the idea that each asset is registered with the hash of its absolute path? That seems very reasonable.

@shashi
Copy link
Member

shashi commented May 23, 2018

That's right. You can register either files or directories.

@rdeits
Copy link
Collaborator Author

rdeits commented May 23, 2018

Do you have any thoughts about how this can work with Jupyter? It looks like the registry lives in memory in the AssetRegistry module, so multiple Julia kernels will each have different registries, and none of those registries will be available to the Jupyter server. Or am I missing something?

@rdeits
Copy link
Collaborator Author

rdeits commented May 23, 2018

Ah, I see you mentioned that earlier. It seems like AssetRegistry.register() could also append to a json file loaded by the jupyter plugin.

@shashi
Copy link
Member

shashi commented May 29, 2018

Update about this issue: there are branches ready on Mux and WebIO to fix this, with the right deprecation when older /pkg/ URLs are requested.

Two TODO items:

  • Get the file locking mechanism to work on windows (it currently doesn't). Libuv has an exclusive lock flag when opening the file, but that doesn't work on Linux, but between this method and that we should have all 3 platforms covered (the other being OSX, where both methods work).
  • get the python plugin to use "homedir()/.jlassetregistry.json" to map requests to files.

@NHDaly
Copy link
Contributor

NHDaly commented Jun 9, 2018

Hi, I'm sorry to butt in. I think this is a cool idea and I'd love to see something like this catch on!

I am surely missing some context, but would you be able to clarify some of this for me? I'm curious about it, but I am not understanding some details.

Why is it preferable to send the client these hashed URLs rather than simply sending the client the absolute path to the assets the kernel wants to load? That is, what is the benefit of storing these hashed urls and sending the hashed url to the client over simply using the original absolute path?

Thanks! :)

@shashi
Copy link
Member

shashi commented Jun 18, 2018

This is closed in the new release.

Why is it preferable to send the client these hashed URLs rather than simply sending the client the absolute path to the assets the kernel wants to load?

Great question. Well it's because the client (browser) cannot usually load any local JS/CSS file, unless you're on a page that's a file:// URL (i.e. a local HTML page). Jupyter, Mux, Blink, Atom all work by having a web server in the background, although these server may be on the same machine as the client, for all intents and purposes localhost:8000 is as special or rather not special as github.com. So we need the files to be available on some URL via that web server. Hashing is just a way to make the URLs not too long, yet unique.

@shashi shashi closed this as completed Jun 18, 2018
@NHDaly
Copy link
Contributor

NHDaly commented Jun 26, 2018

So we need the files to be available on some URL via that web server. Hashing is just a way to make the URLs not too long, yet unique.

Ah yeah that makes sense. Thanks shashi! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants