Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts to proxy&cache CDNs #241

Closed
staticfloat opened this issue Jul 30, 2020 · 19 comments · Fixed by #1561
Closed

Artifacts to proxy&cache CDNs #241

staticfloat opened this issue Jul 30, 2020 · 19 comments · Fixed by #1561
Labels
frontend Concerning the HTML editor HTTP/WS The connection between backend and frontend online deployment About deploying to binder, heroku, self-hosted

Comments

@staticfloat
Copy link

It would be nice for startup time (especially on a slow connection) to bundle all web assets locally; if this is a relatively slowly-moving target, this can be done easily by using Artifacts; simply write a script that collects all files into a single location inside of create_artifact(), archive the artifact into a tarball, upload that tarball somewhere, bind the location of the artifact to a name in your package, and you'll be able to serve the files from a tarball that gets downloaded at the same time as your package.

As a reference, these are the assets that my browser requests after a force-refresh on Pluto v0.11.1:

image

@fonsp
Copy link
Owner

fonsp commented Jul 30, 2020

All these assets are cached by a service worker, so the second load should even work offline! So the uncompressed files and HTTP1 overhead for the larger number of files isn't really a problem.

Improvements to the first load time are welcome of course, but they might be overshadowed by Julia compiling Pluto's methods. There is still the use case of deploying Pluto notebooks online - this would be useful there.

image

@fonsp
Copy link
Owner

fonsp commented Jul 30, 2020

But I am very interested in the artifacts system - do you think we can use it to make our project independent of the CDNs? Ideally you could run an old version of Pluto 20 years from now, when (JS hype has moved on to something else and) some of these CDNs might have gone offline.

Because Pluto is scientific software - these CDNs are commercial.

@staticfloat
Copy link
Author

Improvements to the first load time are welcome of course, but they might be overshadowed by Julia compiling Pluto's methods. There is still the use case of deploying Pluto notebooks online - this would be useful there.

Ah, that makes sense, yes. I was working on a tethered phone, so it took many seconds to fetch those resources the first time. A pathological case for sure.

do you think we can use it to make our project independent of the CDNs?

Yes, definitely! I'd be happy to help you walk through it; the short version is that you would run a script to bundle together your content and upload it as a tarball. Then in Pluto.jl, you would have an Artifacts.toml that points a name (such as javascript_bundle) to that file, and when you want to access, for instance, js/foo.js, you'll just do something like joinpath(artifact"javascript_bundle", "js", "foo.js") and it will reach inside of the artifact (which will have been installed by Pkg automatically when Pluto.jl itself was installed, and it will have been installed by the Pkg server, so the tarball will have been cached for all eternity in the Julia Storage Servers, not to mention hosted wherever you chose to host it (I recommend a GitHub release somewhere). So you can then manually serve out those files. I imagine you could still use a service worker but attempt to load from the server it's being stored on first, and in the event of a 404, then you could grab it from a CDN, and in that way you would get the best of both worlds.

@fonsp fonsp added frontend Concerning the HTML editor HTTP/WS The connection between backend and frontend labels Aug 2, 2020
@c42f
Copy link
Contributor

c42f commented Aug 6, 2020

It would be great to use artifacts for this for the reasons mentioned but there might be some other surprise benefits as well. For example, making Pluto work better in places like China which like to block some CDNs (including google fonts, at least in the past.)

@fonsp
Copy link
Owner

fonsp commented Aug 6, 2020

Thanks everyone! I am definitely interested, but I still have some questions:

  • Does this mean changing Pluto's compatibility from v1.0 to v1.3? How many people do you think would be affected? (The Julia version is not collected as part of Pluto's telemetry unfortunately.)
  • My tagbot lags behind the general registry, so if I generate the bundle using the tagbot, it will not exist for users who install during the inbetween time period. Even if I increase the tagbot interval, creating the bundle will still take time. Should I bundle for each commit instead?

@fonsp fonsp changed the title Bundle Web assets Artifacts to proxy&cache CDNs Aug 6, 2020
@staticfloat
Copy link
Author

staticfloat commented Aug 6, 2020

Does this mean changing Pluto's compatibility from v1.0 to v1.3?

Yes. It's possible to build a "hybrid system" that downloads the assets via Artifacts on 1.3 and through some other mechanism on 1.0-1.2, but IMO it's not worth the effort and maintenance burden.

How many people do you think would be affected?

Hard to say, but the volume of users asking for help with binary issues on Julia 1.3 has dropped off significantly, so my guess is that the majority of users are 1.3+ at this point.

My tagbot lags behind the general registry, so if I generate the bundle using the tagbot

How often does the bundle change? I was under the impression that this was a "generate once, forget about it for a few months" kind of thing. If it changes rapidly, then perhaps we'll need a different solution; a Travis job to rebuild it after every commit and automatically update the Artifacts.toml file if it detects a change, perhaps.

@fonsp
Copy link
Owner

fonsp commented Aug 6, 2020

How often does the bundle change?

That's true, only about 1-2 times per month. What would a manual solution look like? A second repository that just contains releases, that I upload myself?

@staticfloat
Copy link
Author

Precisely; the way I usually do it is to have a script somewhere that, when run, bundles everything together then uploads it to a GitHub release somewhere. It could be Pluto.jl, (And you just create named releases where the gitsha that the release is associated with doesn't matter, like these auto-generated ones for GCC shards on Yggdrasil; the gitsha the "release" is tagged upon has nothing to do with the contents of the release, it's literally just free binary hosting for me) or it could be a totally separate repository.

I tend to use ghr to upload binaries to github releases; here's an example invocation from BinaryBuilder itself:

function upload_to_github_releases(repo, tag, path; gh_auth=Wizard.github_auth(;allow_anonymous=false),
                                   attempts::Int = 3, verbose::Bool = false)
    for attempt in 1:attempts
        try
            ghr() do ghr_path
                run(`$ghr_path -u $(dirname(repo)) -r $(basename(repo)) -t $(gh_auth.token) $(tag) $(path)`)
            end
            return
        catch
            if verbose
                @info("`ghr` upload step failed, beginning attempt #$(attempt)...")
            end
        end
    end
    error("Unable to upload $(path) to GitHub repo $(repo) on tag $(tag)")
end

And heres's how an artifact is created, archived (e.g. turned into a tarball) and bound to an Artifacts.toml file:

# Create artifact (which is a folder in `~/.julia/artifacts`)
treehash = create_artifact() do dir
    # write your files into <dir> here
end

# Archive artifact into tarball and upload it to GHR
local tarball_hash
filename = "WebAssets-1.0.0.tar.gz"
mktempdir() do dir
    tarball_hash = archive_artifact(hash, joinpath(dir, filename))
    BinaryBuilder.upload_to_github_releases(repo, tag, dir)
end

# Bind artifact to Artifacts.toml that needs to go into your package:
bind_artifact!(
    "./Artifacts.toml",
    # This is the name of the artifact; you'll get the path of the locally-installed files via `artifact"WebAssets"` in your package
    "WebAssets",
    treehash,
    download_info = Tuple[
        ("https://github.com/$(repo)/releases/download/$(tag)/$(filename)", tarball_hash),
    ],
)

@fonsp
Copy link
Owner

fonsp commented Aug 15, 2020

Thanks so much! Your examples are very helpful. I am motivated to incorporate this to ensure that old versions of Pluto can stay online for "ever".

(In response to @c42f: thank you for bringing this up, I hadn't thought about access in China. I did find that the jsdelivr.com CDN is available inside China (https://www.jsdelivr.com/network). I have already switched all resources to that CDN, except Google Fonts.)

EDIT: Google Fonts are now also served over jsdelivr.

@ArrowRaider
Copy link

My laptop is giving errors when trying to load these files from the cdn. I have no idea what this means. Please see this screenshot: https://i.imgur.com/k3u64Qc.png

@fonsp
Copy link
Owner

fonsp commented Oct 28, 2020

@ArrowRaider have a look here: #607 (comment)

And you are right that browser extension problems are solved by a proxy.

@usmcamp0811
Copy link

I'm going to fork things and try and make this work. I work on offline computers all day long and would really like to be able to use Pluto! This is what I've got going so far.. I'm just working through the repo searching for urls. The one thing I'm not certain about is the best way to serve the static files once they're downloaded into the artifacts dir.

using Pkg.Artifacts
using Pkg.GitTools
using Pkg.PlatformEngines

toml = joinpath(@__DIR__, "Artifacts.toml")

function add_file_to_artifacts!(filename::String, file_url::String, toml::String; lazy=false, force=true)
    file_hash = create_artifact() do artifact_dir
        file = download(file_url)
        @show artifact_dir
        try
            global download_hash = bytes2hex(GitTools.blob_hash(file))
            @show download_hash
            # unpack(file, artifact_dir)
        finally
            rm(file)
        end
    end
    @show file_hash

    bind_artifact!(toml, filename, file_hash;
                   download_info=[(file_url, download_hash)],
                   lazy=false,
                   force=true)
end


add_file_to_artifacts!("600.css", "https://cdn.jsdelivr.net/npm/[email protected]/600.css", toml)
add_file_to_artifacts!("400.css", "https://cdn.jsdelivr.net/npm/[email protected]/400.css", toml)
add_file_to_artifacts!("close-circle.svg", "https://cdn.jsdelivr.net/gh/ionic-team/[email protected]/src/svg/close-circle.svg", toml)
add_file_to_artifacts!("caret-forward-circle-outline.svg", "https://cdn.jsdelivr.net/gh/ionic-team/[email protected]/src/svg/caret-forward-circle-outline.svg", toml)
add_file_to_artifacts!("ellipsis-horizontal-outline.svg","https://cdn.jsdelivr.net/gh/ionic-team/[email protected]/src/svg/ellipsis-horizontal-outline.svg",toml)
add_file_to_artifacts!("JuliaMono-RegularLatin.woff2","https://cdn.jsdelivr.net/gh/cormullion/[email protected]/webfonts/JuliaMono-RegularLatin.woff2",toml)
add_file_to_artifacts!("JuliaMono-BoldLatin.woff2","https://cdn.jsdelivr.net/gh/cormullion/[email protected]/webfonts/JuliaMono-BoldLatin.woff2",toml)
add_file_to_artifacts!("JuliaMono-Regular.woff2","https://cdn.jsdelivr.net/gh/cormullion/[email protected]/webfonts/JuliaMono-Regular.woff2",toml)
add_file_to_artifacts!("JuliaMono-Bold.woff2","https://cdn.jsdelivr.net/gh/cormullion/[email protected]/webfonts/JuliaMono-Bold.woff2",toml)

@staticfloat
Copy link
Author

I think you might want to create a single artifact from all those files (instead of N artifacts for N files) and lay out the files within that artifact in such a way that Pluto has an easy time pointing at them (e.g. putting the .css files within a css folder, putting the .woff2 files into a `fonts folder, etc...)

@staticfloat
Copy link
Author

Also, I'm not sure why you're removing the file from the artifact with that rm(file) call; that seems counter-productive. ;)

I also think your calculation of download_hash is wrong; you want to use a SHA256 hash after you've turned the artifact into a tarball; you can't download arbitrary files as artifacts; only tarballs.

@fonsp
Copy link
Owner

fonsp commented Mar 17, 2021

I would say, serve all notebooks by URL-encoding their URL. e.g. serve

https://cdn.jsdelivr.net/npm/[email protected]/lib/codemirror.min.css

under

http://localhost:1234/proxied_asset/https%3A%2F%2Fcdn.jsdelivr.net%2Fnpm%2Fcodemirror%405.58.1%2Flib%2Fcodemirror.min.css

so that the artifact has a flat folder structure and the backend is as uncomplicated as possible. Use the serviceworker to redirect requests to their proxied version.

Also note that we want to minimize the maintenance burden for future frontend development. We change the asset dependencies very often, and ideally the artifact system is fully automated. (One idea is to use puppeteer to extract the network log of a Pluto session -- that also gives you indirect dependencies.)

@fonsp
Copy link
Owner

fonsp commented Apr 15, 2021

I found that we can use a pre-release hook to trigger a github action before the release is created: https://github.com/JuliaRegistries/TagBot#pre-release-hooks

This will happen after our patch is merged into the general registry, but before the release it tagged in this repository. Users only get the "update is available!" message after we tag the release.


However, if we are using the Artifacts system, then we would need to generate the artifact before we register our patch into the general registry, because we need to mention it in our Artifacts.toml file, right?

@fonsp fonsp linked a pull request Apr 26, 2021 that will close this issue
@fonsp
Copy link
Owner

fonsp commented Apr 26, 2021

Almost implemented in #1125 , except right now all assets are in the Pluto repository, which I don't like. Can someone use artifacts to store the assets?

@findmyway
Copy link

Thanks so much! Your examples are very helpful. I am motivated to incorporate this to ensure that old versions of Pluto can stay online for "ever".

(In response to @c42f: thank you for bringing this up, I hadn't thought about access in China. I did find that the jsdelivr.com CDN is available inside China (https://www.jsdelivr.com/network). I have already switched all resources to that CDN, except Google Fonts.)

EDIT: Google Fonts are now also served over jsdelivr.

I can confirm that it is not blocked in China (Beijing) right now. But unfortunately, it is blocked under the corp network recently 😢

@fonsp fonsp added the online deployment About deploying to binder, heroku, self-hosted label Jun 11, 2021
@fonsp fonsp mentioned this issue Oct 14, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend Concerning the HTML editor HTTP/WS The connection between backend and frontend online deployment About deploying to binder, heroku, self-hosted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants