Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] GIE Golang Proxy and Docker Compose Support #852

Closed
wants to merge 7 commits into from

Conversation

hexylena
Copy link
Member

@hexylena hexylena commented Oct 5, 2015

First, a replacement proxy is provided for the Galaxy Interactive Environment codebase.

  • In lieu of nodejs, we replace it with a golang proxy
    • compiles to a single binary that people install
    • avoids headache of wide spread of NodeJS versions/package versions/OS packages
  • the proxy is now responsible for killing containers. Containers after 15.10 should remove any monitor_traffic.sh logic.
  • I've ripped out code for the NodeJS proxy, file locking, and sqlite/.json IPC. The go proxy uses an API instead. (TODO: change password from "test")

Secondly, Docker Compose support is provided.

  • A new command, launch_multi is introduced which allows launching docker-compose containers.

  • This command reads in a docker-compose.yml.mako template and fills out various portions from the environment. E.g.

    external:  # Must be named external
      image: erasche/gaucproc
      links:
        - guacamole
      ports:
        - "80" # Must listen on 80
      environment:
        % for key in env:
        ${key}: ${env[key]}
        % endfor
  • With that template, the code cds into a temporary directory, writes out a docker-compose.yml, runs docker-compose up -d and leaves.

  • That code is in serious need of a refactor

Other:

  • fixes a bug whereby any environment variables passed in launch were ignored.

xref #372
cc @bgruening @jmchilton

@hexylena hexylena added the wip label Oct 5, 2015
@hexylena
Copy link
Member Author

hexylena commented Oct 6, 2015

Builds of gie-proxy available https://jenkins.galaxyproject.org/view/Meta/job/GIE-proxy/

@jmchilton
Copy link
Member

Overall this looks very promising - thanks @erasche.

TODO: change password from "test" - I might consider some sort of hashing of Galaxy's id_secret default so it is something that changes, is already set, and is consistent without exposing existing secrets directly.

I think the pressure I feel from higher ups is more to get IPython more scale-able and do a better job with tracking provenance - neither of these changes hurt that.

Did you have a specific use case in mind for docker compose? To me what would be more interesting than docker compose is doing this without docker, just starting IPython or RStudio right in the same VM as Galaxy and proxy appropriately - so a subprocess driver. This is the configuration that would make most sense for projects like planemo-machine or VM for data analysis. Certainly though making things more generic such as done here will benefit that goal though.

Lets have dev VMs ready to go for RStudio and IPython before taking this out of WIP.

My last concern is that upgrading cloudman now will break IPython - I guess that is unavoidable though.

@bgruening
Copy link
Member

@erasche I like the new proxy. Never was a friend of the JS dependency and all the trouble we had.
I don't think we should remove the auto killing feature from the containers. Not yet.
For me it is not clear how the server can know if the user drops the connection or is just idle. We should test this extensively as I think the auto killing containers are quite stable and worked great in the past.

I see that this feature is hard to implement in a compose context, but I would rather try to talk to the Docker guys and come up with a solution on their end than to force the user to manually manage IE sessions.
Thinking ahead, if we try to run Notebooks as part of a workflow we should have a route to kill containers from Galaxy, but we should not make this mandatory.

So +1 for the new proxy. Maybe you could split this PR into 2-3 ones?

@hexylena
Copy link
Member Author

hexylena commented Oct 6, 2015

@jmchilton "overall", feel free to tear it to pieces :)

TODO: change password from "test" - I might consider some sort of hashing of Galaxy's id_secret default so it is something that changes, is already set, and is consistent without exposing existing secrets directly.

yeah, the plan here was:

  • since the "API Password" is just used to harden communication between Galaxy and the proxy, we can auto-generate on every start if Galaxy is managing the proxy
  • and provide a configuration option to set it if you're managing the proxy outside of Galaxy.

the /api route of the proxy should never ever ever ever be exposed to the whole world, the api_key really only protects against the case where other users on the command line of that server can poke at that port.

I think the pressure I feel from higher ups is more to get IPython more scale-able and do a better job with tracking provenance - neither of these changes hurt that.

It sure would be nice if some of these higher ups can tell us what they want to see in no uncertain terms :) But yes, agreed. This is one of the last big ticket items I wanted to solve before building out provenance type stuff. I have some plans for whenever moby/moby#13602 gets merged.

Did you have a specific use case in mind for docker compose?

I sure did! So Apache zeppelin is pretty damn cool, and one of the neat things you can do with it is attach an Apache Spark container to that to provide Spark access & computations. You can probably see where I'm going with this. I wanted to stop building monolithic single containers when people have already done a huge amount of groundwork for building nicely, cleanly separated containers. We're getting closer and closer to the point where we can have a single uniform "galaxy-nginx-proxy" container which you just stick at the top of a docker-compose.yml file, and you can attach it to whatever container mess you want in the backend.

To me what would be more interesting than docker compose is doing this without docker, just starting IPython or RStudio right in the same VM as Galaxy and proxy appropriately - so a subprocess driver. This is the configuration that would make most sense for projects like planemo-machine or VM for data analysis. Certainly though making things more generic such as done here will benefit that goal though.

I can somewhat see the desire for that, but it seems like a LOT of work to do to re-accomplish what we're already doing with docker. Docker works in VMs. What do we really gain by running IPython/RStudio on bare metal?

@hexylena
Copy link
Member Author

hexylena commented Oct 6, 2015

@erasche I like the new proxy. Never was a friend of the JS dependency and all the trouble we had. I don't think we should remove the auto killing feature from the containers. Not yet. For me it is not clear how the server can know if the user drops the connection or is just idle.

Ahhh indeed. You raised exactly the point that I needed to be reminded of. I've started killing based on traffic, rather than based on a TCP connection being open. I'll patch this soon! :)

We should test this extensively as I think the auto killing containers are quite stable and worked great in the past.

It did ... however we're getting closer and closer to being able to use containers without modifying them at all, which can really open up what we can do.

I see that this feature is hard to implement in a compose context, but I would rather try to talk to the Docker guys and come up with a solution on their end than to force the user to manually manage IE sessions.

Users don't have to manage IE sessions manually. That was theoretical discussion on a separate issue #372 (comment)

If you had other concerns about this "manual management", please share!

Thinking ahead, if we try to run Notebooks as part of a workflow we should have a route to kill containers from Galaxy, but we should not make this mandatory.

If we run notebooks from a workflow, that's a very different concept than what we've been doing so far. There would be no traffic whatsoever, so we'll have to do everything there a bit differently.

So +1 for the new proxy. Maybe you could split this PR into 2-3 ones?

Could! If @jmchilton has similar objections I'm happy to.

@jmchilton
Copy link
Member

Couple of responses:

  • When have you ever known me to be anything less than entirely blunt :).
  • More PRs would be better, but this is fine the way it is also if everyone is onboard for these specific features.

@hexylena
Copy link
Member Author

hexylena commented Oct 6, 2015

if everyone is onboard

Okay, I'll wait on your/@bgruening's technical replies.

@hexylena hexylena added this to the 15.12 milestone Oct 6, 2015
@hexylena hexylena modified the milestones: 16.04, 16.01 Jan 5, 2016
@hexylena
Copy link
Member Author

Two new PRs coming soon. Golang proxy first, then compose support once Golang proxy gets in.

I've re-written it to be an optional feature, rather than wiping out NodeJS stuff completely (though I think that should be in the roadmap eventually...)

@hexylena hexylena closed this Feb 25, 2016
@hexylena hexylena deleted the gie-go-proxy branch February 25, 2016 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants