-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable openscapes (and re-enable farallon) deployment by the CI #691
Conversation
It is worth to mention this is not using the usual "deployer" user, it is using an existing "2i2cAdministrator" user as discussed here. |
Manual deployment seems to work as expected:
but tests seems to be failing:
|
When I looked into the deployment service pod:
I first thought this was a timeout because a new node was needed to spin up, but when I run it again I saw the same failure and the logs say:
This feels to me like a flaky test and unrelated to the auth stuff that seems to be working as expected. |
@2i2c-org/tech-team, thoughts about the test failures? |
Well, it seems that redirecting the test output (to prevent the CI from leaking secrets) is preventing us to get the actual error... With the following diff diff --git a/deployer/hub.py b/deployer/hub.py
index 5d99f3d..5755d3b 100644
--- a/deployer/hub.py
+++ b/deployer/hub.py
@@ -458,7 +458,20 @@ class Hub:
# This can contain sensitive info - so we hide stderr
# FIXME: Don't use pytest - just call a function instead
print("Running hub health check...")
- with open(os.devnull, 'w') as dn, redirect_stderr(dn), redirect_stdout(dn):
+ # Show errors locally but redirect on CI
+ gh_ci = os.environ.get('CI', "false")
+ if gh_ci == "true":
+ print("Testing on CI, redirected output")
+ with open(os.devnull, 'w') as dn, redirect_stderr(dn), redirect_stdout(dn):
+ exit_code = pytest.main([
+ "-q",
+ "deployer/tests",
+ "--hub-url", hub_url,
+ "--api-token", service_api_token,
+ "--hub-type", self.spec['template']
+ ])
+ else:
+ print("Testing locally, do not redirect output") I was able to get more info:
|
OK, I will continue on Monday, but the failure seems real... |
I like this. Can we commit this in a new PR? |
Sure, I was planning to do that 😉 ! |
OK, coming back to this one, I was able to add myself as admin, start a notebook server and run the dask test notebook and the error was: I was able to track the decision to provide a daskhub here: #363 (comment), but once I looked into the openscapes image repo, I did not see any dask references:
Some questions: Interestingly, farallon and carbonplan inherited their images from the pangeo one but the openspace_image repo was just using environment/requirement files. This is why this failure is present in openscapes whereas it is not being surfaced in other AWS-based hubs. I guess we have some follow-up to prevent this from happening again. Btw, we got lucky that openscapes people did not try to use dask 😜 . But... going back to this PR, I think the openscapes failure should not prevent the merging of this PR (and eventually fix the failure with another ticket/issue), but happy to know if you disagree (btw, adding reviewers now 😉). |
If I recall, the open scapes folks said that they wanted Dask, but not Dask Gateway. So maybe it's enough to just use a base hub with Dask installed. |
I would agree with that idea, although I think that is probably a discussion for another issue (I will create one later today). |
Opened #694 to track that one. |
Opened #695 to track this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @damianavila!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on the specific implementation here, but the general decision looks good to me, and I agree with @damianavila's assessment that they just need Dask, not DG (I've commented in #695 as well). I'm +1 on merging this and taking yet another important step towards AUTOMATE ALL THE THINGS :-)
Thanks @sgibson91 and @choldgraf for your approvals! |
This is essentially using the previous stuff I have been working on in #673 to enable the automatic deployment of openscapes hubs.
This is also reverting #689 after the key rotation on #688.
Finally, it is also cleaning up old stuff such as the static kubeconfig encrypted files and removing the function to auth with those.