Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow tests #187

Merged
merged 19 commits into from
Nov 9, 2023
Merged

Workflow tests #187

merged 19 commits into from
Nov 9, 2023

Conversation

liamhuber
Copy link
Member

@jan-janssen, some of this might be duplicate, e.g. exceptions seem to be tested here, but for now I thought I'd throw all the tests I had at it and we can trim them down as needed -- I figure you'll know better where to look for duplication anyhow #159

Right now I use PyMPISingleTaskExecutor, but if you think there's good cause we can switch this over to PyMPIExecutor.

Purpose (copied from the init notes)

pympipool should be able to handle the case where no elements of the execution can be pickled with the traditional pickle module but rather require cloudpickle.

This is particularly important for compatibility with pyiron_workflow, which dynamically defines (unpickleable) all sorts of objects.

Currently, pyiron_workflow defines its own executor, pyiron_workflow.executors.CloudPickleProcessPool, which can handle these unpickleable things, but is otherwise very primitive compared to pympipool.mpi.executor.PyMPISingleTaskExecutor.

Simply replacing CloudPickleProcessPool with PyMPISingleTaskExecutor in the pyiron_atomistics tests mostly works OK, and work perfectly when the tests are ported to a notebook, but some tests hang indefinitely on CI and running unittests locally.

To debug this, we break the tests up into their individual components (so hanging doesn't stop us from seeing the results of other tests). Once everything is running, these can be re-condensed into a single test file and the entire new tests subdirectory can be deleted.

Unrelated

I finally tracked down why I was unable to push directly to pyiron/atomistics: my remote was set using https, and pycharm wasn't playing nicely with my 2FA. Switching over to ssh (git@...) solves things nicely.

However, here the pyiron admin team still doesn't seem to have admin rights. This confuses me a bit, since over on the new pyiron_workflow that "administration" and "pyiron" teams got their rights automatically --maybe this is a (positive) side effect of using the module template? But I would have expected this behaviour from just creating a repo inside the org... Anyhow, just a heads-up that the repo settings should probably be adjusted.

@liamhuber
Copy link
Member Author

liamhuber commented Oct 12, 2023

The tests all pass, but that's because the new ones aren't actually getting run. I expected something like run: coverage run -m unittest discover tests (this is also similar in non-centralized repos), but the tests here are explicitly only scraping for the first layer of test files matching a pattern: run: for f in $(ls tests/test_*.py); do echo $f; python -m unittest $f; done. I'm going to be super pragmatic and just push the new tests up a level instead of messing with the CI.

@liamhuber
Copy link
Member Author

Splitting things into different test files didn't have the desired effect -- the entire action is timing out on the first test (test_args, which I am indeed expecting to hang in a non-notebook setting). However, @jan-janssen, since you said you were pretty confident you knew what needed to be done to resolve this hanging, I'm just going to leave it as-is -- you can make the necessary modifications in this branch, then we should see all the tests pass, then we can merge the new tests into a single file.

@liamhuber
Copy link
Member Author

A bit more disconcerting to me is that I'm making a PR from a fork, but on: pull_request is triggering and using actions/checkout to get the fork's code. My understanding was that on: pull_request should not be treating forked PRs this way, but rather only checking out the target branch's code, because otherwise it poses a security risk. @jan-janssen, did you come in here and approve the tests super fast or something? Or does GitHub somehow recognize that I'm a pyiron-org member and this is a pyiron-org repo, even though the repo teams don't seem to be set up? I'm really not an expert on the security side of things, but I know just enough to be a bit nervous about what I see here. @niklassiemer, any wisdom?

@liamhuber
Copy link
Member Author

A bit more disconcerting to me is that I'm making a PR from a fork, but on: pull_request is triggering and using actions/checkout to get the fork's code. My understanding was that on: pull_request should not be treating forked PRs this way, but rather only checking out the target branch's code, because otherwise it poses a security risk. @jan-janssen, did you come in here and approve the tests super fast or something? Or does GitHub somehow recognize that I'm a pyiron-org member and this is a pyiron-org repo, even though the repo teams don't seem to be set up? I'm really not an expert on the security side of things, but I know just enough to be a bit nervous about what I see here. @niklassiemer, any wisdom?

Indeed, I'm quite confused why the workflow here is seeing my PR code and not the main branch code. From the github docs:

For pull requests from a forked repository to the base repository, GitHub sends the pull_request, issue_comment, pull_request_review_comment, pull_request_review, and pull_request_target events to the base repository. No pull request events occur on the forked repository.

When a first-time contributor submits a pull request to a public repository, a maintainer with write access may need to approve running workflows on the pull request. For more information, see "Approving workflow runs from public forks."

Maybe I'm not a "first-time contributor" here and I've just forgotten...

@liamhuber
Copy link
Member Author

Maybe I'm not a "first-time contributor" here and I've just forgotten...

Nope, just went over to PR's and filtered by myself and this is my only one. ... @jan-janssen I really hope you just snuck in and approved these tests, otherwise my understanding of on: pull_request is severely deficient 😝

@niklassiemer
Copy link
Member

I think since you happen to be a maintainer with write access yourself this might already allow you to run these actions without a approval. Lets test that with a new github account?
Otherwise, the on: pull_request should (to my understanding) check out the final branch (i.e. the potential result of a merge to main) to run the tests. Otherwise running any tests is useless (the target branch should have been tested already).
One difference is the exposed credentials. If run from a fork, there should not be any of the repos credentials available in the workflow! Therefore, things like codacy are bound to fail due to missing credentials.

@liamhuber
Copy link
Member Author

I think since you happen to be a maintainer with write access yourself this might already allow you to run these actions without a approval. Lets test that with a new github account?

That's what disturbs me; on this particular repo I'm not a maintainer. I also wasn't allow to push a branch directly to the repo, so I don't think I even have write access.

Otherwise, the on: pull_request should (to my understanding) check out the final branch (i.e. the potential result of a merge to main) to run the tests. Otherwise running any tests is useless (the target branch should have been tested already).
One difference is the exposed credentials. If run from a fork, there should not be any of the repos credentials available in the workflow! Therefore, things like codacy are bound to fail due to missing credentials.

Ah, indeed. And none of the run stuff uses credentials so that all tracks. Thanks for the clarification!

The docs state " maintainer with write access may need to approve running workflows on the pull request" (my emphasis); in the settings for another repo I can see there is the option to not require approval, so maybe that's all that's going on here.

@jan-janssen
Copy link
Member

However, here the pyiron admin team still doesn't seem to have admin rights. This confuses me a bit, since over on the new pyiron_workflow that "administration" and "pyiron" teams got their rights automatically --maybe this is a (positive) side effect of using the module template?

The permissions still have to be set manually, I did this for pyiron_workflow and I now fixed the permission for pympipool.

@jan-janssen
Copy link
Member

I feel the issues here are related to the ones we see in #162

@liamhuber
Copy link
Member Author

I feel the issues here are related to the ones we see in #162

That is a closed pull request with no commentary... could you provide a bit of context?

@jan-janssen
Copy link
Member

I feel the issues here are related to the ones we see in #162

That is a closed pull request with no commentary... could you provide a bit of context?

I just merged the latest changes from main to see if it solves the issue but it does not.

@liamhuber
Copy link
Member Author

As discussed in #210, these are now working. I just merged main and they run great on my local machine inside pycharm too.

I'll recombine these into something sensible and polish it up!

@liamhuber
Copy link
Member Author

======================================================================
ERROR: test_timeout (test_with_dynamic_objects.TestDynamicallyDefinedObjects)
Timeouts for dynamically defined callables should be handled ok.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/pympipool/pympipool/tests/test_with_dynamic_objects.py", line 207, in test_timeout
    fs.result(timeout=0.0001)
  File "/usr/share/miniconda3/envs/test/lib/python3.10/concurrent/futures/_base.py", line 460, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError

----------------------------------------------------------------------

I'm rather confused why this is throwing an error -- line 207 in test_timeout is literally wrapped in a with self.assertRaises( TimeoutError,, this is exactly the behaviour we want. Also it runs fine on my local machine.

I briefly thought I had this nailed when I realized I'm using the builtin TimeoutError, but concurrent.futures._base.TimeoutError is literally just an alias to the builtin guy.

@liamhuber
Copy link
Member Author

Ahhh, but my local interpreter is 3.11 and these are all running for 3.11 here too. Ok, that's a solid lead.

@liamhuber
Copy link
Member Author

I have to go now, but I'll come back to this tomorrow to look at:
a) Why this fails for <3.11
b) Why it won't plug-and-play with pyiron_workflow (somehow the running attribute fails to get updated to False, even though the callback is clearly triggering and the object id is the exact same inside the callback and outside it. This should be an identical case to test_dynamic_callable, which also registers a done callback and is working fine.

@liamhuber
Copy link
Member Author

Huh, it looks like concurrent.futures had its own TimeoutError until 3.11? At any rate my very naive fix actually worked for all but mpich on 3.11. That simply timed out. I'll restart it for now, but if it times out again I'll dig into it tomorrow

@jan-janssen
Copy link
Member

The issue with the MacOS tests failing was still related to #191 I restarted the tests a couple of times, but now they are working fine.

@liamhuber
Copy link
Member Author

Super.

Let me try to track down why it's still not working with the workflows, then I'll merge. Since we test the callback I'm hopeful the problem is purely on the workflow side, but there might be a more complex test case I still need to add here.

@liamhuber
Copy link
Member Author

Super.

Let me try to track down why it's still not working with the workflows, then I'll merge. Since we test the callback I'm hopeful the problem is purely on the workflow side, but there might be a more complex test case I still need to add here.

This was resolved in #214. This guy should be good to go now.

@liamhuber
Copy link
Member Author

Just rerunning now until mac rolls a nat 20

@jan-janssen jan-janssen merged commit 893ec2a into pyiron:main Nov 9, 2023
16 checks passed
@liamhuber liamhuber deleted the workflow_tests branch November 9, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants