-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pin pip<23.2
and skip one micropkg test to unblock CI from breaking changes
#2816
Conversation
RELEASE.md
Outdated
@@ -16,6 +16,7 @@ | |||
|
|||
## Bug fixes and other changes | |||
* Consolidated dependencies and optional dependencies in `pyproject.toml`. | |||
* Pin `pip<23.2` due to a breaking change. See https://github.com/kedro-org/kedro/pull/2813 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we will not have to release Kedro like this 😢
Also, notice that we only changed the CI requirements, Kedro pyproject.toml
is intact
Interesting, it seems that our window test didn't run this step "Install pip setuptools" at all. |
Weird, there's still a stack overflow on Windows. We'll have to look more closely. |
I suspect it is still pip issue but I can confirm later. |
Noted Window tests are failing in different way. Python 3.7 fails with 1 specific test
Python >3.7 fails with StackOverflow and pytest just panic. |
Good sign that win e2e test are passing, I bumped the pytest-xdist which is very old and only support pytest<=3.9 |
only win_unit_test 3.7 is failing now. https://app.circleci.com/pipelines/github/kedro-org/kedro/24745/workflows/d07eff5f-f9a7-4537-b812-f47450a7f7ca/jobs/285418 I will try to make it run sequentially now. |
Temporarily for 3.7 tests run in sequential. |
Revert 5567507 so it doesn't confuse people. Only Win 3.7 should fails after this. Initially I suspect it the parallel execution acting weird, but it get the same error when I try to run sequentially. |
Good insight @noklam. Does it happen if only the offending tests are run? e.g. doing |
I'll open a separate PR as I want to keep this CI job running. It's very tricky to test now because it happens only on CCI (even sshing and run the exact command is passing) |
Waiting for CI now. |
CI still failed - #2828
This is getting to nowhere, I cannot reproduce the issue even on CircleCI SSH. Trying to debug the test itself now. Observing the failing message on CI
It may be related to these lines that file handler are not closed properly and blocked other process?
|
@noklam So it's just the windows unit tests on 3.7 failing? If you feel like it's sensible, should we just change this check to not be mandatory? |
pip<23.2
to unblock CI from breaking changespip<23.2
and skip one micropkg test to unblock CI from breaking changes
Suppress the test in Window Python 3.7 |
de319ff
to
d84e39d
Compare
d84e39d
to
e2464b8
Compare
Signed-off-by: Nok <[email protected]>
Signed-off-by: Nok <[email protected]>
Signed-off-by: Nok <[email protected]>
Signed-off-by: Nok <[email protected]>
Signed-off-by: Nok <[email protected]>
|
Signed-off-by: Nok <[email protected]>
Just a note that the |
Signed-off-by: Nok <[email protected]>
The issue seems to be slightly random, Win 3.7 consistently fail, while the 3.8, 3.9 fails occasionally.
This looks like a related issue found by @AhdraMeraliQB , which suggest could be unknown interaction with |
And as expected, the error on Window 3.7 is something different else. Noted that initially there is only 1 failing test
Then I suppress it, it become another test
and finally
|
Looks like all 3.8 and 3.9 fails are a result of the node crashing and the process timing out, not any particular tests failing. I would expect that on a re-run or with raising the time allowance they would pass more consistently. However this does not address the node crashing (looks like a seg fault) or 3.7's error |
@AhdraMeraliQB This is probably true. I only say that because when I was working on reading the compressed config |
Could be the case, after playing around, the file permissions conflict is arising from the lines
My guess at the moment is that several tests are trying to |
Wait, that looks familiar: #2568 (comment) This is my attempt at a diagnose then: #2568 (comment) Essentially, it seemed like not all the tests were getting their own temporary directory, and under some circumstances this seems to fail. Maybe it's the same problem? I was not able to fix it back then, I just sidestepped it or reverted the changes. |
Signed-off-by: Nok <[email protected]>
Signed-off-by: Nok <[email protected]>
I suspect is our custom fixture which creates problem, so I use pytest fixture but the problem remains. 51986d5 |
If I understand correctly, there are two issues: (1) Incompatibility between latest pip-tools and latest pip on Python 3.7. This is because pip-tools 7 dropped support for Python 3.7, and latest pip-tools 6.* is incompatible with latest pip. I managed to fix that issue with this diff: diff --git a/features/environment.py b/features/environment.py
index 218ff097..671bee01 100644
--- a/features/environment.py
+++ b/features/environment.py
@@ -103,7 +103,9 @@ def _setup_minimal_env(context):
"pip",
"install",
"-U",
- "pip>=21.2",
+ # pip==23.2 breaks pip-tools<7.0, and pip-tools>=7.0 does not support Python 3.7
+ "pip>=21.2,<23.2; python_version < '3.8'",
+ "pip>=21.2; python_version >= '3.8'",
"setuptools>=65.5.1",
"wheel",
],
diff --git a/pyproject.toml b/pyproject.toml
index ae5fc961..50bf4a84 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -28,7 +28,7 @@ dependencies = [
"more_itertools~=9.0",
"omegaconf~=2.3",
"parse~=1.19.0",
- "pip-tools~=6.5",
+ "pip-tools>=6.5,<8",
"pluggy~=1.0",
"PyYAML>=4.2, <7.0",
"rich>=12.0, <14.0", (2) Problems with the micropackaging tests only on Windows and only when run in parallel, unknown origin. As @noklam said above, suppressing 1 test doesn't work, there seems to be a systematic issue with those. Maybe it's related to this observation by @SajidAlamQB in #1328: kedro/tests/framework/cli/conftest.py Lines 147 to 155 in 48d94a6
@AhdraMeraliQB I agree. Could be something like this: (Edit: micropkg tests might fail even if run sequentially, the solution below is not enough) diff --git a/.circleci/continue_config.yml b/.circleci/continue_config.yml
index 1632c4bd..bc7cf756 100644
--- a/.circleci/continue_config.yml
+++ b/.circleci/continue_config.yml
@@ -285,8 +285,11 @@ jobs:
equal: [ "3.10", <<parameters.python_version>> ]
steps:
- run:
- name: Run unit tests without spark in parallel
- command: conda activate kedro_builder; make test-no-spark
+ name: Run unit tests without spark and without micropkg in parallel
+ command: conda activate kedro_builder; pytest --no-cov --ignore tests/extras/datasets/spark -k "not micropkg" --numprocesses 4 --dist loadfile
+ - run:
+ name: Run micropkg unit tests sequentially
+ command: conda activate kedro_builder; pytest --no-cov --ignore tests/extras/datasets/spark -k "micropkg" --dist no
- when:
condition:
equal: [ "3.10", <<parameters.python_version>> ] |
In astrojuanlu#1 I experimented with the two ideas above and I still got a test failure with the dreaded "The process cannot access the file because it is being used by another process". The funniest thing is that the tests were running sequentially, and this test was the first to run. On the other hand, after my changes in #2614, Fundamentally, the fact that we are testing the micropkg workflow almost exclusively through the CLI makes debugging more difficult:
Addressing this requires a significant rewrite of the tests. I don't have very good ideas left on how to proceed beyond marking these tests as |
@astrojuanlu Agree with most of it, just want to note that it also cause this flaky stackoverflow issue (not show up everytime) - See https://app.circleci.com/pipelines/github/kedro-org/kedro/24923/workflows/99bc8992-4935-44bb-9aa4-75dfc2402e23/jobs/286915 In this PR the stackoverflow issue fails in 3.8 & 3.9 as well, I suspect it is more related to So on solution, I think there are 2 options
On refactoring - I agree on this. #2816 (comment) is an attempt to improve how we create and clean up tests. We should leverage However, we need to decide whether |
Signed-off-by: Nok <[email protected]>
On the other hand, it may worth to enable Github Action on |
Closed in favor of moving to Github Action which solves the problem. Still unsure what's the root cause, could be CircleCI infrastructure as we didn't make any code change. |
Description
Partly fix #2813
Development notes
Checklist
It could be
psutils
orpytest-xdist
, root cause unclear.RELEASE.md
file