Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: error if empty build selection, Travis 2.7 Win w/ workaround #545

Merged
merged 8 commits into from
Jan 22, 2021

Conversation

henryiii
Copy link
Contributor

@henryiii henryiii commented Jan 18, 2021

This adds an error message and exit code if no build selectors match, catching runs that don't actually build anything at all. I think this came up once before? Can't find it now. Hopefully the conclusion was it was a good idea?

#536 could build on this and print a smart error message if it's the requires-python part causing it to be empty.

This uncovered a few issues in testing, most importantly that test_cpp17_py27_modern_msvc_workaround passes even if not supported, since it's comparing an empty set with an empty set. I've allowed Travis to participate in the MSVC workaround to fix it.

@YannickJadoul
Copy link
Member

No, the conclusion back then was that it wasn't very desirable, I believe: #179

@henryiii
Copy link
Contributor Author

henryiii commented Jan 18, 2021

To me (on quickly looking over the issue, such as this comment #179 (comment)) seems to be not that it was a bad idea, just that the implementation was not ideal.

If you have a matrix and the macOS job doesn't match anything, most of the time, that's a user error, and when it's not, the macOS job should be removed or the step should be if'd out, since it's not doing anything and confusing readers. Or a user could do cibuildwheel || echo "Nothing built" to explicitly "not care" that nothing was built.

For example:

runs-on: ${{ matrix.os }}
strategy:
  matrix:
    os: [ubuntu-latest, windows-latest, macos-latest]
steps:
- uses: actions/checkout@v2
- name: Build wheels
  run: pipx run cibuildwheel==1.8.0
  env:
    CIBW_BUILD: "*_x86_64"

By throwing an error, it forces a user to either remove the windows job (which probably was not what they wanted, but would be the equivalent to not throwing an error!), or to fix the selector to be CIBW_BUILD: "*_x86_64 *_amd64", which is probably what they wanted.

Another example: currently, someone seeing the new arch support might write this:

- uses: joerick/[email protected]
  env:
    CIBW_BUILD: "*_aarch64"

But since they didn't turn on the needed arch, currently this just silently "succeeds". They might not know they don't have aarch64 wheels until after a release! I think it's critical that CI jobs "fail" if something is clearly wrong, so they can be fixed; no one reads CI logs unless there's a failure. Running cibuildwheel and getting nothing is clearly wrong?

Another example is if we drop Python 3.5, this should immediately fail loudly:

- uses: joerick/[email protected]
  env:
    CIBW_BUILD: "cp35-*"

So the user who just updated their CI configs can remove this step instead of thinking (and indicating to readers) that they still are making 3.5 wheels. Or if they want 3.5 wheels, they will know not to merge the PR with the updated cibuildwheel version.

@YannickJadoul
Copy link
Member

No I think it's more fundamental: #179 (comment)

If you have a matrix and the macOS job doesn't match anything, most of the time, that's a user error, and when it's not, the macOS job should be removed or the step should be if'd out, since it's not doing anything and confusing readers. Or a user could do cibuildwheel || echo "Nothing built" to explicitly "not care" that nothing was built.

That's the design/workaround @Czaki suggested (#179 (comment)) and what @joerick didn't like.

Running cibuildwheel and getting nothing is clearly wrong?

I don't know. I felt reasonably neutral in the discussion, and a quick scan tells me there were some good arguments for both.

      - name: Build wheels
        run: pipx run cibuildwheel==1.8.0
        env:
          CIBW_BUILD: "*_aarch64"

This could still silently fail if there's CIBW_BUILD: "*_x86_64 *_aarch64", btw, while there's still a mistake. So this might give a false sense of security?

Also, on an implementation note: the fact that you need to hack around the tests with pytest.mark.allow_empty might suggest that this wasn't really meant to error? Again, I think I could go either way, but this was definitely discussed before.

@henryiii
Copy link
Contributor Author

The allow empty hack is there because these are unit tests. We are setting unrealistic things and trying them out.

It also exposed a bug in our tests. We are faking the platform, but missed faking the arch, so a bunch of them were empty that shouldn't have been. So I'd say that's another argument in favor.

We can't fix all build selectors. But we can error on cases where the configuration is so bad nothing builds.

Unlike the earlier discussion and PR, this is not about build/skip at all. It is about running with no wheels to build. It could be due to build/skip, or arch, or anything in the future, like a Requires.Python. CI build wheel should not report "success" if it does nothing at all.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 18, 2021

We could also check that every build / skip selector term selects at least one thing, but that's a different, orthogonal check and is separate from this. I think if you write cibuildwheel in a config and it builds nothing, that's confusing to readers, you should make your intention clear with the || trick, for example. You wouldn't expect pip install x to silently do nothing if x can't be installed on this platform? Yes, it might save some typing, but if you misspell x or thought it was supported, that's more important.

@henryiii
Copy link
Contributor Author

henryiii commented Jan 18, 2021

If there's a use case for running an empty build, how about an --allow-empty flag? Then it's also explicit and cross platform. I don't see a strong need for it, but if someone else does? (Though I would steal it for the tests ;) )

@YannickJadoul
Copy link
Member

Unlike the earlier discussion and PR, this is not about build/skip at all. It is about running with no wheels to build.

I kind of disagree on this being different, since @Czaki's original PR was doing the same: checking if the configuration that's running is not going to build anything.

But I do agree that the tests that currently weren't/aren't running are quite worrying and an argument. And recent changes have made the situation slightly more complex, making it possible that the situation has changed, so revisiting the idea might be a good plan.
But I'm mainly saying that @joerick was against the previous instantiation of this idea, and you'll have to convince him ;-) To me, it would probably make sense (I had already approved #179, long long time ago ;-) ), but I never felt the strong need to urgently have this, either :-)

If there's a use case for running an empty build, how about an --allow-empty flag? Then it's also explicit and cross platform. I don't see a strong need for it, but if someone else does? (Though I would steal it for the tests ;) )

Not a big fan of this, for the record. Then || should be better, no? (though not cross-platform, but I don't really think it's worth adding an extra flag over; unless someone brings up a good use case, indeed)

@YannickJadoul
Copy link
Member

Actually, the more I think about it, the more it does indeed make sense to me (the idea, that is; this is not a review of the code; I'll do that when I know something about the chances of @joerick being convinced ;-) ).

@henryiii
Copy link
Contributor Author

henryiii commented Jan 18, 2021

Haha, I've been kicking travis since I thought it was broken as usual. Then I realized, this is actually because of this test:

test_cpp17_py27_modern_msvc_workaround

We only have an explicit skip for Appveyor; we don't support the workaround on travis (See docs: Note that building Windows Python 2.7 wheels on Travis is unsupported.), but this test used to "pass" anyway because it's making an empty set and expecting an empty set. 🤦 Another +1 for adding this check!!!

We probably should relax this to allow this workaround on Travis, honestly, since it should support it, it just doesn't support non-workaround Windows 2.7 wheels.

@joerick
Copy link
Contributor

joerick commented Jan 18, 2021

Ah! It comes up again! Maybe my instinct is just wrong about this...

I struggle to disagree with what I wrote last time, from #179:

Also, there might be reasons a user would like no action. E.g., a user that wants to temporarily skip wheels on Windows, for example, might set CIBW_SKIP=*-win* in their travis config. Or maybe somebody is having a tough day fighting the CI, they might set CIBW_SKIP=*. In both cases, the option is fairly clear that the user would like no builds, and cibuildwheel can happily succeed at doing nothing.

Maybe, I'm thinking about this wrong? Here I'm thinking about cibuildwheel as an API, and in an API, you ask for something to be done 0 times, and it will not raise an error. Because it can do what you ask. In an API, an error should happen when you ask the computer to do something and it cannot do it. But here, it can. It can do nothing.

Buuut I suppose the other way to look at cibuildwheel is as a user interface. You can't open 0 items. You can't add 0 songs to a playlist. Most UIs don't let you do these things. The button is disabled, the menu item greyed out.

I guess command line apps fall in the middle of these things. We're both UIs and APIs. So both arguments work.

$ cp *.txt /tmp
zsh: no matches found: *.txt

I think if you write cibuildwheel in a config and it builds nothing, that's confusing to readers, you should make your intention clear with the || trick, for example. You wouldn't expect pip install x to silently do nothing if x can't be installed on this platform?

Yeah. I can't really argue with that either... and added to the archs/requires-python thing, I think I'm coming round to the idea. I don't have time to review now, but I think in principle I'm okay with it :)

@YannickJadoul
Copy link
Member

OK, I'll give a quick review tomorrow (taking into account and comparing to @Czaki's #179).

One way to go would be to include this in a release, and wait for complaints to come in? If they suddenly do, a new patch version is released quickly enough? At the very least, this would be safe: users would only risk getting unwanted errors, but no silently ignored unwanted results :-)

@YannickJadoul YannickJadoul self-requested a review January 18, 2021 23:00
@henryiii
Copy link
Contributor Author

Also keep in mind cibuildwheel is designed for CI. In CI, no one is monitoring the jobs most of the time, they are just looking for green checks. Having cibuildwheel happily claim it did its job when there's a mistake in BUILD, a bad SKIP config, etc. We just had a bug in the tests because running on travis windows silently skips Python 2.7. Someone making a special Windows 2.7 workaround build with CIBW_BUILD=?p27-* on Travis would just get a happy cibuildwheel run, no error (or even warning, currently), and no wheel. When we drop cp27, the jobs that run cp27* are still going to "succeed", just no wheels will be produced.

Azure made a huge mistake in not activating set -e on multiline scripts; they were unable to fix it after shipping, so they fixed it in GitHub Actions when it came out. Passing errors in CI is very problematic! If you do like this idea, there could be --allow-empty flag, I think it's better to comment out cibuildwheel or similar, but it would be better than silently succeeding on bad configurations by default.

Also, there might be reasons a user would like no action

Then skip the job (or comment out the cibuildwheel line if you really have to run everything else). Starting up the CI, running the job, getting a green check, but not building anything is very likely to cause big problems later. You have to remember to un-skip it. And you might have the parts of the pipeline that depend on the wheels produced, or have expensive setup; it's better to turn off the job, I think.

One way to go would be...

I rather expect this might catch some mistakes. :) I'm now suddenly apprehensive for my own projects now. I actually always assumed it wouldn't let me build nothing and still succeed, so if it passes, I don't check to see if nothing was produced.

@henryiii henryiii force-pushed the fix/err-if-empty branch 3 times, most recently from e5af42b to d689f26 Compare January 19, 2021 04:48
@@ -259,6 +262,10 @@ def main() -> None:
else:
assert_never(platform)

if not identifiers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things:

  1. So, the main difference with Assert that build will fail if there is no configuration to build #179 is that @Czaki put the detection in the build function, and raised an exception there. I think I prefer this approach (checking in __main__), since we don't really care about doing this twice?
    • @Czaki's approach does fix the whole "marker for empty build" issue, since build gets intercepted.
  2. Why do this after you print the preamble and after calling build? If we're catching errors in __main__.py, they should probable be thrown before we give the impression that cibuildwheel started running successfully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this after you print the preamble

Because that information could potentially be useful in helping debug what went wrong. The preamble is also not build-specific, it doesn't get repeated N times for N builds.

since we don't really care about doing this twice

You mean three times? Once per OS file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it's not as important now that we have the build selector. However, basic info like what it's running on, etc. still seem quite useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, myeah, maybe. We don't do that for the other errors, though; but you could argue that the other errors are a bit "simpler".

since we don't really care about doing this twice

You mean three times? Once per OS file?

That wasn't clear. But I meant determining the selected build identifiers twice. It's now done to check they're not empty, then again in {linux,macos,windows}.build() to actually loop over the different builds.

So I meant we don't really care about the performance of doing this twice, in this case (not like it's going to be measurable), and there's hardly any code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to print it out before if that's better; including the build specifier helps quite a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to print this after the preamble, printing the options (and configuration warnings) might be useful information. But after calling platform-specific.build is strange to me... as this check is designed to flag a misconfiguration. Also, just a few lines up we create the output_dir ready to receive wheels, we should be doing this check and failing before that I think.

@henryiii
Copy link
Contributor Author

I'm not against the idea of a --allow-empty flag. The problem with cibuildwheel || true is that it also masks build errors, while --allow-empty would just be a safe way to opt back into the old behavior. And it's handy for a few of our unit tests, too. ;)

@henryiii henryiii closed this Jan 20, 2021
@henryiii henryiii reopened this Jan 20, 2021
@henryiii henryiii changed the title fix: error if empty build selection feat: error if empty build selection Jan 20, 2021
@YannickJadoul
Copy link
Member

I'm not against the idea of a --allow-empty flag. The problem with cibuildwheel || true is that it also masks build errors, while --allow-empty would just be a safe way to opt back into the old behavior. And it's handy for a few of our unit tests, too. ;)

Oh, that's true as well, ofc. I don't really mind either way. I don't really think it would be used that often, but it can't really hurt either? (It's a pretty "shallow" flag, anyway, as it's already handled in __main__.py.)

Copy link
Member

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, this makes sense. And the amount of code that was necessary is also reasonably little, which provides evidence it's not a crazy idea/design :-)

@henryiii henryiii changed the title feat: error if empty build selection feat: error if empty build selection, Travis 2.7 Win w/ workaround Jan 21, 2021
Copy link
Contributor

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally, a few minor things, and I think we should move the location of the check in the main function.

Once those are fixed, I'm happy to merge this and get it into 1.8 - it makes sense to land it at the same time as ARCHS.

@@ -259,6 +262,10 @@ def main() -> None:
else:
assert_never(platform)

if not identifiers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to print this after the preamble, printing the options (and configuration warnings) might be useful information. But after calling platform-specific.build is strange to me... as this check is designed to flag a misconfiguration. Also, just a few lines up we create the output_dir ready to receive wheels, we should be doing this check and failing before that I think.

@henryiii henryiii force-pushed the fix/err-if-empty branch 4 times, most recently from 1327f28 to e6d007a Compare January 22, 2021 14:49
@henryiii
Copy link
Contributor Author

Okay, I've moved architecture out into a separate file, and the check now happens inline rather than in each build function, so it can precede the empty selector check. It's also nice to have all possible exits in the same place and have a clearer idea of what's being checked directly in the main function, reading top to bottom. :) (CC @YannickJadoul).

Also test for the SystemExit now, rather than just skipping, even on Windows.

@henryiii henryiii mentioned this pull request Jan 22, 2021
Copy link
Contributor

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@henryiii
Copy link
Contributor Author

The emulated wheel check is slow...........

@henryiii
Copy link
Contributor Author

Why did this suddenly take so much longer, even timing out on the emulate build?

@henryiii
Copy link
Contributor Author

henryiii commented Jan 22, 2021

Must be an issue with GHA/Azure, the Travis runs were exactly the same length as before. All the GHA/Azure runs took much longer than normal (compared to a recent PR). (Not sure, but I think so).

@joerick
Copy link
Contributor

joerick commented Jan 22, 2021

Well, the emulation test passed. No idea what happened there. I can't see the old log and the results of the previous check because of the force-push (you guys with your force pushing! but i digress). I think the other tests had passed last time around?

@henryiii
Copy link
Contributor Author

Much better, 16 mins instead of 3 hours. :)

@joerick
Copy link
Contributor

joerick commented Jan 22, 2021

Pretty sure the previous failure was just a glitch, and want to get on with this release!

@joerick joerick merged commit d87b589 into pypa:master Jan 22, 2021
@henryiii
Copy link
Contributor Author

The force push put it on latest master to pick up the readme update, and changed the name from err to exit in one spot for consistency. Nothing else changed. All the old logs are in the "actions" tab on the top. But I'm pretty sure this was a weird glitch in scheduling, several jobs were longer than they should have been.

@henryiii henryiii deleted the fix/err-if-empty branch January 22, 2021 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants