Easier cross-compiling for level 4? #5

stuarteberg · 2024-05-29T15:46:24Z

Comment:

The conda-forge docs for the microarch-optimized builds have an example that uses microarch_level: 4. But the README for this feedstock contains the following caveat:

When building packages on CI, level=4 will not be guaranteed, so you can only use level<=3 to build.

Indeed, when I tried to use level 4, I saw failures (in my case, it was on osx).

Nonetheless, I'd like to produce optimized builds for machines that support AVX-512 (level 4). This was possible by explicitly adding the necessary build flag in build.sh and then explicitly listing the appropriate run dependency:

# conda_build_config.yaml
microarch_level:
  - 1
  - 3  # [unix and x86_64]
  - 4  # [unix and x86_64]

# build.sh
if [[ "${microarch_level}" == "4" ]]; then
    CXXFLAGS="${CXXFLAGS} -march=x86-64-v4"
fi

# meta.yaml
requirements:
  run:
    - _x86_64-microarch-level 4  # [unix and x86_64 and microarch_level == 4]

Using that workaround, we were able to produce optimized binaries (including march=x86-64-v4 in the graph-tool feedstock (conda-forge/graph-tool-feedstock#140).

Would it be possible to make that easier for feedstock maintainers, perhaps by having the microarch-level-feedstock produce yet another output?

Right now this feedstock produces two packages for each arch, such as:

x86_64-microarch-level
a. Introduces the -march=x86-64-v${level} flag in CFLAGS etc.
b. Introduces a run_export to _x86_64-microarch-level
_x86_64-microarch-level
a. Introduces a run dependency to the appropriate __archspec virtual package.

...but it seems like cross-compilation would be easier if we were to split up the functionality from 1.a and 1.b. into two separate packages, so we could easily obtain the correct CFLAGS without pulling in the __archspec dependency. Perhaps we could offer two variants of the package: one that provides both 1.a and 1.b, and another variant that only provides 1.a. (I'm just splitballing here...)

Alternatively, we could just drop the run_exports from the {{ family }}-microarch-level recipe. In that case, feedstock maintainers could build level-4 packages without needing to add the compiler flag explicitly, but they would be forced to explicitly list the appropriate runtime dependency in their recipe, which could be annoying:

requirements:
  build:
    - x86_64-microarch-level {{ microarch_level }}  # [unix and x86_64]
    - ppc64le-microarch-level {{ microarch_level }}  # [unix and ppc64le]
  run:
    - _x86_64-microarch-level >={{ microarch_level }} # [unix and x86_64]
    - _ppc64le-microarch-level >={{ microarch_level }} # [unix and ppc64le]

The text was updated successfully, but these errors were encountered:

traversaro · 2024-05-29T15:48:32Z

This is probably related to the discussion in conda-forge/conda-forge.github.io#1261 .

isuruf · 2024-05-29T20:30:04Z

This is a deficiency of run_exports where strong run_exports in build -> host & run and we have no way of specifying build -> run only. I suggest doing ignore_run_exports_from and manually adding them in run.

baszalmstra · 2024-07-16T13:22:35Z

Is this now solved by #6 ?

baszalmstra · 2024-08-06T14:10:55Z

Is this now solved by #6 ?

To answer my own question: no.

Adding x86_64-microarch-level 4 to the build section still adds a _x86_64-microarch-level >=4 package to the host section.

So continuing with the idea from @stuarteberg, how about we do the following:

We introduce a new package in this feedstock whose only purpose is to have a strong run-export on _x86_64-microarch-level. Lets call this package _x86_64-microarch-level-run-export. (I'd love to hear a better name).
The run-export of x86_64-microarch-level is replaced with a weak run_export on this new package _x86_64-microarch-level-run-export. This ensures that the package is added to only the host section of the build.

@isuruf @traversaro @stuarteberg Thoughts?

On another note, currently the x86_64-microarch-level is also created for every microarchitecture, but it seems to me that it is completely unrelated. Should we maybe just remove that?

baszalmstra · 2024-08-06T14:17:28Z

I just went ahead and implemented my note from above: #9

Let me know what you think!

isuruf · 2024-08-06T14:37:26Z

We introduce a new package in this feedstock whose only purpose is to have a strong run-export on _x86_64-microarch-level. Lets call this package _x86_64-microarch-level-run-export. (I'd love to hear a better name).

And what's the difference between that and the current x86_64-microarch-level?

baszalmstra · 2024-08-06T15:50:09Z

Im trying to create a method where a package in the build section adds a package to just the run section (so not the host section). We can do that through 3 different packages:

x86_64-microarch-level: (should be placed in the build section)
- Adds the activation scripts to set the proper compiler flags.
- Adds a weak runexport on _x86_64-microarch-level-run-export
_x86_64-microarch-level-run-export: (automatically placed in the host section if x86_64-microarch-level is placed in build).
- Adds a weak runexport on _x86_64-microarch-level
_x86_64-microarch-level: automatically added by _x86_64-microarch-level-run-export to the run section
- Adds a requirement on a specific __archspec.

As you can see, all three packages have a different responsibility.

The result is that one can just add x86_64-microarch-level to the build section, which will allow building for an architecture that the build machine does not support, while also automatically adding a run requirement that enforces the proper archspec.

isuruf · 2024-08-06T15:56:02Z

host environment getting the virtual packages from build machine is the root issue and all the solutions thus far are hacks IMO. Also your suggestion doesn't work in the case where a v4 package needs a dependency v4 package in host. Then we have the same issue.

I think the best solution here is a way to specify virtual packages to be added to host environments.

baszalmstra · 2024-08-11T16:33:31Z

I think the best solution here is a way to specify virtual packages to be added to host environments.

I have previously been discussing that with @wolfv . That does indeed seem like the best solution.

For the time being however, maybe we can brainstorm a solution that works in the mean time?

Also your suggestion doesn't work in the case where a v4 package needs a dependency v4 package in host. Then we have the same issue.

Would adding a constraint from _x86_64-microarch-level-run-export on _x86_64-microarch-level fix that?

baszalmstra · 2024-08-19T11:45:24Z

@isuruf Any thoughts? I can create a PR if that helps?

isuruf · 2024-08-19T13:47:33Z

For now can you use ignore_run_exports? That seems better than the hacky solution you suggested.

baszalmstra · 2024-08-19T14:17:03Z

For now can you use ignore_run_exports?

But that will require the user to have a deep understanding of what is going on and why this is a problem. The whole idea is to remove friction for the user. My workaround is not that hacky is it?

isuruf · 2024-08-19T15:51:13Z

But that will require the user to have a deep understanding of what is going on and why this is a problem.

What you are suggesting needs a deep understanding of what is going on when compiling downstream packages. The user needs to know that the ABI doesn't change when compiling a lvl4 package with a lvl3 downstream package, but running with a downstream lvl4 package.

stuarteberg added the question Further information is requested label May 29, 2024

stuarteberg mentioned this issue May 29, 2024

Attempt microarch build conda-forge/graph-tool-feedstock#140

Merged

5 tasks

jjhelmus mentioned this issue Jun 4, 2024

host microarch level leaks into run requirements #6

Closed

baszalmstra mentioned this issue Sep 6, 2024

feat: Enable SIMD optimizations conda-forge/boost-histogram-feedstock#57

Open

4 tasks

This was referenced Sep 6, 2024

feat: Enable SIMD optimizations conda-forge/awkward-cpp-feedstock#46

Merged

feat: Enable SIMD optimizations conda-forge/iminuit-feedstock#107

Closed

feat: Enable SIMD optimizations conda-forge/pythia8-feedstock#49

Merged

matthewfeickert mentioned this issue Dec 12, 2024

Double builds for linux-aarch64, linux-ppc64le, and osx-arm64 conda-forge/pythia8-feedstock#53

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier cross-compiling for level 4? #5

Easier cross-compiling for level 4? #5

stuarteberg commented May 29, 2024

traversaro commented May 29, 2024

isuruf commented May 29, 2024

baszalmstra commented Jul 16, 2024

baszalmstra commented Aug 6, 2024 •

edited

Loading

baszalmstra commented Aug 6, 2024

isuruf commented Aug 6, 2024 •

edited

Loading

baszalmstra commented Aug 6, 2024

isuruf commented Aug 6, 2024

baszalmstra commented Aug 11, 2024

baszalmstra commented Aug 19, 2024

isuruf commented Aug 19, 2024

baszalmstra commented Aug 19, 2024

isuruf commented Aug 19, 2024

Easier cross-compiling for level 4? #5

Easier cross-compiling for level 4? #5

Comments

stuarteberg commented May 29, 2024

Comment:

traversaro commented May 29, 2024

isuruf commented May 29, 2024

baszalmstra commented Jul 16, 2024

baszalmstra commented Aug 6, 2024 • edited Loading

baszalmstra commented Aug 6, 2024

isuruf commented Aug 6, 2024 • edited Loading

baszalmstra commented Aug 6, 2024

isuruf commented Aug 6, 2024

baszalmstra commented Aug 11, 2024

baszalmstra commented Aug 19, 2024

isuruf commented Aug 19, 2024

baszalmstra commented Aug 19, 2024

isuruf commented Aug 19, 2024

baszalmstra commented Aug 6, 2024 •

edited

Loading

isuruf commented Aug 6, 2024 •

edited

Loading