-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dependencies in WORKSPACE file #5252
Comments
Interesting, I wasn't awared of this option. We have similar requirement in Gerrit to consume JGit from development tree instead of from Maven-Central. Here is the instruction: https://github.com/GerritCodeReview/gerrit/blob/master/Documentation/dev-bazel.txt#L360-L365 And the corresponding code is here: https://github.com/GerritCodeReview/gerrit/blob/master/lib/jgit/jgit.bzl#L11-L13 It would be great, if we could simplify this setting and pass that option from the command line instead editing source build file in tree. |
@davido Yes, this is exactly use case, can you give it a try? :) |
@meteorcloudy Yes, will look into replacing it and comment how it's going. |
I'm seeing that passing
And the wrapper of JGit libraries utilizing
So, I think it would be really tricky to make it work without this mapping layer and only using Previously I tried to use config_setting for this purpose: #2707. |
I may be a bit confused here, as I am not too familiar with Bazel, but as someone who just wants to integrate TensorFlow into a distro, I want to tell the TF build where my existing installation of Eigen (or another library) lives -- and I just want to provide the unix prefix where it is installed. That is essentially the result of I am not clear on what the semantics of Also, as a user, writing a BUILD file sounds quite tedious, especially if I do not know Bazel. Shouldn't the BUILD file be provided by the project (e.g., TensorFlow), so that whoever runs the build just has to supply a location for the dependency? |
@davido I see. If I understand correctly, the problem is that the layouts of jgit_maven_repos and local jgit repository are different, therefore we not only need to use |
@tgamblin The semantics is I agree ideally users should not be asked to provide a BUILD file for their installations like And yes maybe TensorFlow could provide the BUILD file, but if the override repository has a different layout of the original repo, we not only have to provide a BUILD file, but also need to change the target path (or name). I think this is the problem @davido encountered. |
Yes, somehow we need to implement that mapping:
The reason is obvious - cardinality mismatch between repositories used in WORKSPACE for JGit (N) and JGit local repository (1) - JGit projects has multiple maven artifacts. Each artifact is represented as one external repository in bazel (and one To add to this complexity we need source artifact as well, because we patch some JGit code for diff code paths, that why you see both jar and source classifiers in the map for
|
I think you either need to a) create a way to auto-generate a BUILD file for an external project (seems really hard to me), or provide a way for project implementors like TensorFlow to provide BUILD files describing what external dependencies should look like. The overhead of creating the build file is going to be way too high for most distro packagers to consider maintaining Bazel-built stuff. Requiring everything to have a BUILD file is also a huge burden for project maintainers, as they have to know how to build all their dependencies with Bazel. I don't see how this is scalable. TF basically owns all of its dependencies and their builds, and it depends on very low level aspects of them (specific artifacts and internal headers).
I talked to @martinwicke about this, and he seemed to think it would be fine if the TF build switched the way they handle dependencies to be more compatible with external installs. Specifically, a TF build directory might look like this:
AFAICT, the TF build currently builds each dependency repo in-place, and the BUILD files expose artifacts from the dependency to the project. I think you want something more like this:
Now, TF builds You could add a @martinwicke seemed to think it wouldn't be overly cumbersome for TF's Bazel build to first build, then install each external dependency like this. Maybe he can comment. FWIW, we do something similar to this in Spack, which you might be interested in. See the External Packages section of our docs. Basically, users can specify a prefix in a YAML file, and they can tell Spack as much as they know about it (version of package, compiler, compiler version, etc.). Users can choose whether they want to build certain packages themselves or provide an existing installation. It gets a little messy for system prefixes (which are shared and may contain other libraries that can pollute the build) and for packages where the system layout may be different depending on how they were installed, but it generally works. |
@tgamblin This means, Bazel doesn't touch your source tree, so there's no way to install the build for Instead, how about we provide two BUILD files for each repo:
Both BUILD files should define the same set of targets, so that when switching the dependencies, all targets are still available. We can then replace the dependencies in the following two way:
I actually tried to use
But found the follow issues:
|
@aehlig and I looked at this for some time and here is a potential general solution for replacing source dependencies with pre-built ones. Let us know what you think. First of all, we will need a repository rule (maybe
This rule will populate the repository directory with symlinks to prebuilt files and generate a suitable BUILD files with Then TF will structure their WORKSPACE file as follows. There will be an almost empty file
WORKSPACE file is structured so:
where
During the normal build, when Distro maintainers can replace the
and TF build will use those instead of source dependencies. (Of course, that file should be autogenerated from package descriptions). WDYT? (None of these require any changes in Bazel, but we can help with writing |
@irengrig fyi |
@perfinion FYI, since you are working on this. |
Huh cool, so I've implemented a thing that is similar in ways to parts of how @meteorcloudy and @dslomov did things. My stuff is here: https://github.com/perfinion/tensorflow/commits/unbundle I used a regular cc_library instead. Ideally I'd like some kind of pkg_config_library that would automatically search pkg-config. But until then we can just hardcode the linker flags manually. I originally tried doing something similar to this below but realized the headers= part is not necesary, the compiler will always search /usr/include by default since thats one of the system paths. Its not exactly hermetic but the downsides of hardcoding the paths is a ton of extra maintenance work tracking paths across different distros. Not only that, /usr/include is only correct if you are building for the host. Cross compiling frequently has a different dir that the compiler will look for correctly. (eg /usr/aarch64-unknown-linux-gnu/usr/include/). Also things should be linking to the .so file not the static lib.
swig was one of the more complicated BUILD files since it needs to execute the binary. I ended up just making a rule to copy it into the bazel-bin and run it from there. https://github.com/perfinion/tensorflow/blob/unbundle/third_party/systemlibs/swig.BUILD That is pretty sub optimal, it should just be able to run it from /usr/bin/swig. Copying worked in this case but some binaries use other resources relative to where they are (eg python virtualenv stuff) or SELinux labels may be needed in which case copying would just fail to run at all. About configuring the overrides, I'd rather not have just a blank local_overrides() function. The tooling to make system linked builds definitely needs to be up in the main tensorflow repo itself. Otherwise distro maintainers are going to have to maintain a ton of stuff separately from tensorflow which makes coordination complicated and new people packaging tensorflow may not find the work from other distros and would end up duplicating all the effort. I quite like something like this, its defintely very easy to use: One problem I ran into is that I dont know how to make an if_system so I can nop out the license files. Or is there a way to make just a blank rule so the pip_package BUILD file can still query for it but just gets nothing? |
@perfinion Thanks for the feedback! Excluding the license files should be easy. You can consider the following way:
You can also use the |
@meteorcloudy Thanks, that intro helped me finally understand the conditionals, I'd seen them scattered around but it didn't quite click. The system libs stuff can't be an all-or-nothing toggle since there are still a ton more to unbundle on top of the ones I've done so far. I'm having trouble reconciling the two tho. I currently use an ENV var like this (perfinion/tensorflow@3a667f7#diff-2115a4210ba6ff522adacf81a857d671R40)
Then .tf_configure_bazelrc can contain: That worked really easily in _tf_http_archive to just put the downloading inside an if block. But the above needs ctx so this won't work:
Is there a way to change my original _use_system_lib to use the --define instead of --action_env then I wouldn't need the context I think? I could of course make a million different --define=use_syslib_pcre=true --define=use_syslib_swig=true etc but that seems tedious I found a cleaner way to deal with most of the license files. A blank filegroup worked to define the target that just doesn't do anything. This works for most of the targets since they are eg
|
I asked about this on https://stackoverflow.com/questions/75127243/how-can-i-optionally-override-repository-dependencies-in-bazel/75127299#75127299 and was directed here. What about the possibility of an "optional load" function that can be used in a WORKSPACE.bazel file. Pseudo code: if load_if_exists(":local_workspace.bzl", "install_overrides"):
install_overrides() |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale. |
@meteorcloudy, what is the status of this with bzlmod? |
This is about build switch between system dependencies and fetched dependencies, still an issue with Bzlmod. |
Related doc: TensorFlow Bazel Build Issues
Bazel users sometimes want to replace an external dependencies in WORKSAPCE with a local one.
For example:
I want to use Eigen installed at
/opt/local/eigen-3.3.4
to replace@eigen_archive
. This is achievable in Bazel, but not easy due to two reasons:Lack of documentation of
--override_repository
option.We can consider turning this test into a proper example
bazel/src/test/shell/bazel/workspace_test.sh
Line 268 in 9d5c323
A BUILD file should be provided. For the eigen case, we either put the BUILD file under
/opt/local/eigen-3.3.4
or symlink the files somewhere else then add the BUILD file. Neither of them work very nice. We should think about how to improve the process.Updates:
--override_repository
not enough./cc @aehlig @dkelmer @dslomov
The text was updated successfully, but these errors were encountered: