Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python_source_root not added to syspath #11666

Closed
jyggen opened this issue Mar 10, 2021 · 4 comments · Fixed by #11673
Closed

python_source_root not added to syspath #11666

jyggen opened this issue Mar 10, 2021 · 4 comments · Fixed by #11673
Labels

Comments

@jyggen
Copy link
Member

jyggen commented Mar 10, 2021

I have a project structure along the lines of this:

packages/
    package_a/
        BUILD
        __init__.py
        package_a.py
    package_b/
        BUILD
        __init__.py
        package_b.py
protos/
    package_a/
        BUILD
        foobar.proto
    service_a/
        BUILD
        bazqux.proto
services/
    service_a/
        BUILD
        __init__.py
        main.py
    service_b/
        BUILD
        __init__.py
        main.py

/packages, /protos and /services are all configured as source roots. In each BUILD files residing in protos/* I have either the line protobuf_library(python_source_root='services') or protobuf_library(python_source_root='packages'), which means that the generated code ends up in packages/* or services/* instead.

This is working just fine as long as the running code is in the same source root as the generated protobuf code, but when code in services/ is dependent on protos that has python_source_root set to packages, Python can't find the module unless an actual module from the same source root is also a dependency. I did some digging around, and it seems like the issue is that the source root specified in python_source_root is never explicitly added to Python's syspath, which is why imports fail if no "real" packages from the same source roots are used. So using the same example as earlier I see services and protos, but packages, where the generated code is placed, is missing.

I created a proof-of-concept repository in case my rambling makes little sense. The issue can be seen by running ./pants test services/::.

@Eric-Arellano
Copy link
Contributor

Interesting, this does make sense! This is the relevant code:

extra_env = {
"PYTEST_ADDOPTS": " ".join(add_opts),
"PEX_EXTRA_SYS_PATH": ":".join(prepared_sources.source_roots),
}

Which maps back to:

source_root_objs = await MultiGet(
Get(SourceRoot, SourceRootRequest, SourceRootRequest.for_target(tgt))
for tgt in request.targets
if (
tgt.has_field(PythonSources)
or tgt.has_field(ResourcesSources)
or tgt.get(Sources).can_generate(PythonSources, union_membership)
or tgt.get(Sources).can_generate(ResourcesSources, union_membership)
)
)

Based on this code, we would no matter what be using /protos in the PEX_EXTRA_SYS_PATH (PYTHONPATH) because that is what Get(SourceRoot, SourceRootRequest, SourceRootRequest.for_target(tgt)) evaluates to for the original protobuf_library.

That is, our code is not safe for codegen, which is allowed to place the generated files in any arbitrary location, unlike a normal sources field having to live in the same directory or a subdirectory of the target definition. Similarly tricky, it is legal for codegen to split out its sources into different locations, like generating both dir1/f.ext and dir2/f.ext at the same time.

--

The fix is that rather than using SourceRootRequest.for_target(tgt), which is based on the BUILD file of the original target, we need to use SourceRootRequest.for_file() with codegen:

@classmethod
def for_file(cls, file_path: str) -> SourceRootRequest:
"""Create a request for the source root for the given file."""
# The file itself cannot be a source root, so we may as well start the search
# from its enclosing directory, and save on some superfluous checking.
return cls(PurePath(file_path).parent)
@classmethod
def for_address(cls, address: Address) -> SourceRootRequest:
# Note that we don't use for_file() here because the spec_path is a directory.
return cls(PurePath(address.spec_path))
@classmethod
def for_target(cls, target: Target) -> SourceRootRequest:
return cls.for_address(target.address)

However, we do still need to only be getting source roots for ResourcesSources and PythonSources, or targets that can generate those. We can't simply iterate over every file in sources because it might include things like files() targets, where it would be a bug to set the PYTHONPATH to include those. So, we need to make that list comprehension in python_sources.py fancier..

Right now, this is a convenience that merges all the sources into a single digest. (Refer to https://www.pantsbuild.org/docs/rules-api-and-target-api#the-sources-field and https://www.pantsbuild.org/docs/rules-api-file-system#core-abstractions-digest-and-snapshot):

sources = await Get(
SourceFiles,
SourceFilesRequest(
(tgt.get(Sources) for tgt in request.targets),
for_sources_types=request.valid_sources_types,
enable_codegen=True,
),
)

We don't want that convenience, we need to associate/zip each target with its specific sources. So, we'd replace that above with all_hydrated_sources = await MultiGet(Get(HydratedSources, HydrateSources(..)) for tgt in request.targets), which will give a sequence of HydratedSoures objects. Use the same kwargs that we're using right now with SourceFilesRequest.

Then, replace the below with more complex logic: zip() the request.targets and the all_hydrated_sources. If tgt.has_field(PythonSources) or tgt.has_field(ResourcesSources), use SourceRootRequest.for_target(tgt) like before. Otherwise, if tgt.get(Sources).can_generate(PythonSources, ..) or can generate ResourcesSources, iterate over all of those targets source files in HydratedSources.files and use SourceRootRequest.for_file(f). This probably wouldn't work in a comprehension anymore, but you can build up a list of Get objects using a for loop and then say await MultiGet(source_root_gets).

Finally, updating the test. Add on to this test with a protobuf_library() that sets python_source_root:

def test_python_protobuf(rule_runner: RuleRunner) -> None:
rule_runner.create_file(
"src/protobuf/dir/f.proto",
dedent(
"""\
syntax = "proto2";
package dir;
"""
),
)
rule_runner.add_to_build_file("src/protobuf/dir", "protobuf_library()")
targets = [ProtobufLibrary({}, address=Address("src/protobuf/dir"))]
backend_args = ["--backend-packages=pants.backend.codegen.protobuf.python"]
stripped_result = get_stripped_sources(
rule_runner, targets, source_roots=["src/protobuf"], extra_args=backend_args
)
assert stripped_result.stripped_source_files.snapshot.files == ("dir/f_pb2.py",)
unstripped_result = get_unstripped_sources(
rule_runner, targets, source_roots=["src/protobuf"], extra_args=backend_args
)
assert unstripped_result.source_files.snapshot.files == ("src/protobuf/dir/f_pb2.py",)
assert unstripped_result.source_roots == ("src/protobuf",)

Would you be interested in fixing this @jyggen?

@jyggen
Copy link
Member Author

jyggen commented Mar 10, 2021

Sure, I can give it a try!

@jyggen
Copy link
Member Author

jyggen commented Mar 11, 2021

I gave it a shot in this commit, but something isn't quite right because Pants fills the screen with "No source of dependency..." and refuses to start. Any pointers in the right direction would be appreciated!

@Eric-Arellano
Copy link
Contributor

Huh, I reproduce that extremely confusing rule graph error and I'm not sure why it's happening..sorry for the confusion! Rule graph error messages are a high priority for us to fix.

In the process, though, I realized my suggested approach wasn't fully worked through - I didn't realize that PythonSourceFiles must still have SourceFiles as a field. Instead, I realized a better approach is to keep the original SourceFiles code, but re-hydrate the codegen sources. (The original hydration will have been memoized, so there's little perf impact to this). This diff is working for me:

diff --git a/src/python/pants/backend/python/util_rules/python_sources.py b/src/python/pants/backend/python/util_rules/python_sources.py
index 1d668d7d6..cd891527a 100644
--- a/src/python/pants/backend/python/util_rules/python_sources.py
+++ b/src/python/pants/backend/python/util_rules/python_sources.py
@@ -13,7 +13,7 @@ from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest
 from pants.core.util_rules.stripped_source_files import StrippedSourceFiles
 from pants.engine.fs import MergeDigests, Snapshot
 from pants.engine.rules import Get, MultiGet, collect_rules, rule
-from pants.engine.target import Sources, Target
+from pants.engine.target import HydratedSources, HydrateSourcesRequest, Sources, Target
 from pants.engine.unions import UnionMembership
 from pants.source.source_root import SourceRoot, SourceRootRequest
 from pants.util.logging import LogLevel
@@ -98,15 +98,39 @@ async def prepare_python_sources(
         MergeDigests((sources.snapshot.digest, missing_init_files.snapshot.digest)),
     )
 
-    source_root_objs = await MultiGet(
-        Get(SourceRoot, SourceRootRequest, SourceRootRequest.for_target(tgt))
-        for tgt in request.targets
-        if (
-            tgt.has_field(PythonSources)
-            or tgt.has_field(ResourcesSources)
-            or tgt.get(Sources).can_generate(PythonSources, union_membership)
-            or tgt.get(Sources).can_generate(ResourcesSources, union_membership)
+    # Codegen is able to generate code in any arbitrary location, unlike sources normally being
+    # rooted under the target definition. To determine source roots for these generated files, we
+    # cannot use the normal `SourceRootRequest.for_target()` and we instead must determine the
+    # source file for every individual generated file. So, we re-resolve the codegen sources here.
+    python_and_resources_targets = []
+    codegen_targets = []
+    for tgt in request.targets:
+        if tgt.has_field(PythonSources) or tgt.has_field(ResourcesSources):
+            python_and_resources_targets.append(tgt)
+        elif tgt.get(Sources).can_generate(PythonSources, union_membership) or tgt.get(
+            Sources
+        ).can_generate(ResourcesSources, union_membership):
+            codegen_targets.append(tgt)
+    codegen_sources = await MultiGet(
+        Get(
+            HydratedSources,
+            HydrateSourcesRequest(
+                tgt.get(Sources), for_sources_types=request.valid_sources_types, enable_codegen=True
+            ),
+        )
+        for tgt in codegen_targets
+    )
+    source_root_requests = [
+        *(SourceRootRequest.for_target(tgt) for tgt in python_and_resources_targets),
+        *(
+            SourceRootRequest.for_file(f)
+            for sources in codegen_sources
+            for f in sources.snapshot.files
         )
+    ]
+
+    source_root_objs = await MultiGet(
+        Get(SourceRoot, SourceRootRequest, req) for req in source_root_requests
     )
     source_root_paths = {source_root_obj.path for source_root_obj in source_root_objs}
     return PythonSourceFiles(

The only thing missing now is expanding the tests in python_sources_test to ensure the edge case you identified works. Would you be interested in applying the above diff and updating that test? I'm sorry about the confusing rule graph error and the wrong instructions!

Eric-Arellano pushed a commit to Eric-Arellano/pants that referenced this issue Mar 12, 2021
…ot` (pantsbuild#11673)

Fixes pantsbuild#11666.
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Eric-Arellano pushed a commit to Eric-Arellano/pants that referenced this issue Mar 12, 2021
…ot` (pantsbuild#11673)

Fixes pantsbuild#11666.
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants