From 752cfaf5446875b3de33cbf9d985027740f7e517 Mon Sep 17 00:00:00 2001 From: Alexey Tereshenkov <50622389+AlexTereshenkov@users.noreply.github.com> Date: Mon, 12 Feb 2024 00:31:15 +0000 Subject: [PATCH] docs: export dependency graph as adjacency list --- .../using-pants/project-introspection.mdx | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/docs/docs/using-pants/project-introspection.mdx b/docs/docs/using-pants/project-introspection.mdx index a2222714278b..ad9db28f4e40 100644 --- a/docs/docs/using-pants/project-introspection.mdx +++ b/docs/docs/using-pants/project-introspection.mdx @@ -127,6 +127,72 @@ To include the original target itself, use `--closed`: helloworld/main.py:lib ``` +## Export dependency graph + +Both `dependencies` and `dependents` goals have the `--format` option allowing you to export data in multiple formats. +Exporting information about the dependencies and dependents in JSON format will produce the +[adjacency list](https://en.wikipedia.org/wiki/Adjacency_list) of your dependency graph: + +```bash +$ pants dependencies --format=json \ + helloworld/greet/greeting.py \ + helloworld/translator/translator_test.py + +{ + "helloworld/greet/greeting.py:lib": [ + "//:reqs#setuptools", + "//:reqs#types-setuptools", + "helloworld/greet:translations", + "helloworld/translator/translator.py:lib" + ], + "helloworld/translator/translator_test.py:tests": [ + "//:reqs#pytest", + "helloworld/translator/translator.py:lib" + ] +} +``` + +This has various applications, and you could analyze, visualize, and process the data further. Sometimes, a fairly +straightforward `jq` query would suffice, but for anything more complex, it may make sense to write a small program +to process the exported graph. For instance, you could: + +* find tests with most transitive dependencies + +```bash +$ pants dependencies --filter-target-type=python_test --format=json :: \ + | jq -r 'to_entries[] | "\(.key)\t\(.value | length)"' \ + | sort -k2 +``` + +* find build targets that no one depends on + +```bash +$ pants dependents --filter-target-type=resource --format=json :: \ + jq -r 'to_entries[] | select(.value | length == 0)' +``` + +* find project source files that transitively lead to most tests + +```python +# depgraph.py +import json + +with open("data.json") as fh: + data = json.load(fh) + +for source, dependents in data.items(): + print(source, len([d for d in dependents if d.startswith("tests/")])) +``` + +```bash +$ pants pants dependents --transitive --format=json cheeseshop:: > data.json +$ python3 depgraph.py | sort -k2 +``` + +For more sophisticated graph querying, you may want to look into graph libraries such as [`networkx`](https://networkx.org/). +In a larger repository, it may make sense to track the health of the dependency graph and use the output +of the graph export to identify segments that would benefit from refactoring. + ## `filedeps` - find which files a target owns `filedeps` outputs all of the files belonging to a target, based on its `sources` field.