docs: export dependency graph as adjacency list

pantsbuild · Feb 17, 2024 · 752cfaf · 752cfaf
1 parent d782c03
commit 752cfaf
Showing 1 changed file with 66 additions and 0 deletions.
diff --git a/docs/docs/using-pants/project-introspection.mdx b/docs/docs/using-pants/project-introspection.mdx
@@ -127,6 +127,72 @@ To include the original target itself, use `--closed`:
 helloworld/main.py:lib
 ```
 
+## Export dependency graph
+
+Both `dependencies` and `dependents` goals have the `--format` option allowing you to export data in multiple formats.
+Exporting information about the dependencies and dependents in JSON format will produce the
+[adjacency list](https://en.wikipedia.org/wiki/Adjacency_list) of your dependency graph:
+
+```bash
+$ pants dependencies --format=json \
+  helloworld/greet/greeting.py \
+  helloworld/translator/translator_test.py
+
+{
+    "helloworld/greet/greeting.py:lib": [
+        "//:reqs#setuptools",
+        "//:reqs#types-setuptools",
+        "helloworld/greet:translations",
+        "helloworld/translator/translator.py:lib"
+    ],
+    "helloworld/translator/translator_test.py:tests": [
+        "//:reqs#pytest",
+        "helloworld/translator/translator.py:lib"
+    ]
+}
+```
+
+This has various applications, and you could analyze, visualize, and process the data further. Sometimes, a fairly
+straightforward `jq` query would suffice, but for anything more complex, it may make sense to write a small program
+to process the exported graph. For instance, you could:
+
+* find tests with most transitive dependencies
+
+```bash
+$ pants dependencies --filter-target-type=python_test --format=json :: \
+  | jq -r 'to_entries[] | "\(.key)\t\(.value | length)"' \
+  | sort -k2
+```
+
+* find build targets that no one depends on
+
+```bash
+$ pants dependents --filter-target-type=resource --format=json :: \
+  jq -r 'to_entries[] | select(.value | length == 0)'
+```
+
+* find project source files that transitively lead to most tests
+
+```python
+# depgraph.py
+import json
+
+with open("data.json") as fh:
+    data = json.load(fh)
+
+for source, dependents in data.items():
+    print(source, len([d for d in dependents if d.startswith("tests/")]))
+```
+
+```bash
+$ pants pants dependents --transitive --format=json cheeseshop:: > data.json
+$ python3 depgraph.py | sort -k2
+```
+
+For more sophisticated graph querying, you may want to look into graph libraries such as [`networkx`](https://networkx.org/).
+In a larger repository, it may make sense to track the health of the dependency graph and use the output
+of the graph export to identify segments that would benefit from refactoring.
+
 ## `filedeps` - find which files a target owns
 
 `filedeps` outputs all of the files belonging to a target, based on its `sources` field.