An open-source tree-sitter package manager
Currently, TSPM is a medium-size collection of grammars hosted in
- an interactive online playground
- a package-registry
It's unstable for now and entirely focused on preparing packages for use in Helix. See the scope docs for more information.
TSPM publishes artifacts to a CDN-like registry. Each item in the registry takes the shape
https://pkgs.tspm.io/<language>/<owner>/<revision>-<tree-sitter-version>-<abi-version>-<checksum>-<format>
language
- the language the grammar parses, for example:typescript
owner
- the owner of the grammar as declared ingrammars.toml
revision
- the git revision of the grammar, for example:b3d4a7f14537ecb1eedc75d5e273dd3ce2887df5
tree-sitter-version
- the version of tree-sitter-cli used to generate the artifact, for example:0.20.6
abi-version
- the tree-sitter ABI version number used to generate the artifact, for example:13
checksum
- a base 16 representation of the sha256 sum of the generated artifact, for example:956eb868f38544dedcd5ef45f0ecc7cab542c68b89fae631e149008ad5cc72e8
format
- the format in which the grammar was generated, for example:src.tar.gz
- currently only
src.tar.gz
is published, which is agzip
-compressed GNUtar
archive of thesrc/
directory generated withtree-sitter generate
- currently only
A listing of available artifacts can be found on https://pkgs.tspm.io (this UI will improve).
The package registry is S3 compatible: any HTTP or S3 client is capable of
downloading artifacts. For example, let's download a grammar with curl
,
verify its integrity with sha256sum
, and open it up with tar
.
$ curl -o elixir.tar.gz -sSL https://pkgs.tspm.io/elixir/elixir-lang/a11a686303355a518b0a45dea7c77c5eebb5ec22-0.20.6-13-956eb868f38544dedcd5ef45f0ecc7cab542c68b89fae631e149008ad5cc72e8-src.tar.gz
$ sha256sum elixir.tar.gz
956eb868f38544dedcd5ef45f0ecc7cab542c68b89fae631e149008ad5cc72e8 elixir.tar.gz
$ mkdir src
$ tar xzf elixir.tar.gz -C src
$ tree src/
src
βββ grammar.json
βββ LICENSE
βββ node-types.json
βββ NOTICE
βββ parser.c
βββ scanner.cc
βββ tree_sitter
βββ parser.h
1 directory, 7 files
Now in our elixir
directory we have the files generated by tree-sitter generate
and any licensing files. We can build the grammar with a C/C++
compiler like so:
$ CFLAGS="-I src/ -g -O2 -fPIC -fno-exceptions"
$ c++ -c src/scanner.cc -o scanner.o $CFLAGS
$ cc -c src/parser.c -o parser.o $CFLAGS
$ cc -shared -o elixir.so *.o
Now the elixir.so
shared object is ready for use!
Each new grammar needs an entry in grammars.nix
and its versions locked
in grammar-lock.json
. Say we're packaging
elixir-lang/tree-sitter-iex
. First, we'll add a section to
grammars.nix
with the license:
{
# ..
iex.elixir-lang = tspm.grammar { meta.license = lib.licenses.asl20; };
}
Then we'll add the package to the lockfile:
$ nix run .#lock -- elixir-lang iex
Use nix flake check
to verify that the grammars pass tests.
Currently, tree-sitter grammars are distributed using git repositories, which places the burden of writing a well written package on the grammar authors. This is a bit problematic because:
- grammar repositories typically contain items like documentation, queries, screenshots, tests, etc. that are unnecessary in packages but are good for the grammars themselves
- grammar authors do not usually have any reason to update tree-sitter versions, which means generated parser files may fall behind when breaking ABI changes happen in tree-sitter
TSPM focuses on the packaging aspect, reducing the operational burden of maintaining a grammar. If TSPM becomes widely adopted by tree-sitter consumers, there may no longer be a need to commit generated files in grammar repositories at all.
TSPM's current focus is to optimize grammar packaging for Helix.
A minimal goal for TSPM is to act as a package registry for grammars' src/
directories. Hosting compiled parser artifacts (.so
and .dll
files) is
probably also within scope, but brings its own challenges (particularly around
sourcing compute for less popular architectures). Packaging for queries
alongside their grammars is also desired, but there are no concrete
implementation plans at the moment. Depending on how TSPM is intended on being
used by tree-sitter consumers, a CLI client for the registry (probably called
tspm
) which downloads, compiles, and cleans grammars may be in scope.
Some goals are out of scope for now:
- semantic versioning of grammars
- grammars tend to make breaking changes very often, so this is actually probably not a good idea
- security guarantees
- this would certainly be nice to have, but ultimately it is difficult to ensure any grammar does not execute arbitrary code - grammars could hide such things in external scanner implementations, and manual review is currently the only tool to protect against such abuses
- the "Native Library, WASM parsers" part of tree-sitter#930 could address this
- package download counts
- I'd be open to this if TSPM becomes well adopted and it's not too expensive to track
Nix is a tool for declarative package management. It is known for its
use in large-scale package registries like nixpkgs
, but is
general enough to be used to write new package registries.
Technically, all packaging currently done in TSPM could be accomplished
through Makefiles or shell scripts. There is some variance between how
tree-sitter grammars are structured in the wild, though. Some need
dependencies from local directories, submodules, or NPM. One in particular
had a grammar.js
written in Typescript instead (i.e. grammar.ts
) until
recently. These variances are in scope for TSPM. Using Nix allows us to
more easily write custom builders and plug in custom options with reasonable
defaults.
Nix also sets up network and file-system sandboxing during builds, which is
necessary when packaging tree-sitter grammars because a grammar.js
may
contain arbitrary code.
Currently, TSPM uses the following infrastructure:
- GitHub Actions - for automations like building grammars and wasm bindings
- GitHub Pages - for hosting the playground
- DigitalOcean Spaces - hosts artifacts and provides edge caching
The current monthly cost of TSPM is estimated at 5 USD (the base cost of a DigitalOcean Space).
The current pricing model for Spaces:
| Storage | Outbound transfer | Additional GB Stored | Additional GB Transferred | USD/month |
| 250GB | 1TB | $0.02/GB | $0.01/GB | $5.00 |
Current space usage: 21MB.
Looking for more docs?
- Use CI runners with more resources to accommodate more grammars
- Allow locking NPM dependencies via the
grammar-lock.json
file - Add an app that writes a JSON index of all packaged grammars
- Write that index to the gh-pages branch and use
jq
to accumulate packages as they are generated
- Write that index to the gh-pages branch and use
- Write a landing page for TSPM which allows one to search through the index
and copy
pkgs.tspm.io
links
TSPM is licensed under the MPL-2.0. See the LICENSE for details.