Thank you for your interest in contributing to Narwhals! Any kind of improvement is welcome!
You can contribute to Narwhals in your local development environment, using python3, git and your editor of choice. You can also contribute to Narwhals using Github Codespaces - a development environment that's hosted in the cloud. This way you can easily start to work from your browser without installing git and cloning the repo. Scroll down for instructions on how to use Codespaces.
Open your terminal and run the following command:
git --version
If the output looks like git version 2.34.1
and you have a personal account on GitHub - you're good to go to the next step.
If the terminal output informs about command not found
you need to install git.
If you're new to GitHub, you'll need to create an account on GitHub.com and verify your email address.
You should also check for existing SSH keys and generate and add a new SSH key if you don't have one already.
Go to the main project page. Fork the repository by clicking on the fork button. You can find it in the right corner on the top of the page.
Go to the forked repository on your GitHub account - you'll find it on your account in the tab Repositories.
Click on the green Code
button and then click the Copy url to clipboard
icon.
Open a terminal, choose the directory where you would like to have Narwhals repository and run the following git command:
git clone <url you just copied>
for example:
git clone [email protected]:YOUR-GITHUB-USERNAME/narwhals.git narwhals-dev
You should then navigate to the folder you just created:
cd narwhals-dev
git remote add upstream [email protected]:narwhals-dev/narwhals.git
git fetch upstream
Check to see the remote has been added with git remote -v
, you should see something like this:
git remote -v
origin [email protected]:YOUR-GITHUB-USERNAME/narwhals.git (fetch)
origin [email protected]:YOUR-GITHUB-USERNAME/narwhals.git (push)
upstream [email protected]:narwhals-dev/narwhals.git (fetch)
upstream [email protected]:narwhals-dev/narwhals.git (push)
where YOUR-GITHUB-USERNAME
will be your GitHub user name.
Here's how you can set up your local development environment to contribute.
If you want to run PySpark-related tests, you'll need to have Java installed. Refer to the Spark documentation for more information.
- Make sure you have Python3.12 installed, create a virtual environment,
and activate it. If you're new to this, here's one way that we recommend:
- Install uv: https://github.com/astral-sh/uv?tab=readme-ov-file#getting-started
or make sure it is up-to-date with:
uv self update
- Install Python3.12:
uv python install 3.12
- Create a virtual environment:
uv venv -p 3.12 --seed
- Activate it. On Linux, this is
. .venv/bin/activate
, on Windows.\.venv\Scripts\activate
.
- Install uv: https://github.com/astral-sh/uv?tab=readme-ov-file#getting-started
or make sure it is up-to-date with:
- Install Narwhals:
uv pip install -e ".[dev, core, docs]"
. This will include fast-ish core libraries. If you also want to test other libraries like Dask , PySpark, and Modin, you can install them too withuv pip install -e ".[dev, core, docs, dask, pyspark, modin]"
. - Install a fork of griffe:
This is hopefully temporary until mkdocstrings/mkdocstrings#716 is addressed.
uv pip install git+https://github.com/MarcoGorelli/griffe.git@no-overloads
You should also install pre-commit:
uv pip install pre-commit
pre-commit install
This will automatically format and lint your code before each commit, and it will block the commit if any issues are found.
- Make sure you have Python 3.8+ installed. If you don't, you can check install Python to learn how. Then, create and activate a virtual environment.
- Then, follow steps 2-4 from above but using
pip install
instead ofuv pip install
.
Create a new git branch from the main
branch in your local repository.
Note that your work cannot be merged if the test below fail.
If you add code that should be tested, please add tests.
- To run tests, run
pytest
. To check coverage:pytest --cov=narwhals
- To run tests on the doctests, use
pytest narwhals --doctest-modules
- To run unit tests and doctests at the same time, run
pytest tests narwhals --cov=narwhals --doctest-modules
- To run tests multiprocessed, you may also want to use pytest-xdist (optional)
- To choose which backends to run tests with you, you can use the
--constructors
flag:- To only run tests for pandas, Polars, and PyArrow, use
pytest --constructors=pandas,pyarrow,polars
- To run tests for all CPU constructors, use
pytest --all-cpu-constructors
- By default, tests run for pandas, pandas (PyArrow dtypes), PyArrow, and Polars.
- To run tests using
cudf.pandas
, runNARWHALS_DEFAULT_CONSTRUCTORS=pandas python -m cudf.pandas -m pytest
- To run tests using
polars[gpu]
, runNARWHALS_POLARS_GPU=1 pytest --constructors=polars[lazy]
- To only run tests for pandas, Polars, and PyArrow, use
If you want to have less surprises when opening a PR, you can take advantage of nox to run the entire CI/CD test suite locally in your operating system.
To do so, you will first need to install nox and then run the nox
command in the root of the repository:
python -m pip install nox # python -m pip install "nox[uv]"
nox
Notice that nox will also require to have all the python versions that are defined in the noxfile.py
installed in your system.
We use Hypothesis to generate some random tests, to check for robustness.
To keep local test suite times down, not all of these run by default - you can
run them by passing the --runslow
flag to PyTest.
To keep local development test times down, Dask and Modin are excluded from dev dependencies, and their tests only run in CI. If you install them with
uv pip install -U dask[dataframe] modin
then their tests will run too.
We can't currently test in CI against cuDF, but you can test it manually in Kaggle using GPUs. Please follow this Kaggle notebook to run the tests.
To build the docs, run mkdocs serve
, and then open the link provided in a browser.
The docs should refresh when you make changes. If they don't, press ctrl+C
, and then
do mkdocs build
and then mkdocs serve
.
When you have resolved your issue, open a pull request in the Narwhals repository.
Please adhere to the following guidelines:
- Start your pull request title with a conventional commit tag. This helps us add your contribution to the right section of the changelog. We use "Type" from the Angular convention.
TLDR: The PR title should start with any of these abbreviations:build
,chore
,ci
,depr
,docs
,feat
,fix
,perf
,refactor
,release
,test
. Add a!
at the end, if it is a breaking change. For examplerefactor!
. - This text will end up in the changelog.
- Please follow the instructions in the pull request form and submit.
Codespaces is a great way to work on Narwhals without the need of configuring your local development environment. Every GitHub.com user has a monthly quota of free use of GitHub Codespaces, and you can start working in a codespace without providing any payment details. You'll be informed per email if you'll be close to using 100% of included services. To learn more about it visit GitHub Docs
If you're new to GitHub, you'll need to create an account on GitHub.com and verify your email address.
Go to the main project page. Fork the repository by clicking on the fork button. You can find it in the right corner on the top of the page.
Go to the forked repository on your GitHub account - you'll find it on your account in the tab Repositories.
Click on the green Code
button and navigate to the Codespaces
tab.
Click on the green button Create codespace on main
- it will open a browser version of VSCode,
with the complete repository and git installed.
You can now proceed with the steps 4. Setting up your environment up to 8. Pull request
listed above in Working with local development environment.
If Narwhals looks like underwater unicorn magic to you, then please read how it works.
In Narwhals, we are very particular about imports. When it comes to importing heavy third-party libraries (pandas, NumPy, Polars, etc...) please follow these rules:
- Never import anything to do
isinstance
checks. Instead, just use the functions innarwhals.dependencies
(such asis_pandas_dataframe
); - If you need to import anything, do it in a place where you know that the import
is definitely available. For example, NumPy is a required dependency of PyArrow,
so it's OK to import NumPy to implement a PyArrow function - however, NumPy
should never be imported to implement a Polars function. The only exception is
for when there's simply no way around it by definition - for example,
Series.to_numpy
always requires NumPy to be installed. - Don't place a third-party import at the top of a file. Instead, place it in the function where it's used, so that we minimise the chances of it being imported unnecessarily.
We're trying to be really lightweight and minimal-overhead, and unnecessary imports can slow things down.
Please remember to abide by the code of conduct, else you'll be conducted away from this project.
We have a community call every 2 weeks, all welcome to attend.