Skip to content

Commit

Permalink
build(deps): avoid version conflicts (#636)
Browse files Browse the repository at this point in the history
Addresses #631.

* Uses constraints to keep dependency versions more consistent.
* Moves all dependencies to .in files which are then ingested by setup.py.
* Adds script to check consistency of all extras.
* Adds consistency check to CI.

I should note that while it shouldn't be possible to cause a conflict between base.txt and any of the extras (because base.txt constrains all the extras) it is possible to get a conflict between two of the extras files. There are ways of trying to avoid that (like constraining each file by all the files that have already been processed before it in the order given in the make pip-compile target) but the ones I could think of seemed a little overwrought, and come with problems of their own. If a conflict arises, it should be flagged by CI or locally with make check-deps. When/if that happens, you can resolve the conflict by adding appropriate global constraints in requirements/constraints.txt.

Also note that if fileA.in is constrained by fileB.txt, then fileB.in should be compiled before fileA.in in the make pip-compile target. Otherwise fileA.in will be compiled with the old version of fileB.txt which can cause conflicts or keep dependencies from being updated properly.
  • Loading branch information
qued authored May 24, 2023
1 parent a1fed6d commit c82bad1
Show file tree
Hide file tree
Showing 39 changed files with 557 additions and 2,108 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,36 @@ jobs:
source .venv/bin/activate
make install-ci
check-deps:
strategy:
matrix:
python-version: ["3.8","3.9","3.10"]
runs-on: ubuntu-latest
needs: setup
steps:
- uses: actions/checkout@v3
- uses: actions/cache@v3
id: virtualenv-cache
with:
path: .venv
key: unstructured-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('requirements/*.txt') }}
# NOTE(robinson) - This is a fallback in case the lint job does not find the cache.
# We can take this out when we implement the fix in CORE-99
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Setup virtual environment (no cache hit)
if: steps.virtualenv-cache.outputs.cache-hit != 'true'
run: |
python${{ matrix.python-version }} -m venv .venv
source .venv/bin/activate
make install-base-pip-packages
- name: Check for dependency conflicts
run: |
source .venv/bin/activate
make check-deps
lint:
strategy:
matrix:
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## 0.6.9-dev2
## 0.6.9

### Enhancements

* fast strategy for pdf now keeps element bounding box data
* setup.py refactor

### Features

Expand Down
12 changes: 12 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
include requirements/base.in
include requirements/huggingface.in
include requirements/local-inference.in
include requirements/ingest-s3.in
include requirements/ingest-azure.in
include requirements/ingest-discord.in
include requirements/ingest-github.in
include requirements/ingest-gitlab.in
include requirements/ingest-reddit.in
include requirements/ingest-slack.in
include requirements/ingest-wikipedia.in
include requirements/ingest-google-drive.in
31 changes: 18 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -108,28 +108,28 @@ install-local-inference: install install-unstructured-inference install-detectro
## pip-compile: compiles all base/dev/test requirements
.PHONY: pip-compile
pip-compile:
pip-compile --upgrade -o requirements/base.txt
pip-compile --upgrade requirements/base.in
# Extra requirements for huggingface staging functions
pip-compile --upgrade --extra huggingface -o requirements/huggingface.txt
pip-compile --upgrade requirements/huggingface.in
# NOTE(robinson) - We want the dependencies for detectron2 in the requirements.txt, but not
# the detectron2 repo itself. If detectron2 is in the requirements.txt file, an order of
# operations issue related to the torch library causes the install to fail
pip-compile --upgrade requirements/dev.in
pip-compile --upgrade requirements/test.in
pip-compile --upgrade requirements/dev.in
pip-compile --upgrade requirements/build.in
pip-compile --upgrade --extra local-inference -o requirements/local-inference.txt
pip-compile --upgrade requirements/local-inference.in
# NOTE(robinson) - doc/requirements.txt is where the GitHub action for building
# sphinx docs looks for additional requirements
cp requirements/build.txt docs/requirements.txt
pip-compile --upgrade --extra=s3 --output-file=requirements/ingest-s3.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=azure --output-file=requirements/ingest-azure.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=discord --output-file=requirements/ingest-azure.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=reddit --output-file=requirements/ingest-reddit.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=github --output-file=requirements/ingest-github.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=gitlab --output-file=requirements/ingest-gitlab.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=slack --output-file=requirements/ingest-slack.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=wikipedia --output-file=requirements/ingest-wikipedia.txt requirements/base.txt setup.py
pip-compile --upgrade --extra=google-drive --output-file=requirements/ingest-google-drive.txt requirements/base.txt setup.py
pip-compile --upgrade requirements/ingest-s3.in
pip-compile --upgrade requirements/ingest-azure.in
pip-compile --upgrade requirements/ingest-discord.in
pip-compile --upgrade requirements/ingest-reddit.in
pip-compile --upgrade requirements/ingest-github.in
pip-compile --upgrade requirements/ingest-gitlab.in
pip-compile --upgrade requirements/ingest-slack.in
pip-compile --upgrade requirements/ingest-wikipedia.in
pip-compile --upgrade requirements/ingest-google-drive.in

## install-project-local: install unstructured into your local python environment
.PHONY: install-project-local
Expand Down Expand Up @@ -198,6 +198,11 @@ version-sync:
check-coverage:
coverage report --fail-under=95

## check-deps: check consistency of dependencies
.PHONY: check-deps
check-deps:
scripts/consistent-deps.sh

##########
# Docker #
##########
Expand Down
6 changes: 3 additions & 3 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ babel==2.12.1
# via sphinx
beautifulsoup4==4.12.2
# via furo
certifi==2022.12.7
certifi==2023.5.7
# via
# -r requirements/build.in
# requests
Expand All @@ -20,7 +20,7 @@ docutils==0.18.1
# via
# sphinx
# sphinx-rtd-theme
furo==2023.3.27
furo==2023.5.20
# via -r requirements/build.in
idna==3.4
# via requests
Expand All @@ -40,7 +40,7 @@ pygments==2.15.1
# sphinx
pytz==2023.3
# via babel
requests==2.30.0
requests==2.31.0
# via sphinx
snowballstemmer==2.2.0
# via sphinx
Expand Down
16 changes: 16 additions & 0 deletions requirements/base.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
-c "constraints.in"
argilla
chardet
lxml
msg_parser
nltk
openpyxl
pandas
pdfminer.six
pillow
pypandoc
python-docx
python-pptx
python-magic
markdown
requests
57 changes: 32 additions & 25 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile --output-file=requirements/base.txt
# pip-compile requirements/base.in
#
anyio==3.6.2
# via httpcore
argilla==1.6.0
# via unstructured (setup.py)
argilla==1.7.0
# via -r requirements/base.in
backoff==2.2.1
# via argilla
certifi==2022.12.7
certifi==2023.5.7
# via
# -c requirements/constraints.in
# httpcore
# httpx
# requests
# unstructured (setup.py)
cffi==1.15.1
# via cryptography
chardet==5.1.0
Expand All @@ -25,7 +25,9 @@ charset-normalizer==3.1.0
# pdfminer-six
# requests
click==8.1.3
# via nltk
# via
# nltk
# typer
commonmark==0.9.1
# via rich
cryptography==40.0.2
Expand All @@ -51,59 +53,59 @@ joblib==1.2.0
# via nltk
lxml==4.9.2
# via
# -r requirements/base.in
# python-docx
# python-pptx
# unstructured (setup.py)
markdown==3.4.3
# via unstructured (setup.py)
# via -r requirements/base.in
monotonic==1.6
# via argilla
msg-parser==1.2.0
# via unstructured (setup.py)
# via -r requirements/base.in
nltk==3.8.1
# via unstructured (setup.py)
# via -r requirements/base.in
numpy==1.23.5
# via
# argilla
# pandas
olefile==0.46
# via msg-parser
openpyxl==3.1.2
# via unstructured (setup.py)
# via -r requirements/base.in
packaging==23.1
# via argilla
pandas==1.5.3
# via
# -r requirements/base.in
# argilla
# unstructured (setup.py)
pdfminer-six==20221105
# via unstructured (setup.py)
# via -r requirements/base.in
pillow==9.5.0
# via
# -r requirements/base.in
# python-pptx
# unstructured (setup.py)
pycparser==2.21
# via cffi
pydantic==1.10.7
pydantic==1.10.8
# via argilla
pygments==2.15.1
# via rich
pypandoc==1.11
# via unstructured (setup.py)
# via -r requirements/base.in
python-dateutil==2.8.2
# via pandas
python-docx==0.8.11
# via unstructured (setup.py)
# via -r requirements/base.in
python-magic==0.4.27
# via unstructured (setup.py)
# via -r requirements/base.in
python-pptx==0.6.21
# via unstructured (setup.py)
# via -r requirements/base.in
pytz==2023.3
# via pandas
regex==2023.5.5
# via nltk
requests==2.30.0
# via unstructured (setup.py)
requests==2.31.0
# via -r requirements/base.in
rfc3986[idna2008]==1.5.0
# via httpx
rich==13.0.1
Expand All @@ -119,17 +121,22 @@ tqdm==4.65.0
# via
# argilla
# nltk
typing-extensions==4.5.0
typer==0.9.0
# via argilla
typing-extensions==4.6.0
# via
# pydantic
# rich
urllib3==2.0.2
# via requests
# typer
urllib3==1.26.16
# via
# -c requirements/constraints.in
# requests
wrapt==1.14.1
# via
# argilla
# deprecated
xlsxwriter==3.1.0
xlsxwriter==3.1.1
# via python-pptx
zipp==3.15.0
# via importlib-metadata
6 changes: 3 additions & 3 deletions requirements/build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ babel==2.12.1
# via sphinx
beautifulsoup4==4.12.2
# via furo
certifi==2022.12.7
certifi==2023.5.7
# via
# -r requirements/build.in
# requests
Expand All @@ -20,7 +20,7 @@ docutils==0.18.1
# via
# sphinx
# sphinx-rtd-theme
furo==2023.3.27
furo==2023.5.20
# via -r requirements/build.in
idna==3.4
# via requests
Expand All @@ -40,7 +40,7 @@ pygments==2.15.1
# sphinx
pytz==2023.3
# via babel
requests==2.30.0
requests==2.31.0
# via sphinx
snowballstemmer==2.2.0
# via sphinx
Expand Down
2 changes: 1 addition & 1 deletion requirements/cache.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a
# a
15 changes: 15 additions & 0 deletions requirements/constraints.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
####################################################################################################
# This file can house global constraints that aren't *direct* requirements of the package or any
# extras. Putting a dependency here will only affect dependency sets that contain them -- in other
# words, if something does not require a constraint, it will not be installed.
####################################################################################################
# NOTE(alan): Pinning to avoid conflicts with downstream ingest-s3
urllib3<1.27, >=1.25.4
# consistency with local-inference-pin
protobuf<3.21
# NOTE(robinson) - Required pins for security scans
jupyter-core>=4.11.2
wheel>=0.38.1
# NOTE(robinson) - The following pins are to address
# vulnerabilities in dependency scans
certifi>=2022.12.07
6 changes: 3 additions & 3 deletions requirements/dev.in
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
-c constraints.in
-c base.txt
-c test.txt
jupyter
ipython
pip-tools
pre-commit
# NOTE(robinson) - Required pins for security scans
jupyter-core>=4.11.2
wheel>=0.38.1
Loading

0 comments on commit c82bad1

Please sign in to comment.