Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add devcontainer environment suitable for development #1071

Merged
merged 13 commits into from
Feb 7, 2023
Merged
12 changes: 12 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Note: You can use any Debian/Ubuntu based image you want.
FROM mcr.microsoft.com/devcontainers/python:3.7-bullseye

RUN \
apt update && \
apt-get install bash-completion graphviz default-mysql-client -y && \
pip install flake8 black faker ipykernel nose nose-cov datajoint && \
pip uninstall datajoint -y

ENV DJ_HOST fakeservices.datajoint.io
ENV DJ_USER root
ENV DJ_PASS simple
27 changes: 27 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
{
"name": "Development",
"dockerComposeFile": "docker-compose.yaml",
"service": "app",
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
// Use this environment variable if you need to bind mount your local source code into a new container.
"remoteEnv": {
"LOCAL_WORKSPACE_FOLDER": "${localWorkspaceFolder}"
},
// https://containers.dev/features
"features": {
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/devcontainers/features/git:1": {},
"ghcr.io/eitsupi/devcontainer-features/jq-likes:1": {},
"ghcr.io/guiyomh/features/vim:0": {}
},
"onCreateCommand": "pip install -e .",
"postStartCommand": "MYSQL_VER=5.7 MINIO_VER=RELEASE.2022-08-11T04-37-28Z docker compose -f local-docker-compose.yml up --build -d",
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
}
10 changes: 10 additions & 0 deletions .devcontainer/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: "3"
services:
app:
build: .
extra_hosts:
- fakeservices.datajoint.io:127.0.0.1
volumes:
- ../..:/workspaces:cached
entrypoint: /usr/local/share/docker-init.sh
command: tail -f /dev/null
17 changes: 9 additions & 8 deletions .github/workflows/development.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@ name: Development
on:
push:
branches:
- '**' # every branch
- '!gh-pages' # exclude gh-pages branch
- '!stage*' # exclude branches beginning with stage
- "**" # every branch
- "!gh-pages" # exclude gh-pages branch
- "!stage*" # exclude branches beginning with stage
tags:
- '\d+\.\d+\.\d+' # only semver tags
pull_request:
branches:
- '**' # every branch
- '!gh-pages' # exclude gh-pages branch
- '!stage*' # exclude branches beginning with stage
- "**" # every branch
- "!gh-pages" # exclude gh-pages branch
- "!stage*" # exclude branches beginning with stage
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
include:
- py_ver: '3.9'
- py_ver: "3.9"
distro: debian
image: djbase
env:
Expand Down Expand Up @@ -77,6 +77,7 @@ jobs:
- name: Run primary tests
env:
PY_VER: ${{matrix.py_ver}}
DJ_PASS: simple
MYSQL_VER: ${{matrix.mysql_ver}}
DISTRO: alpine
MINIO_VER: RELEASE.2021-09-03T03-56-13Z
Expand Down Expand Up @@ -119,7 +120,7 @@ jobs:
strategy:
matrix:
include:
- py_ver: '3.9'
- py_ver: "3.9"
distro: debian
image: djbase
env:
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,12 @@ build/
*.env
docker-compose.yml
notebook
.vscode
__main__.py
jupyter_custom.js
.eggs
*.code-workspace
docs/site


!.vscode/settings.json
!.devcontainer/devcontainer.json
11 changes: 11 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"editor.formatOnPaste": false,
"editor.formatOnSave": true,
"editor.rulers": [
94
],
"python.formatting.provider": "black",
"[python]": {
"editor.defaultFormatter": null
}
}
416 changes: 209 additions & 207 deletions CHANGELOG.md

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions LNX-docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# docker compose -f LNX-docker-compose.yml --env-file LNX.env up --exit-code-from app --build
version: '2.4'
# PY_VER=3.8 MYSQL_VER=5.7 DISTRO=alpine MINIO_VER=RELEASE.2022-08-11T04-37-28Z HOST_UID=$(id -u) docker compose -f LNX-docker-compose.yml up --exit-code-from app --build
version: "2.4"
x-net: &net
networks:
- main
- main
services:
db:
<<: *net
image: datajoint/mysql:${MYSQL_VER}
environment:
- MYSQL_ROOT_PASSWORD=simple
- MYSQL_ROOT_PASSWORD=${DJ_PASS}
# ports:
# - "3306:3306"
# volumes:
Expand All @@ -34,12 +34,12 @@ services:
<<: *net
image: datajoint/nginx:v0.2.4
environment:
- ADD_db_TYPE=DATABASE
- ADD_db_ENDPOINT=db:3306
- ADD_minio_TYPE=MINIO
- ADD_minio_ENDPOINT=minio:9000
- ADD_minio_PORT=80 # allow unencrypted connections
- ADD_minio_PREFIX=/datajoint
- ADD_db_TYPE=DATABASE
- ADD_db_ENDPOINT=db:3306
- ADD_minio_TYPE=MINIO
- ADD_minio_ENDPOINT=minio:9000
- ADD_minio_PORT=80 # allow unencrypted connections
- ADD_minio_PREFIX=/datajoint
# ports:
# - "80:80"
# - "443:443"
Expand All @@ -58,7 +58,7 @@ services:
environment:
- DJ_HOST=fakeservices.datajoint.io
- DJ_USER=root
- DJ_PASS=simple
- DJ_PASS
- DJ_TEST_HOST=fakeservices.datajoint.io
- DJ_TEST_USER=datajoint
- DJ_TEST_PASSWORD=datajoint
Expand Down
161 changes: 14 additions & 147 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,161 +6,28 @@
[![Slack](https://img.shields.io/badge/slack-chat-green.svg)](https://datajoint.slack.com/)

# Welcome to DataJoint for Python!

DataJoint for Python is a framework for scientific workflow management based on relational principles. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, computing, and querying data.

DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab at Baylor College of Medicine for the distributed processing and management of large volumes of data streaming from regular experiments. Starting in 2011, DataJoint has been available as an open-source project adopted by other labs and improved through contributions from several developers.
Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com). Related resources are listed at https://datajoint.org.

## Installation
```
pip3 install datajoint
```
Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com).

If you already have an older version of DataJoint installed using `pip`, upgrade with
```bash
pip3 install --upgrade datajoint
```
- [Getting Started](https://datajoint.com/docs/core/datajoint-python/latest/getting-started/)
- [DataJoint Elements](https://datajoint.com/docs/elements/) - Catalog of example pipelines
- [DataJoint CodeBook](https://codebook.datajoint.io) - Interactive online tutorials
- Contribute

## Documentation and Tutorials
- [Development Environment](https://datajoint.com/docs/core/datajoint-python/latest/develop/)
- [Guidelines](https://datajoint.com/docs/community/contribute/)

* https://datajoint.org -- start page
* https://docs.datajoint.org -- up-to-date documentation
* https://tutorials.datajoint.io -- step-by-step tutorials
* https://elements.datajoint.org -- catalog of example pipelines
* https://codebook.datajoint.io -- interactive online tutorials
- Legacy Resources (To be replaced by above)
- [Documentation](https://docs.datajoint.org)
- [Tutorials](https://tutorials.datajoint.org)

## Citation
+ If your work uses DataJoint for Python, please cite the following Research Resource Identifier (RRID) and manuscript.

+ DataJoint ([RRID:SCR_014543](https://scicrunch.org/resolver/SCR_014543)) - DataJoint for Python (version `<Enter version number>`)

+ Yatsenko D, Reimer J, Ecker AS, Walker EY, Sinz F, Berens P, Hoenselaar A, Cotton RJ, Siapas AS, Tolias AS. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015 Jan 1:031658. doi: https://doi.org/10.1101/031658

## Python Native Blobs
<details>
<summary>Click to expand details</summary>

DataJoint 0.12 adds full support for all native python data types in blobs: tuples, lists, sets, dicts, strings, bytes, `None`, and all their recursive combinations.
The new blobs are a superset of the old functionality and are fully backward compatible.
In previous versions, only MATLAB-style numerical arrays were fully supported.
Some Python datatypes such as dicts were coerced into numpy recarrays and then fetched as such.

However, since some Python types were coerced into MATLAB types, old blobs and new blobs may now be fetched as different types of objects even if they were inserted the same way.
For example, new `dict` objects will be returned as `dict` while the same types of objects inserted with `datajoint 0.11` will be recarrays.

Since this is a big change, we chose to temporarily disable this feature by default in DataJoint for Python 0.12.x, allowing users to adjust their code if necessary.
From 13.x, the flag will default to True (on), and will ultimately be removed when corresponding decode support for the new format is added to datajoint-matlab (see: datajoint-matlab #222, datajoint-python #765).

The flag is configured by setting the `enable_python_native_blobs` flag in `dj.config`.

```python
import datajoint as dj
dj.config["enable_python_native_blobs"] = True
```

You can safely enable this setting if both of the following are true:

* The only kinds of blobs your pipeline have inserted previously were numerical arrays.
* You do not need to share blob data between Python and MATLAB.

Otherwise, read the following explanation.

DataJoint v0.12 expands DataJoint's blob serialization mechanism with
improved support for complex native python datatypes, such as dictionaries
and lists of strings.

Prior to DataJoint v0.12, certain python native datatypes such as
dictionaries were 'squashed' into numpy structured arrays when saved into
blob attributes. This facilitated easier data sharing between MATLAB
and Python for certain record types. However, this created a discrepancy
between insert and fetch datatypes which could cause problems in other
portions of users pipelines.

DataJoint v0.12, removes the squashing behavior, instead encoding native python datatypes in blobs directly.
However, this change creates a compatibility problem for pipelines
which previously relied on the type squashing behavior since records
saved via the old squashing format will continue to fetch
as structured arrays, whereas new record inserted in DataJoint 0.12 with
`enable_python_native_blobs` would result in records returned as the
appropriate native python type (dict, etc).
Furthermore, DataJoint for MATLAB does not yet support unpacking native Python datatypes.

With `dj.config["enable_python_native_blobs"]` set to `False`,
any attempt to insert any datatype other than a numpy array will result in an exception.
This is meant to get users to read this message in order to allow proper testing
and migration of pre-0.12 pipelines to 0.12 in a safe manner.

The exact process to update a specific pipeline will vary depending on
the situation, but generally the following strategies may apply:

* Altering code to directly store numpy structured arrays or plain
multidimensional arrays. This strategy is likely best one for those
tables requiring compatibility with MATLAB.
* Adjust code to deal with both structured array and native fetched data
for those tables that are populated with `dict`s in blobs in pre-0.12 version.
In this case, insert logic is not adjusted, but downstream consumers
are adjusted to handle records saved under the old and new schemes.
* Migrate data into a fresh schema, fetching the old data, converting blobs to
a uniform data type and re-inserting.
* Drop/Recompute imported/computed tables to ensure they are in the new
format.

As always, be sure that your data is safely backed up before modifying any
important DataJoint schema or records.

</details>

### API docs

The API documentation can be built with mkdocs using the docker compose file in
`docs/` with the following command:

``` bash
MODE="LIVE" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
```

The site will then be available at `http://localhost/`. When finished, be sure to run
the same command as above, but replace `up --build` with `down`.

## Running Tests Locally
<details>
<summary>Click to expand details</summary>

* Create an `.env` with desired development environment values e.g.
``` sh
PY_VER=3.9
MYSQL_VER=5.7
DISTRO=alpine
MINIO_VER=RELEASE.2022-01-03T18-22-58Z
HOST_UID=1000
```
* `cp local-docker-compose.yml docker-compose.yml`
* `docker-compose up -d` (Note configured `JUPYTER_PASSWORD`)
* Select a means of running Tests e.g. Docker Terminal, or Local Terminal (see bottom)
* Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`
* Run desired tests. Some examples are as follows:

| Use Case | Shell Code |
| ---------------------------- | ------------------------------------------------------------------------------ |
| Run all tests | `nosetests -vsw tests --with-coverage --cover-package=datajoint` |
| Run one specific class test | `nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1` |
| Run one specific basic test | `nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch` |


### Launch Docker Terminal
* Shell into `datajoint-python_app_1` i.e. `docker exec -it datajoint-python_app_1 sh`


### Launch Local Terminal
* See `datajoint-python_app` environment variables in `local-docker-compose.yml`
* Launch local terminal
* `export` environment variables in shell
* Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`

- If your work uses DataJoint for Python, please cite the following Research Resource Identifier (RRID) and manuscript.

### Launch Jupyter Notebook for Interactive Use
* Navigate to `localhost:8888`
* Input Jupyter password
* Launch a notebook i.e. `New > Python 3`
- DataJoint ([RRID:SCR_014543](https://scicrunch.org/resolver/SCR_014543)) - DataJoint for Python (version `<Enter version number>`)

</details>
- Yatsenko D, Reimer J, Ecker AS, Walker EY, Sinz F, Berens P, Hoenselaar A, Cotton RJ, Siapas AS, Tolias AS. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015 Jan 1:031658. doi: https://doi.org/10.1101/031658
4 changes: 1 addition & 3 deletions docs/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# MODE="LIVE|QA|BUILD" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
#
# navigate to http://localhost/
version: "2.4"
services:
docs:
Expand All @@ -18,7 +16,7 @@ services:
- ..:/main
user: ${HOST_UID}:anaconda
ports:
- 80:80
- 8080:80
command:
- sh
- -c
Expand Down
Loading