Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for insert CSV #1067

Merged
merged 5 commits into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### 0.14.0 -- TBA
* Bugfix - Activating a schema requires all tables to exist even if `create_tables=False` PR [#1058](https://github.com/datajoint/datajoint-python/pull/1058)
* Update - Populate call with `reserve_jobs=True` to exclude `error` and `ignore` keys - PR [#1062](https://github.com/datajoint/datajoint-python/pull/1062)
* Add - Support for inserting data with CSV files - PR [#1067](https://github.com/datajoint/datajoint-python/pull/1067)

### 0.13.8 -- Sep 21, 2022
* Add - New documentation structure based on markdown PR [#1052](https://github.com/datajoint/datajoint-python/pull/1052)
Expand Down
2 changes: 1 addition & 1 deletion LNX-docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ services:
interval: 1s
fakeservices.datajoint.io:
<<: *net
image: datajoint/nginx:v0.2.3
image: datajoint/nginx:v0.2.4
environment:
- ADD_db_TYPE=DATABASE
- ADD_db_ENDPOINT=db:3306
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,15 @@ important DataJoint schema or records.

### API docs

The API documentation can be built using sphinx by running
The API documentation can be built with mkdocs using the docker compose file in
`docs/` with the following command:

``` bash
pip install sphinx sphinx_rtd_theme
(cd docs-api/sphinx && make html)
MODE="LIVE" PACKAGE=datajoint UPSTREAM_REPO=https://github.com/datajoint/datajoint-python.git HOST_UID=$(id -u) docker compose -f docs/docker-compose.yaml up --build
```

Generated docs are written to `docs-api/docs/html/index.html`.
More details in [docs-api/README.md](docs-api/README.md).
The site will then be available at `http://localhost/`. When finished, be sure to run
the same command as above, but replace `up --build` with `down`.

## Running Tests Locally
<details>
Expand All @@ -141,11 +141,11 @@ HOST_GID=1000
* Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`
* Run desired tests. Some examples are as follows:

| Use Case | Shell Code |
| ---------------------------- | ------------------------------------------------------------------------------ |
| Run all tests | `nosetests -vsw tests --with-coverage --cover-package=datajoint` |
| Run one specific class test | `nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1` |
| Run one specific basic test | `nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch` |
| Use Case | Shell Code |
| ---------------------------- | ------------------------------------------------------------------------------ |
| Run all tests | `nosetests -vsw tests --with-coverage --cover-package=datajoint` |
| Run one specific class test | `nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1` |
| Run one specific basic test | `nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch` |


### Launch Docker Terminal
Expand Down
16 changes: 12 additions & 4 deletions datajoint/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import pandas
import logging
import uuid
import csv
import re
from pathlib import Path
from .settings import config
Expand Down Expand Up @@ -345,13 +346,16 @@ def insert(
"""
Insert a collection of rows.

:param rows: An iterable where an element is a numpy record, a dict-like object, a
pandas.DataFrame, a sequence, or a query expression with the same heading as self.
:param rows: Either (a) an iterable where an element is a numpy record, a
dict-like object, a pandas.DataFrame, a sequence, or a query expression with
the same heading as self, or (b) a pathlib.Path object specifying a path
relative to the current directory with a CSV file, the contents of which
will be inserted.
:param replace: If True, replaces the existing tuple.
:param skip_duplicates: If True, silently skip duplicate inserts.
:param ignore_extra_fields: If False, fields that are not in the heading raise error.
:param allow_direct_insert: applies only in auto-populated tables. If False (default),
insert are allowed only from inside the make callback.
:param allow_direct_insert: Only applies in auto-populated tables. If False (default),
insert may only be called from inside the make callback.

Example:

Expand All @@ -366,6 +370,10 @@ def insert(
drop=len(rows.index.names) == 1 and not rows.index.names[0]
).to_records(index=False)

if isinstance(rows, Path):
with open(rows, newline="") as data_file:
rows = list(csv.DictReader(data_file, delimiter=","))
CBroz1 marked this conversation as resolved.
Show resolved Hide resolved

# prohibit direct inserts into auto-populated tables
if not allow_direct_insert and not getattr(self, "_allow_insert", True):
raise DataJointError(
Expand Down
85 changes: 74 additions & 11 deletions docs/src/query-lang/common-commands.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,70 @@

<!-- ## Insert is present in the general docs here-->

## Insert

Data entry is as easy as providing the appropriate data structure to a permitted table.
Given the following table definition, we can insert data as tuples, dicts, pandas
dataframes, or pathlib `Path` relative paths to local CSV files.

```text
mouse_id: int # unique mouse id
---
dob: date # mouse date of birth
sex: enum('M', 'F', 'U') # sex of mouse - Male, Female, or Unknown
```

=== "Tuple"

```python
mouse.insert1( (0, '2017-03-01', 'M') ) # Single entry
data = [
(1, '2016-11-19', 'M'),
(2, '2016-11-20', 'U'),
(5, '2016-12-25', 'F')
]
mouse.insert(data) # Multi-entry
```

=== "Dict"

```python
mouse.insert1( dict(mouse_id=0, dob='2017-03-01', sex='M') ) # Single entry
data = [
{'mouse_id':1, 'dob':'2016-11-19', 'sex':'M'},
{'mouse_id':2, 'dob':'2016-11-20', 'sex':'U'},
{'mouse_id':5, 'dob':'2016-12-25', 'sex':'F'}
]
mouse.insert(data) # Multi-entry
```

=== "Pandas"

```python
import pandas as pd
data = pd.DataFrame(
[[1, "2016-11-19", "M"], [2, "2016-11-20", "U"], [5, "2016-12-25", "F"]],
columns=["mouse_id", "dob", "sex"],
)
mouse.insert(data)
```

=== "CSV"

Given the following CSV in the current working directory as `mice.csv`

```console
mouse_id,dob,sex
1,2016-11-19,M
2,2016-11-20,U
5,2016-12-25,F
```

We can import as follows:

```python
from pathlib import Path
mouse.insert(Path('./mice.csv'))
```

## Make

See the article on [`make` methods](../../reproduce/make-method/)
Expand Down Expand Up @@ -31,8 +95,8 @@ data = query.fetch(as_dict=True) # (2)
### Separate variables

``` python
name, img = query.fetch1('name', 'image') # when query has exactly one entity
name, img = query.fetch('name', 'image') # [name, ...] [image, ...]
name, img = query.fetch1('mouse_id', 'dob') # when query has exactly one entity
name, img = query.fetch('mouse_id', 'dob') # [mouse_id, ...] [dob, ...]
```

### Primary key values
Expand All @@ -51,19 +115,18 @@ primary keys.
To sort the result, use the `order_by` keyword argument.

``` python
data = query.fetch(order_by='name') # ascending order
data = query.fetch(order_by='name desc') # descending order
data = query.fetch(order_by=('name desc', 'year')) # by name first, year second
data = query.fetch(order_by='KEY') # sort by the primary key
data = query.fetch(order_by=('name', 'KEY desc')) # sort by name but for same names order by primary key
data = query.fetch(order_by='mouse_id') # ascending order
data = query.fetch(order_by='mouse_id desc') # descending order
data = query.fetch(order_by=('mouse_id', 'dob')) # by ID first, dob second
data = query.fetch(order_by='KEY') # sort by the primary key
```

The `order_by` argument can be a string specifying the attribute to sort by. By default
the sort is in ascending order. Use `'attr desc'` to sort in descending order by
attribute `attr`. The value can also be a sequence of strings, in which case, the sort
performed on all the attributes jointly in the order specified.

The special attribute name `'KEY'` represents the primary key attributes in order that
The special attribute named `'KEY'` represents the primary key attributes in order that
they appear in the index. Otherwise, this name can be used as any other argument.

If an attribute happens to be a SQL reserved word, it needs to be enclosed in
Expand All @@ -82,7 +145,7 @@ Similar to sorting, the `limit` and `offset` arguments can be used to limit the
to a subset of entities.

``` python
data = query.fetch(order_by='name', limit=10, offset=5)
data = query.fetch(order_by='mouse_id', limit=10, offset=5)
```

Note that an `offset` cannot be used without specifying a `limit` as
Expand Down
2 changes: 1 addition & 1 deletion local-docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ services:
interval: 1s
fakeservices.datajoint.io:
<<: *net
image: datajoint/nginx:v0.2.3
image: datajoint/nginx:v0.2.4
environment:
- ADD_db_TYPE=DATABASE
- ADD_db_ENDPOINT=db:3306
Expand Down
6 changes: 2 additions & 4 deletions tests/test_university.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,9 @@ def test_activate():
Enroll,
Grade,
):
import csv
from pathlib import Path

with open("./data/" + table.__name__ + ".csv") as f:
reader = csv.DictReader(f)
table().insert(reader)
table().insert(Path("./data/" + table.__name__ + ".csv"))


def test_fill():
Expand Down