Skip to content

Commit

Permalink
A few refactorizations and transfer-learning notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
blythed committed Jul 26, 2023
1 parent f06e441 commit e7e672d
Show file tree
Hide file tree
Showing 23 changed files with 417 additions and 317 deletions.
2 changes: 1 addition & 1 deletion docs/infrastructure/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ leading to smoother and more robust productionization:
single_host_cluster
architecture
jobs
change_data_capture
client_server
distributed_cluster
deep_dive_on_jobs
```
38 changes: 16 additions & 22 deletions docs/infrastructure/jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,31 @@
## Scheduling of training and model outputs

In order to most efficiently marshall computational resources,
SuperDuperDB may be configured to run in asynchronous mode `{"remote": True}`.
SuperDuperDB may be configured to run in asynchronous mode `{"distributed": True}`.
The simplesst way to set a distributed SuperDuperDB deployment is using a [single-host cluster](singlehost). See [the section on configuration](configuration) for details in setting up SuperDuperDB.

See the section here for an overview
of what this means from an infrastructural point of view.

There are several key functionalities in SuperDuperDB which cause asynchronous jobs to be
spawned in the SuperDuperDB cluster's worker pool:
There are several key functionalities in SuperDuperDB which trigger asynchronous jobs to be
spawned in the configured Dask worker pool.

- Inserting data
- Updating data
- Creating watchers
- Training semantic indexes and imputations
- Apply models to data `model.predict`
- Training models `model.fit`

See [the Dask documentation](https://docs.dask.org/en/stable/) for more information about setting up and managing Dask deployments. The dask deployment may be configured using
the [configuration stystem](configuration).

When a command is executed which creates jobs, its output will contain the job ids of the jobs
created. For example when inserting data, we get as many jobs as there are models in the database.
Each of these jobs will compute outputs on those data for a single model. The order of the jobs
is determined by which features are necessary for a given model. Those models with no necessary
input features which result from another model go first.
The stdout and status of the job may be monitored using the returned `Job` object:

```python
>>> job_ids = docs.insert_many(data)[1]
>>> print(job_ids)
{'resnet': ['5ebf5272-95ac-11ed-9436-1e00f226d551'],
'visual_classifier': ['69d283c8-95ac-11ed-9436-1e00f226d551']}
>>>
>>> job = model.predict(X='my-key', db=db, select=collection.find())
>>> job.watch()
# ... lots of lines of stdout
```

The standard output of these asynchronous jobs is logged to MongoDB. One may watch this
output using:
Jobs may be viewed using `db.show`:

```python
>>> docs.watch_job(job_ids['resnet'])
# ... lots of lines of stdout/ stderr
```
>>> db.show('job')
```
1 change: 1 addition & 0 deletions docs/infrastructure/single_host_cluster.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(singlehost)=
# SuperDuperDB single host cluster deployment

The simplest way to create a SuperDuperDB deployment, is to use the CLI
Expand Down
8 changes: 2 additions & 6 deletions notebooks/sentiment_analysis_use_case.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,8 @@
"metadata": {},
"outputs": [],
"source": [
"from superduperdb import CFG\n",
"from superduperdb.datalayer.base.build import build_datalayer\n",
"db = build_datalayer()\n",
"# db = pymongo.MongoClient().documents\n",
"# db = superduper(db)\n",
"db = build_datalayer(pymongo=pymongo.MongoClient(), name='documents')\n",
"db = pymongo.MongoClient().documents\n",
"db = superduper(db)\n",
"collection = Collection('imdb')"
]
},
Expand Down
Loading

0 comments on commit e7e672d

Please sign in to comment.