Support for hybrid data #308

blythed · 2023-06-20T11:06:08Z

Why

In training and otherwise, it should be possible to load images separately (for example) client side, to enable more memory efficient.

How

Adding ability to download data from DB, and add into that the larger blobs post-hoc - loaded from disk.

What

Add configuration allowing images to be downloaded to disk, rather than to DB
Modify Downloader adding file saving option
Modify SuperDuperCursor
Add "out of memory" option to QueryDataset

This should be facilitated by #422.

The text was updated successfully, but these errors were encountered:

blythed · 2023-07-14T17:38:00Z

Currently data is saved like this in DB:

{"_content": {
  "bytes": b"...",
  "uri": "...",
  "encoder": "image"
}}

As a configurable option, do:

{"_content": {
  "local_uri": "file://... OR s3://",
  "uri": "https://...",
  "encoder": "image"
}}

Still when we perform db.execute(collection.find_one()), we would get data loaded.
This is a bit like functionality of EvaDB.

IDEA is: user experience in either case is just like performing queries, but DB might do this in a hybrid way.

Key modules:

datalayer/base/downloads.py
datalayer/core/documents.py

Currently in Document we see decoding from DB blob. Alternatively, could be a reference.

@classmethod
def _decode(cls, r: t.Dict, encoders: t.Dict):
    if isinstance(r, dict) and '_content' in r:
        type = encoders[r['_content']['encoder']]
        try:
            return type.decode(r['_content']['bytes'])
        except KeyError:
            return r
    elif isinstance(r, list):
        return [cls._decode(x, encoders) for x in r]
    elif isinstance(r, dict):
        for k in r:
            r[k] = cls._decode(r[k], encoders)
    return r

blythed added the story label Jun 20, 2023

blythed added 🛠️ task and removed story labels Jul 3, 2023

This was referenced Aug 7, 2023

Data setup which loads blobs on the fly from filesystem/ s3 #497

Closed

Add hybrid data uris + records #641

Merged

blythed closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for hybrid data #308

Support for hybrid data #308

blythed commented Jun 20, 2023 •

edited

Loading

blythed commented Jul 14, 2023 •

edited

Loading

Support for hybrid data #308

Support for hybrid data #308

Comments

blythed commented Jun 20, 2023 • edited Loading

Why

How

What

blythed commented Jul 14, 2023 • edited Loading

blythed commented Jun 20, 2023 •

edited

Loading

blythed commented Jul 14, 2023 •

edited

Loading