DataEntry should be refactored #520

GearsAD · 2020-07-03T15:18:36Z

For 0.9 we should refactor the data entries to include more information.

The proposed data entry structure that every data source (e.g. MongoDB) should implement is:

struct AbstractDataEntry
  label::Symbol # Formerly key 
  id::UUID
  origin::String # E.g. user|robot|session|label
  description::String
  mimeType::String
  hash::String # Probably https://docs.julialang.org/en/v1/stdlib/SHA
  createdTimestamp::DateTime
end

We can't actually implement this in AbstractDataEntry, but we should enforce that these fields are populated.

JT EDIT: I just swapped label and id for easier deprecation.
JT EDIT: source -> origin from slack comment
JT EDIT: added createdTimestamp

The text was updated successfully, but these errors were encountered:

dehann · 2020-07-03T18:42:09Z

i would suggest we don’t call this AbstractDataEntry, since Abstract has specific meaning in our general dispatch use. Why not just DataEntry and it can derive from AbstractDataEntry if that abstraction is required?

dehann · 2020-07-03T18:42:25Z

rest looks good to me!

GearsAD · 2020-07-03T21:21:21Z

Agreed, that was more of a conceptual design.

In practice these fields would have to be populated in GeneralBigDataEntry, MongodbBigDataEntry, and FileBigDataEntry.

dehann · 2020-07-05T14:22:29Z

guessing you mean “Data” :-)

Affie · 2020-07-07T17:31:49Z

xref #97 (comment)

Affie · 2020-07-09T10:17:36Z

What hashing function do we want to use and will it be fixed to only one?
Julia is currently faster on SHA2

Affie · 2020-07-09T18:58:02Z

Do we want a timestamp?
The variable has a timestamp that should technically be the same (or close to) the data timestamp.
Perhaps it can help with data integrity.

GearsAD · 2020-07-13T02:17:36Z

SHA2 is good I think?
The timestamp is more of a database-level entry, just to keep track of when the data was actually created, the timestamp in the variable is still the authority on the variable.

GearsAD · 2020-07-13T02:18:35Z

Notes from Slack:

If we key the data stores (or ID them) then we can take the store-specific info out of FileDataEntry and the connection information out of MongodbDataEntry. We can keep that info inside the store itself and relate it by the datastorekey.
Can we rename source to maybe origin or graphOrigin because source is a very loaded term?
Hash is base64 encoded string
We'll (optionally) use datasource ID to identify the different stores, which we'll save into the entries
Users are responsible for managing the different stores
Data is immutable, so we don't have to worry about updateTimestamp or updateData()
Line 115 with addData!(dfg::AbstractDFG, dataStore::AbstractDataStore, variableLabel::Symbol, dataLabel::Symbol, blob::Vector{UInt8}, [default parameters like description etc.]) will return the entry when you create the data (really like that way of working with the API as you have it there)
SmallData is a special data entry as you've got there, of type Dict{Symbol, Union{Int, Float64, AbstractString, Vector{Int}, Vector{Float64}, Vector{AbstractString}} or something similar
SmallData is a special DataEntry, but it's only really used in CGDFG for saving/loading. A user won't see it, the DFGVariable will contain the Dict like above

dehann · 2020-07-13T02:30:36Z

I have a suggestion on this:

We'll (optionally) use datasource ID to identify the different stores, which we'll save into the entries

The idea behind immutable id => blob pairs is that the location of the blob does not matter. in fact its highly likely that there will be multiple copies of the blob flying all over the place (hence the immutable requirement), so it seems little awkward to store a single blob store location into the Entry?

perhaps i’m understanding wrong and we should just make that bullet point a bit more explicit.

dehann · 2020-07-19T09:22:12Z

targeting v0.9 or v0.10 now that we pushing for nullhypo stuff in v0.9?

GearsAD added this to the v0.9.0 milestone Jul 3, 2020

Affie modified the milestones: v0.9.0, v0.10.0 Jul 19, 2020

Affie linked a pull request Jul 28, 2020 that will close this issue

DataEntry and BlobStore Options #550

Merged

6 tasks

Affie closed this as completed Jul 28, 2020

Affie mentioned this issue Aug 5, 2020

smallData as a Dict{Symbol, Any} #573

Closed

dehann mentioned this issue Aug 9, 2020

Compulsory MIMEType or description in addData!????? #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataEntry should be refactored #520

DataEntry should be refactored #520

GearsAD commented Jul 3, 2020 •

edited by Affie

Loading

dehann commented Jul 3, 2020

dehann commented Jul 3, 2020

GearsAD commented Jul 3, 2020

dehann commented Jul 5, 2020

Affie commented Jul 7, 2020

Affie commented Jul 9, 2020

Affie commented Jul 9, 2020

GearsAD commented Jul 13, 2020

GearsAD commented Jul 13, 2020 •

edited

Loading

dehann commented Jul 13, 2020 •

edited

Loading

dehann commented Jul 19, 2020

DataEntry should be refactored #520

DataEntry should be refactored #520

Comments

GearsAD commented Jul 3, 2020 • edited by Affie Loading

dehann commented Jul 3, 2020

dehann commented Jul 3, 2020

GearsAD commented Jul 3, 2020

dehann commented Jul 5, 2020

Affie commented Jul 7, 2020

Affie commented Jul 9, 2020

Affie commented Jul 9, 2020

GearsAD commented Jul 13, 2020

GearsAD commented Jul 13, 2020 • edited Loading

dehann commented Jul 13, 2020 • edited Loading

dehann commented Jul 19, 2020

GearsAD commented Jul 3, 2020 •

edited by Affie

Loading

GearsAD commented Jul 13, 2020 •

edited

Loading

dehann commented Jul 13, 2020 •

edited

Loading