Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets in Renku #594

Closed
rokroskar opened this issue Jul 10, 2019 · 5 comments
Closed

Datasets in Renku #594

rokroskar opened this issue Jul 10, 2019 · 5 comments

Comments

@rokroskar
Copy link
Member

rokroskar commented Jul 10, 2019

User story

Users should be able to use the UI to:

  • search for a Dataset
  • view Dataset information in a renku Project
  • view information of all Datasets in the renku instance
  • create/manage Datasets
  • add data to Datasets via upload or remote URL
  • modify some metadata, e.g. the Dataset description
  • delete datasets

Semantics

A Dataset in Renku:

  • represents a collection of data in a repository
  • contains metadata about the user that created it
  • is comprised of zero or more DatasetFiles

A DatasetFile in Renku:

  • contains additional metadata about the creator, which may be different from the Dataset creator
@erbou
Copy link
Contributor

erbou commented Jul 10, 2019

User stories: a user should be able to search existing datasets in Renku, view information about a dataset, and eventually navigate to the project where the dataset was created, or was imported to. The user should be able to cherry pick datasets, possibly from several projects, and add them to a basket. In addition, a user should be able to add data from external sources to this basket, by giving a DOI, or a URL, which can be the path name of a local local file or folder. If the user add external data, they must organize it into datasets in the basket. Finally the user should be able to checkout the basket. Depending on the context, at checkout time, a new project is created and the content of the basket is added to it, or the content of the basket is merged into an existing project. There a several possible scenarios: (1) the basket function was initiated during the creation of a new project, in that case it is added to this new project, (2) the basket function was initiated from the context of an existing project (e.g. dataset management view in the project), in that case it is added to that project, (3) optionally, the basket function was invoked from the home page, outside any context, in that case we should probably take the user to the steps of creating a new project.

Note: a best practice (to promote reusability), would be to keep dataset imported from external sources in a project on their own, without data transformation or any kind of analytics pipelines, unless it is a necessary step to use the data in any other application. Derived data, i.e. data that is created inside Renku, should be kept together with the method that was used to derive the data.

@rokroskar
Copy link
Member Author

Can the "basket" be considered post-MVP?

@ableuler
Copy link
Contributor

I suggest to add that datasets are versioned.

@rokroskar rokroskar changed the title Dataset integration in the UI Datasets in Renku Jul 11, 2019
@erbou
Copy link
Contributor

erbou commented Jul 11, 2019

Can the "basket" be considered post-MVP?

The "basket" can be considered post-MVP, and we may even discover a more elegant approach by the time we get there.

In the meantime we need a way to import datasets that does not require using the CLI.

@rokroskar
Copy link
Member Author

This epic has been completed - we should open new, finer-grained epics as needed. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants