Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic dataset module and specific s3_datasets module - part 5 (Move DatasetServiceInterface to datasets_base, add property, create first list API for datasets_base) #1281

Merged
merged 36 commits into from
May 21, 2024

Conversation

dlpzx
Copy link
Contributor

@dlpzx dlpzx commented May 17, 2024

Feature or Bugfix

  • Feature
  • Refactoring

Detail

As explained in the design for #1123 we are trying to implement a generic datasets_base module that can be used by any type of datasets in a generic way.

In this PR we:

  • Move DatasetServiceInterface to datasets_base. This interface is used by datasets_sharing to "inject" logic in s3_datasets
  • add property dataset_type to the DatasetServiceInterface interface to distinguish which type of dataset this interface applies to.
  • create first list API for datasets_base. 👀 This is the most important part. When having multiple types of datasets users will still list all datasets together in several places in the UI (e.g. in listDatasets in DatasetList view, in listDatasetsEnvironment in Environment view) This API calls are not specific to s3_datasets, but generic to any type of dataset. Thus, they should be part of datasets_base. This PR introduces the datasets_list_service, datasetListRepository and includes only one example of API that moves to dataset_base. In next PRs we will move the rest of APIs

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

dlpzx added 30 commits May 6, 2024 14:33
…t-model-refactoring-2' into feat/generic-dataset-model-refactoring-3
…eric-dataset-model-refactoring-3

# Conflicts:
#	backend/dataall/modules/dataset_sharing/services/dataset_sharing_service.py
#	backend/dataall/modules/s3_datasets/api/dataset/resolvers.py
#	backend/dataall/modules/s3_datasets/db/dataset_models.py
#	backend/dataall/modules/s3_datasets/services/dataset_service.py
#	backend/dataall/modules/s3_datasets/services/dataset_table_service.py
@dlpzx
Copy link
Contributor Author

dlpzx commented May 17, 2024

Testing locally:

  • list Datasets view shows list of Datasets - for owned datasets
  • list Datasets view shows list of Datasets - for dataset stewards
  • list Datasets view shows list of Datasets - for shared datasets

@dlpzx dlpzx marked this pull request as ready for review May 17, 2024 06:47
…oring-5

# Conflicts:
#	backend/dataall/modules/dataset_sharing/db/share_object_repositories.py
#	backend/dataall/modules/dataset_sharing/services/dataset_sharing_service.py
#	backend/dataall/modules/datasets_base/db/dataset_repositories.py
#	backend/dataall/modules/s3_datasets/db/dataset_repositories.py
@dlpzx dlpzx requested a review from petrkalos May 21, 2024 05:32
Copy link
Contributor

@petrkalos petrkalos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving but I'd appreciate if you could add some type hints on the non-trivial arguments

cls._interfaces.append(interface)

@classmethod
def _list_all_user_interface_datasets(cls, session, username, groups) -> List:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return List[Query]?

).to_dict()

@staticmethod
def _query_all_user_datasets(session, username, groups, all_subqueries, filter) -> Query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume all_subqueries is of type List[Query]?

@dlpzx dlpzx merged commit 4acd904 into main May 21, 2024
9 checks passed
@dlpzx dlpzx deleted the feat/generic-dataset-model-refactoring-5 branch May 22, 2024 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants