Migration between deployments/Export project functionality #1780

matusdrobuliak66 · 2024-05-14T15:08:24Z

Based on the working group ITISFoundation/osparc-ops-environments#672 we decided we will investigate these 3 options:

(1) Importing from target deployment

Using an ad-hoc GUI the user can import thier projects from another deployment.

Prerequisits:

user must have an account in bouth source and destination deployments
user must authenticate with his credentials from source inside the destination deployment (this generates tokesn for the purpose of importing projects)

Chnages to oSPARC:

create endpoint for authenticating the user in another deployment
create endpoint for listing projects available to the user (maybe we can reuse soemething?)
create endpoint to start a copy (lock project): provides "project data" + "tokens to copy data from s3"
sumbit a job that "imports" the project: first sync data then insert project in db, if it fails remove data.
create endpoint for signal copy operation is done (unlocks project)

PROS:

not very complex, we rely on already existing tools and just generate a few ne API endpoints
potentially can be used internally to make a copy of an existing project (target the same deployment)
avoids creating a "data model" for exporting and importing user data by rclone copy S3 to S3

CONS:

user does not get access to their data (they can only move it from deployment A to deployment B)

(2) Archiving

Generate an archive containing project data and data stored in all nodes.

Prerequisits:

user must have an account in both source and desitnation deployments
user must have enough disk space to download the archive to his computer

Changes to oSPARC:

create endpoint for starting the export procedure
background job that creates the archive:
- donwload files and put them in an archive envtually compressing them + packaging the data model for the porject
- upload the arhcive to S3 (with an expiration)
- notify user (via email?) that the arhvie is available for download
solid upload process that is able to resume (require backend/FE coordination)
- split file into chunks
- retry if chunk fails to upload
- put chunks together in a unique file
import process (once file is available start import)
- check archive validity (nobody tamperred with it)
- extract data from the archive and upload to S3 (rollback on error)
- insert project in DB

PROS:

user has phisical copy of the data, by opening the archive he could extract a single file

CONS:

requires a third party computer (the user's) to download the arhive and upload the archive
uses two extra step form solution (1): archive creation and archive extraction
require more moving parts that:
- links that expire
- archive management: import + export
- there is one extra job queue (for exporting)

(3) Migration

The idea here is to migrate one deployment to another.

migrate S3 data
Database Migration (issues with autogenerated integer primary/foreign keys) - Potential solutions:
- Change the primary keys to randomly generated string IDs.
- Retain integer keys but artificially increase the integers by a large number.
- Change int to string and add some prefix (different prefix in different deployment)
- Almost all tables:
  - clusters, cluster_to_groups
  - comp_runs
  - comp_tasks
  - folders, workspaces
  - groups + all resources access rights
  - payments
  - resource tracker
  - pricing plans / units / costs
  - users
  - ...

PROS:

We will not face issues with migration between deployments in the future.

CONS:

It's a one-time full migration between deployments effort (not a feature for users as in previous cases)

Tasks

Give feedback

No tasks being tracked yet.

Options

pcrespov · 2024-09-30T08:10:44Z

Brainstorming on Sep.27, 2024

@giancarloromeo, @GitHK, @pcrespov, @matusdrobuliak66

There was no consensus on a clear preference for any of the proposed solutions above. Below some notes from the discussion

Data Migration from Source to Destination Database

When migrating data between databases, especially PostgreSQL tables with identifiers and relationships, it’s important to go beyond just viewing it as a transfer of data rows. The semantics of the data (i.e., the meaning of the entities and their relationships) must also be considered. Still, some of the key challenges can already be identified, particularly around merging data that exists in both the source and destination databases:

Key Challenges:

Integer Identifiers:
- Apply an offset to the source table IDs by adding the maximum ID value from the destination table to avoid conflicts.
- While it’s not mandatory, switching to more unique, descriptive identifiers (similar to Stripe-like IDs such as name_1456123456asdfa45) would be preferable.
Merging Existing Resources (e.g., Users, Products):
- Users: Handle records where users have the same email address in both source and destination databases.
- Products: Manage cases where products share the same product name across both databases.
- Group 1: Identify and handle additional resource overlaps.
Maintaining Dependencies (e.g., Groups):
- To preserve data integrity, ensure that related records (e.g., groups) are inserted in the correct order during migration. This guarantees that dependencies are maintained.

A Semantic Approach to Migration

Considering the database's structure and meaning, a more strategic approach is to break the migration into stages based on different contexts. This allows for grouping related tables and migrating them together, either manually or automatically.

Identified Contexts:

Platform Configurations:
- Clusters
- Products
- Product Prices
- (...)
Users:
- Users
- Wallets
- User Preferences (Frontend)
- (Additional user-related tables)
Services:
- Service Metadata
- Service Access Rights
- (Additional service-related tables)
Studies (Projects + Data):
- Projects
- Folders
- File Metadata
- (...)

Migration Process Requirements

Data Integrity Checks:
- Every step of the migration process must include validation checks to ensure data integrity, preventing corruption or data loss.
Checkpoints for Rollback:
- Implement checkpoints at various stages of the migration to allow for reversion in case a data integrity check fails, ensuring a safe fallback.

Features

Even thought his process will be mostly carried out once and in the backend, it might have a big value if the ability to import/export studies should be available as a standalone feature for users

matusdrobuliak66 · 2025-01-08T18:46:09Z

Discussion on Jan.08, 2025

@mrnicegyu11, @YuryHrytsuk , @matusdrobuliak66

Dustin and Sylvain R. had a discussion, during which it was concluded that an export/import project functionality might be needed to provide users with the option to migrate from TIP in-house to TIP in the cloud. This functionality is a prerequisite for shutting down the TIP in-house deployment.

My Takeaways from Discussion and Proposed Action Plan

We create export and import endpoints
- For user resource
- For project resource
Export will retrieve a JSON type metadata file "artifact" where all important information will be. For example for project:
- Project info -> Services info (including potencionally AWS presigned download links for S3 data download)

Project import

Import endpoint will recieve the metadata information
An asyncronious task will start which might return some job ID (we might create a special table for this usecase)
Import implementation will reuse as much functionality as we have, ex:
- creation/update of the project
- creation/update of the project node
It will create new resource IDs on the fly during the import
It will also trigger some AWS service (outside of the simcore docker stack) that will take care of downloading or moving S3 data to the right location
When import is finished user will get notified for example by email.
To keep the process as simple and minimalistic as possible, we will store only the essential data required to upload the project. This will be clearly and transparently explained to the user. For example: all tags will be lost, all sharing settings will be removed, and even workspace and folder paths may be discarded. Essentially, the project will always appear in the root folder within the private workspace. History of service runs will be lost.

User import/export

A similar logic can be used to build a user import endpoint, which will be utilized exclusively by admins. For example, it can be executed in a loop to migrate users from one deployment to another.

Services

We will conduct a manual analysis of the differences between the services in the two deployments. Based on this analysis, we will decide which services to support. These services will then be migrated manually or with the help of custom scripts.

Product / Platform

These will be manually set up in the new deployment.

Action

One of the first steps we can take is to start writing unit test that utilize the existing creation and update functionality in the code. These test will create a project with multiple services using newly generated IDs. The input to the test will resemble the metadata JSON, which aligns with the export functionality we aim to implement.

matusdrobuliak66 assigned pcrespov, matusdrobuliak66, GitHK and sanderegg and unassigned pcrespov May 14, 2024

matusdrobuliak66 added this to the Leeroy Jenkins milestone May 14, 2024

sanderegg modified the milestones: Leeroy Jenkins, South Island Iced Tea Jun 7, 2024

sanderegg removed this from the South Island Iced Tea milestone Jul 8, 2024

sanderegg added this to the Eisbock milestone Aug 13, 2024

sanderegg assigned pcrespov and giancarloromeo Aug 26, 2024

sanderegg removed this from the Eisbock milestone Sep 13, 2024

odeimaiz mentioned this issue Nov 19, 2024

Phasing out of tip.itis.swiss #1775

Open

6 tasks

odeimaiz assigned mguidon Nov 22, 2024

odeimaiz mentioned this issue Nov 22, 2024

Export/Import studies ITISFoundation/osparc-simcore#6805

Closed

1 task

odeimaiz transferred this issue from ITISFoundation/osparc-simcore Nov 25, 2024

odeimaiz added the PO issue Created by Product owners label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration between deployments/Export project functionality #1780

Migration between deployments/Export project functionality #1780

matusdrobuliak66 commented May 14, 2024 •

edited

Loading

Tasks

pcrespov commented Sep 30, 2024 •

edited

Loading

matusdrobuliak66 commented Jan 8, 2025 •

edited

Loading

Migration between deployments/Export project functionality #1780

Migration between deployments/Export project functionality #1780

Comments

matusdrobuliak66 commented May 14, 2024 • edited Loading

(1) Importing from target deployment

Prerequisits:

Chnages to oSPARC:

PROS:

CONS:

(2) Archiving

Prerequisits:

Changes to oSPARC:

PROS:

CONS:

(3) Migration

Tasks

pcrespov commented Sep 30, 2024 • edited Loading

Brainstorming on Sep.27, 2024

Key Challenges:

Identified Contexts:

matusdrobuliak66 commented Jan 8, 2025 • edited Loading

Discussion on Jan.08, 2025

My Takeaways from Discussion and Proposed Action Plan

Project import

User import/export

Services

Product / Platform

Action

matusdrobuliak66 commented May 14, 2024 •

edited

Loading

pcrespov commented Sep 30, 2024 •

edited

Loading

matusdrobuliak66 commented Jan 8, 2025 •

edited

Loading