Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.

Sufia 6.x to Sufia 7.2 migration PCDM

cam156 edited this page Jun 21, 2017 · 1 revision

Data migration plan

  1. Collection Becomes Collection

  2. Make sure Encoding in metadata works during the migration

  3. GenericFile Becomes GenericWork & FileSet

  4. GenericWork.id = GenericFile.id

  5. Versions: Hector's code handles this, but the order may be invalid project_hydra/sufia/import_s6

    • make sure to keep the update and created dates
    • import_current_version in import_s6 branch should use actor stack
  6. Permissions: Hector's code handles this, but the order may be invalid project_hydra/sufia/import_s6

  7. Thumbnails - We are thinking we are throwing away thumbnails and allowing Sufia 7 to regenerate thumbnails

  8. Make sure Encoding in metadata works during the migration

  9. Migrate related URLs if they point back into ScholarSphere - do we even need this?

    1. Yes migrate the urls forward
  10. Batch Becomes hasRelatedWork predicate on GenericWork (Need project_hydra/sufia/#1711)

  11. Should use DCE:relation Not hasRelatedWork

  12. Activity - In Redis. Need to migrate from Work ID to new File Set ID

  13. Deposit messages can change from GenericFile to GenericWork in the redis key

  14. Attached Message need to move from the id of the GenericFile to the id of the FileSet

    • Should we create the correct files set message or should we just put these at the work.
    • ActorStack might create additionl Redis activity, but with bad dates
  15. Work created at the work level

  16. Upload at the fileset level

  17. Update of new files would occur as activity on both the work and the new fileset

  18. Audit logs - In MySQL. Need to migrate from Work ID to new File Set ID

  19. Featured Works - In MYSQL. Database Migration

  20. Analytics - In MySQL. Account for views of a work & views and downloads of a file set

    1. May need to start with views of the work/fileset being a copy of the GenericFile views/downloads

    2. The work download should be the sum total of the downloads of the file.

    3. The file would have downloads and views

Code change

  1. Keep existing URLs valid - Alias /files to concern/generic_works
  2. Resource type is not a base term
  3. Shared File directory between web application machines is needed for thumbnails
  4. Existing code from Hector. Needs to be reworked for the latest version of S6 and S7
  5. https://github.com/projecthydra/sufia/tree/import_s6
  6. https://github.com/psu-stewardship/scholarsphere/tree/export_s6

Upgrades to server

  1. Need Fedora 4.5.0
  2. Need Solr 5.5.1 or ETDa version
  3. Version of Redis? 2.6?

Plan for migration and verification

Below are the steps to audit the Fedora 4 to Sufia 7 (PCDM) migration. The basic workflow is as follows:

  1. Gather the list of all objects( Collections, GenericFiles, & map Batches) in the Fedora 4 repo and store them in a MySQL table.
  2. Migrate the data.Plan
  3. Loop through all the Fedora 4 objects in the MySQL table and make sure that (a) they exist in the Fedora 4 repo and (b) their model (Collection = Collection, GenericFile = GenericWork (with related works) & FileSet) matches with their mapped model set.

You can perform steps 1 and 2 in whatever order. The only requirement to perform step 1 is that you have access (i.e. the URL) to a running Fedora 4 repo with the original data.

Migration Commands

Tasks prior to production deployment TODO these are a copy from last time

  1. sync data to QA & Staging
  2. Run Migration on QA
  3. Regression Testing QA
  4. Bug Fixes
  5. Release Notification on Production
  6. Create user email list
  7. Staging migration and process documentation
  8. ITS alert
  9. Open firewall so qa is available to stage & prod (repos)
  10. Add migration banner into release branch of master
  11. Ensure all linked files are present and configured correctly
Clone this wiki locally