Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure export import module #3052

Merged

Conversation

CasperWA
Copy link
Contributor

@CasperWA CasperWA commented Jun 19, 2019

Fixes #2702

Reasoning behind and overview of the split ups

In general, this PR is only "chopping up" the big import/export files, not editing them (except to make the linter happy).

Everything (except tests) has been moved to aiida/tools/importexport/*.
Tests have been moved to aiida/backends/tests/tools/importexport/*.

Import/export module (aiida/orm/importexport.py)

Changes: A separate commit in this PR removes LINK_ENTITY_NAME and ATTRIBUTE_ENTITY_NAME, as well as their entries in metadata.json's dictionary all_fields_info that describes the schema for each entity.
This is because these were never actually used (now) and with the the JSONB fields there will be no reason for having attributes as a separate entity, at least.
For links, a separate dictionary exists in data.json that defines all links. Its creation is not reliant on LINK_ENTITY_NAME and the all_fields_info entry.

This file (aiida/orm/importexport.py) has been split up according to four regimes:

  • Export
  • Import
  • Utility
  • Config

Export

The function export and export_tree (where export_tree performs the actual export) are backend independent, hence they are placed in the "root" of sub-module dbexport.

Another function export_zip has been moved to a separate file zip.py under dbexport together with some special classes that should be moved elsewhere in aiida_core at a later stage.

Import

There is one "goto"-function import_data, which has the main purpose of determining the backend and rerouting to either import_data_dj or import_data_sqla for the django and sqlalchemy backends, respectively.

In order to underline this backend dependency, the functions import_data_[dj/sqla] have been placed under the sub-module dbimport/backends/[django/sqla], respectively.
import_data has been placed under dbimport.

Utility

The many utility functions have been shifted into several utility.py files and placed under their respective sub-modules, depending on where the functions are used.

Config

Most of the preparation/initialization in the beginning of the original import/export file has been put into a config.py file under the root of this module.
Most importantly, this is where one may find the definition of EXPORT_VERSION.

Tests for import/export (aiida/backends/tests/test_export_and_import.py)

All tests have been moved to aiida/backends/tests/tools/importexport/*.

This file has been prepped for this split up over the last months, by creating several test classes encompassing tests for a specific subject.
More specifically, several classes concern themselves with testing import/export of a single ORM entity. Others have instead focused on more abstract things, like creating complex provenance graphs and/or testing specific export/import options.

The main idea here was to simply split up the file by creating one file per class.

For the classes that focus on ORM entities, they have been placed the sub-module orm.

Addition

An additional test file (containing only one test) has been added: /orm/test_attributes.py.

This file is based on /orm/test_extras.py, but only has a single test, because attributes cannot be changed after storing a node. This means that there is no need to have special merging flags for attributes upon export/import as is the case for extras.

Export archive migration (aiida/cmdline/utils/migration/*)

The newly added export archive migration module is moved to this more appropriate place (aiida/tools/importexport/migration/*).

For the migration tests that were recently moved over from the aiida-export-migration-tests repo, the __init__.py file in aiida.backends.tests has been updated to reflect the migration files' new location, i.e., changing export_migration.* to tools.importexport.migration.*.
This means that when testing, one can hit all import/export tests (not including those related to verdi commands) by testing db.tools.importexport.

For reviewers

The first three commits (see below under Technical PR notes) do not alter any code, they only split it up according to the reasoning explained above and in the commit messages. Doing only minor changes to make pylint happy (the original files were skipped for checks in .pre-commit-config.yaml).

This means one can successfully review this PR by going through the changes introduced from the last two commits, concerning specifically the removal of LINK_ENTITY_NAME and ATTRIBUTE_ENTITY_NAME and the addition of the test file test_attributes.py under aiida/backends/tests/tools/importexport/orm/.
Lastly, a reviewer should check the setup of the new module aiida.tools.importexport., i.e., the construct of the __init__.py files, the choice of where to place import statement, and finally reflecting on the choices of the API, preferably backing it up with responses to the original reasoning behind the module structure.

Technical PR notes

Made it so that the commits should not be squashed.

First three commits relate to the actual move/restructuring (they could perhaps be squashed, but by keeping them separate it should be easier to go back and make changes, if needed):

  1. Move and restructure the actual export import module (from aiida.orm.importexport)
  2. Move and restructure the tests related to export and import (from aiida.backends.tests.test_export_and_import
  3. Move the export archive migration module (from aiida.cmdline.utils.migration)

The last two commits are attributed to:

  1. Adding a test for import of Node Attributes. Created from the tests for export/import of Node Extras
  2. "Cleaning" import/export by removing LINK_ENTITY_NAME and ATTRIBUTE_ENTITY_NAME, since they were never used

NB! LINK_ENTITY_NAME was used once during import to provide the user with information on what links have been created. Here 'Link' has been hardcoded instead.

@CasperWA
Copy link
Contributor Author

Hmm, my commits got jumbled up when making the PR - weird.

Will edit the top post to match commits.

@coveralls
Copy link

coveralls commented Jun 19, 2019

Coverage Status

Coverage increased (+0.04%) to 74.126% when pulling 3c1d52e on CasperWA:restructure_export_import_module into eebef39 on aiidateam:develop.

@CasperWA CasperWA force-pushed the restructure_export_import_module branch 2 times, most recently from e0516bf to 35ee803 Compare June 20, 2019 09:59
@CasperWA
Copy link
Contributor Author

Hmm, my commits got jumbled up when making the PR - weird.

Yay! It now seems the commits match what I have (always) had in my rebase edit overview:

Screenshot from 2019-06-20 12-01-14

@CasperWA CasperWA changed the title Restructure export import module [WIP] Restructure export import module Jun 20, 2019
@CasperWA
Copy link
Contributor Author

CasperWA commented Jun 20, 2019

This PR is currently on hold until:

@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 35ee803 to 7b25ca8 Compare June 21, 2019 10:44
@CasperWA
Copy link
Contributor Author

Just a note concerning the commit jumblings - now it switched the last two commits, for some reason. For these it is irrelevant, but it is still a weird phenomenon (difference between list of how the commits are rebased and how they are listed on GitHub).

@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 7b25ca8 to f478290 Compare June 21, 2019 11:17
@ltalirz
Copy link
Member

ltalirz commented Jun 21, 2019

@CasperWA First of all a HUGE thank you for hacking this beast of a file to bits. Can't wait to see this merged.

Second: phew, 52 files changed (as to be expected). Since this is a significant effort to review, may I suggest you adapt the description of the PR with

  • some more detail of how you've restructured files
  • a section on how to proceed with the review (potentially pointing out bits that can be reviewed separately, which would also allow to spread the work over multiple people)

@CasperWA
Copy link
Contributor Author

  • some more detail of how you've restructured files

Most of the relevant reasoning of the restructuring can be found in the respective commit messages.
However, I will edit the PR description to highlight the major things.

@CasperWA
Copy link
Contributor Author

@ltalirz, there you go 😃

@CasperWA
Copy link
Contributor Author

Consider moving imports to functions in order to optimize loading times, see PR #3072.

@CasperWA CasperWA force-pushed the restructure_export_import_module branch from f478290 to 33dffd8 Compare June 28, 2019 12:49
@CasperWA CasperWA changed the title [WIP] Restructure export import module Restructure export import module Jun 28, 2019
@CasperWA
Copy link
Contributor Author

With the JSONB project finished, and the changes addressed also here, this is now ready to be reviewed and merged :)

@CasperWA CasperWA force-pushed the restructure_export_import_module branch 3 times, most recently from 9354cf8 to 066a8d3 Compare June 28, 2019 14:53
@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 066a8d3 to 02b4e5b Compare July 1, 2019 13:39
@CasperWA
Copy link
Contributor Author

CasperWA commented Jul 1, 2019

After discussion with @ltalirz, I have removed the dedicated sqla.py and django.py files, and simply put the import functions under the respective __init__.py files in each folder.
Furthermore, the content of .importexport.dbimport.backends's __init__.py file has been removed.

Most significantly, the import of the backend-specific import function is done within the general import_data proxy-function after the backend has been identified.

The commit messages have been changed accordingly.

@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 02b4e5b to 3ec7b9a Compare July 1, 2019 14:14
@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 3ec7b9a to 05091b1 Compare July 1, 2019 14:31
CasperWA added 5 commits July 1, 2019 16:38
Split up and then remove `aiida.orm.importexport`.

Moved to `aiida.tools.importexport`.
Split up into structure (relative to `aiida.tools.importexport`:
./dbexport/__init__.py
./dbexport/utils.py
./dbexport/zip.py
./dbimport/__init__.py
./dbimport/backends/utils.py
./dbimport/backends/django/__init__.py
./dbimport/backends/sqla/__init__.py
./dbimport/backends/sqla/utils.py
./config.py
./utils.py

Import/export is now considered a tool, and has nothing to do with the
specifics of AiiDA.
It should be in a state, where it would be easy to remove it and save
it in a separate repo.

Import with a django or sqlalchemy backend has been split up.
For export, this is not needed, since it is already backend-independent.
To further reflect the split of backends, the actual import functions
have been placed
under `aiida.tools.importexport.dbimport.backends.[django/sqla].`.
In the future, `dbimport` should reflect the situation from `dbexport`
and be backend independent, creating the imported nodes through the
new backend-independent ORM entities.

Export to a zip-folder/file has been split up in a separate file.
This should later be implemented in another place, since a ZipFolder
class exists here.
For example `aiida.common.folder`. See also issue aiidateam#3100.

Note: Be careful with wildcard imports in `__init__` files for backends
specific functionality.
Original file: `aiida.backends.tests.test_export_and_import.py`

Set up tests in `aiida.backends.tests.tools.importexport.`
matching the newly moved import/export module, which has been moved to
`aiida.tools.importexport.`.

Tests are split up into separate files according to the test classes in
the original file.

All tests related to ORM entities have been put under
`aiida.backends.tests.tools.importexport.orm.`.
While `aiida.tools.importexport.orm.` does not exist, this simplifies
the overview of the tests.

The tests were moved under `aiida.backends.tests.` and not
`aiida.tools.importexport.tests.` because we are currently looking into
changing to a pytest API, and it is therefore easier to do this later
change having all tests in a single place.
Furthermore, other tests, which are not backend dependent, are also
found here.
This is more-or-less for the same reason.
Moving module at `aiida.cmdline.utils.migration` to
`aiida.tools.importexport.migration`.
Update `verdi export migrate` to point at new location.
These are removed from the importexport module,
since they were/are never used.
@CasperWA CasperWA force-pushed the restructure_export_import_module branch from 05091b1 to 3c1d52e Compare July 1, 2019 14:38
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @CasperWA

@giovannipizzi
Copy link
Member

@CasperWA before merging, maybe just check the order of the commits, they seem to be reverted again??

@CasperWA
Copy link
Contributor Author

CasperWA commented Jul 1, 2019

@CasperWA before merging, maybe just check the order of the commits, they seem to be reverted again??

Yeah, it's really strange. However, I will leave it up to the person merging to check that the commits are in the correct order, which is:

  1. 9dbb85f - Refactoring importexport module.
  2. 8b36757 - Split up and move importexport tests
  3. f8d79c3 - Move export archive migration
  4. 707bf6d - Add import/export test of Attributes
  5. 3c1d52e - Remove LINK_ENTITY_NAME and ATTRIBUTE_ENTITY_NAME

In other words, I don't know how to make them be in the correct order, since what I do is always the same (rebaseing and the force pushing). Without changing the order.

@giovannipizzi
Copy link
Member

If you click on a commit, you will see that the parent is the correct one, i.e. the one you intend (your last message).
The order on GitHub is "wrong" because your commits happen in a strange date order (you have a commit done in April that, from a point of view of the git graph, comes after one done in June).
Anyway, I will rebase and merge from GitHub and accept whatever will happen ;-)
Maybe in the future in the rebase you might want to decide what to do with the dates of commits you are reshuffling in order.

@giovannipizzi giovannipizzi merged commit 67bf881 into aiidateam:develop Jul 1, 2019
@CasperWA CasperWA deleted the restructure_export_import_module branch July 1, 2019 16:16
@CasperWA
Copy link
Contributor Author

CasperWA commented Jul 1, 2019

Merci @giovannipizzi ! 😄

@sphuber
Copy link
Contributor

sphuber commented Jul 1, 2019

Thanks a million @CasperWA for the restructuring and for reformatting the commits! 👍 much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restructure/-factor import/export module
5 participants