Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add unit tests around schema evolution #225

Merged
merged 8 commits into from
Jun 8, 2023

Conversation

rgraber
Copy link
Contributor

@rgraber rgraber commented May 24, 2023

Description:
Add unit tests to confirm that any schema updates are backward and forward compatible with stored versions. Also adds a management command to generate and store the current Avro schema for each signal.

Manual testing:
Test 1: Add required field "thing" to CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: No default value for thing

Test 2: Remove required field "creation_date" from CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: No default value for creation_date

Test 3: Change type of user_data from UserData to str in CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'name': 'id', 'ty...
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: string is not {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'na...

Test 4: Add optional field thing to CourseEnrollmentData
All tests pass

Test 4: Remove optional field created_by from CourseEnrollmentData
All tests pass

ISSUE: edx/edx-arch-experiments#271
Reviewers:

  • tag reviewer
  • tag reviewer

Merge checklist:

  • All reviewers approved
  • CI build is green
  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Commits are squashed

Post merge:

  • Create a tag
  • Check new version is pushed to PyPI after tag-triggered build is
    finished.
  • Delete working branch (if not needed anymore)

Author concerns: List any concerns about this PR - inelegant
solutions, hacks, quick-and-dirty implementations, concerns about
migrations, etc.

@rgraber rgraber requested a review from a team as a code owner May 24, 2023 19:19
@rgraber rgraber marked this pull request as draft May 24, 2023 19:24
@rgraber rgraber marked this pull request as ready for review May 25, 2023 15:38
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I wonder if we need a doc that explains the process for updating a schema, even if it is just what we know today? Could be docstring, as long as someone would know where to look, so we might just have a README.rst that points to the right docstring.
  • Can you do a couple of manual tests and list what you tested? E.g. making an incompatible change to the signal before and after updating the schemas. Making a compatible change, etc.
  • Also, I wonder if the management command should do some of this testing and complain if you are trying to make an incompatible change, unless you provide some "force" argument, in case you are first working on the event. Or I guess you could just delete the schema.

Thoughts to be discussed.

current_event_bytes = current_out.read()

# get stored schema
stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the schema hasn't been generated, can we detect and give a simple message about how to resolve? Could we point to a doc (or docstring) that explains the process of updating schemas)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a full process for updating schemas so I'm hesitant to make much documentation about it. In any case, if there's no file, this would be an issue with how schemas are added as opposed to updated. I guess we can document that somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you even just add a comment above this line that if the schema is missing, they probably need to run the management command to generate a file for a new event?

@rgraber
Copy link
Contributor Author

rgraber commented May 25, 2023

  • I wonder if we need a doc that explains the process for updating a schema, even if it is just what we know today? Could be docstring, as long as someone would know where to look, so we might just have a README.rst that points to the right docstring.

I would put that in a separate documentation-only PR. Among other things, there will probably be some debate on where to put it.

* Can you do a couple of manual tests and list what you tested? E.g. making an incompatible change to the signal before and after updating the schemas. Making a compatible change, etc.

Done

* Also, I wonder if the management command should do some of this testing and complain if you are trying to make an incompatible change, unless you provide some "force" argument, in case you are first working on the event. Or I guess you could just delete the schema.

I thought we didn't really want people overwriting existing schemas regardless of whether or not the change is compatible, so we're always testing against the original version of the schema. That's also why we always have an "are you sure" prompt when overwriting.

Thoughts to be discussed.

@rgraber rgraber force-pushed the rsgraber/20230523-unit-test-schema-evolution branch from 1be4630 to 7abc233 Compare May 25, 2023 18:32
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great. I'm excited about it. Note that there is a group considering consuming messages using node, and this Avro schemas will come in handy. :)

@robrap
Copy link
Contributor

robrap commented May 25, 2023

I thought we didn't really want people overwriting existing schemas regardless of whether or not the change is compatible, so we're always testing against the original version of the schema. That's also why we always have an "are you sure" prompt when overwriting.

Sounds reasonable. I made other PR comments to see if we could add this context to your code comments, but otherwise consider this thread resolved. Thanks.

@rgraber rgraber force-pushed the rsgraber/20230523-unit-test-schema-evolution branch from 5acde7f to b2301b3 Compare June 6, 2023 14:58
@rgraber rgraber requested a review from robrap June 7, 2023 15:55
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor questions.

"org.openedx.content_authoring.course.certificate_config.deleted.v1",
]

def generate_test_data_for_schema(schema): # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why # pragma: no cover is needed for this and the other method, when it seems like they are used in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code that is not called and is only meant as a safety hatch against weird data types in future events. It didn't seem worth it to add test coverage to a method that itself is only used for tests, but not adding the pragma did lower the coverage in a way that would cause warnings on future PRs (since the total coverage would be considered low).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You could add the pragma more specifically on the code that isn't actually covered, but this is non-blocking. It just raised questions for me. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it worthwhile to future-proof. Coverage shouldn't really care about this method at all, and if someone changes it they shouldn't have to go through the annoyance of figuring out which bits are and aren't covered again.

current_event_bytes = current_out.read()

# get stored schema
stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts?

@rgraber rgraber requested a review from robrap June 8, 2023 15:16
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

try:
old_schema = load_schema(schema_filename)
except SchemaRepositoryError: # pragma: no cover
self.fail(f"Missing file {schema_filename}. If a new signal has been added, you may need to run the"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. We know this is an expected situation, and it just makes the experience much nicer for adding new events.

@rgraber rgraber merged commit 3965caa into main Jun 8, 2023
@rgraber rgraber deleted the rsgraber/20230523-unit-test-schema-evolution branch June 8, 2023 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants