feat: add unit tests around schema evolution #225

rgraber · 2023-05-24T19:19:41Z

Description:
Add unit tests to confirm that any schema updates are backward and forward compatible with stored versions. Also adds a management command to generate and store the current Avro schema for each signal.

Manual testing:
Test 1: Add required field "thing" to CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: No default value for thing

Test 2: Remove required field "creation_date" from CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: No default value for creation_date

Test 3: Change type of user_data from UserData to str in CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'name': 'id', 'ty...
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: string is not {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'na...

Test 4: Add optional field thing to CourseEnrollmentData
All tests pass

Test 4: Remove optional field created_by from CourseEnrollmentData
All tests pass

ISSUE: edx/edx-arch-experiments#271
Reviewers:

tag reviewer
tag reviewer

Merge checklist:

Post merge:

Create a tag
Check new version is pushed to PyPI after tag-triggered build is
finished.
Delete working branch (if not needed anymore)

Author concerns: List any concerns about this PR - inelegant
solutions, hacks, quick-and-dirty implementations, concerns about
migrations, etc.

robrap

I wonder if we need a doc that explains the process for updating a schema, even if it is just what we know today? Could be docstring, as long as someone would know where to look, so we might just have a README.rst that points to the right docstring.
Can you do a couple of manual tests and list what you tested? E.g. making an incompatible change to the signal before and after updating the schemas. Making a compatible change, etc.
Also, I wonder if the management command should do some of this testing and complain if you are trying to make an incompatible change, unless you provide some "force" argument, in case you are first working on the event. Or I guess you could just delete the schema.

Thoughts to be discussed.

robrap · 2023-05-25T15:47:41Z

openedx_events/event_bus/avro/tests/test_avro.py

+            current_event_bytes = current_out.read()
+
+            # get stored schema
+            stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/"


If the schema hasn't been generated, can we detect and give a simple message about how to resolve? Could we point to a doc (or docstring) that explains the process of updating schemas)?

We don't have a full process for updating schemas so I'm hesitant to make much documentation about it. In any case, if there's no file, this would be an issue with how schemas are added as opposed to updated. I guess we can document that somewhere.

Could you even just add a comment above this line that if the schema is missing, they probably need to run the management command to generate a file for a new event?

openedx_events/event_bus/avro/tests/test_avro.py

rgraber · 2023-05-25T18:25:34Z

I wonder if we need a doc that explains the process for updating a schema, even if it is just what we know today? Could be docstring, as long as someone would know where to look, so we might just have a README.rst that points to the right docstring.

I would put that in a separate documentation-only PR. Among other things, there will probably be some debate on where to put it.

* Can you do a couple of manual tests and list what you tested? E.g. making an incompatible change to the signal before and after updating the schemas. Making a compatible change, etc.

Done

* Also, I wonder if the management command should do some of this testing and complain if you are trying to make an incompatible change, unless you provide some "force" argument, in case you are first working on the event. Or I guess you could just delete the schema.

I thought we didn't really want people overwriting existing schemas regardless of whether or not the change is compatible, so we're always testing against the original version of the schema. That's also why we always have an "are you sure" prompt when overwriting.

Thoughts to be discussed.

robrap

This is really great. I'm excited about it. Note that there is a group considering consuming messages using node, and this Avro schemas will come in handy. :)

openedx_events/management/commands/generate_avro_schemas.py

robrap · 2023-05-25T20:49:39Z

I thought we didn't really want people overwriting existing schemas regardless of whether or not the change is compatible, so we're always testing against the original version of the schema. That's also why we always have an "are you sure" prompt when overwriting.

Sounds reasonable. I made other PR comments to see if we could add this context to your code comments, but otherwise consider this thread resolved. Thanks.

robrap

Minor questions.

robrap · 2023-06-07T19:56:16Z

openedx_events/event_bus/avro/tests/test_avro.py

-    "org.openedx.content_authoring.course.certificate_config.deleted.v1",
-]
+
+def generate_test_data_for_schema(schema):  # pragma: no cover


Curious why # pragma: no cover is needed for this and the other method, when it seems like they are used in tests?

There is code that is not called and is only meant as a safety hatch against weird data types in future events. It didn't seem worth it to add test coverage to a method that itself is only used for tests, but not adding the pragma did lower the coverage in a way that would cause warnings on future PRs (since the total coverage would be considered low).

I see. You could add the pragma more specifically on the code that isn't actually covered, but this is non-blocking. It just raised questions for me. Thanks.

I thought it worthwhile to future-proof. Coverage shouldn't really care about this method at all, and if someone changes it they shouldn't have to go through the annoyance of figuring out which bits are and aren't covered again.

robrap · 2023-06-07T19:57:33Z

openedx_events/event_bus/avro/tests/test_avro.py

+            current_event_bytes = current_out.read()
+
+            # get stored schema
+            stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/"


robrap

Thanks!

robrap · 2023-06-08T15:38:20Z

openedx_events/event_bus/avro/tests/test_avro.py

+            try:
+                old_schema = load_schema(schema_filename)
+            except SchemaRepositoryError:  # pragma: no cover
+                self.fail(f"Missing file {schema_filename}. If a new signal has been added, you may need to run the"


Thanks for this. We know this is an expected situation, and it just makes the experience much nicer for adding new events.

rgraber requested a review from a team as a code owner May 24, 2023 19:19

rgraber marked this pull request as draft May 24, 2023 19:24

rgraber marked this pull request as ready for review May 25, 2023 15:38

robrap reviewed May 25, 2023

View reviewed changes

rgraber force-pushed the rsgraber/20230523-unit-test-schema-evolution branch from 1be4630 to 7abc233 Compare May 25, 2023 18:32

robrap reviewed May 25, 2023

View reviewed changes

openedx_events/management/commands/generate_avro_schemas.py Outdated Show resolved Hide resolved

openedx_events/management/commands/generate_avro_schemas.py Outdated Show resolved Hide resolved

Rebecca Graber added 3 commits June 6, 2023 10:58

feat: unit test schema evolution

283918f

fixup!: fix test_backward to write a dict rather than an attrs class

e78bc3f

fixup!: comment

b2301b3

rgraber force-pushed the rsgraber/20230523-unit-test-schema-evolution branch from 5acde7f to b2301b3 Compare June 6, 2023 14:58

Rebecca Graber added 2 commits June 6, 2023 11:01

fixup!: review comments

d099f73

fixup!: latest schemas

09e53f0

rgraber requested a review from robrap June 7, 2023 15:55

robrap reviewed Jun 7, 2023

View reviewed changes

Rebecca Graber added 3 commits June 8, 2023 08:34

fixup!: comment if file not found

8bde11d

fixup!: isort

3da739f

fixup!: pragma

0bfefce

rgraber requested a review from robrap June 8, 2023 15:16

robrap approved these changes Jun 8, 2023

View reviewed changes

rgraber merged commit 3965caa into main Jun 8, 2023

rgraber deleted the rsgraber/20230523-unit-test-schema-evolution branch June 8, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add unit tests around schema evolution #225

feat: add unit tests around schema evolution #225

rgraber commented May 24, 2023 •

edited

Loading

robrap left a comment

robrap May 25, 2023

robrap Jun 7, 2023

rgraber Jun 7, 2023

robrap Jun 7, 2023

rgraber commented May 25, 2023 •

edited

Loading

robrap left a comment

robrap commented May 25, 2023

robrap left a comment

robrap Jun 7, 2023

rgraber Jun 7, 2023

robrap Jun 7, 2023

rgraber Jun 8, 2023

robrap Jun 7, 2023

robrap left a comment

robrap Jun 8, 2023

feat: add unit tests around schema evolution #225

feat: add unit tests around schema evolution #225

Conversation

rgraber commented May 24, 2023 • edited Loading

robrap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgraber commented May 25, 2023 • edited Loading

robrap left a comment

Choose a reason for hiding this comment

robrap commented May 25, 2023

robrap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robrap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgraber commented May 24, 2023 •

edited

Loading

rgraber commented May 25, 2023 •

edited

Loading