-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add unit tests around schema evolution #225
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I wonder if we need a doc that explains the process for updating a schema, even if it is just what we know today? Could be docstring, as long as someone would know where to look, so we might just have a README.rst that points to the right docstring.
- Can you do a couple of manual tests and list what you tested? E.g. making an incompatible change to the signal before and after updating the schemas. Making a compatible change, etc.
- Also, I wonder if the management command should do some of this testing and complain if you are trying to make an incompatible change, unless you provide some "force" argument, in case you are first working on the event. Or I guess you could just delete the schema.
Thoughts to be discussed.
current_event_bytes = current_out.read() | ||
|
||
# get stored schema | ||
stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the schema hasn't been generated, can we detect and give a simple message about how to resolve? Could we point to a doc (or docstring) that explains the process of updating schemas)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a full process for updating schemas so I'm hesitant to make much documentation about it. In any case, if there's no file, this would be an issue with how schemas are added as opposed to updated. I guess we can document that somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you even just add a comment above this line that if the schema is missing, they probably need to run the management command to generate a file for a new event?
I would put that in a separate documentation-only PR. Among other things, there will probably be some debate on where to put it.
Done
I thought we didn't really want people overwriting existing schemas regardless of whether or not the change is compatible, so we're always testing against the original version of the schema. That's also why we always have an "are you sure" prompt when overwriting.
|
1be4630
to
7abc233
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great. I'm excited about it. Note that there is a group considering consuming messages using node, and this Avro schemas will come in handy. :)
Sounds reasonable. I made other PR comments to see if we could add this context to your code comments, but otherwise consider this thread resolved. Thanks. |
5acde7f
to
b2301b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor questions.
"org.openedx.content_authoring.course.certificate_config.deleted.v1", | ||
] | ||
|
||
def generate_test_data_for_schema(schema): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why # pragma: no cover
is needed for this and the other method, when it seems like they are used in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is code that is not called and is only meant as a safety hatch against weird data types in future events. It didn't seem worth it to add test coverage to a method that itself is only used for tests, but not adding the pragma did lower the coverage in a way that would cause warnings on future PRs (since the total coverage would be considered low).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You could add the pragma more specifically on the code that isn't actually covered, but this is non-blocking. It just raised questions for me. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it worthwhile to future-proof. Coverage shouldn't really care about this method at all, and if someone changes it they shouldn't have to go through the annoyance of figuring out which bits are and aren't covered again.
current_event_bytes = current_out.read() | ||
|
||
# get stored schema | ||
stored_schema = load_schema(f"{os.path.dirname(os.path.abspath(__file__))}/schemas/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
try: | ||
old_schema = load_schema(schema_filename) | ||
except SchemaRepositoryError: # pragma: no cover | ||
self.fail(f"Missing file {schema_filename}. If a new signal has been added, you may need to run the" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. We know this is an expected situation, and it just makes the experience much nicer for adding new events.
Description:
Add unit tests to confirm that any schema updates are backward and forward compatible with stored versions. Also adds a management command to generate and store the current Avro schema for each signal.
Manual testing:
Test 1: Add required field "thing" to CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: No default value for thing
Test 2: Remove required field "creation_date" from CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: No default value for creation_date
Test 3: Change type of user_data from UserData to str in CourseEnrollmentData
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_backward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'name': 'id', 'ty...
FAILED openedx_events/event_bus/avro/tests/test_avro.py::TestAvro::test_evolution_is_forward_compatible - fastavro._read_common.SchemaResolutionError: Schema mismatch: string is not {'type': 'record', 'name': 'org.openedx.learning.course.enrollment.created.v1.UserData', 'fields': [{'na...
Test 4: Add optional field thing to CourseEnrollmentData
All tests pass
Test 4: Remove optional field created_by from CourseEnrollmentData
All tests pass
ISSUE: edx/edx-arch-experiments#271
Reviewers:
Merge checklist:
Post merge:
finished.
Author concerns: List any concerns about this PR - inelegant
solutions, hacks, quick-and-dirty implementations, concerns about
migrations, etc.