Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

Rewrite Migration parser using conventional FromJSON #43

Merged
merged 6 commits into from
Jan 11, 2023

Conversation

pbrisbin
Copy link
Contributor

@pbrisbin pbrisbin commented Jan 3, 2023

The code as it was was decoding the Yaml as a type-less Object, then
digging around manually in the key-values to "parse" it further, to the actual
Migration type. This resulted in a brittle situation (relying on read) with
poor error messages ("unrecognized field" was used for a lot of distinct
errors). It also required Aeson internals such that we'd need CPP to build
against 1.x and 2.x.

This series of commits moves to a more idiomatic approach of a record type
with a FromJSON that is decoded directly. The decoding then handles
required-vs-optional, parsing, etc.

I recommend by-commit review, as the final result is somewhat complex, but
this was due to some landmines encountered along the way, and there may
exist a simpler approach depending on external constraints. Personally, I'd
say it's worth keeping the custom UTCTime en/decoder, but I would not
preserve the quirky "Depends must be present but can be Null" behavior.

At this point, we are behavior-neutral except for the following changes in error
message:

### Failure in: 0:Filesystem Parsing:9
test/FilesystemParseTest.hs:116
expected: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_missing_required_fields:Error in \"/Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_missing_required_fields\": missing required field(s): [\"Depends\"]"
 but got: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_missing_required_fields:AesonException \"Error in $: parsing Database.Schema.Migrations.Filesystem.MigrationYaml(MigrationYaml) failed, key \\\"Depends\\\" not found\""

### Failure in: 0:Filesystem Parsing:10
test/FilesystemParseTest.hs:116
expected: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_field_name:Error in \"/Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_field_name\": unrecognized field found"
 but got: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_field_name:AesonException \"Error in $: parsing Database.Schema.Migrations.Filesystem.MigrationYaml(MigrationYaml) failed, unknown fields: [\\\"InvalidField\\\"]\""

### Failure in: 0:Filesystem Parsing:12
test/FilesystemParseTest.hs:116
expected: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_timestamp:Error in \"/Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_timestamp\": unrecognized field found"
 but got: Left "Could not parse migration /Users/patrick/code/pbrisbin/dbmigrations/test/migration_parsing/invalid_timestamp:AesonException \"Error in $.Created: Unable to parse UTCTime\""

Some of this is our intended improvements:

  • Actually separating the different reasons of unrecognized field
  • Displaying more keys/values that are the cause of the failure

But they are a bit noisy without any extra polishing. Therefore, if you'd like to
see the messages tweaked in some way, do let me know. Otherwise, I can just
update the test expectations and we'll be green.

This patch defines a `MigrationYaml` record that represents the shape of
a migration's Yaml file directly, then decodes to it. This obviates the
manual "parsing" from a decoded `Object`. It also produces better error
messages.

The tests don't pass because we have actually been using Show/Read for
the `UTCTime` fields instead of a JSON representation. I'll have to
introduce a new type to account for that without breaking compatibility.
There's no functional need for this, using the JSON representation
would've been readable enough, but there are going to be tons of
migration files in the wild with these simplified values.

We could write encoding at JSON and a decoder that accepts either, but
I'm not sure it's worth it vs just using a completely custom newtype.
It was listed in that `requiredFields` array, but was being defaulted to
the empty string elsewhere. There is a test that confirms its optional,
which makes sense anyway.
It seems the `Depends` _key_ is indeed required, but a `Null` as the
_value_ is acceptable (and defaulted to no dependencies). I don't know
if we need to preserve this too, but the tests fail if we don't.

It was super fun to see one test assert it was optional and fail, only
to have the test that asserts it as required then fail (:
@pbrisbin pbrisbin changed the title pb/from json Rewrite Migration parser using conventional FromJSON Jan 3, 2023
@pbrisbin pbrisbin marked this pull request as ready for review January 3, 2023 19:05
@jtdaugherty
Copy link
Owner

I just remembered that this was still open. Let me know when it's ready for me to review.

@pbrisbin
Copy link
Contributor Author

It is (was always) ready for review 😄

@jtdaugherty
Copy link
Owner

Okay, great! For what it's worth, an explicit declaration that it's ready would be helpful for a couple of reasons. First, you created the pull request before you were done clarifying its purpose and pushing patches, which makes it a bit ambiguous as to whether it's ready. Second, you made changes to the PR description and I got the impression that the purpose of the PR had not settled (so, in other words, don't review yet) and PR description edits don't generate email notifications, so I had no idea that a new message had been edited into the description.

I'll see if I can take a look at the changes in the next few days.

@jtdaugherty
Copy link
Owner

I took a look and the changes look okay to me. As far as the test expectations are concerned, I think the new errors are more descriptive than the old ones but the error message text containing the offending file path means that to check for those errors requires doing a bit of string manipulation in the tests to check only the part of the error that comes after the file path. That works for me, so if you are willing to make that change, too, then I'd welcome it.

Thanks for your work on this!

@jtdaugherty jtdaugherty merged commit 2cfdf43 into jtdaugherty:master Jan 11, 2023
@pbrisbin
Copy link
Contributor Author

Awesome thanks!

an explicit declaration that it's ready would be helpful

I will usually open as Draft, then clicking Ready for Review is my "explicit declaration". I guess I should've also actually formally requested you as reviewer -- I'm so used to my team's auto-assign feature that I didn't think to.

you created the pull request before you were done clarifying its purpose and pushing patches.

Yeah, I can see how that's confusing. I mis-fired my tooling (gh pr create --fill) and the PR title/description didn't auto-fill from the commit message as usual. As I mentioned, I mean to include --draft, but didn't because I thought it would. And the subsequent commits were sort of self-review fixes. I was hoping I did all that quickly enough, perhaps before you even reacted to the notification of the PR being opened, that it would be OK. I didn't account for the fact that the (clearly WIP) title/description is what you'd see in the (only) notification, regardless of how quickly I corrected my mistake.

Going forward, if I'm careful to correctly use --draft and formally request review from you on PRs (when ready) -- would that work? Or do you specifically prefer a "ready for review" comment instead?

@pbrisbin pbrisbin deleted the pb/from-json branch January 11, 2023 12:59
@jtdaugherty
Copy link
Owner

I mean to include --draft

I see, makes sense! I also see now that you did mark this as ready for review above, but not only is it a bit hard hard to see in github's UI but that doesn't generate an email notification. I really wish it did, since that's what I'm depending on when PRs are evolving. That's why I was thinking an explicit comment would be helpful. Perhaps if I used some kind of app, maybe I'd get a notification when the ready state changed?

perhaps before you even reacted to the notification of the PR being opened

I usually react right away and come look to see what the PR is all about, even if I can't get to a deeper review of the code until later.

Going forward, if I'm careful to correctly use --draft and formally request review from you on PRs (when ready) -- would that work?

I'm happy to just try this out and see. However, FWIW, my ideal preference is not to have draft PRs at all, i.e., if discussion is warranted, I'd much prefer to do it in an open ticket that captures the problem statement. That way, we can discuss it and it may turn out that the solution is not a PR. And if it turns out code changes are warranted, then having discussion in advance will really help ensure that the PR contains no big surprises. I find code changes much easier to review if they're the end of a conversation, not the beginning. :)

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Jan 11, 2023

All sounds good, but for just a little bit about how I use Draft PRs: A Draft PR is purely for me. It's just that I'd like to get my code pushed (in the backup sense) and start working on the non-code details. In the case of a single-commit PR, I can certainly just do gh pr create --fill and there is no non-code details, as my commit message should be complete at that point. But in a multi-commit PR, I need the Draft state to craft the overall PR description; and I actually prefer to do it in the GitHub UI (vs gh pr create --edit) for reasons I won't get into.

I'll also often do a final self-review in the GitHub UI, where I find the change in head-space will often surface something I'd not otherwise see. Only when I flip from Draft to non-draft do I expect to involve the upstream / reviewer at all. Until then, the Draft should not be on their radar at all. So I'm definitely not using them as a conversation tool. I'd open an Issue before that, then a PR at the end. Do you get notified of Draft PRs opening, or only when they flip to non-draft? I assumed the latter, and if that's not the case that would certainly invalidate this approach a bit 😬

@pbrisbin
Copy link
Contributor Author

Oh and Draft PRs are usually a good time to see if CI passes and address it, before involving any reviewers.

@jtdaugherty
Copy link
Owner

Do you get notified of Draft PRs opening, or only when they flip to non-draft?

I don't know for sure. If you want, we could test it. Looking at github's material on this topic, I assume that I'd get notified since Draft status does not imply privacy/visibility, just non-readiness for merging.

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Jan 11, 2023

I wouldn't go to any trouble testing. I think the behavior just comes down to your own watching settings. A 1-dev OSS project is likely watching "all activity", so naturally you'll get notified of the draft PR opening. If the contributor explicitly requests you as a reviewer, you'd definitely get another notification then.

In such a setup, the Draft state could still be useful as an indicator of something not yet ready -- but (assuming there is no notification for the draft-to-ready state-change) you would need your contributors to use explicit reviewer requests to then see when it has become ready. This isn't historically how OSS projects tend to work, I think.

The workflow I've internalized comes from projects with a larger team. In such a case, you want to watch "participating and mentioned" only. And you always want to request reviewers (either explicitly or via CODEOWNERS). With such a system in place, the "quiet" draft state is useful in the ways I've started using it, since it notifies nobody.

I think I just need to remember all this and adjust my own behavior accordingly. Thanks for engaging on this -- it's been interesting!

toJSON = toJSON . formatTime defaultTimeLocale utcTimeYamlFormat . unUTCTimeYaml
toEncoding = toEncoding . formatTime defaultTimeLocale utcTimeYamlFormat . unUTCTimeYaml

-- Keeps things as the old Show/Read-based format, e.g "2009-04-15 10:02:06 UTC"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @jtdaugherty I just wanted to mention: I finally updated my own app to this version of this library and I found it failed because my migrations all had fractional seconds, such as

Created: 2009-04-15 10:02:06.23345 UTC

I'm not sure where that came from, or why the parse won't accept it, because I swear I tested things when I wrote this PR.

In any event, I ran the following to truncate them before my migrations would run with the new version:

sed -i 's/\.[0-9]\+ UTC$/ UTC/' db/migrations/**/*  

I'm not sure if we should make an update to make the parser more flexible?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! Sorry to hear about this. It seems to me that a relaxation to the parser ought to be okay. If you want to make that change, let me know - otherwise I'll see to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that a relaxation to the parser ought to be okay

Yeah, agreed. If I'm reading the docs correctly, it seems like using ...%T%Q UTC could do it:

%Q

decimal point and fraction of second, up to 12 second decimals, without trailing zeros. For a whole number of seconds, %Q omits the decimal point unless padding is specified

(Emphasis mine)

I'm not quite sure my availability, so I'll comment again if I get a chance to work on it; otherwise, feel free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do something, because new migrations still get created with fractional seconds, so the problem will keep coming back. @pbrisbin any chance you could create a PR with this fix, as you seem quite familiar with the problem and relevant code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because new migrations still get created with fractional seconds

I didn't realize this. You're saying this version creates fractional seconds files that it can't parse? That's much worse.

you seem quite familiar with the problem and relevant code?

I may have been at one time, but that was a long time ago. Still, it seems like a relatively straightforward thing. The decoder needs to accept with or without fractional seconds. Adding a unit test and using the format-string I mentioned above should do it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants