-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
absent-vs-null caltrops #191
Comments
I really do not appreciate that github thinks that a |
can we not attempt to see if there is a |
That's basically option 2, except |
I think lean towards preferring "say no" (with an error message that tries to provide helpful suggests that "did you mean to call Representation first, and encode that?") vs quietly calling a method, if we pursue option 2. The quietly-call-the-method route would result in wider changes of behavior from smaller/less-direct changes in arguments if one did pass additional config later, and my current thought is that this would be more magical looking and more confusing than just nudging people to make sure not to forget the |
i think this will remain a caltrop. i feel like the default thing when i encode a node is that i want the representation encoded as i expect. the debug functionality is something i should opt into. |
I fully agree - that's what I've argued before and in #190 :) the Go APIs should do what most users expect by default. |
I think maybe we'll want to look at how the code would turn out before deciding on this detail. My concern with making all codecs do.... anything, really... is mostly in terms of the odds of defect rates in third party codecs; and how odd the results will be for any defects that are likely. We don't have one good single point to wrap all codecs. Resolving to consider it a bug (and definitely not a feature) when a codec coerces an If we resolved to make "check for schema.TypedNode or ipld.ADL and then transform your arguments before proceeding" be a feature of every codec, we'll have to copy and paste that (or make a mixin function; details) into the start of every codec function. And count on third party codecs to do the same. The odds of defect here seen slightly higher, because the "correct" thing would now require additional action (whereas handling tokens, in the above story, is already key to the job). And then it requires even more action to add back in the ability to override this, so that it's possible to encode the type level representation if the user knows what they're asking for. Then we get to the even bigger concern: if a third party codec doesn't do this... how odd are the results? I'm afraid very odd: the user will be able to "patch" their code to do the right thing by calling I think we might get considerably better consistency by the chiding approach, because of a third party codec doesn't implement that, it doesn't produce odd results wherein the path of least resistence which a user will take to compensate for the issue then creates other problems. |
I've revisited part of my last comment...
What I was worrying about is that if a third party codec didn't do this, an end user would add a So there's a lot less of a problem there than I had been imagining previously. |
There's a lot going on in this issue, and some of it pretty heady -- but, I believe the state of play here has at least partially, tangentially, become improved: #232 will be introducing new helper functions for encoding and decoding which do indeed automatically shift gears into using representation nodes and nodebuilders whenever typed nodes are seen in play. While that addresses none of the deeper theoretical soundness issues here, it does have the potential to drastically decrease the frequency of painful end-user encounters with this subtlety, and it does it immediately and universally for users of the new API, rather than requiring all codecs to adopt it individually. The linked PR is for newly added APIs. I have not drafted a similar change to the existing methods on LinkSystem yet, but that would also seem like a reasonable step to take in the near future. |
We hit this again in the indexer wire format. The times one is supposed to call |
We have some errata about how
Absent
vsNull
are handled. This is a distinction that is sometimes useful, but also very subtle.(
Absent
is somewhat similar to javascript'sundefined
. I hope we pull it off better than javascript did, frankly -- but javascript's history with this concept is certainly a valid cautionary tale about the degree of subtlety, too.)At least one piece of code that touches on these corner cases resulted in pain today: a codec coerced an
Absent
to aNull
and serialized data that way. The resulting serial data could then not be parsed back: because the schema declared "optional" rather than "nullable" for the affected fields, a null token was (correctly!) rejected in that position as data that was not matching the schema.It would be better to not be able to end up in this position (or, at minimum, make it much harder to end up in this position accidentally).
I have three sets of ideas, in increasing levels of invasiveness, which could be relevant:
The first change is the most direct.
ipld.Absent
node to be encoded as anull
token.Absent
. Erroring is much better than silent coersion of values.We can make a norm of having codec implementations do this check, and flunk that data.
(Unfortunately, we can only make it a strongly recommended feature for codec implementations to do this check. I can't think of a way to make the type system help us enforce this rule for any third party codecs. But maybe that's fine.)
(It's also still concerningly easy for a codec to forget to check this, because absent nodes report their Kind as
ipld.Kind_Null
, and you have to checkIsAbsent
or== ipld.Absent
to disambiguate. But again, maybe that's fine (for now), and we just try to make a norm of it anyway.)This will detect this particular situation of accidental coersion of absent values into nulls. It should stop people from reaching a situation where they have serial data which they believe is correct but was actually serialized from a level they didn't intend. That's the most important goal. (However, also remember that this only kicks in if there's
Absent
values in the data -- it's not a general detector for accidentally serializing type-level data when one meant to serialize representation-level data... so, depending on where you're putting your goalposts, this may be seen as either a full solution to the specific problem, or a "partial" solution to a bigger problem.)There's also one downside, which is the ability to use a codec as a "debug" view is lost if the data contains any
Absent
values. However, that's a relatively minor downside. (And honestly, that was a probably "buggy" "feature" anyway -- such as being ambiguous for a field which is optional and nullable.)The second possible change area is a little more complicated.
The goal would be to increase clarity for encode functions, preventing accidental misapplication of them on type-level data (when instead the encode function should've been applied on representation-level data, which is... the vast majority of the time):
schema.TypedNode
interface on theipld.Node
value, and just say "no" in that case.options
parameter of some kind to encoders which lets you opt into this.schema.TypedNode
.Like above, in the first change concept, I can't actually think of a good way to use the type system to enforce that all codecs do this.
It's intended that the typed view of data (and ADLs, similarly) provide at least all the methods of
ipld.Node
.The consequence is that there's no interface that says "this is 'just' a node": we're stuck detecting the fancier nodes and denylisting them, rather than acceptlisting the things that are known to be sensible.
But maybe this is fine, and we should make norms out of this regardless.
I feel a bit weird about aiming in this direction, because it's kind of intentionally crippling a couple of that are composable, just because of the encounter rate with some footguns that composability enables.
However, the footguns are pretty big, so it's still worth considering.
A possible even more aggressive further pursuit, focused on the null-vs-absent situation:
IsAbsent
fromipld.Node
entirely.IsAbsent
toschema.TypedNode
, instead.ipld.Kind_Invalid
?ipld.Kind_Absent
. Clearer. (Leavesipld.Kind_Invalid
only as an extremely clear indicator that You Goofed somewhere and have uninitialized or seriously bonkers data.)IsAbsent
predicate where it is, if that turns out easier for some reason.This would decrease the odds of someone accidentally writing a codec in the future which emits null tokens when encountering absent values.
(It would fix the current problem of how concerningly easy it is to make this coersion by accident because absent nodes report their Kind as
ipld.Kind_Null
, and you have to checkIsAbsent
or== ipld.Absent
to disambiguate.)Moving the
IsAbsent
predicate toschema.TypedNode
would also make it clearer in general that "absent" should not occur in the regular data model, which could be a bonus win.As you can by see how the bullet points started bifrucating into a choice forest there, I'm less sure about this. There are a couple different options here. It's not clear which is best. They seem to vary in terms of how error-prone they are in golang vs how well they match the data model spec, which is pretty disconcerting.
Overall I think change
#1
is the most reasonable. It fixes the biggest and most painful of the footguns, and is clear enough to execute on.Maybe with
#1
implemented, then any need for#2
will actually fade. The other consequences of accidentally encoding something from the type-level view instead of representation-level view aren't very bad, as long as they don't hit this coersion stuff. (Usually, you can rehydrate that data again: just use the type-level prototypes when loading it, and you're matched up.) (The absent->null coersion is especially nasty just because it's (potentially) lossy, and because it doesn't roundtrip at all anytime there's any optional fields with absent values in play.)At any rate, all these are now documented for consideration.
The text was updated successfully, but these errors were encountered: