Replies: 3 comments 2 replies
-
🤔 doing so requires that we answer a related question: how do you compare key components of BUT this runs counter to 30+ years of SQL semantics which consider I can certainly see the rationale here -- we used to have this behavior -- but it does get into some weird territory with SQL. At minimum, we'd probably want to update materializations to prohibit collections with nullable key components, forcing the user to create a derivation which picks some other equivalent modeling. An argument in favor of this change is that the user at least has a pathway to get a SQL materialization working again (by creating a derivation and recasting the offending key components), whereas they don't have a way to even capture in this use case without getting pretty fancy. |
Beta Was this translation helpful? Give feedback.
-
Another related issue is that some of their tables explicitly say there is no key. We'd talked in the past about allowing the use of |
Beta Was this translation helpful? Give feedback.
-
Following up belatedly after discussing this on VC: We're agreed that allowing nullable key components seems worthwhile, and that the semantics of nulls should match that of most programming languages ( collections:
acmeCo/with-null-key:
schema:
type: object
properties:
key: {type: [string, 'null']}
value: {type: string}
required: [key, value]
key: [/key] An open follow-up question is whether we should treat collections:
acmeCo/with-undefined-key:
schema:
type: object
properties:
key: {type: string}
value: {type: string}
# key is not in `required` fields
key: [/key] Given the above spec, they key of a document that doesn't have a There's one objection that I'd like to address up front. Allowing key properties to be missing could turn typos into a massive foot-gun, since you could accidentally type The benefit is that it allows us to handle union data types by using a key with the union of keys from each type. For example, given the schema: type: object
oneOf:
- properties:
purchase_id: {type: string}
required: [purchase_id]
- properties:
refund_id: {type: string}
required: [refund_id]
properties:
amount: {type: number} You could use |
Beta Was this translation helpful? Give feedback.
-
Working with GTFS data has made me once again re-consider
null
values in primary keys. The GTFS format is basically just a zip archive with a bunch of CSVs that seem directly exported out of a database schema. Their docs include the primary keys for each file, and many of them are compound keys. Several of the tables have key components that are nullable. The transfers table is one example, and the nullable values seem to be intentionally a part of their modeling. The agencies table has a regular (not compound) primary key which is nullable, and their rationale is that many datasets will include only a single agency, and thusnull
is a reasonable id for it. As a user, I feel like I should be able to directly use the keys specified in their docs without needing to do any elaborate workarounds. So maybe we should re-consider our stance on nullable keys?Another good example is the
alerts
feed. Each alert relates to either a route, a stop, or to both. At least one ofrouteId
orstopId
must be non-null, but Flow would not allow using either field as part of a compound key. Flow would allow using a generic non-nullableentityId
andentityType
pair, whereentityType
is one ofROUTE
orSTOP
, so there is a potential workaround for this case. But IMO it still feels like a workaround, and I think user expectations would generally be that they can take whatever schema they have and model it directly within Flow. This point is of course open to debate, but that's how I'm seeing it at the moment.From where I sit right now, I think it makes sense to remove the restriction that keys be non-nullable. AFAIK this restriction was only ever in place because it seemed like a potential foot-gun. But having seen a real-world use case, I now think it's better to allow users to model the data as they wish.
Beta Was this translation helpful? Give feedback.
All reactions