-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcasting dictionaries with unmatched keys gives confusing result #19
Comments
Ahh - yes the bounds checks aren't done yet, partly because I've been fighting with myself about the semantics of My current thinking #13 is roughly that The results in #13 shouldn't be spurious anymore. But the more flexible broadcasting is not implemented yet. |
Yeah, I've been thinking about this too. I think the tension seems to be But, going back to the original topic here, isn't requiring Also, the semantics of broadcasting is that it not only maps the elements but also it stretches the shape of the container to fit with the larger container. So, maybe outer join makes sense? Furthermore, if we use f.(d1, d2) # => AbstractDictionary{I,Union{R,Nothing}} outer join with `nothing`
something.(f.(d1, d2)) # => AbstractDictionary{I,R} inner join
something.(f.(d1, d2), missing) # => AbstractDictionary{I,Union{R,Missing}} outer join with `missing` where Here, It'd be useful for flexible default value handling: something.(f.(something.(d1, default1), d2)) This would fill the value of |
Actually, maybe it doesn't make sense for |
I think join is rather different and should be modelled by something else. These are also really only primary-key - primary-key joins that you're discussing, too. Doing various things with two dictionaries with the It seems the people in
I think we could have dotcall do really useful things, but I'm thinking of the kinds of things people do with |
Probably a
So you have keys only on the left ( PS - I don't really want to call it EDIT: PPS that |
I think
I believe there is some room for improvement and
This sounds very interesting! If you have this in mind already it makes total sense to avoid attaching additional semantics to dotcalls on dictionaries. |
BTW, since you mentioned nested dictionary, did you have a look at awkward-array https://github.com/scikit-hep/awkward-array? It's a Python library for nested arrays. I think it's exploring an interesting direction for nested data structure. I only know it from this Strange Loop talk https://www.youtube.com/watch?v=2NxWpU7NArk though. I meant to play with it and see how it can be relevant to APIs in Julia but I haven't had a chance to do this yet. Well, I'm mentioning it here anyway since there are some chance that it interest you or you already know it and have some thoughts on it. |
Yes I did! I just recently watched the same talk - very interesting. The python-esque way of having matrices = nested vectors is quite interesting for ragged arrays... together with "dot-getindex" then I can imagine using something like |
If your goal is to implement relational algebra, there are a few things to keep in mind.
SQL has copped a lot of flack for distorting relational algebra with confusing stuff with NULL and the results of many operations, which are necessary because it doesn't allow nested containers. Take for example C#
So instead of a null sentinel you get an empty set of matches. Additionally, the functional programming crowd also noted there is an interesting analogue between left foreign key joins and optional types and outer foreign key joins and variant types, which you see being exploited in the wild in SQL databases all the time (can't find my original reference for learning this, sorry). If you like internet rants which lie somewhere between amusing and dead serious, I also suggest googling something called "The Third Manifesto and an interesting implementation in python called "Dee". I particularly like the way they encourage thinking in terms of nested tables, for example in grouping. (Sorry - clearly I've been thinking about this stuff too much...) |
I appreciate your comments!
Yeah, I agree that Numpy using C contiguous array and already abstracting multi-dimensional arrays as nested vectors give a big advantage to awkward-array.
There was a bit of discussion in JuliaLang/julia#35681 for problems with current lowering of
I was also thinking it might be possible to extend this to outer join on primary key (since dictionary keys are unique). I thought it's possible to do this without going to something.(tuple.(d1, something.(d2, Some(nothing)))) # left join
something.(tuple.(something.(d1, Some(nothing)), d2)) # right join I guess I need to try implementing this to see if I can make a coherent system out of this idea, though. I'm hoping that I can try this after JuliaLang/julia#36085 with arbitrary indexable objects including
Yeah, I was only thinking join on primary keys (dictionary keys) when I talk about broadcasting. For generic join API we need more flexible entry point.
Why not have some generic entry point that treats everything as just iterable (at least semantically) and use, e.g.,
OK I used SQL a lot for inspiration mostly because that's what I understand (although somewhat barely) and the internet is full of information about it. But maybe I should've avoided that 😆 I actually started implementing sorting-based lazy inner/left/outer join framework that works on bare Julia data (kind of in the split of SplitApplyCombine.jl). The idea was to make it work with Transducers.jl and SplittablesBase.jl (so that it's parallelizable). It works well with inner join but for outer joins it wasn't clear what the best API was. I can make a "flat" iterator using NULL but it may not be so useful if you want to consume it on-the-fly (this is true also when using foreign key). Until now it didn't click why SplitApplyCombine.jl has (Rather note to self, but, I think it'd be better to split the work between inner and outer reducing functions and re-launch inner reducing function for each group. This is what I do for |
Maybe it should do what
mergewith
does? However, each kind of join also makes sense. I think slightly more manual API for specifying the join would be nice JuliaLang/julia#25904 (comment)The text was updated successfully, but these errors were encountered: