General first impressions after reading the tutorial #2034
Replies: 2 comments 7 replies
-
Hey hey! Thank you for having a careful look at vaex and the tutorials / docs. Much appreciated. Let me try to reply to all the points in the order in which you've raised them. A general comment: we are a small team, and we know there are many low hanging fruits to fix, such as inconsistency in the docs, or things that can be improved there.. so we are hoping maybe someone in the community will want to contribute (easy way to get into vaex perhaps). Othewise we are happy to do it, but at expense of features and a matter of priorities.. Anyway:
Thanks this was very informative. Please feel free to offer suggestions / PRs / general feedback to things. Especially for easy fixes like bad/old docstrings and stuff like that. We are meant to update/rework the tutorial for a while now.. but yet to get to it. Thanks! @maartenbreddels please check in case I got something wrong.. |
Beta Was this translation helpful? Give feedback.
-
Hi Juan, thansk for the feedback, this is appreciated. Jovan answer most, let me add a bit to that
Yes, this must be one of the most legacy things in Vaex, me and @JovanVeljanoski joke about this regularly. I think it's time to remove it.
"" only works for count, that mean accepts it is actually a bug. It's there because previously Vaex was a GUI tool (build on qt), that needed a string input, it was easier to support "" as an argument, vs an optional argument. But, like in SQL, it only works with count. Regarding the crash. That is really annoying, and has to do with using memory mapping for writing to a file. We cannot have an exception handler handle that (it leads to a SIGBUS signal). For that reason alone I've been working on refactoring that part to use regular file IO. Regards, Maarten |
Beta Was this translation helpful? Give feedback.
-
Hi! Juan Luis from Orchest here 👋🏽 I'm writing a blog post on Vaex, and I'm collecting some notes about what I find. Here is a recollection of some intriguing things or errors I found while following the 11 minute intro:
df.x
anddf.col.x
? I force myself to typedf["x"]
for consistency, butdf.col.{colname}
looks like a nice way of namespacing columns as properties. Unsure if that's the original intent though.Expressions
are lazy because it's so fast 🔥 thathelmi-dezeeuw-2000-FeH-v2-10percent.hdf5
becomes just too small! (Not a criticism, quite the opposite)df.evaluate
is mentioned right afterdf.select
and used further down in the visualization section, but the docstring says "Note that this is not how vaex should be used, since it means a copy of the data needs to fit in memory". Wondering if including it in the quickstart should be reconsidered?df.evaluate(..., selection=True)
, but when trying to understand what possible values doesselection
receive, the docstring is not very helpful::param selection: selection to apply
- the visualization section gives a hint withselection=[None, df.x < df.y, df.x < -10]
, but it's still unclear how it works.df("x")
and got avaex.legacy.SubspaceLocal
object. If it'slegacy
, maybe this could emit aDeprecationWarning
or similar? I then trieddf("x").mean()
and gotTypeError: __init__() got multiple values for argument 'info'
, so I stopped pursuing the matter.df.count(binby="x")
, and it seems to work with arbitrary expressions too (df.count(binby="sin(x)")
❓)df.mean("*", binby="x")
and gotTypeError: mean() missing 1 required positional argument: 'expression'
, which is not very informative.nyctaxi
was a bit rough though, I triednyctaxi["trip_distance"].mean()
and after some time the kernel just died. Missing the kind of progress feedback one gets with Dask, although I acknowledge that this might be out of scope.About some of the questions, I'll probably figure out as soon as I dive deeper in the documentation, but I thought it would be useful to gather these here since they were a bit surprising or unexpected. Hope it helps!
Beta Was this translation helpful? Give feedback.
All reactions