-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To pipe or not to pipe? #16
Comments
Maybe this is a silly question but without some kind of piping operator, how would a text editor know when to stop indenting a query expression? For that matter, how would the interpreter know when the query expression is over? |
Actually, that's not a silly question at all. That's an excellent observation. So, it'd be instead qry = @query tbl begin
filter(name == "Niamh")
select(age)
end Hmm. Well, now I don't know. Still not bad. |
8 character fixed cost vs a 2 character variable cost. how many verbs are in the typical query? the begin/end syntax matches whats in Lazy.jl and the @byrow! macro in DataFramesMeta.jl |
I think I'd prefer the |
I think the big question is whether we ever intend to support functions that have more than one argument that needs to come from the previous step in the computation. If so, we might need something more complicated than line breaks. If not, I agree with @nalimilan that minimizing typing is nice. |
@johnmyleswhite But in that case we'd also need something more complicated than pipes, too, right? Can you give an example of such a situation? |
I don't have one offhand, but I imagine we'll need to be careful with things like: SELECT
x
FROM
table1
WHERE
y IN (SELECT y FROM table2 WHERE y > 0) |
As a sad old-fashioned unix type, I am extremely fond of pipes, and I was delighted when The question (for me, and maybe not for others) is whether we want to introduce new syntax into the language in general. Is > max(1, 2)
[1] 2
> runif(1)
[1] 0.1256291 becomes > library(magrittr)
> 1 %>% max(2)
[1] 2
> 1 %>% runif
[1] 0.1256291 I think this would be great, and I would be wholeheartedly behind it, and it would enhance the language. If not, then having this weird pipe operator that only worked in the context of Just my 2¢. |
Yes. julia> 5 |> x -> x + 1
6 |
We might also want to consider reserving block statements for the analog of common-table-expressions (CTEs), e.g. result = @query source(s) begin
x = source(s) |> ... # x is an alias for a subquery/CTE
y = x/source(s) |> ... # y is an alias for a subquery/CTE
x/y/source(s) |> ... # the last expression is the result
end |
^ In which case we want both the pipe operator and block expressions. EDIT: John's example could possibly be expressed as qry = @query begin
subq = table2 |>
filter(y > 0) |>
select(y)
table1 |>
filter(y in subq) |>
select(x)
end The pipes are kind of noisy. EDIT^2: Actually, one could analyze the data flow through qry = @query begin
subq = table2
filter(y > 0)
select(y)
table1
filter(y in subq)
select(x)
end based purely on the contents of the block expressions alone. An expression consisting solely of a symbol (e.g. |
a counter-argument to my proposal will be the possibility of having the assignments/etc done outside of the macro, rather than inside the macro, i.e. subq = @query table2 |>
filter(y > 0) |>
select(y)
qry = @query table1 |>
filter(y in subq) |>
select(x) or correspondingly, subq = @query table2 begin
filter(y > 0)
select(y)
end
qry = @query table1 begin
filter(y in subq)
select(x)
end |
@davidagold Good point. I had never noticed that julia> max(5, 6)
6
julia> 5 |> max(6)
ERROR: MethodError: objects of type Int64 are not callable
in |>(::Int64, ::Int64) at ./operators.jl:345 and not thought through what the error message was. Given that is true, and looking at your followup comment, I would definitely advocate @davidagold's EDIT to EDIT^2, because the latter only makes sense in the context of the whitespace to me. However, the problem with this version of pipes (not an R-like macro and without unix-like |
I understand your concern, but I do rather like the look of the EDIT^2 version.
No, you're right -- it's not ideal.
@yeesian True, though I think this may be less flexible. It's not clear to me how you'd be able to generate the single SQL statement from John's example if you make two separate macro invocations. Part of the difficulty is that there's no type information for |
Which matches the intuition that there might be things we want to express in a block statement which is otherwise hard/impossible to express with the piping proposal. It remains unclear exactly what that is though.
Yeah, that's true. What about the case when there's interpolation syntax for it? |
This is how these latest examples look in Query: q = @from i in table1 begin
@where i.y in @from j in table2 begin
@where j.y>0
@select j.y
end
@select i.x
end Or you can also split this into two queries: subq = @from i in table2 begin
@where i.y>0
@select i.y
end
q = @from i in table1 begin
@where i.y in subq
@select i.x
end I get around the type issue for the split case by having a two-pass system: the macros only syntactically transform the queries into a series of function calls, those function calls generate an object graph that has type information in it, and the end result is that the object graphs that gets created by the first and second example are identical (modulo one unimportant small difference). Not sure whether something similar could be done here, but maybe that would also work for jplyr. |
This issue does not concern the "one-off" macros (
@select
, etc.), which continue to be defunct. Rather, it concerns syntax within the@query
macro. Currently, one conveys the intention to pipe the result of one query command to the next with the use of the pipe operator:But strictly speaking this is unnecessary. The pipe operator is seen by the macro, which could just as easily see the separation of
Expr
s within a:block
Expr
. Indeed, I can see three (EDIT: four) reasons to remove the use of pipes within the@query
macro:@query
but which are never actually run.Minimize keystrokes for the user.(see @tcovert 's comment below)|>
is uncertain anyway, so perhaps it is best not to rely on it to convey any one particular thing.The one good reason I can see for keeping
|>
is that it makes the intention of piping data explicit. But this could be served just as well by the newline once that is established as a convention. So, would people prefer?? I'm leaning that way myself.
EDIT: This could also remove the need to have
@query
and@qcollect
. We could make it that one-line@query
invocations collect automatically, whereas multiline invocations return a graph. That is,would automatically collect, whereas
would return a graph.
EDIT EDIT: I suppose the above suggestion could be carried out with
|>
as well. I just happened to think of it while writing this issue.The text was updated successfully, but these errors were encountered: