To pipe or not to pipe? #16

davidagold · 2016-08-10T23:07:18Z

This issue does not concern the "one-off" macros (@select, etc.), which continue to be defunct. Rather, it concerns syntax within the @query macro. Currently, one conveys the intention to pipe the result of one query command to the next with the use of the pipe operator:

qry = @query tbl |>
    filter(name == "Niamh") |>
    select(age)

But strictly speaking this is unnecessary. The pipe operator is seen by the macro, which could just as easily see the separation of Exprs within a :block Expr. Indeed, I can see three (EDIT: four) reasons to remove the use of pipes within the @query macro:

Minimize the number of function calls that the user expresses within @query but which are never actually run.
~~Minimize keystrokes for the user.~~ (see @tcovert 's comment below)
The future of |> is uncertain anyway, so perhaps it is best not to rely on it to convey any one particular thing.
Less clutter

The one good reason I can see for keeping |> is that it makes the intention of piping data explicit. But this could be served just as well by the newline once that is established as a convention. So, would people prefer

qry = @query tbl
    filter(name == "Niamh")
    select(age)

?? I'm leaning that way myself.

EDIT: This could also remove the need to have @query and @qcollect. We could make it that one-line @query invocations collect automatically, whereas multiline invocations return a graph. That is,

@query filter(tbl, name == "Niamh")

would automatically collect, whereas

@query tbl
    filter(name == "Niamh")

would return a graph.

EDIT EDIT: I suppose the above suggestion could be carried out with |> as well. I just happened to think of it while writing this issue.

The text was updated successfully, but these errors were encountered:

tcovert · 2016-08-11T03:08:20Z

Maybe this is a silly question but without some kind of piping operator, how would a text editor know when to stop indenting a query expression? For that matter, how would the interpreter know when the query expression is over?

davidagold · 2016-08-11T03:17:01Z

Actually, that's not a silly question at all. That's an excellent observation. So, it'd be instead

qry = @query tbl begin
    filter(name == "Niamh")
    select(age)
end

Hmm. Well, now I don't know. Still not bad.

tcovert · 2016-08-11T03:20:34Z

8 character fixed cost vs a 2 character variable cost. how many verbs are in the typical query?

the begin/end syntax matches whats in Lazy.jl and the @byrow! macro in DataFramesMeta.jl

nalimilan · 2016-08-31T12:41:16Z

I think I'd prefer the begin... end version, which is more standard in Julia than repeating |> at the end of each line. Also, if you count the number of syntax markers instead of the number of characters (which is another interesting metric of cognitive load), begin... end is a fixed cost of 2 words, while |> is a variable cost of at least 2 "words".

johnmyleswhite · 2016-09-01T19:55:31Z

I think the big question is whether we ever intend to support functions that have more than one argument that needs to come from the previous step in the computation. If so, we might need something more complicated than line breaks. If not, I agree with @nalimilan that minimizing typing is nice.

davidagold · 2016-09-01T20:04:41Z

@johnmyleswhite But in that case we'd also need something more complicated than pipes, too, right? Can you give an example of such a situation?

johnmyleswhite · 2016-09-01T20:51:33Z

I don't have one offhand, but I imagine we'll need to be careful with things like:

SELECT
    x
FROM
    table1
WHERE
    y IN (SELECT y FROM table2 WHERE y > 0)

richardreeve · 2016-09-01T21:02:44Z

As a sad old-fashioned unix type, I am extremely fond of pipes, and I was delighted when magrittr introduced them into R. I would be equally delighted if they were introduced into julia. However, they are simple, unix tools - "do one job, do it well" - and do not obviously improve the situation when there are multiple inputs (although I still use them then!). As such @davidagold is right that it doesn't help in @johnmyleswhite's example. I also think that minimising typing is a distraction.

The question (for me, and maybe not for others) is whether we want to introduce new syntax into the language in general. Is |> an infix operator that means "take the LHS and insert it as the first argument of the RHS of this operator"? That is what magrittr does in R:

> max(1, 2)
[1] 2
> runif(1)
[1] 0.1256291

becomes

> library(magrittr)
> 1 %>% max(2)
[1] 2
> 1 %>% runif
[1] 0.1256291

I think this would be great, and I would be wholeheartedly behind it, and it would enhance the language. If not, then having this weird pipe operator that only worked in the context of @query calls would be a terrible mistake, and even as a pipe enthusiast I would discourage it...

Just my 2¢.

davidagold · 2016-09-01T21:14:23Z

Is |> an infix operator that means "take the LHS and insert it as the first argument of the RHS of this operator"?

Yes.

julia> 5 |> x -> x + 1
6

yeesian · 2016-09-01T21:17:20Z

@johnmyleswhite But in that case we'd also need something more complicated than pipes, too, right? Can you give an example of such a situation?

We might also want to consider reserving block statements for the analog of common-table-expressions (CTEs), e.g.

result = @query source(s) begin
    x = source(s) |> ... # x is an alias for a subquery/CTE
    y = x/source(s) |> ... # y is an alias for a subquery/CTE
    x/y/source(s) |> ... # the last expression is the result
end

davidagold · 2016-09-01T21:21:31Z

^ In which case we want both the pipe operator and block expressions.

EDIT: John's example could possibly be expressed as

qry = @query begin
    subq = table2 |>
        filter(y > 0) |>
        select(y)
    table1 |>
        filter(y in subq) |>
        select(x)
end

The pipes are kind of noisy.

EDIT^2: Actually, one could analyze the data flow through

qry = @query begin
    subq = table2
        filter(y > 0)
        select(y)
    table1
        filter(y in subq)
        select(x)
end

based purely on the contents of the block expressions alone. An expression consisting solely of a symbol (e.g. table1) or of an assignment (e.g. subq = table2) -- call these data source expressions -- could signal the start of data flow through subsequent manipulation verbs until the next data source expression or the end of the block is reached.

yeesian · 2016-09-01T21:56:32Z

a counter-argument to my proposal will be the possibility of having the assignments/etc done outside of the macro, rather than inside the macro, i.e.

subq = @query table2 |>
        filter(y > 0) |>
        select(y)
qry = @query table1 |>
        filter(y in subq) |>
        select(x)

or correspondingly,

subq = @query table2 begin
        filter(y > 0)
        select(y)
end
qry = @query table1 begin
        filter(y in subq)
        select(x)
end

richardreeve · 2016-09-01T22:19:40Z

@davidagold Good point. I had never noticed that |> was in core julia, and now I'm feeling a bit stupid. I had only tried it as:

julia> max(5, 6)
6

julia> 5 |> max(6)
ERROR: MethodError: objects of type Int64 are not callable
 in |>(::Int64, ::Int64) at ./operators.jl:345

and not thought through what the error message was. Given that is true, and looking at your followup comment, I would definitely advocate @davidagold's EDIT to EDIT^2, because the latter only makes sense in the context of the whitespace to me.

However, the problem with this version of pipes (not an R-like macro and without unix-like -X option syntax) is that presumably it becomes very hard to write functions that use pipe inputs because a separate method has to be written for every non-pipe method that lacks the first argument and returns a function instead of a result. Or am I misunderstanding this?

davidagold · 2016-09-01T23:48:52Z

I would definitely advocate @davidagold's EDIT to EDIT^2, because the latter only makes sense in the context of the whitespace to me.

I understand your concern, but I do rather like the look of the EDIT^2 version.

Or am I misunderstanding this?

No, you're right -- it's not ideal.

a counter-argument to my proposal will be the possibility of having the assignments/etc done outside of the macro, rather than inside the macro,

@yeesian True, though I think this may be less flexible. It's not clear to me how you'd be able to generate the single SQL statement from John's example if you make two separate macro invocations. Part of the difficulty is that there's no type information for subq in the second macro (indeed, without interpolation syntax it would be interpreted as a column name). On the other hand, if subq = table2 ... is seen by the same @query call that sees table1 ..., then it can register the LHS of subq = table2 as an alias for a subquery and act on each subsequent instance of the symbol subq accordingly. This may make a difference if the result of collecting on subq as you've defined it above doesn't fit in memory.

yeesian · 2016-09-02T01:20:36Z

True, though I think this may be less flexible.

Which matches the intuition that there might be things we want to express in a block statement which is otherwise hard/impossible to express with the piping proposal. It remains unclear exactly what that is though.

Part of the difficulty is that there's no type information for subq in the second macro (indeed, without interpolation syntax it would be interpreted as a column name).

Yeah, that's true. What about the case when there's interpolation syntax for it?

davidanthoff · 2016-09-02T04:45:00Z

This is how these latest examples look in Query:

q = @from i in table1 begin
    @where i.y in @from j in table2 begin
                      @where j.y>0
                      @select j.y
                  end
    @select i.x
end

Or you can also split this into two queries:

subq = @from i in table2 begin
    @where i.y>0
    @select i.y
end

q = @from i in table1 begin
    @where i.y in subq
    @select i.x
end

I get around the type issue for the split case by having a two-pass system: the macros only syntactically transform the queries into a series of function calls, those function calls generate an object graph that has type information in it, and the end result is that the object graphs that gets created by the first and second example are identical (modulo one unimportant small difference). Not sure whether something similar could be done here, but maybe that would also work for jplyr.

This was referenced Sep 4, 2016

Roadmap to 0.1.0 #19

Open

Query syntax issue #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To pipe or not to pipe? #16

To pipe or not to pipe? #16

davidagold commented Aug 10, 2016 •

edited

Loading

tcovert commented Aug 11, 2016

davidagold commented Aug 11, 2016

tcovert commented Aug 11, 2016

nalimilan commented Aug 31, 2016

johnmyleswhite commented Sep 1, 2016

davidagold commented Sep 1, 2016

johnmyleswhite commented Sep 1, 2016

richardreeve commented Sep 1, 2016

davidagold commented Sep 1, 2016

yeesian commented Sep 1, 2016 •

edited

Loading

davidagold commented Sep 1, 2016 •

edited by yeesian

Loading

yeesian commented Sep 1, 2016 •

edited

Loading

richardreeve commented Sep 1, 2016

davidagold commented Sep 1, 2016

yeesian commented Sep 2, 2016 •

edited

Loading

davidanthoff commented Sep 2, 2016

To pipe or not to pipe? #16

To pipe or not to pipe? #16

Comments

davidagold commented Aug 10, 2016 • edited Loading

tcovert commented Aug 11, 2016

davidagold commented Aug 11, 2016

tcovert commented Aug 11, 2016

nalimilan commented Aug 31, 2016

johnmyleswhite commented Sep 1, 2016

davidagold commented Sep 1, 2016

johnmyleswhite commented Sep 1, 2016

richardreeve commented Sep 1, 2016

davidagold commented Sep 1, 2016

yeesian commented Sep 1, 2016 • edited Loading

davidagold commented Sep 1, 2016 • edited by yeesian Loading

yeesian commented Sep 1, 2016 • edited Loading

richardreeve commented Sep 1, 2016

davidagold commented Sep 1, 2016

yeesian commented Sep 2, 2016 • edited Loading

davidanthoff commented Sep 2, 2016

davidagold commented Aug 10, 2016 •

edited

Loading

yeesian commented Sep 1, 2016 •

edited

Loading

davidagold commented Sep 1, 2016 •

edited by yeesian

Loading

yeesian commented Sep 1, 2016 •

edited

Loading

yeesian commented Sep 2, 2016 •

edited

Loading