Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros that take only symbols for arguments #2

Open
davidagold opened this issue Jul 20, 2016 · 9 comments
Open

Macros that take only symbols for arguments #2

davidagold opened this issue Jul 20, 2016 · 9 comments

Comments

@davidagold
Copy link
Owner

davidagold commented Jul 20, 2016

Now that one can dispatch on macro arguments, it is easy to write two distinct @filter "methods" -- one that expects a data source as a first argument,

macro filter(input::Symbol, conds::Expr...)
    ...
end

and one that expects only filtering expressions

macro filter(conds::Expr...)
    ...
end

However, with verbs like select, the situation is more complicated, because the arguments to the macro are all Symbols. That is, macro dispatch cannot tell the difference between being passed a data source and columns and just being passed columns: @select(iris, SepalLength, Species) and @select(SepalLength, Species) see the same argument signature. The upshot is that we cannot distinguish between the case in which a data source, say iris is piped to @select(SepalLength, Species) and the case in which it is provided as an argument. So it's not clear to me how to provide support for both cases. Some options:

  1. Try to do analysis of the first symbol passed to @select(args...), e.g. check isdefined(args[1]) && isa(eval(args[1]), DataFrame). This is fairly inflexible - in particular, it will be incorrect if the data source is not defined in global scope.
  2. Require data sources passed to @select to be somehow wrapped in order to turn them into Expr objects as registered at macroexpand-time: e.g. @select(Data.iris, SepalLength, Species).
  3. Defer creation of the data manipulation DAG to run-time and have the DAG generating function (_select) dispatch on the type of the first argument. If the type of the first argument is not a data source, return a curried function. EDIT: The main difficulty here is that the first argument may or may not be a valid name.
  4. Only support either @select(iris, SepalLength, Species) or iris |> @select(SepalLength, Species), but not both.

1 seems like a non-starter. 3 may be tricky, because (as explained in #1), for @filter we create the DAG at macrocompile time. While @select doesn't require this, it might make DAG creation from chaining within an @query call difficult. I will defer choosing between 2 and 4 until I am convinced that 3 cannot be made to work.

davidagold added a commit that referenced this issue Jul 21, 2016
@davidagold
Copy link
Owner Author

Option 5: Use a try-catch block to run assuming the first argument is a datasource. This strategy is described in more detail in #3. I have adopted it for the time being, but I'm leaving this issue open because I don't think the last word has been said.

@bramtayl
Copy link

bramtayl commented Aug 3, 2016

Maybe only support @select(iris, SepalLength, Species) and use a chaining macro (there are several existing ones to choose from, Lazy.jl, Functional.jl, ChainMap.jl, Pipe.jl) to take care of the rest? Note that some of these, like ChainMap.jl, support the creation of anonymous functions.

@davidagold
Copy link
Owner Author

The try-catch solution seems to be working well for now. A chaining macro would really just give the functionality already provided in @query. I want users to be able to write iris [pipe] @select(Species), and a macro, unless it's infix, doesn't help there because it can't see iris.

@davidagold
Copy link
Owner Author

Actually, I just realized that the present solution still runs into trouble if somebody does something like

a = [1,2,3]
tbl = Table(a = a)
tbl |> @select(a)

It's still possible to handle this case with the try-catch solution, but it remains to be seen how many other such cases I'll have to throw in...

@davidagold
Copy link
Owner Author

Hmm, I'm now much less convinced that the try-catch solution is appropriate going forward. The following

tbl = Table(a = [1,2,3])
@select(ttbl, a)

won't throw an error complaining about there being no such object ttbl, but rather it'll return an anonymous prepared to select the columns named ttbl and a from a piped input. Seems like this is a great way to introduce sneaky bugs.

I'm starting to despair at the possibility of providing piped versions of commands whose arguments apart from the data source can be symbols. I'm definitely not a fan of providing some pipable commands and some not pipable ones. Perhaps @bramtayl is right in that only non-pipable one-off commands should be offered, and people can rely on Lazy.jl to chain them. Of course, if people don't mind calling collect on a query object, they can just use @query for piping.

@johnmyleswhite
Copy link
Collaborator

I would suggest the opposite: require that data be explicitly fed into a computational graph. See the strategy in TensorFlow for an example: https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#feeding

@davidagold
Copy link
Owner Author

So, simply have no "one-off" macros at all?

@johnmyleswhite
Copy link
Collaborator

Possibly.

@davidagold
Copy link
Owner Author

I'm not opposed to that. It would bring us closer to a "one way to do things" set up, which I like. Though it might be nice to have a variant of @query that automatically collects the resultant graph. @querycollect or something.

@davidagold davidagold mentioned this issue Aug 8, 2016
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants