-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parametrized queries/interpolation #22
Comments
Not sure this can also work for jplry, but in Query I'm using closures for this kind of scenario: data = collect(1:10)
c = 3
q = @from i in data begin
@where i>c
@select i
end
for j in 1:5
c = j
println(collect(q))
end does what you would want it to do, and it does so by creating a closure that captures |
I can see one-off queries that uses interpolations being a sufficiently common use-case, that combines the steps of preparing statements and binding parameters. The third of my proposal(s) is whether we should have qry = @query tbl |>
where(A > $c)
summarize(avg_B = mean(B)) as a stand-in for qry = @query tbl |>
where(A > :c)
summarize(avg_B = mean(B))
bind!(qry, c = c) Responding to the issues that have surfaced: parsing the first argument
My preference is for us to not make the parsing of the first argument ("primal data source"?) too special or different from the parsing of the query arguments. Because conceptually we could view
as
in which case specifying data type for parameterized queries
If we want to provide type information, can we do using
|
@davidanthoff Yeesian and I have experimented with using closures, but we had some difficulty getting type inference to correctly predict the return type of a lambda that produces the captured value in cases where that value is re-assigned, even inside a function. Compare the following: # this is fine
function test1()
c = 1
λ = () -> c
@code_warntype λ()
end
# this is not fine
function test2()
c = 1
λ = () -> c
for _c in [1, 2, 3]
c = _c
@code_warntype λ()
end
end The only way I've thought of to fix this to wrap the lambda in some The question then is, how should users declare the type of |
@davidagold I think these are all instances of JuliaLang/julia#15276, so I'm kind of hoping that at some point that gets fixed and then the closure approach should always work :) I'll add your latest example to that issue as well, if you don't mind. |
@davidanthoff Please do. |
@yeesian I should be clearer. By a "query argument" I just mean an expression (that may consist solely of a variable name) in which unadorned names are assumed to refer to attributes (column names) of a pre-specified data source. So, in The same is true for joins. In the following, @query join(tbl1, tbl2, A == B) the first two arguments of I'm realizing now that the present discussion needs to be part of a larger discussion about how names are resolved inside of a "query context" in an |
@davidagold I know you previously wrote that jplyr would work with any data source, not just things that have a column like structure, but I don't understand how that works with the syntax here. It seems that e.g. the How would one for example query an array of |
Those are good questions. It really comes down to defining a collection machinery that implements the semantics you're looking for. Part of this is understanding what is stored in a julia> using jplyr
julia> A = collect(1:10);
julia> qry = @query filter(A, isodd(a))
Query with Array{Int64,1} source If I look at the julia> dump(qry.graph)
jplyr.FilterNode
input: jplyr.DataNode
input: #undef
args: Array{Union{Expr,Symbol}}((1,))
1: Expr
head: Symbol call
args: Array{Any}((2,))
1: Symbol isodd
2: Symbol a
typ: Any
helpers: Array{jplyr.FilterHelper{F}}((1,))
1: jplyr.FilterHelper{##3#4}
f: #3 (function of type ##3#4)
arg_fields: Array{Symbol}((1,))
1: Symbol a The julia> function jplyr._collect(A::Array{Int}, q::jplyr.FilterNode)
res = Array{Int}(0)
f = q.helpers[1].f
for a in A
f((a,)) && push!(res, a)
end
return res
end
julia> collect(qry)
5-element Array{Int64,1}:
1
3
5
7
9 In these semantics, the name It would probably be useful to add some sort of syntax akin to what you have in Query.jl, e.g.
since without as much there's no way to enforce coherence of placeholder names from verb to verb. But it's not strictly necessary, as evidenced above. Does this help? EDIT: This also perhaps isn't the most on-topic discussion. EDIT2: But it is making me think about some interesting syntax issues/enhancements, so I do appreciate it. |
Thanks for the explanation! |
My pleasure. |
Also, I suppose I was saying things like "assumed to be an attribute" above =p So really, I should generalize the vocabulary that deals with names in query arguments. Rather than saying that names in the context of query arguments are assumed to be attributes, I should say that they are assumed to refer to some aspect of the data source, where this aspect is (generally) an attribute when the data source is a tabular data structure. So, David's comment is really more relevant to the present discussion than I initially acknowledged. |
This issue is a proper home for the discussion that ended up emerging in #20 re: syntax and semantics for parametrized queries (i.e. (?) prepared statements) and interpolation (the original issue concerned the functionality of extending existing
Query
s via@query
).By interpolation we mean the ability for users to designate a name within an
@query
invocation as referring to a value in the scope surrounding the@query
context. For instance, letting$
denote an interpolated name (value?), the following querywould be equivalent to
This sort of functionality is (I think) relatively straightforward. However, it is not the most efficient means of satisfying what I expect will be the dominant use patterns involving this functionality. These use patterns involve collecting a
Query
over an array of values for query parameters, e.g.The above is suboptimal because it generates a new
Query
object, with a newgraph
and newQueryHelper
s for each value ofc
, where the structure of each of the foregoing objects is actually invariant. What would be most efficient would be to generate a singleQuery
with a parameterc
that can be assigned values fromconsts
without having to regenerate theQuery
each time. This requires syntaxes that (i) allow users to designate parameters within their queries and (ii) allow users to bind parameters to specific values beforecollect
ing theQuery
s. I envision something like the following in place of the interpolation solution above:My plan is that parameter information will be stored in some mutable aspect of the
Query
object, say in an object of typeParameters
, and that users will usebind!
will to bind each parameter to a specific value via keyword arguments, where each key specifies the name of the parameter to bind.It will be important, at least for
collect
machinery for in-memory Julia tabular data structure, that pulling a query parameter from aQuery
object and passing it, say, to a filtering kernel (lambda) be type-inferable in order to avoid boxing the result of applying the lambda to the row and parameter arguments.In the normal case of applying such a lambda just to the values of an iterator over
Table
rows, (i.e. without any query parameters), type information about such row values (which are tuples) is propagated through the type of the iterator, which is just the result ofzip
ing together the individual columns that are arguments to the filtering lambda. That is to say, the type parameters of the zipped iterator convey the element types of the relevant columns of theTable
data source, since those element types are themselves represented as type parameters of the columns (which areNullableArray
s). As long as this iterator is passed through a function barrier before being iterated over, type inference can identify the type of the argument (a tuple) passed to the filtering kernel.In the case of a parametrized query, an obvious implementation of this functionality involves passing the parameters from the appropriate field of the
Query
object (after they've been bound) to, in this case, the filtering kernel as a second argument tuple. However, if the mutable structure in which the parameters are stored does not convey the type information of the parameters, then this information may not propagate to the point at which the kernel is actually applied to the row-tuple and the parameter-tuple, which may result in the compiler boxing the result of applying the kernel. Of course, we may assume that a filtering kernel always returns aNullable{Bool}
, but in the case of, say, a select kernel we can't make any such assumptions. I think (but could be wrong about this) that the only sure-fire way to avoid such boxing is to convey the type information of the parameters in the type of theParameters
object that wraps them.Thus
Parameters
should be a parametric type, and in order for us to generate useful type parameters, we need to know the types of the parameters before they are ever bound in theQuery
. This suggests a syntax for declaring query parameters and their types. A natural candidate is type assertion:This syntax choice would preclude use of the
::
type assertion syntax in query arguments. I would be okay paying this price.It may turn out, however, that we can find a way to communicate the type information of parameters to the compiler without storing it in the
Parameters
wrapper, in which case$
syntax for designating parameters would be sufficient.Note that query parametrization is a solely a matter of designating parameters within "query arguments", that is, non-data source arguments to manipulation verbs within an
@query
invocation. The present package achieves an analogue of parametrization for data sources by means of dummy sources. As mentioned in comments in #20, this syntactic distinction reflects a conceptual distinction: one collects the same query against multiple backends by means of dummy sources, and one collects the a query with constant structure but varying values against a fixed backend by using parametrization.Thoughts?
The text was updated successfully, but these errors were encountered: