Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Tables.jl interface and ag-grid #6

Merged
merged 12 commits into from
Feb 19, 2019
Merged

Use Tables.jl interface and ag-grid #6

merged 12 commits into from
Feb 19, 2019

Conversation

pfitzseb
Copy link
Member

@pfitzseb pfitzseb commented Oct 19, 2018

This works fine for small(ish) tables already (< 100k rows). For everything bigger than that we should use the Infinite View Model instead, but that requires more communication between Julia and the browser. We'd also need to implement sorting and filtering on the Julia side in that case (or just disable it).
image

@davidanthoff
Copy link
Contributor

Could we make sure that TableTraits.jl sources keep working? The old version supports any TableTraits.jl source, so it would be great to not remove that functionality.

@pfitzseb
Copy link
Member Author

Yeah, definitely. My main goals for this PR are to

  • support all table interfaces
  • be able to display arbitrarily large tables easily

Editing and Julia-side filtering/sorting can come after that.

@piever
Copy link
Contributor

piever commented Oct 21, 2018

For inspiration for filtering you can look at TableWidgets (see for example here for the demo at juliacon https://youtu.be/wFzkNSoKo0U?t=640), in particular the 3 selectors (discrete, categorical and arbitrary).

I think it'd be nice to do this using the Widgets interface (see here), which means that showtable would return a Widget object like:

Widget{:table}([:input => t], output = o, scope = w, layout = scope)

where the :input is the table the user is passing, the scope is the WebIO scope you are creating and output is the result of filtering and sorting. In this way the user can "save" the result of manipulations on the table. They could even feed the widget as observable input to another widget to create a pipeline.

@pfitzseb
Copy link
Member Author

image

@pfitzseb
Copy link
Member Author

Works remarkably well with "big data":
image

Copy link
Contributor

@davidanthoff davidanthoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I've added some comments, there are a whole bunch of queries this won't handle as is.

Also, I've tried this from the REPL, but nothing happened, same for Jupyter Lab. Things did show up in Jupyter notebook.

src/TableView.jl Outdated
function showtable(x; dark = false)
if Tables.istable(typeof(x))
return _showtable(x, dark)
elseif TableTraits.isiterabletable(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to handle the case where TableTraits.isiterabletable returns missing, like the DataFrame constructor here.

Those cases are a bit tricky to generate right now [*] but they will become much more common with hopefully the next release of Query.jl. These are essentially cases where Query.jl returns a lazy iterator but wasn't able to use type inference to determine the eltype of the iterator, and so a source needs to be able to handle that case. The general strategy in those cases is to look at the type of the first iterated element to figure out the schema. For iterable tables, there is a rule that they must iterate only elements of the same type, so looking at the first element should be enough to handle all iterable table sources.

[*] The shortest example right now is TableView.showtable(rand()>0.00000000001 ? i : true for i in df). I think there is a plan to handle to subsume generators under Tables.istable (that I disagree with), at which point this particular example would be moot. But once I've made the change in Query.jl that I mentioned, this will show up independently of this example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I saw that case in the DataFrame constructor but couldn't figure out what it meant.

Isn't isiterabletable a bit of a misnomer it can return missing for something that most definitely is an iterable table, albeit with undetermined eltype?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in general there are cases where isiterabletable just can't figure out whether something is a table or not. A generator that returns a stream of Ints is not a table, but it doesn't have an element type, so you can't tell until you start to iterate. So, of course a source either is or is not a table, but we don't know that when we call isiterabletable, so missing seemed appropriate?

What is missing is the information whether something is a table or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I guess -- I definitely haven't put as much thought into it as you :)

src/TableView.jl Outdated
function showna(xs::Columns)
rows(map(showna, columns(xs)))
function _showtable(table, dark)
length(Tables.rows(table)) > 10_000 ? _showtable_async(table, dark) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check IteratorSize before calling length. Any Query.jl query that includes a @filter will have IteratorSize(q)==SizeUnknown(), and so a sink needs to be able to handle sources with unknown length.

src/TableView.jl Outdated
headers = colnames(subt)
cols = [merge(Dict(:data=>n), get(colopts, n, Dict())) for n in headers]
function _showtable_sync(table, dark)
schema = Tables.schema(table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be some code path that handles situation where no schema is available. I think this function returns nothing in that case, but I'm not 100% sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does. I guess we should just not print a header in that case? Or is there any other mechanism to handle schemaless tables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they are not really schema less, it is just that you don't know the schema until you start iterating. So the strategy would be to iterate the first element, look at the type of that and that should tell you what the column names and types are.

src/TableView.jl Outdated
ks = collect(ks)
vs = collect(vs)
function _showtable_async(table, dark)
schema = Tables.schema(table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, needs to handle schemaless sources.

@davidanthoff
Copy link
Contributor

Just one other general pointer: I've started adding support for the application/vnd.dataresource+json MIME type to Queryverse. nteract has a native table viewer built in for that MIME type that is pretty nice. I think the Jupyter Lab folks are also considering adding native support, although it is unclear to me how active that effort is.

Currently the only package that exports this support on my end is VegaDatasets.jl. If you open a notebook in nteract and then run:

using VegaDatasets

dataset("cars")

you can see how that looks.

@pfitzseb
Copy link
Member Author

Thanks for the comments, @davidanthoff! I think I've addressed most of those.

Also, I've tried this from the REPL, but nothing happened, same for Jupyter Lab. Things did show up in Jupyter notebook.

That's a WebIO problem, I think, but I hope that'll be solved soon.

@shashi
Copy link
Contributor

shashi commented Oct 23, 2018

Awesome stuff. cc @andreasnoack @joshday could you check your tables?

@piever
Copy link
Contributor

piever commented Oct 24, 2018

Really nice work!

I like the performance a lot for normal datasets, but I'm having issues to get it to work efficiently for a Observable of tables. Meaning something like:

using RDatasets, Blink, TableView, Observables
school = RDatasets.dataset("mlmRev","Hsb82")
w = Window()
body!(w, map(TableView.showtable, Observable(school)))

Gives me an unresponsive window for quite some time (whereas using it directly on school is pretty fast). I wonder where the performance issue comes from and whether it's easy enough to add support for Observables in TableView.showtable directly (so that TableView.showtable(Observable(school)) would work already).

EDIT: actually checking out Simon's branch JuliaGizmos/WebIO.jl#208 in WebIO seems to mitigate the performance issue. Still, if adding support for Observables is straightforward, it'd definitely be nice to have.


end # module
String(take!(io))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unfortunate thing is that I can't figure out how to transfer the data with WebIO without an additional JSON.parse on the javascript side. Any ideas on how to circumvent that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a type in JSON.jl called JSONString which was introduced for this purpose

deps/build.log Outdated
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 84 100 84 0 0 497 0 --:--:-- --:--:-- --:--:-- 497
100 91448 100 91448 0 0 493k 0 --:--:-- --:--:-- --:--:-- 493k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit pick but maybe this should be omitted and in .gitignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/TableView.jl Outdated
empty!(ag_grid_imports)
for f in ["ag-grid.js", "ag-grid.css", "ag-grid-light.css", "ag-grid-dark.css"]
path = normpath(joinpath(@__DIR__, "..", "deps", "ag-grid", f))
AssetRegistry.register(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you omit this line, I think we can drop the explicit dependency on AssetRegistry. WebIO will automatically make this translation if it sees a file path. (I don't know if that's a 100% good idea, but it seemed like it before; let me know if you have any opinions about this!).

Also, if you just register joinpath(@__DIR__, "..", "deps", "ag-grid") then you can serve any file under that path (this is useful if your files import other files under this directory) this does require AssetRegistry dependency though.

However, I'm fine with merging this as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've removed the dependency on AssetRegistry now. It's probably safer to have it, but dunno.

names = schema.names
types = schema.types
end
w = Scope(imports = ag_grid_imports)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shashi Is there a way to directly style the div introduced by the Scope? This results in something like

<div class="webio-mountpoint interactbulma" data-webio-mountpoint="13462147745702835718">
  <div class="webio-scope" data-webio-scope-id="scope-4acc84b9-6adc-4a65-8534-e785737a7c2d">
    <div id="grid" class="ag-theme-balham" style="height: 100%; min-height: 200px; width: 100%;">

right now, but I'd also like to give the Scope (and the mountpoint, I guess) a style to allow piping through the parent's width and height.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have that right now, but the way to do it would be allow s(style=Dict(:foo => bar)) to work where s is a Scope (just like it works for Node)...

@shashi
Copy link
Contributor

shashi commented Nov 6, 2018

Nice! Can we merge this? (you can do it when you're ready)

@pfitzseb
Copy link
Member Author

pfitzseb commented Nov 7, 2018

I can't figure out how to properly style this such that it looks good in various frontends (i.e. without having to actually specify a width or height that's not relative to the parent container).

@pfitzseb pfitzseb merged commit 3179e08 into master Feb 19, 2019
@pfitzseb pfitzseb deleted the sp/tables branch February 19, 2019 16:25
@pfitzseb
Copy link
Member Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants