-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Tiling #636
WIP: Tiling #636
Conversation
Can you add a notebook showing how to use it? |
This looks great but I'd also love to see a notebook so I can play around with it. The super-tiling approach seems very sensible and should reduce a major bottleneck that I encountered in my implementation. In the absence of complex spatial indexing I've added another optimization to my implementation which uses datashader 'any' aggregation, aligned with the tile grid to find tiles which can safely be skipped because they are empty, i.e. I generate an aggregate where each pixel corresponds to one tile and indicates whether that tile is empty or not. For higher zoom levels this has resulted in major speedups especially on sparse data like the census dataset. |
Just for comparison, here's Philipp's independent version in a notebook: https://anaconda.org/philippjfr/tile_renderer |
I've updated the notebook linked above and here's an example of the census tiles generated using it (http://assets.holoviews.org/tiles/census.html). I'm about half-way through rendering zoom-level 12. |
@philippjfr, looks like there are normalization issues between tiles in your version? |
Yes, that seems unavoidable unless you render everything as one gigantic tile (which is impossible) or you allow passing extraneous information into the normalization methods (e.g. eq_hist). |
It's definitely avoidable; we just have to decide how we want to handle avoiding it. |
That's why I mentioned the two alternatives, which both seem incredibly expensive at high zoom levels. At minimum you'd have to perform two stage rendering, saving out the raw image data in the first pass, then compute the eq_hist across tiles and then shade the raw image tiles and save out the pngs. |
@philippjfr @jbednar normalization and fixed color mapping is an important feature for datashader which sounds like was partially implemented with the |
Right, at least for the linear and log mapping
Agreed.
I'm happy to hop on a call now or later in the week. |
@philippjfr @jbednar hey got time for a general code review today? I was thinking I could walk you guys through the code and help strategize further dev. |
@brendancol: @jbednar put a meeting on my calendar, did you get it? |
@jbednar regarding swapping out Probably good at this point to keep the |
@philippjfr the |
Ok, using boto3 is fine then, as it's got to be working for s3fs to work. Good eye! You can see the single-pixel-per-tile optimization in Philipp's code here: https://anaconda.org/philippjfr/tile_renderer/notebook |
A minimum zoom level for the tile layer. This is the most zoomed-out level. | ||
|
||
max_zoom : int | ||
A maximum zoom level for the tile layer. This is the most zoomed-in level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_zoom
documented twice.
You will get a smaller saving because of the supertiling but at high zoom levels and for sparse data it will still provide major speedups. The other issue is that for very high zoom levels even the single pixel overviews will become larger than memory will allow. Nonetheless I think we can add the optimization at an early stage, basically filtering the list of supertiles that are to be generated before they are dispatched. I think that could be added in a separate PR though. |
Seems like that won't be a big issue with supertiles? Maybe my math is wrong, though. If a regular tile is 256x256 and a supertile is 4096x4096, there are 256 regular tiles per supertile. At zoom level 20, there are Of course, you could also do it as a search tree. E.g. once you have a supertile, you could regrid it to a 4x4 grid with .any() to determine whether you generate each of the next 4x4 supertiles subdividing this one. |
That's true, in my implementation I can run it up to zoom level 16, which is equivalent to 4294967296 tiles. If the maximum zoom level we allow is 20 then that's fine but I noticed the default max_zoom is set to 31. |
I don't think we'd enforce a maximum zoom level; e.g. if someone has local data, it could make complete sense to do zoom level 31 for some portion of the earth. It's only when one has global data that the total number of tiles matters, and for global data it's less likely that people will need to zoom that far, because that that point the zoomed-in data will be such a tiny fraction of the total worldwide dataset that it probably doesn't matter much. Anyway, we should just set it up in some reasonable way, and if people end up pushing up against some limit, we can worry about it at that point. |
Sounds reasonable, I'm happy to add the logic to filter the supertiles to dispatch based on that approach in a subsequent PR. |
… / shader_func); updated tests; small pep8 fixes
@brendancol , those changes look good. Let me know if you want it reviewed and merged, or if you are still working on the to-do items above. |
Is it otherwise ready for review, then? |
@jbednar very close, I spun my wheels a bit on the nbsmoke tests before realizing most notebooks are explicitly skipped in the There is some python2 fails which I'm resolving today and then this will be ready for review / merge |
@jbednar @philippjfr I'm having trouble figuring out what this issue is here on the py27 appveyor build fail: https://ci.appveyor.com/project/bokeh-integrations/datashader/build/1.0.781/job/2klbyjsapckvwdlj If yall could a take a quick peek, I would be appreciative. |
Looks like the errors are related to a missing Example ERROR: _________ ERROR at setup of test_bundle[random_layout-connect_edges] __________
file C:\projects\datashader\datashader\tests\benchmarks\test_bundling.py, line 33
@pytest.mark.parametrize('bundle', [directly_connect_edges, hammer_bundle])
@pytest.mark.parametrize('layout', [random_layout, circular_layout, forceatlas2_layout])
@pytest.mark.benchmark(group="bundling")
def test_bundle(benchmark, nodes, edges, layout, bundle):
** E fixture 'benchmark' not found **
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, edges, monkeypatch, nodes, pytestconfig, record_property, record_xml_attribute, record_xml_property, recwarn, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
C:\projects\datashader\datashader\tests\benchmarks\test_bundling.py:33 |
I don't know offhand, maybe @ceball would know. |
It sounds like there is a missing test dependency. @ceball, can you address that? |
I have not attempted to investigate this problem yet, and I can't access the build logs above. However, I guess this is the same py27+windows-specific problem that's now been fixed on master. Please could you update this PR to the latest master to see what happens now? Thanks |
Is this ready for review and merge? |
@jbednar @philippjfr This PR is close, but there are a couple of things that need verification:
Things work, and the goal of this was only a "first-pass" of a tiling, so it may be viable to merge, but I think we need more testing, and probably some more eyes too. |
@brendancol, can you test out what is proposed in #368 and see if it fixes span? @Raphyyy, apart from span (for which you can also test out the fix if you like), the PR should be ready to use. I'd recommend trying it on a low zoom level and be sure of your parameters before doing zoom level 20, as that will take a lot of computation. |
This PR provides tiling for datashader.
Tiling current assumes that data is provided in meters.
The interface for tile