Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Tiling #636

Merged
merged 20 commits into from
Oct 25, 2018
Merged

WIP: Tiling #636

merged 20 commits into from
Oct 25, 2018

Conversation

brendancol
Copy link
Collaborator

@brendancol brendancol commented Aug 16, 2018

This PR provides tiling for datashader.

Tiling current assumes that data is provided in meters.

The interface for tile

@jbednar
Copy link
Member

jbednar commented Aug 16, 2018

Can you add a notebook showing how to use it?

@philippjfr
Copy link
Member

This looks great but I'd also love to see a notebook so I can play around with it.

The super-tiling approach seems very sensible and should reduce a major bottleneck that I encountered in my implementation. In the absence of complex spatial indexing I've added another optimization to my implementation which uses datashader 'any' aggregation, aligned with the tile grid to find tiles which can safely be skipped because they are empty, i.e. I generate an aggregate where each pixel corresponds to one tile and indicates whether that tile is empty or not. For higher zoom levels this has resulted in major speedups especially on sparse data like the census dataset.

@jbednar
Copy link
Member

jbednar commented Aug 16, 2018

Just for comparison, here's Philipp's independent version in a notebook: https://anaconda.org/philippjfr/tile_renderer

@philippjfr
Copy link
Member

I've updated the notebook linked above and here's an example of the census tiles generated using it (http://assets.holoviews.org/tiles/census.html). I'm about half-way through rendering zoom-level 12.

@jbednar
Copy link
Member

jbednar commented Aug 20, 2018

@philippjfr, looks like there are normalization issues between tiles in your version?

image

@philippjfr
Copy link
Member

@philippjfr, looks like there are normalization issues between tiles in your version?

Yes, that seems unavoidable unless you render everything as one gigantic tile (which is impossible) or you allow passing extraneous information into the normalization methods (e.g. eq_hist).

@jbednar
Copy link
Member

jbednar commented Aug 20, 2018

It's definitely avoidable; we just have to decide how we want to handle avoiding it.

@philippjfr
Copy link
Member

That's why I mentioned the two alternatives, which both seem incredibly expensive at high zoom levels. At minimum you'd have to perform two stage rendering, saving out the raw image data in the first pass, then compute the eq_hist across tiles and then shade the raw image tiles and save out the pngs.

@brendancol
Copy link
Collaborator Author

brendancol commented Aug 20, 2018

@philippjfr @jbednar normalization and fixed color mapping is an important feature for datashader which sounds like was partially implemented with the span argument on .shade. Overall though this is tangential to tile creation provided that tile creation can inject the correct span when rendering tiles. I think we should have a quick call about this which could be also part of a general code review of the tiling strategy here.

@philippjfr
Copy link
Member

normalization and fixed color mapping is an important feature for datashader which sounds like was partially implemented with the span argument on .shade

Right, at least for the linear and log mapping span should be sufficient but eq_hist is a lot more complex I imagine.

Overall although this is tangential to tile creation provided that tile creation can inject the correct span which rendering tiles.

Agreed.

I think we should have a quick call about this which could be also part of a general code review of the tiling strategy here.

I'm happy to hop on a call now or later in the week.

@brendancol
Copy link
Collaborator Author

@philippjfr @jbednar hey got time for a general code review today? I was thinking I could walk you guys through the code and help strategize further dev.

@philippjfr
Copy link
Member

@brendancol: @jbednar put a meeting on my calendar, did you get it?

@brendancol
Copy link
Collaborator Author

brendancol commented Aug 21, 2018

@jbednar regarding swapping out boto3 for s3fs to prevent an additional dependency, it looks like boto3 is already a dependency of s3fs.

Probably good at this point to keep the S3TileRenderer as-is with boto3.

@brendancol
Copy link
Collaborator Author

@philippjfr the single-pixel data lookup was an interesting optimization? how do you think we can add it into the render_tiles function?

@jbednar
Copy link
Member

jbednar commented Aug 21, 2018

Ok, using boto3 is fine then, as it's got to be working for s3fs to work. Good eye! You can see the single-pixel-per-tile optimization in Philipp's code here: https://anaconda.org/philippjfr/tile_renderer/notebook

A minimum zoom level for the tile layer. This is the most zoomed-out level.

max_zoom : int
A maximum zoom level for the tile layer. This is the most zoomed-in level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_zoom documented twice.

@philippjfr
Copy link
Member

@philippjfr the single-pixel data lookup was an interesting optimization? how do you think we can add it into the render_tiles function?

You will get a smaller saving because of the supertiling but at high zoom levels and for sparse data it will still provide major speedups. The other issue is that for very high zoom levels even the single pixel overviews will become larger than memory will allow. Nonetheless I think we can add the optimization at an early stage, basically filtering the list of supertiles that are to be generated before they are dispatched. I think that could be added in a separate PR though.

@jbednar
Copy link
Member

jbednar commented Aug 22, 2018

The other issue is that for very high zoom levels even the single pixel overviews will become larger than memory will allow.

Seems like that won't be a big issue with supertiles? Maybe my math is wrong, though. If a regular tile is 256x256 and a supertile is 4096x4096, there are 256 regular tiles per supertile. At zoom level 20, there are 4**20 tiles and thus 4**20/256=4294967296 supertiles. Seems like that would take 4GB if you use any() with int8s?

Of course, you could also do it as a search tree. E.g. once you have a supertile, you could regrid it to a 4x4 grid with .any() to determine whether you generate each of the next 4x4 supertiles subdividing this one.

@philippjfr
Copy link
Member

At zoom level 20, there are 420 tiles and thus 420/256=4294967296 supertiles. Seems like that would take 4GB if you use any() with int8s?

That's true, in my implementation I can run it up to zoom level 16, which is equivalent to 4294967296 tiles. If the maximum zoom level we allow is 20 then that's fine but I noticed the default max_zoom is set to 31.

@jbednar
Copy link
Member

jbednar commented Aug 22, 2018

If the maximum zoom level we allow is 20 then that's fine

I don't think we'd enforce a maximum zoom level; e.g. if someone has local data, it could make complete sense to do zoom level 31 for some portion of the earth. It's only when one has global data that the total number of tiles matters, and for global data it's less likely that people will need to zoom that far, because that that point the zoomed-in data will be such a tiny fraction of the total worldwide dataset that it probably doesn't matter much. Anyway, we should just set it up in some reasonable way, and if people end up pushing up against some limit, we can worry about it at that point.

@philippjfr
Copy link
Member

Sounds reasonable, I'm happy to add the logic to filter the supertiles to dispatch based on that approach in a subsequent PR.

… / shader_func); updated tests; small pep8 fixes
@jbednar
Copy link
Member

jbednar commented Aug 22, 2018

@brendancol , those changes look good. Let me know if you want it reviewed and merged, or if you are still working on the to-do items above.

@brendancol
Copy link
Collaborator Author

@jbednar I didn't implement the esri / ogc metadata files. But detailed what needs to happen in these issues for later:

#639
#640

@jbednar
Copy link
Member

jbednar commented Aug 28, 2018

Is it otherwise ready for review, then?

@brendancol
Copy link
Collaborator Author

brendancol commented Aug 29, 2018

@jbednar very close, I spun my wheels a bit on the nbsmoke tests before realizing most notebooks are explicitly skipped in the tox.ini config.

There is some python2 fails which I'm resolving today and then this will be ready for review / merge

@brendancol
Copy link
Collaborator Author

@jbednar @philippjfr I'm having trouble figuring out what this issue is here on the py27 appveyor build fail: https://ci.appveyor.com/project/bokeh-integrations/datashader/build/1.0.781/job/2klbyjsapckvwdlj

If yall could a take a quick peek, I would be appreciative.

@brendancol
Copy link
Collaborator Author

brendancol commented Aug 29, 2018

Looks like the errors are related to a missing pytest-benchmark dependency on the CI server. Not sure if this is a known thing, but looks like the errors are unrelated to the tiling code:

Example ERROR:

_________ ERROR at setup of test_bundle[random_layout-connect_edges] __________
file C:\projects\datashader\datashader\tests\benchmarks\test_bundling.py, line 33
  @pytest.mark.parametrize('bundle', [directly_connect_edges, hammer_bundle])
  @pytest.mark.parametrize('layout', [random_layout, circular_layout, forceatlas2_layout])
  @pytest.mark.benchmark(group="bundling")
  def test_bundle(benchmark, nodes, edges, layout, bundle):
** E       fixture 'benchmark' not found **
>       available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, edges, monkeypatch, nodes, pytestconfig, record_property, record_xml_attribute, record_xml_property, recwarn, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.
C:\projects\datashader\datashader\tests\benchmarks\test_bundling.py:33

@philippjfr
Copy link
Member

I don't know offhand, maybe @ceball would know.

@jbednar
Copy link
Member

jbednar commented Aug 31, 2018

It sounds like there is a missing test dependency. @ceball, can you address that?

@ceball
Copy link
Member

ceball commented Sep 11, 2018

I have not attempted to investigate this problem yet, and I can't access the build logs above. However, I guess this is the same py27+windows-specific problem that's now been fixed on master. Please could you update this PR to the latest master to see what happens now? Thanks

@jbednar
Copy link
Member

jbednar commented Oct 22, 2018

Is this ready for review and merge?

@brendancol
Copy link
Collaborator Author

brendancol commented Oct 23, 2018

@jbednar @philippjfr This PR is close, but there are a couple of things that need verification:

  • I'm not convinced that the span argument of shade is working at all. At first I thought it was, but now I am not sure. We all know it doesnt work for categoricals, but I thought it was supposed to work for numeric aggregates. Without span, supertiles won't be rendered seamlessly.
  • I looked at tiling the generate_terrain perlin noise function and wasn't sure orientation of supertiles was correct. I spent a couple hours last night trying to figure out was was going on, but wasn't sure. It could be though that I'm just being fooled by the span argument not functioning properly.

Things work, and the goal of this was only a "first-pass" of a tiling, so it may be viable to merge, but I think we need more testing, and probably some more eyes too.

@Raphyyy
Copy link

Raphyyy commented Oct 24, 2018

Hello there
I work on an air quality map (professional project) and I would like to serve this through tile rendering.
I have a data warehouse of geo time series and I am interested with high resolution rendering (zoom 20).
I would be pleased to test datashader to do this.
Do you think I could already setup a demo by giving it a try at this point or should I come back later ?

image
I didn't saw any update on this, is this problem solved ?

Thank you guys for your work though, datashader looks amazing

@jbednar
Copy link
Member

jbednar commented Oct 24, 2018

@brendancol, can you test out what is proposed in #368 and see if it fixes span?

@Raphyyy, apart from span (for which you can also test out the fix if you like), the PR should be ready to use. I'd recommend trying it on a low zoom level and be sure of your parameters before doing zoom level 20, as that will take a lot of computation.

@jbednar jbednar merged commit 7682ef5 into holoviz:master Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants