Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix raster buffer indexing #550

Merged
merged 5 commits into from
Sep 10, 2024
Merged

Fix raster buffer indexing #550

merged 5 commits into from
Sep 10, 2024

Conversation

CyclotomicDev
Copy link
Contributor

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Changed to the correct indexing when converting from a row-major 2D array to vector index.
Previous test only really tested if std::ops::Index works, and not indexing the buffer; in the example the index of b[(3,5)] = 3 * 5 + 5 would be the same as b[(4,0)] = 4 * 5 + 0 . New test ensures all buffer indices covered.

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

I think Buffer is supposed to be row-major, so the computation is correct. How did you come across this?

@CyclotomicDev
Copy link
Contributor Author

To try to understand the format, I tried to read a .tiff file with size (5100, 2400), however, I could only directly read in the range (0,0) to (2399, 2399), when trying to read (2400, 0), I got a "index out of bounds error".

@CyclotomicDev
Copy link
Contributor Author

I think Buffer is supposed to be row-major, so the computation is correct. How did you come across this?

If you mean the previous computation, that was x * N_x + y rather than x + N_x * y if convention with this Wikipedia article: https://en.wikipedia.org/wiki/Row-_and_column-major_order

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

That's because band sizes are (width, height), but that doesn't mean the data is structured the same way. Compare to GDAL's RasterIO:

The buffer into which the data should be read, or from which it should be written. This buffer must contain at least nBufXSize * nBufYSize words of type eBufType. It is organized in left to right, top to bottom pixel order. Spacing is controlled by the nPixelSpace, and nLineSpace parameters.

@CyclotomicDev
Copy link
Contributor Author

That's because band sizes are (width, height), but that doesn't mean the data is structured the same way. Compare to GDAL's RasterIO:

The buffer into which the data should be read, or from which it should be written. This buffer must contain at least nBufXSize * nBufYSize words of type eBufType. It is organized in left to right, top to bottom pixel order. Spacing is controlled by the nPixelSpace, and nLineSpace parameters.

As far as I understand, this doesn't contradict what I am saying.
The problem is that, with the current formula, different indices will return the same point in the buffer, if width < height, and some indices will be out of bound if width > height.
I don't understand why coord.0 * self.shape.0 + coord.1 is used as it is not the formula for dealing with row-major (or column-major) index representation, and the test ,index, would fail if some of the numbers were changed.

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

I don't understand why coord.0 * self.shape.0 + coord.1 is used as it is not the formula for dealing with row-major (or column-major) index representation, and the test ,index, would fail if some of the numbers were changed.

Because coord is (y, x) and shape is (width, height), as documented on fn shape().

@CyclotomicDev
Copy link
Contributor Author

I don't understand why coord.0 * self.shape.0 + coord.1 is used as it is not the formula for dealing with row-major (or column-major) index representation, and the test ,index, would fail if some of the numbers were changed.

Because coord is (y, x) and shape is (width, height), as documented on fn shape().

I don't understand what you are saying here. That it is in column-major representation? (In which case the formula would still be incorrect)? I have tried to work with reversed coordinates, but that doesn't stop the problem of accessing elements outside of a square.

So we have the width = number of columns, height = number of rows. Say we have a buffer, b with shape (width, height). What is b[(x, y)] meant to do? Currently it returns data at (y, x), but has bounds for (x, y), making it impossible to access some data in a buffer, or directing different indices to the same place in the buffer, depending on whether width > length or width < length.

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

Currently it returns data at (y, x), but has bounds for (x, y),

Oh, I didn't spot the bounds check. Yeah, I think the formula is correct, but the bounds check is wrong.

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

CC @metasim my intention was to index by (row, col), but I don't remember if that matches ndarray or other libraries.

@jdroenner
Copy link
Member

I think ndarray can use both (https://docs.rs/ndarray/latest/ndarray/enum.Order.html) and there are good arguments for both options. I prefer [y,x] which is what numpy (python) use. And if i remember correct this is also the default in ndarray.

@lnicola
Copy link
Member

lnicola commented Aug 4, 2024

@CyclotomicDev can you update this PR to switch around the bounds check?

@CyclotomicDev
Copy link
Contributor Author

Apologies, its my first time using a PR. I seem to have somehow reverted the Cargo.toml file, will that be fine to ignore in the merge?

@lnicola lnicola force-pushed the master branch 2 times, most recently from e3a2aa4 to 5d1e1c4 Compare August 5, 2024 07:46
@lnicola
Copy link
Member

lnicola commented Aug 5, 2024

I've dropped the dependency version change and added a warning to the docs:

image

@lnicola
Copy link
Member

lnicola commented Aug 5, 2024

r? @jdroenner

@metasim
Copy link
Contributor

metasim commented Aug 5, 2024

CC @metasim my intention was to index by (row, col), but I don't remember if that matches ndarray or other libraries.

There's a lot of historical legacy here, but my memory is fuzzy on why it is the way it is. Every time I would iterate over cells, I'd double check the docs, for both correctness and cache coherence (an iterator can take care of that part).

In my own personal libraries I never return tuples, but structs with labeled fields because it's too easy for me to mess up. But gdal doesn't do that (e.g. window coordinates to read), so I don't think there's any way around having to manually check the docs.

As you note, ndarray supports either (as does numpy) due to Fortran vs C traditions, so that part of the API could abstract this out.

@lnicola
Copy link
Member

lnicola commented Aug 7, 2024

I was asking more about what buf[(y, x)] should do, but good point, I've added a pair of getters to make things easier for the users.

@lnicola lnicola mentioned this pull request Aug 13, 2024
2 tasks
@jdroenner
Copy link
Member

jdroenner commented Aug 14, 2024

I have been thinking about the Buffer. To be honest i think we should deprecate all methods with tuple params as well as the shape method. We should simply add new methods with row and column as parameters. Who ever needs ndarray style access can transform the Buffer into an Ndarray...

@lnicola lnicola mentioned this pull request Aug 28, 2024
@jdroenner jdroenner merged commit 819950f into georust:master Sep 10, 2024
10 checks passed
@lnicola
Copy link
Member

lnicola commented Sep 18, 2024

@jdroenner we should probably discuss the deprecation of Index separately, but maybe let's publish a patch version now, to unblock the other PRs.

@jdroenner
Copy link
Member

good idea. some lint broke the CI. I guess we should fix that first.

@lnicola
Copy link
Member

lnicola commented Sep 18, 2024

I have an open PR for that.

@jdroenner
Copy link
Member

merged! thx!

@lnicola
Copy link
Member

lnicola commented Sep 20, 2024

Gentle nudge about publishing a patch update, @jdroenner.

@jdroenner
Copy link
Member

jdroenner commented Sep 20, 2024

i will do it later today or Sunday (edit: will do it Sunday morning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants