Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async flatgeobuf reading via object-store #494

Merged
merged 8 commits into from
Feb 26, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Feb 3, 2024

This is a draft for now because HTTPFgbReader is not generic over backends. See flatgeobuf/flatgeobuf#345

@weiji14 weiji14 mentioned this pull request Feb 3, 2024
6 tasks
@kylebarron kylebarron changed the title WIP: async flatgeobuf reading via object-store Blocked: async flatgeobuf reading via object-store Feb 19, 2024
@kylebarron kylebarron added the blocked Blocked on external progress label Feb 19, 2024
@Libbum
Copy link

Libbum commented Feb 24, 2024

I've been watching your progress on this here and in the flatgeobuf crate. This capability would be super helpful not just for geo-arrow, but for broader applications also.

  1. Is there anything in particular I can help you with to finish off this feature?
  2. Do you think it would be practical / possible to release this as a separate crate? If not: would it be possible to at least expose these methods for use elsewhere?

@kylebarron
Copy link
Member Author

This is waiting on a new release of flatgeobuf. Hopefully this will just work once that's released.

If you're interested, you could test this branch with the latest main of flatgeobuf, and then verify that it actually works.

  1. Do you think it would be practical / possible to release this as a separate crate? If not: would it be possible to at least expose these methods for use elsewhere?

Release what as a separate crate? The only non-geoarrow-specific part of this is object_store_reader.rs which is much too small to release as its own crate IMO. More likely would be to see if http_range_client is interested in a PR to integrate with object-store

@Libbum
Copy link

Libbum commented Feb 24, 2024

Thanks :) I thought there was still something to finish up due to the todo! in reader_async, but that was a total skim on my part—it's clearly just a return type to be plugged in.

see if http_range_client is interested in a PR to integrate with object-store

Valid point, that sound like a cleaner plan regardless.

Cool, I'll play around with flatgeobuf master and let you know what I find out 🚀

@kylebarron
Copy link
Member Author

it looks like 4.1 was just published, so you should test with that

Comment on lines 16 to 18
let split_range = range.split('-').collect::<Vec<_>>();
let start_range = split_range[0].parse::<usize>().unwrap();
let end_range = split_range[1].parse::<usize>().unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range (at least in my testing) looks like bytes=0-12943 - so your split is insufficient.

I've done let split_range = range.split(&['=', '-']).collect::<Vec<_>>(); and then incremented the indexes on line 17, 18.

Still looking, but I'm getting a response with this change at least 🚀

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A refactor including errors:

        let bytes_range = range
            .split(&['=', '-'])
            .filter_map(|n| n.parse::<usize>().ok());
        if let Some((start_range, end_range)) = bytes_range.collect_tuple() {
            // Add one to the range because HTTP range strings are end-inclusive
            let bytes = self
                .reader
                .get_range(&self.location, start_range..end_range + 1)
                .await
                .map_err(|_| HttpError::HttpStatus(500))?;
            return Ok(bytes);
        }
        Err(HttpError::HttpStatus(500))

Copy link

@Libbum Libbum Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use itertools:Itertools; //collect_tuple
use http_range_client::HttpError;

@Libbum
Copy link

Libbum commented Feb 25, 2024

With that small fix above, this seems to be doing exactly what I'd expect!

This is so great—I've been managing PSKs to get access to encrypted flatgeobufs on s3 up until now. Super clunky. Seamless with this wrapper. Thanks Kyle :)

@kylebarron
Copy link
Member Author

🚀

image

@kylebarron kylebarron marked this pull request as ready for review February 26, 2024 03:56
@kylebarron kylebarron changed the title Blocked: async flatgeobuf reading via object-store Async flatgeobuf reading via object-store Feb 26, 2024
@kylebarron
Copy link
Member Author

Merging with follow ups tracked in #534. Not yet working for s3 files

@kylebarron kylebarron merged commit 0ec47c3 into main Feb 26, 2024
6 checks passed
@kylebarron kylebarron deleted the kyle/wip-object-store-flatgeobuf branch February 26, 2024 04:15
@kylebarron kylebarron removed the blocked Blocked on external progress label Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants