Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bootstraper): fix version inconsistencies and #19 regressions #34

Merged
merged 8 commits into from
Apr 14, 2023

Conversation

emileten
Copy link
Contributor

@emileten emileten commented Apr 6, 2023

  • fix inconsistent pypgstac versions in bootstrapper and stac-ingestor
  • fix regression introduced by Fix ingestor regressions introduced by developmentseed/cdk-pgstac/pull/18. #19 : pypgstac.load does not have a load_collection but a load_collections method. I am unsure about my fix : it looks like load_collections excepts a json containing a list of collection metadata. How is it going to behave if we have a single collection in it ?
  • make sure the value passed to the file parameter of the pypgstac collection loader is a list of dict.

@emileten emileten changed the title Draft : Bug fix bugfix : fix inconsistent API and database pypgstac versions and fix regression introduced by #19 Apr 6, 2023
@emileten emileten changed the title bugfix : fix inconsistent API and database pypgstac versions and fix regression introduced by #19 bugfix: fix inconsistent API and database pypgstac versions and fix regression introduced by #19 Apr 6, 2023
@emileten emileten changed the title bugfix: fix inconsistent API and database pypgstac versions and fix regression introduced by #19 fix(bootstraper): fix inconsistent API and database pypgstac versions and fix regression introduced by #19 Apr 6, 2023
@emileten emileten changed the title fix(bootstraper): fix inconsistent API and database pypgstac versions and fix regression introduced by #19 fix(bootstraper): fix version inconsistencies and #19 regressions Apr 6, 2023
@emileten emileten self-assigned this Apr 6, 2023
@emileten
Copy link
Contributor Author

emileten commented Apr 7, 2023

I tested this in an AWS deployment. I am posting this to the collections endpoint :

{
    "id": "icesat2-boreal",
    "links": [
        {
          "rel": "items",
          "type": "application/geo+json",
          "href": "https://stac.maap-project.org/collections/icesat2-boreal/items"
        },
        {
          "rel": "parent",
          "type": "application/json",
          "href": "https://stac.maap-project.org/"
        },
        {
          "rel": "root",
          "type": "application/json",
          "href": "https://stac.maap-project.org/"
        },
        {
          "rel": "self",
          "type": "application/json",
          "href": "https://stac.maap-project.org/collections/icesat2-boreal"
        }
    ],
    "type": "Collection",
    "title": "Gridded Boreal Aboveground Biomass Density c.2020 at 30m resolution",
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -180, 51.6, 180, 78
                ]
            ]
        },
        "temporal": {
            "interval": [
              [
                "2019-01-01T00:00:00.000Z",
                "2021-01-01T00:00:00.000Z"
              ]
            ]
          }        
    },
    "license": "CC-BY",
    "description": "Aboveground biomass density c.2020 gridded to 30m derived from ICESat-2, Harmonized Landsat-Sentinel2 and Copernicus DEM. Band 1 is the data band. Band 2 is the standard deviation.",
    "item_assets": {
        "cog_default": {
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": [
                "data",
                "layer"
            ],
            "title": "Default COG Layer",
            "description": "Cloud optimized default layer to display on map"
        },
        "documentation": {
            "type": "text/html",
            "roles": [
                "documentation"
            ],
            "title": "Find and open data with STAC",
            "description": "Find and open data with STAC",
            "href": "https://docs.maap-project.org/en/stac/search/searching_the_stac_catalog.html"
        }        
    },
    "stac_version": "1.0.0"
}

And I am getting an error in the database side. Here is the log :

2023-04-07 07:10:37 UTC:10.0.2.183(22885):pgstac_user@pgstac:[4632]:ERROR: COPY from stdin failed: error from Python: JSONDecodeError - Input must be bytes, bytearray, memoryview, or str: line 1 column 1 (char 0)

@vincentsarago @alukach any idea ? This PR is bumping the database pypgstac version so that it matches that of the ingestor -- so it bumps from 0.6.8 to 0.6.13. Could this update be causing the error ?

Also @alukach -- do you have a way to test TypeScript changes in a deployment before publishing these changes ? I have been trying to point my package.json file to a github branch or local cdk-pgstac directory, without success so far. I keep encountering Typescript compiling errors. The way I did this test above is to reference the latest published version of cdk-pgstac in the package.json file, and then manually add the few modifications I made in this PR directly in the cdk-pgstac code in mynode_modules directory.

In case that helps understanding what I want to do : if it was python I would pip install git+https://github.com/developmentseed/cdk-pgstac.git@bug-fix in my deployment. I am wondering how you typically do this with TypeScript dependencies, without resorting to the kind of manual work above. Thanks !

@sharkinsspatial
Copy link
Member

sharkinsspatial commented Apr 7, 2023

@emileten Given some oddities with npm I have found this is the only reliable approach for testing the compiled construct in another stack.

  1. Update the package version (this is a temporary hack to avoid potential issues with npm caching, make sure not to commit this change) https://github.com/developmentseed/cdk-pgstac/blob/main/package.json#L3
    2. npm run build and npm run package
  2. Update the pacakge.json in the target CDK project where you are using the cdk-pgstac construct to reference your new version package "cdk-pgstac": "file:../cdk-pgstac/dist/js/cdk-pgstac@your_update_version.jsii.tgz". Note that this should be the relative path to the local location of your compiled cdk-pgstac package.
  3. rm -rf node_modules in the target CDK project.
  4. npm install in the target CDK project.

I'm not certain all this is fully necessary, but I ran into numerous cases where my target CDK project was using a cached version of the locally compiled package so I've been using this approach to ensure I'm always using the compiled code I expect.

@sharkinsspatial
Copy link
Member

cc @bitner An ideas on why pgstac would be throwing this #34 (comment)?

@bitner
Copy link

bitner commented Apr 10, 2023

cc @bitner An ideas on why pgstac would be throwing this #34 (comment)?

You say "posting this to the endpoint". Is this being posted to a stac-fastapi endpoint or to something that is using pypgstac? For pypgstac, the read_json function (https://github.com/stac-utils/pgstac/blob/64221059916d4e182c938c86799bf77d9c403965/pypgstac/pypgstac/load.py#L145) expects either a json string or an iterable of python dicts [{"id": .....}]

@emileten
Copy link
Contributor Author

@bitner something that is using pypgstac.

I think the problem is indeed in the type of the object we pass to the collections loader. However, I am having a hard time verifying this locally, and I could get some help from you on that.

Here is an idea of what I am trying to do : https://gist.github.com/emileten/bd7ef3dbdd1cf642b79fa0a0efab10f8. This code is attempting to load a collection to a pgstac database running locally (version 0.6.13). It reproduces the bug I posted. The collection json that is referenced can be found here : https://gist.github.com/emileten/90e87db0af340f3542726d6146f95f87.

I feel that in this line https://gist.github.com/emileten/bd7ef3dbdd1cf642b79fa0a0efab10f8#file-use-pypgstac-py-L35 coll_data should be an Iterable (like you said), but it isn't. If I change it to a list though the error still occurs.

Now, I have this locally running pgstac instance, thanks to the helpful pgstac docs. The way I got it up and running is by running the update script and then the server script. However, I would like to be able to iterate quickly on local changes to pypgstac so that I can investigate the problem. With my setup, however, I need to rebuild the docker images everytime for my changes to pypgstac to be taken in account.

So my question is -- what is your typical workflow to get that kind of local setup ?

Thanks !

@emileten
Copy link
Contributor Author

@sharkinsspatial could you review the new commit I am mentioning in the first bullet point below. With this fix I can now push collections. Except if I get other bugs with items ingestions, we might then be good to merge.

  • Solved the problem. pypgstac expects either a str or an Iterable of dict, like david said. In our code (1) the FastAPI endpoint + pydantic forces the collection json to the type indicated in the type hint -- that is, StacCollection, so, not a dict, and (2) we weren't passing an Iterable but the StacCollection object itself. I solved this by adding this line right before the call to the pypgstac loader collection = [collection.to_dict()]. Verified it work both locally and in my deployment. Commit.

  • @bitner I would still love having your advice about how to efficiently look locally for the cause of an exception raised in pypgstac (see my message above). Thank you !

@sharkinsspatial
Copy link
Member

@emileten This looks good to me and nice catch. This is a great example of why we need #13 (which would have caught this regression which actually got introduced across 2 PRs) as well as configuring type checking in our tox configuration.

Can you add a unit test for https://github.com/developmentseed/cdk-pgstac/blob/main/lib/ingestor-api/runtime/src/collection.py which validates that load_collections is being called with the wrapped iterable. You can use https://github.com/developmentseed/cdk-pgstac/blob/main/lib/ingestor-api/runtime/tests/test_ingestor.py as an example of the mocking.

@emileten emileten merged commit ebeac2a into main Apr 14, 2023
@emileten emileten deleted the bug-fix branch April 14, 2023 03:14
github-actions bot pushed a commit that referenced this pull request Apr 14, 2023
## [3.0.1](v3.0.0...v3.0.1) (2023-04-14)

### Bug Fixes

* **bootstraper:** fix version inconsistencies and [#19](#19) regressions ([#34](#34)) ([ebeac2a](ebeac2a))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants