Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on the durability of the current handling of BDTOPO #185

Closed
ainar opened this issue Apr 26, 2023 · 2 comments · Fixed by #186
Closed

Questions on the durability of the current handling of BDTOPO #185

ainar opened this issue Apr 26, 2023 · 2 comments · Fixed by #186

Comments

@ainar
Copy link
Contributor

ainar commented Apr 26, 2023

Hey,
I noticed that the 2023 versions of BDTOPO files are called with "3-3": BDTOPO_3-3_TOUSTHEMES_GPKG_LAMB93_D[DEPARTEMENT]_[DATE].7z
I suppose it is a version of something. What does it correspond to exactly? May a "version" change modify the GeoPackage format and need a pipeline update?

In the code, it is expected "3-0" and it is not configurable:

if path.endswith("BDTOPO_3-0_TOUSTHEMES_GPKG_LAMB93_D{}_{}.7z".format(

I found that version 3-0 is for 2021 and 2022. Years 2017 and 2018 used version 2-2. 2012 to 2016 used 2-1, and 2009 to 2011 used 2-0.

Example of GeoPackage archive for Paris (75) 2023:
https://wxs.ign.fr/859x8t863h6a09o9o6fy4v60/telechargement/prepackage/BDTOPOV3-TOUSTHEMES-DEPARTEMENT_GPKG_PACK_231$BDTOPO_3-3_TOUSTHEMES_GPKG_LAMB93_D075_2023-03-15/file/BDTOPO_3-3_TOUSTHEMES_GPKG_LAMB93_D075_2023-03-15.7z

Also, as the date is precisely in the file name, could we use a folder name that is nonspecific to the date, like before? It was helpful to have different folders when the Shapefile names were the same. However, when using the archive, it is not the case anymore. If I want to use a BDTOPO from another year, I need to change two parameters, whereas only one could be possible. I've included the current date-specific folder name definition below.

context.config("bdtopo_path", "bdtopo22")

Aina

@sebhoerl
Copy link
Contributor

There are some valid points. But then I think the easiest way would simply be to load all 7z that have been dumped in the folder (without considering any file name patterns) and reading everything that is there. One just need to be careful not to put together files for five different areas in the same folder (but it wouldn't hurt either, it is just a runtime issue then).

@sebhoerl
Copy link
Contributor

I updated the PR above. I think the best is just to dump the files into a folder and then we read everything that is in there. The only thing that might be tricky is if there are files from multiple years fro the same area. However, then the pipeline will automatically remove duplicates (because we do that anyway for merging the departments) and the user will have a latest buildings. So now if you want to use two different data sets, you basically just need to change the option bdtopo_path, which brings us back to one config option :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants