-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cm/fix requirements #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clmarmy you'll find some comments, regarding in particular the handling of the CRS.
Thanks in advance,
Alessandro
@@ -61,12 +62,15 @@ def file_loader(files): | |||
|
|||
acc_gdf = gpd.GeoDataFrame() # accumulator | |||
|
|||
crs = None # init | |||
#crs = None # init | |||
crs = epsg_dft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to pass epsg_dft
as a parameter instead of using a global variable (cf. https://stackoverflow.com/questions/19158339/why-are-global-variables-evil).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about it before and prefer to use a global variable because of the importance to have a unique CRS (by ex. for the clip function) over the all project.
But having it as a parameter is also good for me.
@@ -18,3 +18,4 @@ output_files: # GeoPackage is used as output format | |||
settings: | |||
gt_sectors_buffer_size_in_meters: <ex. 1.0> # GT sectors are "augmented" by a buffer having this size (in meters) | |||
tolerance_in_meters: <ex. 1.0> # GT trees and detected are considered to match if their distance is <= this tolerance (in meters) | |||
crs_dft: 'epsg:2056' # CRS that will be added if inputs files have no CRS (by eg. the detections outputs of the DFT have no CRS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file is a template config file, we could write crs_dft: <ex. 'epsg:2056'>
in order to force the end-user to replace with the desired value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point !
crs = tmp_gdf.crs | ||
|
||
if tmp_gdf.crs is None: | ||
crs = epsg_dft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 66 already sets crs = epsg_dft
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous implementation, we used to read the CRS from the first GeoDataFrame to be concatenated. Then, we wanted to make sure that all the subsequent ones had the same CRS. Such a CRS was lost upon pd.concat([...]), as Pandas is not CRS-aware. It was then restored by line 81 gdf = gdf.set_crs(crs)
.
- What happens with the new implementation?
- How could the previous implementation work with the DFT output, despite the lack of CRS?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About l.66: it was meant for humans "if there is no CRS, the CRS of the project will be set"; but yes, it is otherwise useless.
What happens with the new implementation ? It has two purposes :
- It fixes a CRS for the DFT outputs which have none.
- It assures good practice by checking that all layers (detection, GT, sectors) are in the same CRS. This is mandatory for the clip function by example.
For that reason, I think the fix is improving the previous implementation. Here a cleaner version (with the CRS as argument and no more global variable) :
` def file_loader(files, crs):
acc_gdf = gpd.GeoDataFrame() # accumulator
for _file in files:
# TODO: check schema
tmp_gdf = gpd.read_file(_file)
if tmp_gdf.crs != None:
if tmp_gdf.crs != crs:
logger.critical("Input datasets have mismatching CRS. Exiting.")
sys.exit(1)
acc_gdf = pd.concat([acc_gdf, tmp_gdf])
gdf = gpd.GeoDataFrame(acc_gdf)
gdf = gdf.set_crs(crs)
return gdf.reset_index(drop=True)
`
No description provided.