Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cm/fix requirements #1

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Cm/fix requirements #1

wants to merge 4 commits into from

Conversation

clmarmy
Copy link

@clmarmy clmarmy commented Nov 16, 2022

No description provided.

Copy link
Member

@acerioni acerioni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clmarmy you'll find some comments, regarding in particular the handling of the CRS.

Thanks in advance,
Alessandro

@@ -61,12 +62,15 @@ def file_loader(files):

acc_gdf = gpd.GeoDataFrame() # accumulator

crs = None # init
#crs = None # init
crs = epsg_dft
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to pass epsg_dft as a parameter instead of using a global variable (cf. https://stackoverflow.com/questions/19158339/why-are-global-variables-evil).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about it before and prefer to use a global variable because of the importance to have a unique CRS (by ex. for the clip function) over the all project.

But having it as a parameter is also good for me.

@@ -18,3 +18,4 @@ output_files: # GeoPackage is used as output format
settings:
gt_sectors_buffer_size_in_meters: <ex. 1.0> # GT sectors are "augmented" by a buffer having this size (in meters)
tolerance_in_meters: <ex. 1.0> # GT trees and detected are considered to match if their distance is <= this tolerance (in meters)
crs_dft: 'epsg:2056' # CRS that will be added if inputs files have no CRS (by eg. the detections outputs of the DFT have no CRS).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is a template config file, we could write crs_dft: <ex. 'epsg:2056'> in order to force the end-user to replace with the desired value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point !

crs = tmp_gdf.crs

if tmp_gdf.crs is None:
crs = epsg_dft
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 66 already sets crs = epsg_dft.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous implementation, we used to read the CRS from the first GeoDataFrame to be concatenated. Then, we wanted to make sure that all the subsequent ones had the same CRS. Such a CRS was lost upon pd.concat([...]), as Pandas is not CRS-aware. It was then restored by line 81 gdf = gdf.set_crs(crs).

  • What happens with the new implementation?
  • How could the previous implementation work with the DFT output, despite the lack of CRS?!

Copy link
Author

@clmarmy clmarmy Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About l.66: it was meant for humans "if there is no CRS, the CRS of the project will be set"; but yes, it is otherwise useless.

What happens with the new implementation ? It has two purposes :

  1. It fixes a CRS for the DFT outputs which have none.
  2. It assures good practice by checking that all layers (detection, GT, sectors) are in the same CRS. This is mandatory for the clip function by example.

For that reason, I think the fix is improving the previous implementation. Here a cleaner version (with the CRS as argument and no more global variable) :

` def file_loader(files, crs):

  acc_gdf = gpd.GeoDataFrame() # accumulator

  for _file in files:
      # TODO: check schema  
      tmp_gdf = gpd.read_file(_file) 
      
      if tmp_gdf.crs != None:
          if tmp_gdf.crs != crs:
              logger.critical("Input datasets have mismatching CRS. Exiting.")
              sys.exit(1)
      acc_gdf = pd.concat([acc_gdf, tmp_gdf])

  gdf = gpd.GeoDataFrame(acc_gdf)
  gdf = gdf.set_crs(crs)

  return gdf.reset_index(drop=True) 

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants