Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling data processing functions that output to a grid or table #1536

Open
2 of 4 tasks
weiji14 opened this issue Sep 23, 2021 · 6 comments
Open
2 of 4 tasks

Handling data processing functions that output to a grid or table #1536

weiji14 opened this issue Sep 23, 2021 · 6 comments
Labels
question Further information is requested

Comments

@weiji14
Copy link
Member

weiji14 commented Sep 23, 2021

Description of the issue

In the GMT command-line world, there are some data processing functions that can output to either a NetCDF grid or ASCII table. Translating to Python/PyGMT, do we want to 1) have a single function that can output to both (depending on some flag), or 2) have two functions/methods, one which outputs to a grid, and one which outputs to a table.

This is a list of functions that need to be handled:

Originally posted by @weiji14 in #1433 (comment)

I changed the implementation a bit relative to #731 to support ASCII or pandas.DataFrame output for writing out the equalized histogram.

Still, the code is a bit clunky in order to support four different output types (pandas.DataFrame, xarray.DataArray, netCDF, or ASCII). What would you think about having two PyGMT functions for GMT's grdhisteq module rather than just one? One function could write out the data ranges of histogram equalization to a pd.DataFrame or ASCII table and the other could write out the cumulative distribution statistics to a netCDF file or xarray.DataArray. I guess coming up with the names for these would be harder than the current implementation, but I think it would be more user friendly long-term.

Yeah I've debated a bit on whether to have 2 functions too, something like a pygmt.grdhisteq.to_table() and pygmt.grdhisteq.to_grid() (implemented using Python classmethods), or maybe with an underscore like pygmt.grdhisteq_to_table() and pygmt.grdhisteq_to_grid() (implemented purely using Python functions). Tying this to #1318 (comment), I think the split into 2 may have to happen eventually, especially if we want to support more table-like outputs (ascii/numpy/pandas/geopandas/etc) like what Will is doing at grd2xyz #1284.

Possible implementation styles

These are how the implementation would look like, using triangulate as an example.

Single function

def triangulate(data, outgrid=None, outfile=None):
    pass

Two Python functions

Have a common _triangulate function that handles grid or table outputs, some similarities to the _blockm.

def _triangulate(data, outgrid=None, outfile=None):
    pass

def triangulate_to_grid(data, outgrid=None):
    pass

def triangulate_to_table(data, outfile=None):
    pass

Two methods in a single Python class ✔️

class triangulate:
    def _triangulate():
        pass

    @staticmethod
    def to_grid(data, outgrid=None):
        pass

    @staticmethod
    def to_table(data, outfile=None):
        pass

Are you willing to help implement and maintain this feature? Vote for which API style you prefer!

  • A. 👍 Single function to do both grid/table output, i.e. pygmt.triangulate(outgrid=True) or pygmt.triangulate(outfile=True)
  • B. 🎉 The 'functional' style, i.e. pygmt.triangulate_to_grid() or pygmt.triangulate_to_table()
  • C. 🚀 The 'class' method style, i.e. pygmt.triangulate.to_grid() or pygmt.triangulate.to_table()
  • D. 👀 Other suggestions on the names, or API design, please comment below!

P.S. Also xref #896 where there is a similar API design discussion on wrapping GMT functions that do either plotting or data processing.

@weiji14 weiji14 added the question Further information is requested label Sep 23, 2021
@weiji14 weiji14 mentioned this issue Oct 3, 2021
10 tasks
@maxrjones
Copy link
Member

I like the syntax of the class method style, but dislike using classes in a functional programming style with the staticmethod decorator. I would also prefer for the function/method names to be more descriptive regarding the output than 'to_table' or 'to_grid'. For example, triangulate.find_voronoi_edges or triangulate.find_delauney_edges or triangulate.grid_data.

In https://github.com/GenericMappingTools/pygmt/tree/grdhisteq-functions, I tried to implement a syntax similar to the class based option and the pygmt.datasets.load_* functions while still keeping the design functional. The functions work and I think the syntax is actually quite user-friendly, however, I could not get the import statements working for autodoc. Any advice here would be appreciated. I mixed up merge commits and needed to close #1571 due to divergence with the grdhisteq branch. I could either discard the class-based design, discard the functional design, or open a different PR from grdhisteq-functions with main as the target branch to compare the two options.

@maxrjones maxrjones mentioned this issue Oct 23, 2021
5 tasks
@maxrjones
Copy link
Member

The implementation of grdhisteq in #1433 uses the "Two methods in a single Python class" style and is currently on a final review call. If that PR gets merged, I think we should stick with that style for the other functions that output to a grid or table for consistency. So, please comment either here or in that PR if anyone does not like that design choice.

@weiji14
Copy link
Member Author

weiji14 commented Mar 12, 2022

Thanks Meghan for getting the grdhisteq function done. I'll refactor the triangulate implementation in #731 to use a similar "Two methods in a single Python class" style to be consistent.

@maxrjones
Copy link
Member

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide. The guidance could be added as a follow-up to #1687.

@seisman
Copy link
Member

seisman commented Mar 17, 2022

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.

The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.

@maxrjones
Copy link
Member

Just a note that this issue can be closed after the recommended structure (as used in grdhisteq and triangulate) is added to the contributing guide.

The contributing guide is already too long, do you think we should add a code style guide instead? Here is an example from ObsPy https://docs.obspy.org/coding_style.html.

Yes, a code style guide would be a good alternative. We could also move a bunch of the other information into a docs style guide if it's important to have the base contributing guide shorter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants