Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap surface #245

Merged
merged 14 commits into from
Jan 16, 2019
Merged

Wrap surface #245

merged 14 commits into from
Jan 16, 2019

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Nov 25, 2018

Description of proposed changes

Wrapping the surface gridding function. A new load_tut_ship() dataset function was also added as an example for the tests.

Question: Do I put the API docs for surface under Data Processing (along with gmt.info and gmt.grdinfo)? Or have it under another section?

Fixes #243

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Initial commit for GenericMappingTools#243. Implement GMT surface function under gridding.py. Test cases checking file/matrix/vectors type inputs stored in test_surface.py. Sample dataset for tests uses newly created load_tut_ship function in datasets/tutorial.py.

Note that original GMT surface module https://gmt.soest.hawaii.edu/doc/latest/surface.html requires a -Goutputfile.nc parameter. Here, the implementation of surface does not respect this -G parameter properly. Instead, it stores the output file to a GMTTempFile and returns an xarray.Dataset in Python.
Checks that the loaded pandas.DataFrame's shape and minmax summary on each column is correct.
Getting code coverage up to 100% for gridding.py.
Allow `surface` to store the output grid to a netcdf file. Will still return xarray.Dataset in python bindings.
@welcome
Copy link

welcome bot commented Nov 25, 2018

💖 Thanks for opening this pull request! 💖

Please make sure you read our contributing guidelines and abide by our code of conduct.

A few things to keep in mind:

  • If you need help writing tests, take a look at the existing ones for inspiration. If you don't know where to start, let us know and we'll walk you through it.
  • All new features should be documented. It helps to write the docstrings for your functions/classes before writing the code. This will help you think about your code design and results in better code.
  • No matter what, we are really grateful that you put in the effort to do this! 🎉

@weiji14 weiji14 changed the title Wrap surface WIP Wrap surface Nov 25, 2018
@leouieda
Copy link
Member

@weiji14 thanks for submitting this PR! I have a few comments that need to addressed first.

I wasn't sure if we should merge this yet because I'm making some big changes in the C lib wrapper and how we build the wrapper modules. But I'll merge this in as a test case for some of the changes I'm planning.

Let me know if you need any help.

Do I put the API docs for surface under Data Processing (along with gmt.info and gmt.grdinfo)?

Yep, you can put it under the "tabular data" section.

Renaming the function to load the @tut_ship.xyz tutorial dataset.
Using long aliases 'spacing' and 'region' instead of -I and -R respectively. Change from x, y, z to longitude, latitude, bathymetry for load_sample_bathymetry tutorial dataset. Note that the sample dataset's longitude is in 0-360 format instead of -180 to 180.
The xarray.Dataset may sometimes be lazily loaded, so we use `.load()` to force loading of the data. See e.g. https://github.com/pydata/xarray/blame/9f4474d657193f1c7c9aac25bb2edf94755a8593/doc/io.rst#L169-L170. This is important because we want to persist the data even after the tempfile is deleted.
Cosmetic changes mainly. Try to reduce cognitive complexity of the surface docstring. Also fix pylint C0411: third party import "import xarray as xr" should be placed before "import pytest" (wrong-import-order) in test_surface.py
@weiji14
Copy link
Member Author

weiji14 commented Nov 30, 2018

@leouieda thanks for taking the time to review this! I've made some commits to address the issues you've highlighted, and have now added surface to the docs.

You've definitely spotted the problematic part in this comment #245 (review). I do think there needs to be a bit more discussion there on how we handle the output. I.e. do we want the gridding function to create a file only by default, or have it output an xarray.Dataset, or have some gmt-python specific flag (e.g. return_grid = True/False) that allows the user to decide.

Another question is, what's a suitable alias for -G? 'outputfile' seems a bit long, 'grdfile' or 'out_grdfile' seems quite technical/specific. My preference is for 'outfile', but I'm open to suggestions.

Using 'outfile' as long alias for the 'G' parameter in gmt surface. Created some if-then logic to return no output (None) if this outfile parameter is set, otherwise return an xarray.Dataset if outfile parameter is unset (the default result from previous code).
@weiji14 weiji14 changed the title WIP Wrap surface Wrap surface Dec 28, 2018
@weiji14
Copy link
Member Author

weiji14 commented Dec 28, 2018

Ok, it's been almost a month already so I'm just going ahead with the changes suggested by @leouieda and me:

  • The -G parameter now has an alias called outfile
  • Setting this outfile (G) parameter means the grid won't be returned in python (i.e. None is returned)

Also, I do have a jupyter notebook called process_grid_data.ipynb showcasing this new functionality and that of gmt.grdcontour (See #225) but I'll put that in a separate pull request once this one is merged in!

P.S. Let me know if I should merge in the latest commits from master. Also, not sure what I can do about the codeclimate cognitive complexity problem.

@leouieda
Copy link
Member

Hi @weiji14, sorry for the absence. I'm glad you went ahead with the changes. The code is fine and the tests seem to work well. The issue with the complexity is because of the many nested levels of the code. I don't see a good way around this for this function so we can leave it as is.

The output handling is something I'm still undecided. I find it very unnatural to have all these Python functions that operate with file names for input and output. In any other library in the ecosystem, the computations happen in memory and there are dedicated functions for reading and writing files. This also minimizes the complexity of the input arguments making the code much easier to test (which is a major issue with GMT).

Regarding the example, please don't commit the notebook yet. I'm trying to get sphinx-gallery working which will greatly facilitate making examples and tutorials. After that is done, then I'd welcome some help with writing gallery plots or tutorials.

I'll merge this in now and we can see think of a better way to handle IO later on.

Thanks for your work on this! I hope you'll consider making more contributions in the future 🥇 🎉

@leouieda leouieda merged commit 1079460 into GenericMappingTools:master Jan 16, 2019
@welcome
Copy link

welcome bot commented Jan 16, 2019

🎉🎉🎉 Congrats on merging your first pull request and welcome to the team! 🎉🎉🎉

Please open a new pull request to add yourself to the AUTHORS.md file. We hope that this was a good experience for you. Let us know if there is any way that the contributing process could be improved.

@weiji14
Copy link
Member Author

weiji14 commented Jan 17, 2019

In terms of working on filenames, I think a good example would be the pandas library, and in fact, I did look at their I/O implementation when wrapping surface. But really, this should be in a separate topic. Anyways, thanks for merging this in, I'll definitely contribute some more in the future!

@leouieda
Copy link
Member

leouieda commented Jan 17, 2019

I think a good example would be the pandas library

Yep. And they have no function that computes things and saves them directly to a file. You always do pd.read_csv then compute something and DataFrame.to_csv to save to a file. So it feels weird doing gmt.suface(data.longitude, data.latitude, data.bathymetry, outlfile="grid.nc").

But yes, we'll discuss this more in another issue :)

I'll definitely contribute some more in the future!

🚀 🎉

weiji14 added a commit to weiji14/deepbedmap that referenced this pull request Feb 8, 2019
Couple of breaking changes. Since GenericMappingTools/pygmt#261, the conda-forge binary package is broken (hence all the recent CI build failures...) and we have to build the GMT binary from source, sigh. The new Dockerfile code for handling this isn't nice but it'll do for now.

Also, the Python `gmt-python` package has been renamed to `pygmt` in GenericMappingTools/pygmt#265, so the Pipfile is updated accordingly here. Also pinning to a newer git version on the upstream branch instead of my personal fork, since GenericMappingTools/pygmt#245 has been merged in!
@weiji14 weiji14 mentioned this pull request May 22, 2019
@weiji14 weiji14 mentioned this pull request Jun 24, 2020
9 tasks
@weiji14 weiji14 mentioned this pull request Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrapper for surface
2 participants