Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance update_platform to support more use cases and produce more consistent results #740

Closed
emiliom opened this issue Jun 25, 2022 · 8 comments
Assignees
Labels
bug Something isn't working enhancement This makes echopype better
Milestone

Comments

@emiliom
Copy link
Collaborator

emiliom commented Jun 25, 2022

The use case that drove the development and testing of update_platform was a single set of Saildrone data files from the Pacific Hake survey, with one EK80 raw file with no Platform data at all and an associated external data file.

I illustrate and document some of the issues in this notebook.

General comments about the current implementation of update_platform

  • It fails with the use case of a single fixed location created as an xarray Dataset on the fly with no other variables
  • It handles latitude, longitude, pitch, roll, vertical_offset and water_level
  • It currently adds a history attribute only to time1.history. Here's an example: "2022-06-25 21:32:53.788634 +00:00. Added from external platform data". I think this same attribute string should also be added to all variables inserted by update_platform
  • time1 is used for all inserted variables. It is the only dimension assigned to these variables. That pattern predates echopype 0.6.0, and for variables other than latitude and longitude the resulting Platform group ends up being inconsistent with a Platform group generated using open_raw where the platform data are found in the raw file. Specifically, echopype 0.6.0 specifies the following dimensionality:
    pitch: (channel, time2)
    roll: (channel, time2),
    vertical_offset: (channel, time2)
    water_level: (channel, time3)
    
  • It sets all missing target variables to nan except water_level, which is set to 0. I didn't find any valid reason for the water_level assignment of 0, and I believe the only reason was to prevent the failure of tests/echodata/test_echodata.py::test_update_platform as currently implemented.

Some open questions about the fixed-location use case, such as from a mooring

  • What should the timestamp on the location be? eg, the first or last ping_time value? When using a single timestamp (time1 will have a length of 1), the assumption is that it applies to the entire dataset (all ping times).
  • Would it be clearer to replicate the fixed location across more than one timestamp? For example, the first and last ping_time?

Issues unrelated to update_platform per se

Comparing the outcome of update_platform for AZFP and EK60 raw files that don't have the variables handled by update_platform but do have other Platform variables exposed some remaining differences in how variables with empty values are being handled by the different set_groups_* classes. These differences are illustrated in my sample notebook, and they involve issues such as:

  • Whether a variable is created when empty (eg, latitude and longitude are not created in set_groups_azfp, but they are in set_groups_ek60 and set_groups_ek80)
  • Default values of 0 or nan
  • Assigned dimensions

Also, set_groups_ek60 does not add the 3 empty global platform_* attributes, whereas set_groups_ek80 and set_groups_azfp do.

Finally, set_groups_azfp is not assigning any attributes to the AZFP-specific variables tilt_x and tilt_y .

@emiliom emiliom added bug Something isn't working enhancement This makes echopype better labels Jun 25, 2022
@emiliom emiliom self-assigned this Jun 25, 2022
@emiliom emiliom added this to the 0.6.1 milestone Jun 25, 2022
@emiliom emiliom added this to Echopype Jun 25, 2022
@emiliom emiliom moved this to Todo in Echopype Jun 25, 2022
@emiliom emiliom moved this from Todo to In Progress in Echopype Jun 25, 2022
@emiliom
Copy link
Collaborator Author

emiliom commented Jun 26, 2022

See PR #741, which addressed a few of the tasks from this issue. In particular, the simple fixed-location use case where a single lat-lon coordinate is passed to update_platform. The choice of which timestamp to associate with it was left to the user.

@leewujung
Copy link
Member

Some open questions about the fixed-location use case, such as from a mooring

  • What should the timestamp on the location be? eg, the first or last ping_time value? When using a single timestamp (time1 will have a length of 1), the assumption is that it applies to the entire dataset (all ping times).
  • Would it be clearer to replicate the fixed location across more than one timestamp? For example, the first and last ping_time?

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive. We can perhaps add in the comment that the lat/lon are added through update_platform and use the first ping_time as the timestamp.

@emiliom
Copy link
Collaborator Author

emiliom commented Jun 29, 2022

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive.

I hadn't thought of that case, and it's a good point. Right now this decision (whether to use a single point with the first ping_time or two points) is external to update_platform, but I'm thinking that we could build in -- now or down the road -- a special handling of the fixed-location case such that the user could pass a single lat,lon coordinate; then update_platform would implement the assumption (eg, single point timestamped to the first ping_time).

We can perhaps add in the comment that the lat/lon are added through update_platform and use the first ping_time as the timestamp.

In this issue I've included a suggestion to add the history attribute to every variable that's updated (inserted/rewritten) by update_platform. That would take care of the first part of your suggestion. The second part, the use of the first ping_time as the timestamp, could be added there too (for simplicity); or as a comment attribute.

Overall, I like your suggestions and reasoning.

@emiliom
Copy link
Collaborator Author

emiliom commented Jun 29, 2022

For version 0.6.1 (short time frame), I think we can focus largely on the fixed-location use case and the issues and clean ups I've already addressed in PR #741. We can leave the other comments I've raised here to 0.6.2.

@leewujung leewujung modified the milestones: 0.6.1, 0.6.2 Jul 1, 2022
@emiliom emiliom modified the milestones: 0.6.2, 0.6.3 Aug 12, 2022
@emiliom emiliom modified the milestones: 0.6.3, 0.6.4 Oct 6, 2022
@leewujung leewujung modified the milestones: 0.6.4, 0.7.0 Dec 1, 2022
@leewujung
Copy link
Member

Just wanted to note that this is related to #769.

@emiliom emiliom modified the milestones: 0.7.0, 0.7.1 Mar 24, 2023
@leewujung
Copy link
Member

Now #1009 supersede #769.

@emiliom
Copy link
Collaborator Author

emiliom commented Aug 3, 2023

I've gone over this issue, related PR's, and the outcome of update_platform in an example. I think everything here has been addressed. The only exception is the issue now tracked in #1009; we'll discuss and track its progress there and can close this issue.

There is one usability improvement mentioned here that is not implemented:

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive.

I hadn't thought of that case, and it's a good point. Right now this decision (whether to use a single point with the first ping_time or two points) is external to update_platform, but I'm thinking that we could build in -- now or down the road -- a special handling of the fixed-location case such that the user could pass a single lat,lon coordinate; then update_platform would implement the assumption (eg, single point timestamped to the first ping_time).

This would be nice, but let's create a new issue solely for it. We can then address it in a later release.

@emiliom
Copy link
Collaborator Author

emiliom commented Aug 16, 2023

Now that I've opened #1126, we close this issue.

@emiliom emiliom closed this as completed Aug 16, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Echopype Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement This makes echopype better
Projects
Status: Done
Development

No branches or pull requests

2 participants