Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent time coordinate naming within echopype #656

Closed
Tracked by #566
leewujung opened this issue Apr 28, 2022 · 20 comments
Closed
Tracked by #566

Consistent time coordinate naming within echopype #656

leewujung opened this issue Apr 28, 2022 · 20 comments
Assignees
Labels
data conversion data format Anything about data format
Milestone

Comments

@leewujung
Copy link
Member

leewujung commented Apr 28, 2022

To be compliant with the convention but also recognizing different sonar instruments produce different datagrams that contain the same types of information, we (@b-reyes @imranmaj @lsetiawan @leewujung) spent some time discussing what we should do in each of the instruments echopype supports at the moment. This discussion was spurred by #649.

Below are our conclusions:

Overarching

  • we need to be explicit that the same time base can be called differently across different groups, in docs and other relevant places
  • the source of the time base will be noted in the comment attribute of these time coordinate variables.

Time coordinates in Platform group

  • time1: will be used for data variables related to GPS lat/lon
    • this will be the same for all sensors
  • time2: will be used for data related to platform motion and positions, such as pitch/roll/vertical_offset
  • time3: will be used for anything from environment type of datagrams, such as water_depth
    • EK60: ping_time (not yet implemented)
    • EK80: environment_time (already implemented)
    • AZFP: X (not yet implemented)

Time coordinates in Environment group

  • we will call the time base time1 across the board, and we don't currently see a use case for time2 yet
  • sound_speed_indicative, dimension: (time1)
    • EK60: source ping_time (not yet implemented)
    • EK80: source environment_time (not yet implemented)
    • AZFP: X (do not need)
  • absorption_indicative, dimension: (channel, time1)
    • EK60: source ping_time (not yet implemented)
    • EK80: source environment_time (not yet implemented) environment_time
      • end result: time1 in Environment group is the same as time3 in Platform group. This will be noted in the time1 comment attribute and in the corresponding Platform.time3.comment attribute (those attributes already exist but will need to be updated)
    • AZFP: X (do not need)
  • other variables, such as temperature, acidity, etc, dimension: (time1)
    • EK60: X
    • EK80: source environment_time (not yet implemented)
    • AZFP: variable temperature, source ping_time (not yet implemented)
@leewujung leewujung added data conversion data format Anything about data format labels Apr 28, 2022
@leewujung leewujung added this to the 0.6.0 milestone Apr 28, 2022
@leewujung leewujung moved this to Todo in Echopype Apr 28, 2022
@leewujung
Copy link
Member Author

@emiliom : assigned you to this for first round of check/feedback and let's see who handles the implementation.

@emiliom
Copy link
Collaborator

emiliom commented Apr 28, 2022

This looks good

@b-reyes b-reyes self-assigned this May 4, 2022
@b-reyes
Copy link
Contributor

b-reyes commented May 6, 2022

While looking at some tests, I noticed that in echopype/tests/convert/test_convert_source_target_locs.py::test_convert_time_encodings we are testing the encodings for the time. After further review I noticed that the dictionary DEFAULT_ENCODINGS in echopype/utils/coding.py does not have a key for time3. Since we are talking about time3 in this issue, should we add "time3": DEFAULT_TIME_ENCODING to this dictionary?

@leewujung
Copy link
Member Author

leewujung commented May 7, 2022

Since we are talking about time3 in this issue, should we add "time3": DEFAULT_TIME_ENCODING to this dictionary?

Yes please add it.

Also I think for things like this you could take the same approach as what @emiliom mentioned for pushing to others' PR: now you have a sense of how things should be, you could just do it and mention it in your PR comments to call others' attention. 🤓

@b-reyes
Copy link
Contributor

b-reyes commented May 9, 2022

the source of the time base will be noted in the comment attribute of these time coordinate variables.

@leewujung or @emiliom did you have a generic statement that you have in mind for the comment attribute for each of these times? Below is my initial attempt:

  • Platform group
    • time1.comment = This time-coordinate variable corresponds to GPS location.
      • Note: AZFP does not have a time1
    • time2.comment = This time-coordinate variable corresponds to platform sensors.
    • time3.comment = This time-coordinate variable corresponds to environmental variables.
      • For EK80, we will have the following comment: time3.comment = This time-coordinate variable corresponds to environmental variables. Note that Platform.time3 is the same as Environment.time1
  • Environment group
    • time1.comment = This time-coordinate variable corresponds to environmental variables.
      • For EK80, we will have the following comment: time1.comment = This time-coordinate variable corresponds to environmental variables. Note that Platform.time3 is the same as Environment.time1

@leewujung
Copy link
Member Author

Maybe we can iterate on this at PR review stage?
The only comment I have now is to shorten the first (or the only) sentence above to a form like:
"Time coordinate corresponding to XYZ."

@b-reyes
Copy link
Contributor

b-reyes commented May 9, 2022

Maybe we can iterate on this at PR review stage? The only comment I have now is to shorten the first (or the only) sentence above to a form like: "Time coordinate corresponding to XYZ."

Ok, sounds good. I will use your suggested form and we can discuss this further at the PR stage.

@b-reyes
Copy link
Contributor

b-reyes commented May 9, 2022

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

@b-reyes
Copy link
Contributor

b-reyes commented May 9, 2022

Time coordinates in Environment group

  • we will call the time base time1 across the board, and we don't currently see a use case for time2 yet

After reviewing the Environment group for EK80, I see that we have the time variables ping_time and environment_time. Do we want to leave ping_time alone or do we want to change it to time2? It seems like maybe we incorrectly characterized the times here (please see my comment directly below).

  • other variables, such as temperature, acidity, etc, dimension: (time1)
    • EK80: source environment_time (not yet implemented)

From the code, this does not seem to be correct. These variables have the dimension ping_time.

@leewujung
Copy link
Member Author

leewujung commented May 10, 2022

@b-reyes : for EK80 under set_env, if you read from the top, you will see that what's called ping_time is actually identical as the later part called environment_time -- always check where things come from in the parser is helpful. I think this is an error from way back and then became a redundancy when other variables were added recently to this group (#616), so please give it a fix. The designation we had above is current.

Note that we do not support the env_only flag anymore. I think removing this and other similar things, such as the one about NMEA (see #580) should be on the TODO list in v0.6.1.

@leewujung
Copy link
Member Author

leewujung commented May 10, 2022

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

I think tracing the code back to the upstream and try a few test files will help resolve this. Please let us know what you find in terms of what operations make them different (if they are) and we can decide on the comment from there!

@b-reyes
Copy link
Contributor

b-reyes commented May 10, 2022

@b-reyes : for EK80 under set_env, if you read from the top, you will see that what's called ping_time is actually identical as the later part called environment_time -- always check where things come from in the parser is helpful. I think this is an error from way back and then became a redundancy when other variables were added recently to this group (#616), so please give it a fix. The designation we had above is current.

Note that we do not support the env_only flag anymore. I think removing this and other similar things, such as the one about NMEA (see #580) should be on the TODO list in v0.6.1.

@leewujung Now that you point to it, I do see that in the parser ping_time and environment_time do use the same values. However, my concern was when we actually set the variable in set_env. In this function, the lines

"environment_time": (
["environment_time"],
[self.parser_obj.environment["timestamp"]]
if "timestamp" in self.parser_obj.environment
else np.datetime64("NaT"),

made it seem like there was a scenario where we could have ping_time = some value and environment_time = NaT. Based on your comments, it seems like you are very confident that this will not happen. So, I will happily remove it.

To be absolutely clear, you are suggesting that I remove environment_time as a coordinate and replace it with ping_time wherever it occurs and we will remove the env_only flag later (not in v0.6.0), correct?

@leewujung
Copy link
Member Author

leewujung commented May 10, 2022

@b-reyes : what I meant was that the ping_time you saw in there is from self.parser_obj.environment["timestamp"], which is what environment_time is set to in the code section you highlighted. Line 125-127 handles the case when this timestamp does not exist (I think this is when there's no environment XML datagram in the file -- @imranmaj could you confirm? I don't immediately see a test file corresponding to this, but maybe I missed something).

As I mentioned, I think the ping_time section in set_env was buggy. I recommend completely ignoring it and implementing what needs to be there: time1 that contains the timestamp(s) from the environment datagram(s) -- there can be one or multiple such datagrams.

And yes, removing env_only does not need to implemented now.

@b-reyes
Copy link
Contributor

b-reyes commented May 10, 2022

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

I think tracing the code back to the upstream and try a few test files will help resolve this. Please let us know what you find in terms of what operations make them different (if they are) and we can decide on the comment from there!

@leewujung looking into this, it seems like Platform.time1 and Platform/NMEA.time1 use the same parser object (self.parser_obj.nmea["timestamp"]). However, Platform/NMEA.time1 seems to be a subset of Platform.time1:

idx_loc = np.argwhere(np.isin(messages, self.ui_param["nmea_gps_sentence"])).squeeze()

time1 = (
(
np.array(self.parser_obj.nmea["timestamp"])[idx_loc]

It seems like this is a subset because we only want to select times that correspond to certain messages (or NMEA sentences). These allowable sentences are set here

out_params["nmea_gps_sentence"] = param_dict.get("nmea_gps_sentence", NMEA_SENTENCE_DEFAULT)

From the multiple files that I have tested, it seems like Platform/NMEA.time1 is often a subset of Platform.time1. For this reason, I think we need to add a slightly different comment for Platform/NMEA.time1. Perhaps Platform/NMEA.time1.comment = "Time coordinate corresponding to GPS location and allowable NMEA sentences."

Edit:

changed "Time coordinate corresponding to environmental variables and allowable NMEA sentences." to "Time coordinate corresponding to GPS location and allowable NMEA sentences."

@leewujung
Copy link
Member Author

@b-reyes : thanks for the investigative work! I think this messages selection thing is what should be removed (#580 linked above). What do you think if we remove this messages selection here, and in a post-v0.6.0 PR completely remove all related function calls and mechanism? This way we would have identical NMEA and Platform GPS time coordinates, seems cleaner this way (as level 0 product)? and users or our downstream functions can have that selection for level>1 product.

@b-reyes
Copy link
Contributor

b-reyes commented May 10, 2022

@b-reyes : what I meant was that the ping_time you saw in there is from self.parser_obj.environment["timestamp"], which is what environment_time is set to in the code section you highlighted. Line 125-127 handles the case when this timestamp does not exist (I think this is when there's no environment XML datagram in the file -- @imranmaj could you confirm? I don't immediately see a test file corresponding to this, but maybe I missed something).

Thinking out loud ... If no environment XML datagram exists in the file, then we will not have the variables sound_velocity_profile, sound_velocity_source, transducer_name, transducer_sound_speed (for EK80). In the case where an environment XML datagram does exist, then ping_time and environment_time will be the same. Since we do an if statement to check whether or not those variables can be set, it seems like in either case (we don't have an environment XML or we do have an environment XML) it is completely safe to replace environment_time with ping_time. @leewujung is this how you are thinking about it?

As I mentioned, I think the ping_time section in set_env was buggy. I recommend completely ignoring it and implementing what needs to be there: time1 that contains the timestamp(s) from the environment datagram(s) -- there can be one or multiple such datagrams.

Since it is possible that no environment XML datagram exists in the file, how do we want to set time1 in this case? I was thinking the following:

        if "timestamp" in self.parser_obj.environment:
            time1 = self.parser_obj.environment["timestamp"]
        else:
            time1 = np.datetime64("NaT")

@b-reyes
Copy link
Contributor

b-reyes commented May 10, 2022

@b-reyes : thanks for the investigative work! I think this messages selection thing is what should be removed (#580 linked above). What do you think if we remove this messages selection here, and in a post-v0.6.0 PR completely remove all related function calls and mechanism? This way we would have identical NMEA and Platform GPS time coordinates, seems cleaner this way (as level 0 product)? and users or our downstream functions can have that selection for level>1 product.

@leewujung reading up on #580, it seems like we definitely plan to remove this messages section. I think we should create a new issue for this and handle it in a different PR (as I am not sure how many downstream impacts there will be). For this issue, I suggest that we just add Platform/NMEA.time1.comment = "Time coordinate corresponding to GPS location."

@leewujung
Copy link
Member Author

@b-reyes I agree with your suggestions!

@emiliom
Copy link
Collaborator

emiliom commented May 11, 2022

Finally catching up here ... I think I agree with everything I'm seeing, including removing the messages filtering in the NMEA data. That data should be as close to the raw, unfiltered datagram as possible.

@leewujung
Copy link
Member Author

I think we can close this now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data conversion data format Anything about data format
Projects
Status: Done
Development

No branches or pull requests

3 participants