Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow clean functions to handle _avg variables #377

Merged
merged 5 commits into from
Feb 25, 2025

Conversation

jmcvey3
Copy link
Contributor

@jmcvey3 jmcvey3 commented Feb 6, 2025

Makes the ADCP cleaning functions more robust - updated based on latest reader updates for dual profiling instruments. Solution for Issue #373

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcvey3 thanks for submitting this. I think most of these questions are for me but maybe I found something helpful.

Comment on lines +160 to +169
# Use "avg" velocty if standard isn't available.
# Should not matter which is used.
tag = []
if hasattr(ds, "vel"):
tag += [""]
if hasattr(ds, "vel_avg"):
tag += ["_avg"]

# This finds the maximum of the echo profile:
inds = np.argmax(ds["amp"].values, axis=1)
inds = np.argmax(ds["amp" + tag[0]].values, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag[0] could throw an error since it is initialized as an empty list.

I think this should have an else tag=[''] (or maybe its initialized as an empty string and only add _avg ?) or have better error handling:
raise ValueError("Neither 'vel' nor 'vel_avg' found in dataset")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thoughts; however, if "vel" doesn't exist, "amp" and "corr" will also not exist. The signal amplitude and correlation are the quality analysis of the velocity ping's signal.

@@ -199,7 +207,7 @@ def water_depth_from_amplitude(ds, thresh=10, nfilt=None) -> None:

ds["depth"] = xr.DataArray(
d.astype("float32"),
dims=["time"],
dims=["time" + tag[0]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to have a time average if we have a vel_avg?

Copy link
Contributor Author

@jmcvey3 jmcvey3 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hardcoded it. The data stored under the "averaging" ID has their own timestamp, and I log "time" from that data ID with the "_avg" tag.

Comment on lines 188 to 194
d = np.median(D, axis=0)

# Throw out values that do not increase near the surface by *thresh*
for ip in range(ds["vel"].shape[1]):
for ip in range(ds["vel" + tag[0]].shape[1]):
itmp = np.min(inds[:, ip])
if (edf[itmp:, :, ip] < thresh).all():
d[ip] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 188 here you are using the median on d1 and d2 and I notice on 194 you add a nan to the array.

Will d1 or d2 ever have nan? If so median will always return nan. Is that the behavior you want? Or would np.nanmedian be preferred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm this is a Levi function... I'll add nan to all of those median min and max functions, because if this is called after another QC function that would be a problem.

raise NameError("The variable 'temp' does not exist.")

# Density calcation
P = ds["pressure"].values
T = ds["temp"].values # temperature, degC
P = ds[pressure[0]].values # pressure, dbar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do all instruments use dbar (over Pa)? or is it well described in the examples? Should this be added to the docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they all use dbar because it translates nearly 1:1 as meters-beneath-the-surface (assuming the pressure sensor was zeroed before deploying, of course)

Comment on lines 393 to 404
# Fetch cell size
cs = [
a
for a in ds.attrs
if (
("cell_size" in a)
and ("_bt" not in a)
and ("_alt" not in a)
and ("wave" not in a)
)
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this:
raise KeyError("No valid 'cell_size' attribute found in dataset.")

Or will there always be a "cell_size"?

Copy link
Contributor Author

@jmcvey3 jmcvey3 Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will always be a "cell_size" if the user doesn't remove it. I'll add a code block for user input if need be.

@akeeste
Copy link
Contributor

akeeste commented Feb 20, 2025

With Sterling's commits to develop pulled in, tests are now passing

@akeeste
Copy link
Contributor

akeeste commented Feb 25, 2025

@jmcvey3 are there any other outstanding changes in this PR from @ssolson's review?

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 25, 2025

@jmcvey3 are there any other outstanding changes in this PR from @ssolson's review?

I was waiting on a response in issue #373, but looks like we're good to go.

@jmcvey3 jmcvey3 merged commit 7e91cea into MHKiT-Software:develop Feb 25, 2025
43 checks passed
@jmcvey3 jmcvey3 deleted the clean_avg branch February 25, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants