Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move upper air data capability into Siphon #147

Merged
merged 4 commits into from
Sep 12, 2017

Conversation

jrleeman
Copy link
Contributor

Addresses #130 and moves upper air data support into Siphon. Still a sketch given that we have some questions on where exactly this should live. I've used the in memory capability of netCDF4-python here, but without wrapping the Variable like we do in MetPy, units are a no go. Then again, we'd be depending on MetPy's unit registry at that point, so maybe we return a plain netCDF Dataset? I'm thinking a tool in MetPy that goes through and deals with appending units from metadata might be useful anyway.

-------
:class:`metpy.io.cdm.Dataset` containing the data

.. deprecated:: 0.6.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this warning mirrored in siphon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we can get rid of it.

@lesserwhirls
Copy link
Collaborator

Do we have tests in MetPy that we can move over to Siphon?

@jrleeman
Copy link
Contributor Author

There are some tests that will need to be re-mocked. Especially my test on server return codes.

Copy link
Contributor Author

@jrleeman jrleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few places I need to update or would like input on @dopplershift

from io import BytesIO
import numpy as np
import pandas as pd
from metpy.calc import get_wind_components
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning on porting get_wind_components from metpy into an _tools.

:class:`pandas.DataFrame` containing the data

"""
endpoint = WyomingUpperAir('http://weather.uwyo.edu/')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the URL here, I'm feeling like the object setup isn't ideal here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in the constructor for the class...see below.

('td', 'dewpoint')]:
key_data, key_units = info[key]

insert_with_units(data, name, key_data, key_units)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add u, v, speed, direction still.

Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my thoughts. You might also look at ncss.py to see how I did things there.


class WyomingUpperAir(HTTPEndPoint):
"""Download and parse data from the University of Wyoming's upper air archive."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to go is to add a __init__ that takes no arguments and passes the url to:

super(WyomingUpperAir, self).__init__('myurlhere')

Should also include 'cgi-bin/sounding' in that url.

a file-like object from which to read the data

"""
path = ('cgi-bin/sounding?region={region}&TYPE=TEXT%3ALIST'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's encompass this in a Query object for Wyoming. Might need to refactor DataQuery to allow a query object for this to inherit useful behavior.

@@ -0,0 +1,279 @@
# Copyright (c) 2013-2015 University Corporation for Atmospheric Research/Unidata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this file is going away.

'&YEAR={time:%Y}&MONTH={time:%m}&FROM={time:%d%H}&TO={time:%d%H}'
'&STNM={stid}').format(region=region, time=time, stid=site_id)

resp = self.get_path(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With everything above, this becomes self.get_query(query).


# Grab the stuff *between* the <PRE> tags -- 6 below is len('<PRE>\n')
buf = resp.text[data_start + 6:data_end]
return BytesIO(buf.encode(encoding='UTF-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably just move the contents of parse here since we're not trying to have the same common structure as we did in MetPy. Might be able to leverage pandas' own parsing code to do this work too.

:class:`pandas.DataFrame` containing the data

"""
endpoint = WyomingUpperAir('http://weather.uwyo.edu/')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in the constructor for the class...see below.

fobj = endpoint.get_data(time, site_id)
info = endpoint.parse(fobj)

direction, spd, spd_units = info['wind']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the pandas manipulation should be in get_data in the WyomingUpperAir class.

from metpy.calc import get_wind_components


def get_upper_air_data(time, site_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think:

  1. This function should be a staticmethod on WyomingUpperAir
  2. It should just wrap the process of building a query and calling get_data.

@dopplershift
Copy link
Member

Happy to discuss more...probably easier to get into details now that you have a concrete picture.

@jrleeman
Copy link
Contributor Author

Will need to have a discussion about using query tomorrow, but I have reworked the parsing completely and it's now pretty robust as far as I can tell.

@dopplershift dopplershift modified the milestone: 0.6 Sep 6, 2017
@CLAassistant
Copy link

CLAassistant commented Sep 12, 2017

CLA assistant check
All committers have signed the CLA.

@jrleeman jrleeman force-pushed the Upperair_Data branch 2 times, most recently from 81f0815 to 058b6ac Compare September 12, 2017 13:46
@jrleeman
Copy link
Contributor Author

Ready for re-review @dopplershift

@jrleeman
Copy link
Contributor Author

Also notice that some of the indexes in the test changed from MetPy - we're returning the full data frame, even if the first rows (lowest altitudes) are all NaN's. I think that's fine, but am up for disagreement.

@dopplershift
Copy link
Member

Is there an easy way with pandas to remove empty rows?

@jrleeman
Copy link
Contributor Author

Technically not an empty row as the populate height still.... I'll see if there is an easy way to cut if all other fields are NaN

@jrleeman
Copy link
Contributor Author

See what you think about this. If we like it, I'll fixup. Not the nicest solution, but the end result is nice. You must drop the row only when all the columns specified are NaN, then reindex so the index column starts a zero and is continuous.

@dopplershift
Copy link
Member

I'm curious why you don't think it's a nice solution? It looks like a perfectly straightforward one-liner to me.

@jrleeman
Copy link
Contributor Author

Seems verbose for a simple task. i.e. manually specifying the index rest and needed to give a kwarg to prevent maintaining the old index as a column. It is a one-liner, just seemed overly long for a simple task.

@dopplershift
Copy link
Member

Me thinks someone's standards are a tad high. 😈 Lest I remind you this is how we did this before:

            if any(np.invert(np.isnan(values[1:]))):
                arr_data.append((level,) + values)

Not exactly the bastion of simple code either.

@jrleeman
Copy link
Contributor Author

Don't burst my idealistic bubble 😄 - just fixed up, pending CI, we're good to go.

@dopplershift
Copy link
Member

I would have figured that bubble had been burst many times over the last 6 months. 😁

@dopplershift dopplershift merged commit 259ee92 into Unidata:master Sep 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants