-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/get bufr restructure #223
Conversation
* Updated get_buffer to become more testable by changing global variables to parameters * Added csv file with l3 data for testing * Implemented unit tests covering the current get_buffer imlemented * Fixed issue where find_positions were incorrectly expected to return a single value instead of a tuple. It could have been solved by adding parenthesis but is better to use a consistent signature of the return values.
c65fe66
to
7a80dcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mads for this rework!
I've always struggled to understand this part of the code and I have made comments about structure and readability that will hopefully make the code clearer.
I did not see anything wrong and I expect the code to work for 80% of the stations. I have listed some weird station geometries and weird cases where SR50s are misbehaving. It is fine for now to use the standard geometries everywhere and to add more flexibility to the representation of the station design in the code at a later point.
I have a problem with eccodes on my machine and therefore cannot test the code myself.
* Made variables more descriptive * Applied auto clean (black) * Updated logging
* Added earliest date as parameter * Extracted functionalities for determine bufr variables * Restructured get current data * Extracted last_data_row functionality * Updated selecting current_timestamp
* Added csv-table to support station specific dimensions * Created unit tests that explicitly checks bufr variables
* Reorder processing steps * Fix spelling * Spell correction anometer -> anemometer * Removed dev argument
* renamed csv2bufr -> bufr_utilities.py * Various cleanup and renaming * Moved realtime helper functions to real_time_utilities.py * Renamed and updated getBUFR -> write_bufr_message * Changed signature to take only variable Series and a writable fileobject as input * Add __all__ list to bufr_utilities.py * Implemented bufr_reader Configuration handling * Eliminate pypromice.postprocess.wmo_config * Add wmo id to BUFRVariables to remove station specific configurations * Moved BUFR template configurations to bufr_utilities.py * Added BUFRVariables attrs class to manage configurations instead of pd.Series Unittests * Added BUFR export tests * Test exception handling * test_land_station_export, WEB_B: e87f17d029d76cda45ee23172933a4dcca554fb3d7235116eb8ecc68105902ec * Renaemd test_bufr_export.py -> test_get_bufr
* Add process_station(..) to isolate functionalities for single stations for readability and testability * use Path as inputs * Use position seeds as input StationConfiguration * Added attrs class for handling station configurations * Implemented functions for reading and writing toml files * Renamed station_dimensions.csv -> station_configurations.toml * Move configuration from wmo_config to station_configuration * Added skip stid to station_configuration * Added skipped_variables to station_configuration Compile data checks and position export from real_time_utilities.py * Removed time filtering from get_latest * Removed position storage from get_latest. Other * Moved tx_l3_test1.csv
The previous version incorrectly used older variables for fill nan.
526fac2
to
6b3391f
Compare
* mouting type is not relevant for BUFR export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished refactoring and addressed change requests
FutureWarning: last is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead df_limited = df.last(time_limit).copy()
Handle case where station dimension attributes were None: station_configuration.barometer_from_gps Removed default path in ArgumentParser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work.
It is indeed clearer and will definitely be easier to test, adapt and debug.
Unfortunately, I do not have the time to go through the main scripts (bufr_utilities.py
and get_bufr.py
) until mid-next week. If you are ready to make it operational, and if we are still waiting for DMI to fix technical issues, then please go ahead. I'd personally switch to this new version and debug it now rather than after the initiation of data flow to GTS.
The only things I'd be careful with are potential wrong settings for export_bufr
and skipped_variables
attributes in the configuration file. By "wrong" I mean different from the previous version but also potential new station/sensors failures. But I guess you went through that file and through the data a couple of times already!
So I approve the PR for now and will have a second look whenever I have the time.
I let you decide how you want to proceed.
I have made a slight update to aws-operational-processing to provide explicit cli parameters since I removed the default paths. https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/pull/23 |
I have updated thet BUFR export functionalities to
In order to achieve that I have
bufr_utilities
contains functionalities related to BUFR serialisation including both a read and write functions.real_time_utilities
contains functionality for filtering and selecting instantaneous data from time series.get_bufr
contains main scripts for generating bufr files AWS L3 csv data - aligning with out current operational pipeline.get_bufr
and restructured the use of global variables (WMO configs)datetime.utcnow()