Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance MPR Python embedding to support all MET STAT line types #2539

Open
6 of 21 tasks
DanielAdriaansen opened this issue May 11, 2023 · 1 comment
Open
6 of 21 tasks
Assignees
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle MET: Python Embedding priority: medium Medium Priority requestor: METplus Team METplus Development Team type: enhancement Improve something that it is currently doing

Comments

@DanielAdriaansen
Copy link
Contributor

DanielAdriaansen commented May 11, 2023

Describe the Enhancement

The current Python embedding support for MPR line type data is very specific to MPR line types, and deviates from the typical MPR line type format because it assumes VERSION column is excluded. For users who choose to generate their own MPR line type (or other line type data) that was NOT generated by a MET tool, this is confusing and in conflict with the format of the MPR line type data generated when using a MET tool. Similarly, users may wish to also:

  1. Use other line types with MET using Python embedding
  2. Include additional data as "extra columns" that they may wish to use in StatAnalysis, for example

The enhancement request is to modify the current implementation of the MPR Python embedding to support all other line types, as well as additional columns of data not typically found in those line types, if the user wishes. One suggested moniker for the generalized capability is Python embedding for STAT line types (e.g. read_ascii_stat_line.py, etc.).

The work can be broken down into the following generalized pieces:

  1. We will assume that the user has a single line type of data, specified in the LINE_TYPE column of the "common STAT output" columns.
  2. We will use this LINE_TYPE, and also the MET version number (as determined by MET, not by the user's VERSION column in their "common STAT output") to determine what specific columns to look for in the users data.
  3. We will have a pre-processing layer to re-organize the columns of data from the user's order to the order MET expects for the LINE_TYPE that was identified. For example, a user may have a DataFrame with 100 columns, and one of those columns is FCST but it is not in the correct column number for the MPR line type. The pre-processing layer will extract the FCST column from the user's DataFrame and insert it in the expected column number for the version of MET being used.
  4. Additional columns will be appended to the end of the columns correctly ordered for the LINE_TYPE being used by the users. These columns will retain their names inside of MET, so that they can be referenced just like other STAT columns in tools like StatAnalysis for filtering.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED CYCLE ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

We should consider the impact of any use cases in the METplus wrappers repository.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Development issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@DanielAdriaansen DanielAdriaansen added type: enhancement Improve something that it is currently doing alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels May 11, 2023
@DanielAdriaansen DanielAdriaansen self-assigned this May 11, 2023
@DanielAdriaansen DanielAdriaansen removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label May 11, 2023
@DanielAdriaansen DanielAdriaansen added priority: medium Medium Priority requestor: NCAR National Center for Atmospheric Research MET: Python Embedding and removed alert: NEED MORE DEFINITION Not yet actionable, additional definition required labels May 11, 2023
@JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway and @DanielAdriaansen discussed additional python embedding details based on #2924 adding new MPR and ORANK columns.

Consider including the following changes:

  1. Right now python embedding of MPR data for Stat-Analysis DOES NOT include the MET version number in the first column. We assume that the input .stat data format from Python MATCHES the format of the version of Stat-Analysis being run.
  • Recommend allowing the version number to be provided via Python embedding so that the Python embedding scripts DO NOT need to be updated every time a MET .stat line type format changes.
  • MET should inspect the input data from Python and use a regular expression to check the contents of the first columns. If it matches "V*.." assumes it's a version number and use it. If not, substitute in the version number of the Stat-Analysis tool being run. Unclear if this logic should live in MET's Python embedding code or the C++ code.
  1. Recommend enhancing Stat-Analysis to check for a MINIMUM number of expected header and data columns when parsing its input and print a warning or errror message if not enough columns are provided. Note that we need to keep supporting the "extra" columns of data supplied by the GSI tools.
  2. Make sure the Python embedding warning and debug messages are as informative as possible to help users solve their own problems.

@hsoh-u hsoh-u moved this from 🟢 Ready to 🎯 Up Next in MET-12.0.0 Development Aug 22, 2024
@hsoh-u hsoh-u removed their assignment Sep 9, 2024
@hsoh-u hsoh-u moved this from 🎯 Up Next to 🟢 Ready in MET-12.0.0 Development Sep 9, 2024
@JohnHalleyGotway JohnHalleyGotway added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Oct 3, 2024
@JohnHalleyGotway JohnHalleyGotway moved this from 📖 Backlog to 🟢 Ready in MET-12.1.0 Development Oct 3, 2024
@JohnHalleyGotway JohnHalleyGotway added the requestor: METplus Team METplus Development Team label Nov 5, 2024
@JohnHalleyGotway JohnHalleyGotway removed the requestor: NCAR National Center for Atmospheric Research label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle MET: Python Embedding priority: medium Medium Priority requestor: METplus Team METplus Development Team type: enhancement Improve something that it is currently doing
Projects
None yet
Development

No branches or pull requests

3 participants