Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional entry of length or number of octets in GRIB2_Template_ csv-files #219

Closed
SibylleK opened this issue Sep 18, 2023 · 17 comments · Fixed by #290 or #307
Closed

Additional entry of length or number of octets in GRIB2_Template_ csv-files #219

SibylleK opened this issue Sep 18, 2023 · 17 comments · Fixed by #290 or #307
Assignees
Milestone

Comments

@SibylleK
Copy link
Contributor

Details

In GRIB2 Template definition files the octet number (from - to) for each entry in the GRIB section is given.
Most of the GRIB processing software packages need the length or number of octets for each entry, which has to be calculated from the specification of "OctetNo".

But with variables and repitition within some templates an automated calculation is sometimes not easy. (e.g. 37 + (ND-1)*4 + (NF-1)*4 -40 +(ND-1)*4 + (NF-1)*4)

Therefore, this is a proposal to add a column with the length of each entry in the GRIB2_Template files.

Requestor

Sibylle Krebber, @SibylleK

@shahramn
Copy link

A very good suggestion

@amilan17
Copy link
Member

amilan17 commented Sep 22, 2023

https://github.com/wmo-im/CCT/wiki/20.to.22.September.2023 notes:

  • Title_en, OctetNo, octetCount, Contents_en, Note_en, noteIDs, codeTable, flagTable, Status
  • there could be an impact on the software that is ingesting the machine-readable codes; 
  • the script to generate the TXT,CSV will need to be updated
  • template with a sample and the software can be updated for testing

@antoinemerle
Copy link
Contributor

@antoinemerle :
On roadmap purpose and planning on EUM side :

  • : need to check in the vocabulary manager what would be the impact : are we taking the data from the xml or note, how do we parse the xml in a way it is going to break or not our interfaces.
  • Estimate the cost of the impact on EUM side / implementation of this new colomn

e.g :
image

@sebvi
Copy link
Contributor

sebvi commented Nov 20, 2023

Are we implementing this? I don't see a branch created for it

@marijanacrepulja
Copy link
Contributor

I believe we agreed to address this after finalising FT2024-1.

@amilan17 amilan17 moved this to In discussion in GRIB2 Amendments Aug 22, 2024
@amilan17 amilan17 added this to the noTargetMilestone milestone Aug 22, 2024
@amilan17
Copy link
Member

https://github.com/wmo-im/tt-tdcf/wiki/2024.10.15.tt.tdcf notes:
@amilan17 will update master after merging in FT2024-2 branch

@amilan17
Copy link
Member

Do the machine-readable files also need to include OctetCount?

@sebvi
Copy link
Contributor

sebvi commented Nov 11, 2024

Octet counts are not needed on our side

EDIT: actually it is a useful validation if the count is present

@amilan17
Copy link
Member

amilan17 commented Nov 12, 2024

https://github.com/wmo-im/tt-tdcf/wiki/2024.11.12.tt.tdcf notes:

  • empty columns are created for all templates, needs checking
  • @antoinemerle can help with scripts to ensure that the TXT and XML output files do NOT include the OctetCount column
  • Sibylle it would be even better (eventually) if the octet can be calculated from the OctetCount but not necessary today
  • @antoinemerle new script to retroactively populate all octetCount columns

@github-project-automation github-project-automation bot moved this from In progress to Ready for FT approval procedure in GRIB2 Amendments Nov 28, 2024
@amilan17 amilan17 reopened this Nov 28, 2024
@github-project-automation github-project-automation bot moved this from Ready for FT approval procedure to In progress in GRIB2 Amendments Nov 28, 2024
antoinemerle added a commit that referenced this issue Jan 14, 2025
as requested in :

- #219

>In GRIB2 Template definition files the octet number (from - to) for each entry in the GRIB section is given.
Most of the GRIB processing software packages need the length or number of octets for each entry, which has to be calculated from the specification of "OctetNo".
But with variables and repitition within some templates an automated calculation is sometimes not easy. (e.g. 37 + (ND-1)*4 + (NF-1)*4 -40 +(ND-1)*4 + (NF-1)*4)
Therefore, this is a proposal to add a column with the length of each entry in the GRIB2_Template files.
@antoinemerle
Copy link
Contributor

Hi @amilan17,

I 've comit the changes (you can see it here 5646f0d)

I am going to review now the output and to verify this is working as expected

at the moment this is the correct behavior, I will use the same script to populate changes on all existing columns

@antoinemerle
Copy link
Contributor

Hi @SibylleK ,

I made the changes,

the only thing I don't know how to handle is the following

What should be the following length when :

  • 73-nn ? at the moment I put the same value for the length

@amilan17
Copy link
Member

https://github.com/wmo-im/et-data/wiki/2025.01.14.et.data notes:

  • Sibylle entries like "73-nn" are probably in the wrong column

@amilan17 amilan17 modified the milestones: noTargetMilestone, FT2025-1 Jan 15, 2025
@amilan17
Copy link
Member

@SibylleK @antoinemerle
This is an example an entry from template 3.40. Does every octetCount cell need to be populated? Is it ok to leave as is?
Image

antoinemerle added a commit that referenced this issue Jan 29, 2025
…or all templates. the script will be added in the issue #219
@antoinemerle
Copy link
Contributor

Dear @amilan17 and @SibylleK

I applied the changes as discussed in b88e9a1

I also run a script to go through tall the existing template file.

here is the script for tracing and consitency .

#!/usr/bin/env python3

import os
import csv
import re

def parse_octet_count(octet_str):
    """
    Return a string with the numeric length if OctetNo is simple, else "".
    """
    if any(sym in octet_str for sym in ['(', ')', '+', '*', 'ND', 'NF', 'n']):
        return ""

    total_length = 0
    parts = [p.strip() for p in octet_str.split(',')]
    for p in parts:
        range_match = re.match(r'^(\d+)-(\d+)$', p)
        single_match = re.match(r'^(\d+)$', p)
        if range_match:
            start = int(range_match.group(1))
            end = int(range_match.group(2))
            total_length += (end - start + 1)
        elif single_match:
            total_length += 1
        else:
            return ""
    return str(total_length)

def update_octet_count_inplace():
    """
    For each GRIB2_Template*.csv file in the current directory,
    ensure it has an 'OctetCount' column (inserted right after 'OctetNo'),
    and fill that column (if empty) based on 'OctetNo'.
    """
    all_files = [f for f in os.listdir('.')
                 if f.startswith("GRIB2_Template") and f.endswith(".csv")]

    for fname in all_files:
        print(f"Processing: {fname}")
        tmp_name = fname + ".tmp"

        with open(fname, mode="r", encoding="utf-8", newline='') as inf, \
             open(tmp_name, mode="w", encoding="utf-8", newline='') as outf:

            reader = csv.DictReader(inf)
            original_fieldnames = reader.fieldnames[:]  # make a copy

            # If "OctetCount" not in columns, insert it right after "OctetNo"
            if "OctetCount" not in original_fieldnames:
                if "OctetNo" in original_fieldnames:
                    idx = original_fieldnames.index("OctetNo")
                    original_fieldnames.insert(idx+1, "OctetCount")
                else:
                    # fallback: if no "OctetNo" either, just add it at the end
                    original_fieldnames.append("OctetCount")

            writer = csv.DictWriter(
                outf,
                fieldnames=original_fieldnames,
                delimiter=',',
                quotechar='"'
            )
            writer.writeheader()

            # For each row, fill 'OctetCount' if missing or empty
            for row in reader:
                for fn in original_fieldnames:
                    if fn not in row:
                        row[fn] = ""

                if not row["OctetCount"]:  # empty or missing
                    octet_no = row.get("OctetNo", "")
                    row["OctetCount"] = parse_octet_count(octet_no)

                writer.writerow(row)

        # Replace the old file with the updated one
        os.replace(tmp_name, fname)

if __name__ == "__main__":
    update_octet_count_inplace()
    print("Done updating OctetCount in GRIB2_Template*.csv files.")

@antoinemerle
Copy link
Contributor

antoinemerle commented Jan 30, 2025

Dear @SibylleK and @marijanacrepulja,

May I ask you to tell me if this is what we wanted to achieve.

Current implementation in the current branch

  • all GRIB 2 csv templates have now the OctetCount number filled in (when possible)
  • After each commit made by anyone this column will be also filled / added in in the ./txt/template.txt (not the XML)

example of current files :

Example of ./txt/template.txt

Title_en,OctetNo,Contents_en,Note_en,noteIDs,codeTable,flagTable,OctetCount,Status
Identification template 1.0 - calendar definition,24,Type of calendar,(see Code table 1.6),,1.6,,1,Operational
Identification template 1.1 - paleontological offset,24-25,Number of tens of thousands of years of offset,,,,,2,Operational
Identification template 1.2 - calendar definition and paleontological offset,24,Type of calendar,(see Code table 1.6),,1.6,,1,Operational

Example of the csv

Title_en,OctetNo,Contents_en,Note_en,noteIDs,codeTable,flagTable,OctetCount,Status
Identification template 1.0 - calendar definition,24,Type of calendar,(see Code table 1.6),,1.6,,1,Operational
Identification template 1.1 - paleontological offset,24-25,Number of tens of thousands of years of offset,,,,,2,Operational
Identification template 1.2 - calendar definition and paleontological offset,24,Type of calendar,(see Code table 1.6),,1.6,,1,Operational

Question :

the pending question on my side is :

@SibylleK and @marijanacrepulja : Do we need/want to populate this OctetCount in the txt file being generated after each commit or not ?

PS : I want to be sure I am not impacting any other operational SW by editing the txt file

Thanks a lot

@amilan17
Copy link
Member

@antoinemerle I think we should be consistent across the .txt and the .xml files -- and neither should include the octetCount at this time.

@antoinemerle
Copy link
Contributor

Hi @amilan17, I have finally updated the branch and scripts accordingly.

Here is a summary of the changes:

  • update the CI/CD script to not fail when generating the XML and TXT (knowing we are not populating the octet Count )
  • all the GRIB2_Template* have been updated : OctetCount column has been filled in when needed

the new behavior now to be adopted by the team should be :

  • when commit a new template : they should manually enter the right value in the template
  • the OctetCount is not going to be populated in the ./txt/ and ./xml/ Template

In the future, maybe we would like to actually run a batch script that is verifying the value of the OctetCount pushed by any of the team member.

Thanks again @amilan17 for your quick answer for any of my questions.

@amilan17 amilan17 mentioned this issue Jan 31, 2025
@amilan17 amilan17 linked a pull request Jan 31, 2025 that will close this issue
amilan17 added a commit that referenced this issue Jan 31, 2025
* Update create_master_lists.py

as requested in :

- #219

>In GRIB2 Template definition files the octet number (from - to) for each entry in the GRIB section is given.
Most of the GRIB processing software packages need the length or number of octets for each entry, which has to be calculated from the specification of "OctetNo".
But with variables and repitition within some templates an automated calculation is sometimes not easy. (e.g. 37 + (ND-1)*4 + (NF-1)*4 -40 +(ND-1)*4 + (NF-1)*4)
Therefore, this is a proposal to add a column with the length of each entry in the GRIB2_Template files.

* Update create_master_lists.py

remove any keys not in 'fieldnames' to avoid ValueError

* xml,txt files

* Update create_master_lists.py

make the count to be computed when the limit is not fixed but variable

* xml,txt files

* run the script update_octetcount.py to update the OctetCount number for all templates. the script will be added in the issue #219

* Replace Length per octetCount and remove it from the xml

* xml,txt files

* remove the octetCount in the txt and from the CI/CD

* update the create_master to avoid any issue while populating fields in the xml and txt

* update the create_master to avoid any issue while populating fields in the xml and txt

* Apply suggestions from code review

* xml,txt files

---------

Co-authored-by: antoineMerleEUM <[email protected]>
Co-authored-by: Enrico Fucile <[email protected]>
Co-authored-by: antoinemerle <[email protected]>
@github-project-automation github-project-automation bot moved this from In progress to Ready for FT approval procedure in GRIB2 Amendments Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for FT approval procedure
6 participants