Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Seismic Data Ingestion Without Dimension-Specific Indexing #248

Merged
merged 47 commits into from
Aug 25, 2023

Conversation

markspec
Copy link
Contributor

Resolves #241 also fixes typing in a number of places that might also resolve #242. A new "AutoIndex" grid override has been implemented that can be used in conjunction with the "trace" index name to automatically generate a trace index and cn be used to ingest non-regularized data.

@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2023

Codecov Report

Merging #248 (3e401e2) into main (cafef57) will increase coverage by 0.21%.
Report is 16 commits behind head on main.
The diff coverage is 85.77%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #248      +/-   ##
==========================================
+ Coverage   84.23%   84.44%   +0.21%     
==========================================
  Files          46       46              
  Lines        2017     2186     +169     
  Branches      325      348      +23     
==========================================
+ Hits         1699     1846     +147     
- Misses        282      297      +15     
- Partials       36       43       +7     
Files Changed Coverage Δ
src/mdio/commands/segy.py 0.00% <ø> (ø)
src/mdio/converters/segy.py 73.78% <50.00%> (-2.47%) ⬇️
src/mdio/segy/utilities.py 83.72% <66.66%> (-3.78%) ⬇️
src/mdio/segy/geometry.py 91.30% <84.81%> (-4.09%) ⬇️
src/mdio/api/accessor.py 90.42% <100.00%> (ø)
src/mdio/segy/parsers.py 98.03% <100.00%> (+0.03%) ⬆️
tests/integration/conftest.py 100.00% <100.00%> (ø)
tests/integration/test_segy_import_export.py 100.00% <100.00%> (ø)
tests/unit/test_segy_grid_overrides.py 100.00% <100.00%> (ø)

@tasansal tasansal changed the title 241 nonreg Support for Seismic Data Ingestion Without Dimension-Specific Indexing Aug 15, 2023
@tasansal
Copy link
Collaborator

@markspec we need to change the logic here. When we ingest this way, we are losing self-describing information from the MDIO file. i.e. it is unclear what indices the trace key flattened. This information gets lost.

We need to add dimensions without coordinates; i.e. just dimensions. This way they won't mean anything but we should preserve index keys names. Some examples:

("inline", "crossline", "offset", "azimuth")

becomes

("inline", "crossline", "trace")

and there is no record of what went into trace. Instead we should still ingest it as:

("inline", "crossline", "offset", "azimuth")

However, it should know that offset and azimuth as not indexed. We should implement a simpler version of non-index and index dimensions concept like Xarray has. See here.

@tasansal tasansal added the enhancement New feature or request label Aug 15, 2023
@markspec
Copy link
Contributor Author

@tasansal, your proposal is highly confusing to the user. I feel your proposal would confuse the user and only obfuscates what the dataset is. If the user wants or needs a ("inline", "crossline", "offset", "azimuth") from a ("inline", "crossline", "trace") dataset they need to explicitly handle this in a downstream application e.g. through 5D reg. "trace" is in no way related to offset or azimuth and should not be thought of in that way.

@tasansal
Copy link
Collaborator

Ok, I see what you're saying. However, still, I wouldn't say I like the API. Will come back to it later :)

@tasansal
Copy link
Collaborator

tasansal commented Aug 17, 2023

@markspec after speaking w/ Santh, Cable, and @srib we decided on the following. The NonBinned is explicit; and we preserve the names and bytes the user intended to use if it was gridded. This way during data exploration; users know the lineage of the data. We can save "names" and "bytes" into the coordinate's attributes with another key/value pair as "non_binned" = True.

Non-Binned 3D Seismic Offset Example

segy_to_mdio(
    ...,
    index_bytes=(189, 193),
    index_names=("inline", "crossline"),
    chunksize=(4, 4, 1024),
    grid_overrides={"NonBinned": True, "chunksize": 16}
)

Non-Binned 3D Seismic Offset / Azimuth Example

segy_to_mdio(
    ...,
    index_bytes=(189, 193),
    index_names=("inline", "crossline"),
    chunksize=(4, 4, 1024),
    grid_overrides={"NonBinned": True, "chunksize": 16}
)

Non-Binned 2D Seismic Offset Example

segy_to_mdio(
    ...,
    index_bytes=(21,),
    index_names=("cdp"),
    chunksize=(4, 1024),
    grid_overrides={"NonBinned": True, "chunksize": 16}
)

Binned 2D Seismic Offset with Duplicate Traces Example

segy_to_mdio(
    ...,
    index_bytes=(21, 37),
    index_names=("cdp", "offset"),
    chunksize=(4, 16, 1024),
    grid_overrides={"HasDuplicates": True}  # chunksize would be set to 1
)

There may be some things we are all missing; but I'm sure they'll come out while being implemented.

@markspec
Copy link
Contributor Author

@tasansal just a note on implications. Currently GridOverrides is setup to modify and return index_headers. To implement your suggested API change would require a refactor of GridOveride to return and modify index_names and chunksize.

@markspec
Copy link
Contributor Author

@tasansal , get_grid_plan() would need to modify the chunk_size which is very ugly!

@tasansal
Copy link
Collaborator

made some changes to the examples per our discussion

@tasansal
Copy link
Collaborator

NonBinned and HasDuplicates can use the same AutoIndex grid override and be specified in OverrideHandler; and chunksize defaults to 1.

@tasansal
Copy link
Collaborator

@tasansal , please can you take a look at the safety test.

looks like the poetry lock fixed the issue. I am working on some minor fixes, will merge the PR today.

@tasansal
Copy link
Collaborator

tasansal commented Aug 24, 2023

@markspec

Please check this diff:
https://github.com/TGSAI/mdio-python/compare/f6b64f2fa08435ba50f01ae84f07bcda7d10a895..markspec:mdio-python:3e401e2713871e363948fe67cec1e8f3b582c089?diff=split

It is from where you left off f6b64f2 to my final version 3e401e2).

PR is ready to merge unless you see something I messed up. All tests pass.

@markspec
Copy link
Contributor Author

@tasansal , looks good to me.

@tasansal tasansal self-requested a review August 25, 2023 14:23
@tasansal tasansal merged commit 5694c18 into TGSAI:main Aug 25, 2023
@markspec markspec deleted the 241-nonreg branch August 30, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parse_trace_headers bug Support for segy ingestion of non-regularized data
3 participants