Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: SNOMED::ICD10CM Mapping Support #207

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

joeflack4
Copy link
Collaborator

@joeflack4 joeflack4 commented Feb 24, 2022

Fixes #208

New Feature: SNOMED::ICD10CM Mapping Support

  • Added feature to allow for conversion of these premade mappings provided by SNOMED into SSSOM format.

General updates

  • utils.py: Reorganized SSSOM_READ_FORMATS: Top half are plain data formats, and bottom half are special-case formats. Both halves of the list are alphabetically sorted.

Docs

  • Added a page parsers.rst, and populated with 'SNOMED complex map' info.

@joeflack4 joeflack4 marked this pull request as draft February 24, 2022 23:57
@joeflack4 joeflack4 requested a review from matentzn February 24, 2022 23:58
@joeflack4 joeflack4 self-assigned this Feb 24, 2022
@joeflack4 joeflack4 added enhancement New feature or request new-command labels Feb 24, 2022
@joeflack4 joeflack4 linked an issue Feb 25, 2022 that may be closed by this pull request
2 tasks
@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch 3 times, most recently from 95c96e2 to 7d90e91 Compare February 25, 2022 00:36
@matentzn
Copy link
Collaborator

The actual extension is looking great! @hrshdhgd can take it for a ride when the time comes.

Sorry we have not introduced you to our build system: we are using tox - please do not commit any requirements.txts here. To build sssom-py, just use tox. If you have questions on that @hrshdhgd will be happy to help!

@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch 4 times, most recently from 7398ac4 to bae870a Compare March 2, 2022 22:24
@joeflack4
Copy link
Collaborator Author

joeflack4 commented Mar 2, 2022

Current output (updated 2022/03/03 4:25pm EST):
sssom_map.tsv.zip

Screen Shot 2022-03-03 at 4 26 55 PM

@joeflack4 joeflack4 marked this pull request as ready for review March 3, 2022 21:21
@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch 3 times, most recently from 9e1fd42 to 2810cba Compare March 3, 2022 21:24
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
# or is there a SKOS predicate I can map to in case where predicate is unknown? I think most of these
# mappings are attempts at exact matches, but I can't be sure (at least not without using these fields
# to determine: mapGroup, mapPriority, mapRule, mapAdvice).
# mapCategoryId,mapCategoryName: Only these in set: 447637006 "MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapCategoryId and mapCategoryName

I'm not sure if there's any useful information here that could be put anywhere than the other field. There are only 3 values that show up in the entire mapping dataset for both mapCategoryId and mapCategoryName, and I listed them out int he comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sssom/parsers.py Outdated
# "IF LISSENCEPHALY TYPE 3 FAMILIAL FETAL AKINESIA SEQUENCE SYNDROME CHOOSE Q04.3 | MAP OF SOURCE CONCEPT
# IS CONTEXT DEPENDENT"
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA"
'predicate_id': f'SNOMED:{row["mapCategoryId"]}',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned elsewhere, I'm not 100% sure if predicate_* is best taken from mapCategory*. It's possible that mapRule and mapAdvice could be better. Would like to defer to others for judgement on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a much deeper understanding of these is needed.. Ask in the call later.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am super excited about this, and I am very happy about how much care you are taking sorting out the mapping here. Most of the things in here we need to discuss in person, too much to unpack to type up now, but two things would be probably good:

  1. Start a new "parser" documentation as described in comments above
  2. Rebrand the method to "snomed_map" rather than "snomed_icd10_map". In fact, there are two very different ones: complex and simple map. So lets figure out which of the two you are dealing with at the moment, see also: Review and potentially align SSSOM & SNOMED complex map sssom#42 and rebrand your methods accordingly. So read_snomed_simple_map() or read_snomed_complex_map() depending on what we are dealing with here.

sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Outdated Show resolved Hide resolved
sssom/parsers.py Show resolved Hide resolved
@@ -0,0 +1,126 @@
Parsers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty terrible at .rst, but here you go. Based on what you said, this should be suitable for right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sssom/parsers.py Outdated
file_path: str,
prefix_map: Dict[str, str] = None,
meta: Dict[str, str] = None,
filter_by_confident_mappings=True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your request to filter only mapping rows that contain 'ALWAYS <code>', and possible future flexibility to fine tune the mappings, I added a param here and just set it to True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sssom/parsers.py Outdated
# - Note: joeflack4: I used this info as a reference for this pattern.
# https://www.medicalbillingandcoding.org/icd-10-cm/#:~:text=ICD%2D10%2DCM%20is%20a,decimal%20point%20and%20the%20subcategory.
always_confidence_pattern = 'ALWAYS [A-Z]{1}[0-9]{1,2}\.[0-9A-Z]{1,4}'
always_confidence_antipattern = always_confidence_pattern + '\?'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found edge cases where 'ALWAYS <code>' appears, but has a ? mark after, indicating that it is not 100% confident. Example:

ALWAYS T50.905? | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | EPISODE OF CARE INFORMATION NEEDED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ms = _init_mapping_set(meta)

# Filtering
if filter_by_confident_mappings:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the filtering you requested.

It turns out that I was wrong earlier when I thought that if mapAdvice contained 'ALWAYS <code>', mapRule would always be TRUE. If this was the case, filtering by mapRule would've been much easier, but unfortunately it did not turn out to be the case.

I was thinking about a simple .str.contains('ALWAYS'), but it is possible that the word 'ALWAYS' could appear in cases other than 'ALWAYS <code>', so I decided to go with the stricter approach of a regex.

I created my own regex. But now I just realized that ICD10 is such a big thing that I should've just been googled a pre-baked regex. Here's one I found: https://www.johndcook.com/blog/2019/05/05/regex_icd_codes/. If you'd like me to use that or another regex instead, let me know.

I checked to see how many rows were removed by this filter, and it was about ~50%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch 2 times, most recently from c6a7610 to dd52b99 Compare April 20, 2022 00:57
- Added feature to allow for conversion of these premade mappings provided by SNOMED into SSSOM format.

General updates
- cli.py: Reorganized SSSOM_READ_FORMATS: Top half are plain data formats, and bottom half are special-case formats. Both halves of the list are alphabetically sorted.
…o 'SNOMED complex map'.

- Docs: Added a page 'parsers.rst', and populated with 'SNOMED complex map' info.
@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch from dd52b99 to ea18e6b Compare July 13, 2022 19:47
sssom/parsers.py Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the snomed-icd-mappings-work branch from ea18e6b to 2a29806 Compare July 13, 2022 20:27
# SNOMED mappings, all of them had correlationId of 447561005, which also happens to be 'unspecified'.
# If correlationId is indeed more appropriate for predicate_id, then I don't think there is a representative
# field for 'mapping_justification'.
# TODO: How to properly get mapping_justification?
Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 3: solved

Not sure how to use sssom_schema.slots.mapping_justification here.
edit: Harshad recommended I use SEMAPV

Issue 4

It might take a significant amount of time (maybe) to map all the SNOMED mapping fields to SSSOM mapping fields. But I should probably look into that again.


@matentzn I recommend we merge this now while this is rebased and functional, and we can make improvements to this feature later.

@hrshdhgd
Copy link
Contributor

I did some formatting Joe, I hope that was fine. The pending errors from tox are as follows:

sssom/parsers.py:273:1: DAR003 Incorrect indentation: ~<
sssom/parsers.py:704:1: DAR003 Incorrect indentation: ~<
sssom/parsers.py:745:59: W605 invalid escape sequence '\.'
sssom/parsers.py:746:66: W605 invalid escape sequence '\?'

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Jul 14, 2022

Fixed the errors:

sssom/parsers.py:745:59: W605 invalid escape sequence '\.'
sssom/parsers.py:746:66: W605 invalid escape sequence '\?'

by adding a r before the regex expression. Still the other 2 are pending. Source for solution.

I think the pending ones must have risen from using spaces instead of tabs in your IDE. Not sure, tried fixing it on my end, but it still persists.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot review this with these spurious diffs.. Can we somehow avoid these?

# IS CONTEXT DEPENDENT"
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA"
'predicate_id': f'SNOMED:{row["mapCategoryId"]}',
'predicate_label': row['mapCategoryName'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share an example of how the resulting sssom looks like, maybe, 10 rows in markdown here in this pull request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sure thing.

Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table

I don't know if this table appears very strange / unreadable to others. It wasn't the case before, but as I am editing this today (2023/11/14) it is strange now.

subject_id predicate_id object_id match_type subject_label subject_category predicate_label predicate_modifier object_label object_category author_id author_label reviewer_id reviewer_label creator_id creator_label license subject_source subject_source_version object_source object_source_version mapping_provider mapping_cardinality mapping_tool mapping_tool_version mapping_date confidence subject_match_field object_match_field match_string subject_preprocessing object_preprocessing match_term_type semantic_similarity_score semantic_similarity_measure see_also other comment
SNOMED:10000006 SNOMED:447637006 ICD10CM:R07.9 Unspecified Radiating chest pain MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Chest pain, unspecified 2021-09-01 id=4c4d2755-e65f-52dc-8353-ecf437495eb8|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS R07.9
SNOMED:10001005 SNOMED:447639009 ICD10CM:P36.9 Unspecified Bacterial sepsis MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT Bacterial sepsis of newborn, unspecified 2021-09-01 id=f7e1ed02-3107-5a6c-9439-21e9879fb746|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=IFA 445518008 | Age at onset of clinical finding (observable entity) | < 29.0 days|mapAdvice=IF AGE AT ONSET OF CLINICAL FINDING BEFORE 29.0 DAYS CHOOSE P36.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED | MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT
SNOMED:10001005 SNOMED:447637006 ICD10CM:A41.9 Unspecified Bacterial sepsis MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Sepsis, unspecified organism 2021-09-01 id=907aa2b0-a832-5254-acdf-6f366e406d2b|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=2|mapRule=OTHERWISE TRUE|mapAdvice=ALWAYS A41.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED
SNOMED:10007009 SNOMED:447637006 ICD10CM:Q04.8 Unspecified Coffin-Siris syndrome MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other specified congenital malformations of brain 2021-09-01 id=0b918513-be22-5059-beb6-e8348dfc631f|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Q04.8
SNOMED:1001000119102 SNOMED:447637006 ICD10CM:I26.99 Unspecified Pulmonary embolism with pulmonary infarction MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other pulmonary embolism without acute cor pulmonale 2021-09-01 id=7f3f0a95-0441-57ba-a58f-03d2f2736c60|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS I26.99
SNOMED:10017004 SNOMED:447637006 ICD10CM:K03.0 Unspecified Occlusal wear of teeth MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Excessive attrition of teeth 2021-09-01 id=3fa74c85-56d0-5819-97e8-80c8d4c053c0|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS K03.0
SNOMED:100191000119105 SNOMED:447637006 ICD10CM:N42.89 Unspecified Asymmetry of prostate MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other specified disorders of prostate 2021-09-01 id=51537076-a3bf-59f8-aa5f-f571dfcf32cb|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS N42.89
SNOMED:100211000119106 SNOMED:447637006 ICD10CM:M62.830 Unspecified Muscle spasm of thoracic back MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Muscle spasm of back 2021-09-01 id=a0263a65-4056-5cf9-8caa-669afce20189|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS M62.830
SNOMED:1002229008 SNOMED:447637006 ICD10CM:L75.2 Unspecified Apocrine miliaria of areola MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Apocrine miliaria 2021-09-01 id=65fa44e5-272d-54e5-b787-cb977fc24720|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS L75.2
SNOMED:1002253002 SNOMED:447637006 ICD10CM:Z28.3 Unspecified Immunization series incomplete MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Underimmunization status 2021-09-01 id=c2ab86f4-29c5-5ea3-8818-1e389d9c59a6|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Z28.3 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • match_type does not exist anymore
  • Why all these empty columns?
  • predicate_id is not legal - I looked this SNOMED rel up but could not understand what it means at all. Can you explain how "Map source concept is properly classified" is a mapping relation? how does it map to skos?
  • you should add a test that runs validate_file on the a small sample of the output I think,

Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • match_type: Oh this is an old copy. I haven't re-ran it in awhile. The code has since been updated to mapping_justification.
  • Empty columns: I haven't determined how to populate these SSSOM fields. For many of them, there may not be any information available. I have enumerated the SSSOM Mapping data model as of 2022/30 in my code comments (link). If you click that link and check it out, you'll see that I have some thoughts/suggestions/questions as to what/how to populate. Any feedback/suggestions to those questions/comments are appreciated.
  • predicate_id: I opened a review comment in the first commit about populating this field from SNOMED's mapCategoryId and mapCategoryName fields. They don't use too many mapping predicates, unfortunately. In that linked review comment, I stated that there were only 3 values in the field, but the ICD10CM mapping dataset only includes 1: MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED (447637006). You might be able to find more information in the "codebook" that SNOMED has published on these mappings. That codebook is named doc_Icd10cmMapReleaseNotes_Current-en-US_US1000124_20210901.pdf which I have zipped up for you (w/ some other related files) here:
    SNOMED ICD10CM map technical guide etc.zip
  • test: Sure thing, I'll add one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeflack4 TODO: Note to self: Add a test that does validate_file on the output.

@joeflack4
Copy link
Collaborator Author

joeflack4 commented Jul 14, 2022

@matentzn wrote:

I cannot review this with these spurious diffs.. Can we somehow avoid these?

Just to clarify, by spurious, do you mean the review comments that are linked to code snippets that are marked with the Outdated label?

I don't think that's really something that can easily be changed. The way to change that would be for me to write code once and for it to be perfect and then thus not have the need to change it. But I don't really think that's realistic. I think it's probably better to hack away at things fast, get something pushed, and then make future edits as needed; agile style.

@joeflack4
Copy link
Collaborator Author

@hrshdhgd Thanks so much for addressing the style / tox related changes. @matentzn @hrshdhgd honestly I totally forgot to run tox and make sure my code was compliant before re-pushing. Will eventually remember to do that every time, just need to make a habit.

@joeflack4
Copy link
Collaborator Author

joeflack4 commented Jul 14, 2022

@matentzn wrote:

I am super excited about this, and I am very happy about how much care you are taking sorting out the mapping here. Most of the things in here we need to discuss in person, too much to unpack to type up now, but two things would be probably good:

  1. Start a new "parser" documentation as described in comments above
  2. Rebrand the method to "snomed_map" rather than "snomed_icd10_map". In fact, there are two very different ones: complex and simple map. So lets figure out which of the two you are dealing with at the moment, see also: Review and potentially align SSSOM & SNOMED complex map sssom#42 and rebrand your methods accordingly. So read_snomed_simple_map() or read_snomed_complex_map() depending on what we are dealing with here.
  1. Yep, will do.

2.a. Simple vs Complex
You must be looking at some old code. I remember we discussed this back in March I think, and I investigated and determined that I am using the complex map. I updated my code a few months ago; all of the methods and CLI related to this feature are named "complex". I have not utilized the simple map yet (not sure where to download it either), and that is not on my radar at the moment. I think we should probably merge this feature with only the complex map implementation, and we can always add simple later.

2.b. SNOMED map vs SNOMED::ICD map
Ok, I can rename this. But just to clarify: Are you sure that SNOMED publishes other mappings that are in the same format as the SNOMED::ICD mappings? If not, then we should continue calling this "snomed_icd_map". But if they are in the same format, then yeah I agree renaming to just "snomed_map". I can look into this for you, but I don't want it to necessarily slow down this PR. The longer we wait to merge, the more time I'll have to spend rebasing my code.

@cthoyt
Copy link
Member

cthoyt commented Nov 14, 2023

Is there interest to finish this?

@joeflack4
Copy link
Collaborator Author

@cthoyt I hate to see unfinished work. I think I had basically finished "complex map" or "simple map". Other priorities took hold and this was forgotten. I leave it to @matentzn to decide, but perhaps the best thing to do is (i) close this PR, (ii) open an issue about finishing this, linking to this PR.

@matentzn
Copy link
Collaborator

@joeflack4 ok! Feel free to do
As you see fit :)

@joeflack4
Copy link
Collaborator Author

@nico OK then. The issue for this is still open:

I just updated it a little bit.

@cthoyt I could see this PR staying open for years / never getting done. So if you'd like to close it, feel free, and we can keep the above issue open.

@matentzn matentzn marked this pull request as draft November 16, 2023 12:33
@matentzn
Copy link
Collaborator

Let leave in draft stage, I can see this becoming important in mid 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new-command
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest for SNOMED-ICD10CM mappings
4 participants