Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a mapping table to be used for handling ambiguous ICDO site/histology combinations present in multiple schemas. #113

Closed
mgurley opened this issue Oct 6, 2019 · 9 comments

Comments

@mgurley
Copy link
Collaborator

mgurley commented Oct 6, 2019

No description provided.

@sratwani
Copy link
Collaborator

Based on the 10/17 vocabulary call, Christian, Rimma and Dima need to scope the requirements out for this task. Awaiting their discussion before this task can be assigned.

@cgreich
Copy link
Collaborator

cgreich commented Oct 20, 2019

@mgurley or @dimshitc : Can you provide an example? I can't remember what this was.

@dimshitc
Copy link
Collaborator

It was decided that we need this kind of table:

icdo_site_histology | naaccr_item | naaccr_item_code | sex | csfactor25 | naaccr_item_omop_concept_id | naaccr_item_code_omop_concept_id
see #59

So, I should make this for all schemas to be consistent, right?

I see 'csfactor25' here. What about '3926' -- Schema Discriminator 1 and '3927' -- Schema Discriminator 2 ?
they can be used as well
see
http://datadictionary.naaccr.org/default.aspx?c=10#3927
http://datadictionary.naaccr.org/default.aspx?c=10#3926

@mgurley
Copy link
Collaborator Author

mgurley commented Oct 21, 2019

@dimshitc @cgreich
Yes we need a column for every possible additional NAACCR item that is needed to disambiguate a site/histology/naaccr item/naaccr item code combination. Here is a list of all the ambiguities at the NAACCR item level:

https://drive.google.com/file/d/1MeZi95W9_PCeXEoI2J8SzppeqfbkI6rm/view?usp=sharing

This list was created based on this SQL:

https://github.com/OHDSI/OncologyWG/blob/master/etl/support/debug_naaccr_item_ambiguites.sql

If you look at the first two rows of this spreadsheet you will see the following two schemas for the same ICDO site/histology/naaccr item combinations.

  • Peritoneum (excluding Gastrointestinal Stromal Tumors and Peritoneum Female Genital M-8000-8576, 8590-8671, 8930-8934, 8940-9110 for females)
  • Peritoneum for Females Only

So if you look and compare the following:
https://staging.seer.cancer.gov/tnm/schema/1.9/peritoneum/?breadcrumbs=(~schema_list~)

https://staging.seer.cancer.gov/tnm/schema/1.9/peritoneum/?breadcrumbs=(~schema_list~)

You will see that sex =1, 3-5, 9 is needed. Maybe there is a more automated way to do this via the SEER API but I have not looked into that. All in all there are only 13 schemas with ambiguous NAACCR items. So it should not be that much work. Someone needs to curate this list to discover the necessary possible other columns that need to be in the table. Only the unique list of schemas from this file needs to be looked at.
Here they are:

  • Peritoneum (excluding Gastrointestinal Stromal Tumors and Peritoneum Female Genital M-8000-8576, 8590-8671, 8930-8934, 8940-9110 for females)
  • Peritoneum for Females Only
  • Retroperitoneum
  • Malignant Melanoma of Skin, Vulva, Penis, Scrotum
  • Malignant Melanoma of Upper Lip
  • Malignant Melanoma of Lower Lip
  • Malignant Melanoma of Other Lip
  • Malignant Melanoma of Ciliary Body (excluding Iris)
  • Malignant Melanoma of Iris (excluding Ciliary Body)
  • Brain and Cerebral Meninges
  • Hodgkin and Non-Hodgkin Lymphomas of All Sites (excluding Mycosis Fungoides and Sezary Disease)
  • Other Parts of Central Nervous System
  • Hematopoietic, Reticuloendothelial, Immunoproliferative, and Myeloproliferative Neoplasms

@sratwani
Copy link
Collaborator

Per the 10/31 call, Dima's testing of the query revealed that we end up with several Value concepts for same ICDO-naaccr_item-code combination because ICDO codes drive to different schemes. To avoid this Dima will take all ICDO to Schemes from CS algorithm and take ICDO codes missing from CS algorithm from EOD. This was there wont be anymore ambiguous code besides the 13. Next steps will be to (1) replace the vocabulary (2) Change the NAACCR ETL code to incorporate the table (Robert) (3) Have one NAACCR ETL code base and run it through SQLRender to be able to generate the different dialects.

@dimshitc
Copy link
Collaborator

dimshitc commented Nov 4, 2019

  1. vocabulary is updated.

@sratwani
Copy link
Collaborator

sratwani commented Nov 5, 2019

@mgurley @rtmill Since the vocabulary is now updated with the corrections, the next step is to change the NAACCR ETL code to incorporate the changes. The Development effort for this is recorded in the Issue 166

@sratwani
Copy link
Collaborator

sratwani commented Nov 6, 2019

@dimshitc Can this issue be closed. I have another Issue 166 track the Development work.

@sratwani
Copy link
Collaborator

sratwani commented Nov 8, 2019

Vocabulary work is complete. Development effort is being tracked with Issue 166

@sratwani sratwani closed this as completed Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants