-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a mapping table to be used for handling ambiguous ICDO site/histology combinations present in multiple schemas. #113
Comments
Based on the 10/17 vocabulary call, Christian, Rimma and Dima need to scope the requirements out for this task. Awaiting their discussion before this task can be assigned. |
It was decided that we need this kind of table: icdo_site_histology | naaccr_item | naaccr_item_code | sex | csfactor25 | naaccr_item_omop_concept_id | naaccr_item_code_omop_concept_id So, I should make this for all schemas to be consistent, right? I see 'csfactor25' here. What about '3926' -- Schema Discriminator 1 and '3927' -- Schema Discriminator 2 ? |
@dimshitc @cgreich https://drive.google.com/file/d/1MeZi95W9_PCeXEoI2J8SzppeqfbkI6rm/view?usp=sharing This list was created based on this SQL: https://github.com/OHDSI/OncologyWG/blob/master/etl/support/debug_naaccr_item_ambiguites.sql If you look at the first two rows of this spreadsheet you will see the following two schemas for the same ICDO site/histology/naaccr item combinations.
So if you look and compare the following: https://staging.seer.cancer.gov/tnm/schema/1.9/peritoneum/?breadcrumbs=(~schema_list~) You will see that sex =1, 3-5, 9 is needed. Maybe there is a more automated way to do this via the SEER API but I have not looked into that. All in all there are only 13 schemas with ambiguous NAACCR items. So it should not be that much work. Someone needs to curate this list to discover the necessary possible other columns that need to be in the table. Only the unique list of schemas from this file needs to be looked at.
|
Per the 10/31 call, Dima's testing of the query revealed that we end up with several Value concepts for same ICDO-naaccr_item-code combination because ICDO codes drive to different schemes. To avoid this Dima will take all ICDO to Schemes from CS algorithm and take ICDO codes missing from CS algorithm from EOD. This was there wont be anymore ambiguous code besides the 13. Next steps will be to (1) replace the vocabulary (2) Change the NAACCR ETL code to incorporate the table (Robert) (3) Have one NAACCR ETL code base and run it through SQLRender to be able to generate the different dialects. |
|
Vocabulary work is complete. Development effort is being tracked with Issue 166 |
No description provided.
The text was updated successfully, but these errors were encountered: