-
Notifications
You must be signed in to change notification settings - Fork 24
CDM Vocabulary Subgroup Meeting
a. Description - To have a definitive categorization of cancer diagnosis under which we can define the modifiers. Connect every ICDO or SNOMED to the Modifiers that we get from CAP, NAACCR, AJCC. Taking the actual Tumor Modifiers and linking them up, between each other and the one we pick as a standard.
b. Status/Next Steps –
i. The effort to create the Attributes/Modifiers is in progress. It’s 50% complete per Christian.
ii. AJCC codes - Pull the AJCC codes from NCIt. Christian is working on sucking this information out and should be done soon. What kind of codes are these? Source of the coding. Joe asked about the provenance of the AJCC codes (Where are they coming from).
iv. CCDH – NCIt - Setup a call with terminologists to discuss OHDSI challenges/questions/issues.
vii. Disambiguate in NAACCR schema -> NAACCR schema’s are disconnected between the models and the AJCC staging within NAACCR. This precludes us from saying that her’s the model and you have to use this model to describe any type of cancer diagnosis. Currently, in the NAACCR schemas there is some dis-ambiguity there is no real connection between the schema and the AJCC staging. Need to discuss this further with Michael.
2. Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event (#123) - Description/What do we need is below:
i. Methodology/Rule for identification/abstraction of subsequent disease episodes
i. Our outreach to reach out to Debra Schrag /Jeremy and check to see if the methodology is open source.
ii. PRISSMM is a guidance on chart abstraction. mCOde and NCIt are options.
iii. Asieh has identified some methodology based on Lit Review
ii. Algorithm for abstraction of Disease Episodes for situations where this information is not clearly represented in the data
iii. List of subsequent Disease Episodes
i. Asieh has folks helping her get all the literature on general codes for Occurrence and Metastasis. Michael also has extrapolated the episodes
ii. Rimma/Michael/Asieh to work on coming up with the list
iv. Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event
v. Modeling for subsequent Disease Episodes - WG is currently putting a proposal together for final review.
vi. NAACCR ETL work to incorporate the new model
3. Fix high-level treatment-like domain assignments. #49 (Chemo, Surgery, Radiation, Targeted Therapy)
a. This work is already in progress. Rimma’s team (Tatyana) have done a good bit of work and we are closer to completion.
To have a definitive categorization of cancer diagnosis under which we can define the modifiers.
Taking the actual Tumor Modifiers and linking them up, between each other and the one we pick as a standard.
Key discussion points:
The plan is that we will connect every ICDO or SNOMED to the Modifiers that we get from CAP, NAACCR, AJCC. The effort to create the Attributes/Modifiers is in progress. It’s 50% complete per Christian.
Rimma is going to give us the OncoTree to ICDO mapping.
Christian will also provide examples related to the cancer work and terminology questions/challenges to Melissa as she has offered services for mapping to solve the problems we have identified.
4. Relationships between CAP Protocols and Organs -> We wanted to know for a CAP protocol what sites/histology it applied to.
Plan is to Ingest AJCC schemas and link them to SNOMED equivalence to build a hierarchy that combines NAACCR schemas, SNOMED and attributes from AJCC.
Vlad to produce a model drawing before we get into mapping like what we had for the NAACCR schema and how we mapped it to each precoordinated concept of histology.
Joe talked about the need for authorization. Since AJCC does not tell us what has changed in their staging which makes it difficult for Cancer registers to know what is transformed from one edition to the next. The best bet is to create separate concepts code and create attributes and linkages. Dima will discuss the above with Christian.
6. https://github.com/OHDSI/OncologyWG/issues/123 -> Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event
Rimma and Michael will propose something based on their assessment of NCI. For phase 1, we’ll just have Recurrence, Progression, Remission, Metastasis etc. Standardizes list of disease episodes.
Asieh has folks helping her get all the literature on general codes for Occurrence and Metastasis, we can get a consensus and try to converge on that an create our own. Asieh plans to have this very soon.
Andrew will reach out to Jeremy about their chart abstraction methodology for validation of the automation.
Version 1 -> pull the AJCC codes from NCIt. Pull them out, reformat them. For the vocabulary team, they can just start with injecting AJCC instead of the whole NCIt. Christian is working on sucking this information out and should be done soon.
8. Fix high-level treatment-like domain assignments. #49 (Chemo, Surgery, Radiation, Targeted Therapy)
This work is already in progress. Rimma’s team (Tatyana) have done a good bit of work and we are closer to completion.
Georgina has raised an issue with the preservation of referential integrity based on the polymorphic identity. To solve the referential integrity problem the recommendation is to have an interface table that’s an event interface that handles the polymorphism. Per Georgina's recommendation, the interface table would allow any clinical table to be conceptualized as an 'event' and used flexibly in a polymorphic fashion for episodes, and also potentially other contexts.
CREATE TABLE event (event_id INTEGER NOT NULL, episode_event_field_concept_id VARCHAR, PRIMARY KEY (event_id))
CREATE TABLE episode (episode_id BIGINT NOT NULL, episode_source_value VARCHAR(50), PRIMARY KEY (episode_id))
CREATE TABLE episode_event (episode_id BIGINT NOT NULL, event_id BIGINT NOT NULL, PRIMARY KEY (episode_id, event_id), FOREIGN KEY(episode_id) REFERENCES episode (episode_id), FOREIGN KEY(event_id) REFERENCES event (event_id))
The Oncology team wants more details on the use case and counter arguments that indicate what the possible performance hits or the effect on the ETL that the not structuring it the it's suggested above would have.
ICDO defined cancer differs from SNOMED defined cohort differs from CAP. We need to figure out where things fall and which vocabulary we use as the authority and which others we link into this. We want to build the direct links and the hierarchy which will take care of the fact that these things can changes as we go down in the hierarchy.
To have a definitive categorization of cancer diagnosis under which we can define the modifiers.
Taking the actual Tumor Modifiers and linking them up, between each other and the one we pick as a standard.
We create behind the scenes, for each detailed condition, a link to the appropriate attribute we pick from the diff sources (CAP, NAACR AJCC etc.) the alternative is we create category concepts, people must maneuver from the individual diagnosis down to the modifiers.
The plan as summarized by Christian is that we won’t have explicit schemas they will all be behind the scenes. We will connect every ICDO or SNOMED to the Modifiers that we get from CAP, NAACCR, AJCC. The effort to create the Attributes/Modifiers is in progress. It’s 50% complete.
To contribute to the effort, Scott can get us the question component, for example, what is the histological type (primary neoplasm) of organ x? Give us the concepts of every cancer that CAP is talking about. For us to create a cleaned-up CAP representation that can be used for generic concepts in the OMOP repo of data we give Scott our Lego bricks. Christian is working on parsing CAP and NAACCR into the Lego bricks and is going to give Scott his Legos.
Rimma is going to map OncoTree to ICDO and do the same thing we’ve done with SNOMED. Still determining the scope and the timing, Rimma will let us know.
For CCDH to help us, Melissa asked that since they already have an inventory of cancer terminologies it would help if we sent her an actual technical assessment of what overlaps exist, where is the actual connect present and where are the touch points.
Christian is working on putting a list of Tumor Modifiers together. We use NAACCR and CAP checklist as a starting point assuming its comprehensive. Run our comprehensive/consolidation/relevance approach with Melissa.
Christian will also provide examples related to the cancer work and terminology questions/challenges to Melissa as she has offered services for mapping to solve the problems we have identified.
We have not looked at NCIt and Melissa can help us connect with them.
Joe shared a similar effort he’s been working on with consolidating terminologies. Joe will share his table and Ontology framework with Christian (presentation to Alison Van Dyke).
Ingest AJCC schemas
Link them to SNOMED equivalence to build a hierarchy that combines NAACCR schemas, SNOMED and attributes from AJCC.
6. https://github.com/OHDSI/OncologyWG/issues/123 -> Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event
All Prism concepts are covered by NCI. Rimma and Michael will propose something based on their assessment of NCI. For phase 1, we’ll just have Recurrence, Progression, Remission, Metastasis etc.
Once we have the list of ‘Disease Episodes’, we then need to map NAACCR to whatever it is we choose. -> Rimma will talk to her investigators/look at PRISM and solidify her thoughts on what needs to be instantiated in the vocabulary.
Michael and Asieh are going to re-group to discuss the vocabulary first version. Asieh has folks helping her get all the literature on general codes for Occurrence and Metastasis, we can get a consensus and try to converge on that an create our own. Asieh plans to have this very soon.
Will pull the AJCC codes from NCIt. Pull them out, reformat them. For the vocabulary team, they can just start with injecting AJCC instead of the whole NCIt. Christian is working on sucking this information out and should be done soon.
8. Fix high-level treatment-like domain assignments. #49 (Chemo, Surgery, Radiation, Targeted Therapy)
Rimma and Michael are meeting the week of 7/20 to discuss further and put a proposal together.
Georgina (Australia) is using python to automate the ETL process from MOSAIQ into Oncology. For the Episode_Event table, she wants this table to have a primary key which currently it’s a join table and it doesn’t have a primary key. The reason for her needing this primary key is that she is using an object relationship mapper which requires all tables it deals with to have a primary key. Recommendation from Christian and Andrew is to go ahead and do it their OMOP model as a one off. This recommendation may pose a challenge as she’s going to be providing this ETL to multiple locations in AU. Michael will reach out and discuss further with her. How does this play into when the organizations in the US want to use the ETL for MOSAIQ to OMOP?
Protocol mapping work. https://forums.ohdsi.org/t/limits-of-useful-precoordination/11246 -> Where we want to have the Chemo represented, fully normalized in a commutable fashion. Georgina wants to go further and have treatment event table whereby she would want to define not only the drugs that belong to a given regimen, but also their dosage and schedule.
Michael has the treatment plans in his data and also the schedule and dosage however they’ll have to NLP it since its non-structured.
The discussion shed light on the fact that this is a strategy that potentially scales beyond cancer. This is bigger than cancer and we should think about treatment guidelines holistically. Andrew will respond to the forum post and propose a systematic conversation with Georgina’s efforts for the future.
-
Relationships between CAP Protocols and Organs -> We wanted to know for a CAP protocol what sites/histology it applied to.
a. Vlad found that the Protocol does not relate to organ/body site but to a disorder. Vlad’s recommendation is to build the relationships between a Protocol and a Disorder.
b. Vlad incoporated AJCC to the list and compared CAP Protocols, NAACCR Schemas, ICOO names, SNOMED names, AJCC schemas.
c. AJCC schema looks similar to NAACCR. CAP has some gaps.
d. Snomed/icdo ends up being more detailed than ajcc and naaccr.
e. Per Dima, we can take ajcc schema, look at equivalence in snomed, then make the specific concept classes for pure snomeds, assign them ajcc equivalence, finally create a pre-coordination from them based on AJCC. Try this out in the Prostate example.
f. As next steps: > Rimma and Dima will work on a model drawing before we get into mapping > Ingest ajcc schemas, link them to snomed equivalence to build a hierarchy that combines schemas, snomed and attributes from ajcc. Once we have v 8, the api can give us their schemas and their asso to icdo sites and histology. This might only be a problem with hematlogic ones, lip and neruroendocrin, for others we should be good with v 7. For the POC, we cant release ajcc 8 but can 7 so we’ll work with 7. A few things will be missing.
-
Fix high-level treatment-like domain assignments. #49 (Chemo, Surgery, Radiation, Targeted Therapy) a. Rimma and Michael are meeting the week of 6/22 to discuss further and put a proposal together.
-
https://github.com/OHDSI/OncologyWG/issues/123 -> Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event
a. Next Steps
i. List of ‘Disease Episodes’ that are complete and normalize it to a standard vocabulary (possibly SNOMED). ii. Once we have the list of ‘Disease Episodes’, we then need to map NAACCR to whatever it is we choose. iii. Need to define what the difference is between 'Disease Recurrence' and 'Disease Progression' iv. Do we need 'Disease Metastasis' -> Could be a child of progression. But local recurrence is not always a child of progression based on clinical practice. Metastasis is always progression. v. 'Disease Relapse'?? -> Is this the same as Recurrence? vi. We need something that says ‘Unknown’. There are a lot of cases where we don’t know if it’s something that’s progressed. There is a term where they detect metastasis w/o knowing the origin of primary. vii. Determine modeling for Disease Episodes beyond Disease First Episodes
CAP is in Athena; Need evaluation license to access CAP. Christian working on license with CAP
Next Steps:
PIONEER needs help so we could consider doing Prostate mapping next. Per Christian, no funding is available -> They need the standards for their upcoming hackathon. We could do a provisional just for Prostate by creating concepts, reserving the concepts, handing it over to them. We only care about the standards so they can just take the UK specific codes and map it over. We don’t need to care about the source ingestion.
Rimma will check on what the next priority is so we can decide which organ to map next
Andrew has connections with the Allen Institute of Brian Science. They may have interest in cancer mapping and terminology work that we can ask about if they are willing to fund the effort
-
https://github.com/OHDSI/OncologyWG/issues/11-> Add 'Registry' type concept to all domains - Complete - In the future if we need subtypes, we can add NAACCR, Norway etc.
-
NAACCR Issues -> Dima will review his approach with Michael
https://github.com/OHDSI/OncologyWG/issues/234 -> fix missing leading zeros in the concept_codes for NAACCR values. Dima wants to investigate further with Michael and compare the concept_codes
https://github.com/OHDSI/OncologyWG/issues/195 -> Add leading zeros for race.
https://github.com/OHDSI/OncologyWG/issues/138 -> Publish NAACCR vocabulary ingestion code
- CAP Issues
Variable names -> Dima put short names with ‘..’ at the end to indicate the name truncation, the full name will be in the concept_synonym
Relationships between CAP Protocols and Organs -> We wanted to know for a CAP protocol what sites/histology it applied to.
i. Vlad found that the Protocol does not relate to organ/body site but to a disorder.
ii. Example
<Skin.Melanoma.Res.259_2.004.001.REL_sdcFDF> - Merkel Carcinoma, <Breast.DCIS.Res.211_3.001.011.REL_sdcFDF> - DCIS.
iii. We need to figure out a layer that is not anatomical, but something else (NAACCR calls it "schema"). We should compare how NAACCR, AJCC, SNOMED and ICDO (and maybe others like NCIt) do it.
iv. Vlad’s recommendation is to build the relationships between a Protocol and a Disorder.
v. Dima’s suggestion, map it to SNOMED / ICDO (if there's no such a SNOMED concept)
vi. Christian’s suggestions, classification used in clinical settings may be a mixture of histology and anatomy (Which is how NAACCR schemas and AJCC chapters organized). We have the combo ICDO histology/anatomy. We should see how it all fits.
vii. Vlad shared his spreadsheet, where for each CAP protocol, the equivalent representation in NAACCR and SNOMED were captured. ‘concept_code’ fields have protocols that were created during the time of ingestion of CAP, linked protocols to existing NAACCR Schemas. The goal is to pick a standard to link the CAP Protocol with the disorder. In general, NAACCR schemas can cover most CAP protocols. The left overs (22) are not covered due to multiple schemas existing in NAACCR. Example, Skin Merkel. CAP is more general in some cases while in some others NAACCR is more general.
viii. If we identify SNOMED as the standard, then we are good. But in SNOMED we cannot identify a level in the hierarchy which we want to call schema. We hope that we can, with some fixing get snomed to be the master hierarchy, we can put cap under it.
x. We could either ask SNOMED to fix or add SNOMED extension. Michael’s suggestion is to not consider NAACCR schemas and just use SNOMED hierarchy.
ix. Next step -> Vlad to add AJCC to the list and highlight the ones that are problematic and then we can look at the whole picture.
- https://github.com/OHDSI/OncologyWG/issues/123 -> Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event
Below is the current standard ‘Disease Episodes’: • Disease First Occurrence • Disease Recurrence (local vs metastatic) • Disease Remission • Disease Progression
Next Steps
i. Rimma will come up with a list of ‘Disease Episodes’ that are complete and normalize it to a standard vocabulary (possibly SNOMED).
ii. Once we have the list of ‘Disease Episodes’, we then need to map NAACCR to the 'Disease Episodes'-> Rimma will talk to her investigators/look at PRISM/ mCode and solidify her thoughts on what needs to be instantiated in the vocabulary.
iii. Other tasks/questions:
* 'Disease Metastasis' -> Could be a child of progression. But local recurrence is not always a child of progression based on clinical practice. Metastasis is always progression.
* 'Disease Relapse'?? -> Is this the same as Recurrence?
* Need to define what the difference is between 'Disease Recurrence' and 'Disease Progression'
* We need something that says ‘Unknown’. There are a lot of cases where we don’t know if it’s something that’s progressed. There is a term where they detect metastasis w/o knowing the origin of primary.
- CAP Breast Cancer (CAP Header) mapping issues identified by Michael -> Resolved.
Per Dima, there are links between CAP headers and closest attributes and attributes are related to each other like they are in CAP hierarchy. 'CAP parent item of' relationship can be used to get the lower level attributes.
-
https://github.com/OHDSI/OncologyWG/issues/268 -> Full CAP vocabulary ingestion.
a. CAP will be in Athena tomorrow b. PIONEER needs help so we could consider doing Prostate mapping next. Christian will discuss funding c. Rimma will check on what the next priority is so we can decide which organ to map next d. Andrew has connections with the Allen Institute of Brian Science. They may have interest in cancer mapping and terminology work that we can ask about if they are willing to fund the effort
-
CAP Issues
a. Variable names -> Dima put short names, this is related to the issue that was discussed with Christian that name does not represent the meaning. When Dima started to concatenate the hierarchy items then the length issue arose, so they had to cut the name somewhere in the middle. b. Will we have a binding from the protocol to the anatomic sites that it’s for OR we won’t do that and there will always be one question that’s in the CAP checklist and we would use that to find what the bindings are for the anatomic site? For example, if the prostate checklist is only for ICDO3 sites that are prostate. Do we have protocol linked to organ? Need to know for each CAP protocol programmatically which list of tumor attributes is most appropriate for it -> Vocabulary team built some type of substitution, we can find the direct protocol for which variable a value belongs to. c. For each protocol there are sites listed under the question of the site – Per Michael, sometimes there is sometimes there isn’t. In Breast, there are 2 parallel protocols (1) Pathology (2) Biomarkers, are these 2 identical for all protocols or are they completely disconnected? Dima confirmed that they are disconnected. Rimma asked about Breast Invasive Carcinoma that the Odysseus team has mapped for MSK. In these mappings for Breast Cancer, are there any indications of sites, how were those mapped and what were they mapped to? d. Manually map protocol to organ encoded in ICDO3 site since there are only 100. Reach out to Richard Morgan whether they have them and just have not given this to us. For protocols that have a question called tumor site you can steal it from there or for those that done, we can just manually do it. This will allow us to know what tumor attributes you should apply it to, if we have ICDO3 sites on hand
-
NAACCR Issues Listen to 2:35 a. https://github.com/OHDSI/OncologyWG/issues/234 -> fix missing leading zeros in the concept_codes for NAACCR values b. https://github.com/OHDSI/OncologyWG/issues/195 -> Add leading zeros for race c. https://github.com/OHDSI/OncologyWG/issues/138 -> Publish NAACCR vocabulary ingestion code
-
https://github.com/OHDSI/OncologyWG/issues/287 NCI assessment to bridge tumor attributes gap a. Took NAACCR breast schema and assessed their representation in NCI. NAACCR has better representation. Will review with the team once full assessment is complete. b. CCDH NCIt terminology discussion c. Melissa from the CD2H program is willing to help and will setup a meeting to discuss the terminologies further
-
https://github.com/OHDSI/OncologyWG/issues/123 -> Vocabulary support for representation of other 'Disease Episodes' beyond 'Disease First Occurrence' like Recurrence, Progression, Stable Disease, Remission, End of life event a. http://datadictionary.naaccr.org/?c=10#1880 (NAACCR variables for recurrence) Already in Vocab, NAACCR has a long list of ways they encode recurrence. We need to map the NAACCR list to the high-level list. Do we have all the right disease episodes? Can we find (the below) them in a standardized vocabulary, so they are not OMOP generated? Rimma will provide guidance on how we should model disease episodes? We need to come up with a list of disease episodes that are complete and normalize it to a standard vocabulary (possibly SNOMED). Once we have the list of disease episodes, we then need to map NAACCR to whatever it is we choose. Rimma will talk to her investigators and look at PRISM (not allowed to share this yet). We can look at mCode. Rimma will take it on and solidify her thoughts on what needs to be instantiated in the vocabulary. http://datadictionary.naaccr.org/?c=10#1861 (Recurrence Date)
b. Ingest NAACCR recurrence values into OMOP – Done
c. Create mappings from these values to our standard 'Disease Episode' concepts. Below are the current standard disease episode: • Disease First Occurrence • Disease Recurrence (local vs metastatic) • Disease Remission • Disease Progression We are missing the following: • 'Disease Metastasis' • 'Disease Relapse' Further questions: o Need to define what the difference is between 'Disease Recurrence' and 'Disease Progression' -> Recurrence is a relapse you may have stable disease/progressive disease so there is a distinction. Its going to be a hierarchy so we have to give this a thought. o Disease metastasis should be one of the disease episode concepts. We have a primary in whatever location, then primary metastases to something else. Then you create child diseases episodes of the primary episodes and make the episode object concept id used as component of the site to which it travelled. o We need something that says unknown. There is a lot of cases where we don’t know if it’s something that’s progressed? There is a term where they detect metastasis w/o knowing the origin of primary.
There may be cases where NAACCR has these variables ‘metastasis of lung’ where NAACCR is tracking where metastasis is present, where a primary is found. That’s not always how they happen. NAACCR variables are put in the measurement table, when NAACCR tells us that there is a metastasis identified at the time of primary, we modify the primary with metastasis diagnosis modifier, in this case it’s not a disease episode but a modifier of the primary. But in cases where years later after primary, then that should be a subsequent disease episode that has its own entry in the episode table as the child of the primary. Should we do this and how? • Rimma will investigate the 'Disease Episode' and see if we should source them from an authoritative source • Do the normal routine of looking in the SNOMED forest to map NAACCR Value to standard concepts.
-
Vocabulary support for ‘Response to Treatment’ ??
-
https://github.com/OHDSI/OncologyWG/issues/49 -> High level treatment concepts (Surgery, Radiation, Targeted Therapy) -> Michael and Rimma are writing a proposal with explicit references for the group to review.
-
CAP Breast Cancer mapping issues identified by Michael -> Michael found that CAP Headers’ are not fully bound. Per Mik, CAP Headers were meant to be structural information. If we are not going to map them properly then we should not map them at all. It’s not important for the CAP Headers to be there. Michael uses them for joins to find actual variables. The use case is that having the section will be very useful for people doing NLP, so we should try to get the right. Example -> This for ETL on how people harvest data. Michael will write up an example. No action needed on the Vocabulary side.
-
Proposal for handling Tumor Registry w/o dates -> Out for review
Create high level treatment concepts (Chemo, Radiation, Surgery, Immunotherapy, Targeted Therapy etc.) in order to produce episodes about the treatment.
- A simple compare across SNOMED, HemOnc.org, NAACCR and NCI concludes there is no one available solution in any of the terminologies we have explored.
- There is a need to identify the treatment concepts so we can start using them. It’s a simple task and there is a lot of sources but not a lot of concepts.
- Michael and Rimma are writing a proposal to make it ready by next week. With explicit references and give it to the group for consideration. Domain experts have looked at the codes, take high level concepts used in multiple registries and mapped them together. This will be a hierarchy maintained by us and a bunch of domain experts and we will not depend on the changes that may happen with ATC, SNOMED which are not domain focused. This will include targeted therapy also. ASCO will be helpful. These will be propagated throughout so we should pay attention.
- The proposal will include the creation of new domain for Treatment Regimens (Jeremy’s). Drug vs Regimen
- Once the high-level concepts are added to the vocabulary -> > a. For concepts from Cancer Registry, it will directly map to top level concepts (defined above). > b. Where there is no registry information and only Claims/EMR information is available, there is a task to link drugs to the high-level concepts that we are creating, put them in hierarchy with the highest level as being identified above.
> RadLex, a specialized radiology terminology, recommended by MSK Radiology Informatics is being evaluated. It
is very comprehensive and covers every nuanced aspect of breast radiology. RadLex procedures are linked to
LOINC (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6016707/). Other aspects of RadLex like anatomic site or
finding are not linked to any standard terminology. MSK team is exploring it now and will make a proposal of
incorporating it into OMOP vocabulary.
> RadLex originated code exist in OMOP concpet_ids for measurement. The radiology procedures are stored in
Procedure_Occurrence table. We need to standardize the concept_ids for radiology procedures, too.
> Looking at RadLex vs SNOMED and running into high level vs low level granularity and how it projects to the
real-world data. Not ready to give a proposal. Can give an overview around mid-July.
https://forums.ohdsi.org/t/proposal-to-store-radiology-diagnosis-report-in-omop-cdm/8797
Evaluation of NCI to close the Tumor Attributes gap -> We need to figure out what is in NCI, where is it coming from, is it good and fill the attributes. Dima will do an assessment of NCI and present to the team.
CAP issues with breast cancer -> Within breast if we count the number of questions, the breast biomarker reporting template, there should be one variable present for every pdf contained more or slightly diff things. Michael will provide examples of variable in pdf but not in database for Dima to investigate.
We need to get back to CAP and summarize our evaluation. A rough overview of our evaluation of how it will work for our purposes. Mic is working on this evaluation to be reviewed by Christian.
-
To create high level treatment concepts (Chemo, Radiation, Surgery, Immunotherapy, Targeted Therapy etc) below are the mappings
a. Chemotherapy -> concept_id =35807188 -> Hemonc chemotherapy b. Immunotherapy -> concept_id = 35807189 -> Hemonc Immunotherapeutic c. Endocrine -> concept_id = 21603812 -> ATC L02 ENDOCRINE THERAPY 35807188 -> Hemonc chemotherapy 35807189 -> Hemonc d. Immunotherapeutic should be linked via 'Is a' to 21601387 (ATC 'Antineoplastic'). Details on the comparison can be found in the link https://github.com/OHDSI/OncologyWG/blob/master/documentation/hemonc_vs_atc.xlsx e. For Surgery, there is a SNOMED concept. f. For Radiation Therapy, Dima investigated SNOMED, HemonOnc.org and NCI. The Radiation Concepts from the NCIT list https://ncim.nci.nih.gov/ncimbrowser/ConceptReport.jsp?dictionary=NCI%20Metathesaurus&code=C0877346 (1) There are no overlaps between SNOMED and HemOnc.org. HemOnc.org has regimens as descendants. SNOMED represents vastly different list of concepts in terms of the semantic types/attributes like Oral Therapy, Gama Therapy etc. (2) Essentially, HemOnc.org and SNOMED cannot be compared. For the high-level concepts for Radiation, most of the concepts were found in SNOMED. SNOMED is larger but lacks the granularity we need in oncology. Certain concepts are not represented well in SNOMED. With SNOMED, most of the granular Radiation Therapy procedures are represented by CPT codes, there is no formal link between CPT and SNOMED. This is a huge undertaking to do the linkage. (3) In conclusion, for Radiation Therapy the hierarchy present in SNOMED and HemOnc.org are vastly different. (4) NCI does not have hierarchy. NCIT (OROT and Thoracic Society) has CPT to SEER classifications that are used in the cancer domain and widely accepted. Using NCI will allow us to work with a well- established classification system instead of creating our own. 5. For Diagnostic Representation SNOMED is a very good foundation. For Radiation Therapy procedures, we need a well-defined classification system which NCI offers. g. For Targeted Therapy (does not exist in OMOP) -> Targeted Therapy drugs are spread across HemonOnc.org Chemo and ATC Antineoplastic. Ask Jeremy why he does not have a component class called Targeted Therapy. Seems like he’s mis-classifying things. Dima will review with Christian the suggestion to bring NCIT into OMOP and add OMOP Extension
-
For tumor diagnostic records that don’t have dates (MSK COVID patients) Rimma’s investigation concluded that the dates cannot be inferred from EHR data. Rimma will write up a proposal on how to model this and send it out for internal review before bringing it up with the Modeling team
-
HemOnc 'Chemotherapy' category is relatively clean. Should be connected via 'Is a' relationship to ATC L01 'Antineoplastic', HemOnc ‘Immunotherapy' is relatively clean having exactly cancer targeted medications. Should be connected via 'Is a' relationship to ATC L01 Antineoplastic
-
'HemOnc Endocrine' contains a lot of drugs that could be used in non-cancer treatment, for example 'Fluoxymesterone' is used for treatment of low testosterone levels in men, delayed puberty in boys, breast cancer in women, and anemia. The recommendation is that this category should be ignored. Instead, ATC L02 'ENDOCRINE THERAPY' should be used to define ENDOCRINE CANCER THERAPY ATC L03 IMMUNOSTIMULANTS and L04 IMMUNOSUPPRESSANTS categories contain the drugs which can be used in other than cancer conditions.
-
Based on the above findings, below are the recommendations:
i. Chemotherapy -> concept_id =35807188 -> Hemonc chemotherapy
ii. Immunotherapy -> concept_id = 35807189 -> Hemonc Immunotherapeutic
iii. Endocrine -> concept_id = 21603812 -> ATC L02 ENDOCRINE THERAPY 35807188 (Hemonc chemotherapy) and 35807189 -> Hemonc
iv. Immunotherapeutic should be linked via 'Is a' to 21601387 (ATC 'Antineoplastic').
v. Details on the comparison can be found in the link below -> https://github.com/OHDSI/OncologyWG/blob/master/documentation/hemonc_vs_atc.xlsx
To create high level treatment concepts in order to produce episodes about the treatment for both detailed drug information and were we only have high level drug information from cancer registry sources, we need to:
-
Define concepts that will go to the episode table that will represent the high-level information for Chemo, Radiation, Surgery, Immunotherapy, Targeted Therapy etc.
a. Chemo, Immuno and Endocrine are discussed above.
b. For Surgery, there is a SNOMED concept.
c. For Radiation Therapy, investigation is needed between SNOMED and HemonOnc.org. Comparison needs to be done (similar to the comparison between HemOnc.org and ATC) to choose best concepts for Radiation Therapy. The Radiation Concepts can also be obtained from the NCIT list below.
d. For Targeted Therapy (does not exist in OMOP) -> Targeted Therapy drugs are spread across HemonOnc.org Chemo and ATC Antineoplastic. Ask Jeremy why he does not have a component class called Targeted Therapy. Seems like he’s mis-classifying things. One suggestion is to bring NCIT into OMOP and add OMOP Extension concepts like what was done for COVID concepts. Dima will run this by Christian for his agreement on the approach of bringing in NCIT.
Concept | Concept |
Targeted Therapy | Therapeutic or Preventive Procedure |
Targeted Molecular Therapy | Therapeutic or Preventive Procedure |
Targeted Fusion Protein Therapy | Therapeutic or Preventive Procedure Targeted Therapy Agent |
Radiotherapies, Targeted | Therapeutic or Preventive Procedure Targeted cancer therapy |
Targeted Protein Toxin Therapy | Therapeutic or Preventive Procedure |
Targeted radionuclide therapy | Therapeutic or Preventive Procedure |
Once the high level concepts are added to the vocabulary, for concepts from Cancer Registry, it will directly map to top level concepts (defined above). Where there is no registry information and only Claims/EMR information is available, the task is to link existing drugs to the high-level concepts that we are creating, put them in hierarchy with the highest level as being what we have identified in #1 above.
Discuss the creation of new domain for Regimens (Jeremy’s). Dima suggested getting rid of ‘Regimen’ domain and assign the Regimes to the ‘Drug’ domain. Rimma does not think this will work because whatever is placed in ‘Drug’ domain has specific set of drug attributes which regimes don’t have. They don’t belong to the ‘Drug’ domain. This needs to be discussed further.
a. Below are the Hemonc modalities shared by Rimma:
i. Antibiotic therapy
ii. Anticoagulation
iii. Chemoimmunotherapy
iv. Chemoradio immunotherapy
v. Chemoradiotherapy
vi. Chemotherapy
vii. Growth factor therapy
viii. Hormonotherapy
ix. Immunosuppressive therapy
x. Immunotherapy
xi. Radioimmunotherapy
xii. Radiotherapy
xiii. Supportive therapy
b. Some of these above can be rolled up to be parents of others
c. To compare Hemonc.org to ATC, Michael took all Regimen from Hemonc.org and joined them to components/ingredients
via concept_relationship and excluded relationship_id: cr1.relationship_id not in ('Has supportive med'), which
left "Has antineoplastic", "Has immunosuppressor", "Has local therapy"
https://drive.google.com/drive/folders/19r_5uLezurJUSZJGArPaorDReyvARYPH?ths=true
a. Dima will create a Venn Diagram comparing ATC and Hemonc by Drug -> For the ones that don’t overlap, analysis
needs to be done from a clinical review perspective on whether those concepts need to be included from Hemonc or ATC
and why. There is a slight nuance to how the classification is done within Hemonc. Rimma will send the 2 files
(Regimen by modalities and ingredient classification) to Dima.
b. Talk to Jeremy and show him the hierarchy (Combine the modalities so it becomes a common hierarchy). Both Hemonc
and ATC have classifications. Analysis of how they differ. We can go to Hemonc and say this is what ATC has done and
it’s good so why don’t you do it this way.
c. Queries to compare the classification can be put on GitHub
a. The Vocabulary team mapped CAP to standard concepts in OMOP (LOINC, NAACCR etc.). Used Nebraska Lexicon and their
mapping to SNOMED Ext as target concepts. Other OMOP standard vocabularies were used to represent clinically relevant
CAP entities.
b. Link to the CAP mapping issues and approaches with examples can be found in the link below:
https://docs.google.com/presentation/d/13H5aneGgeoJkIHRFkwJsGOw-Iz7SZk0gqkKDq7Grjis/edit#slide=id.g7535a59aa9_0_93
c. Dima reviewed the issues encountered with the mapping with specific examples.
i. Some concepts were not mapped. Some concepts were mapped to more generic concepts.
ii. For issues related to loss of context -> Looked at prostate and all other cancers and the decision on
tumor size still needs to be made.
iii. Loss of hierarchy -> Approach with mapping to generic is incorrect. It’s not the right one so we lost the
context. Joe stated that margin is not well modeled. Reports within the same institution are different and not
standard. ERPR is also not very standard. CAP puts templates which are very hard to put in practice. The
resolution is to create a new mapping type called ‘postponed’.
a. Team to review the mapping presentation (link above) and come prepared to ask questions in the next meeting
b. Dima to quantify, #s on how often we see these problems broken down by the categories.
-
Standard concepts for highest levels of treatment episodes (Systemic therapies (e.g. chemo, immunotherapy), Radiotherapy, and Surgery)
Episode Concept will remain the same, either ‘Disease Episodes’ (Disease First Occurrence, Disease Recurrence, Disease Progression, Disease Remission, Treatment Regimen, Treatment Cycle, Episode of Care) or ‘Treatment Regimen’.
We want to create standard concepts for Surgery and Radiation Therapy (List below) and put them in Episode_Object_Concept_ID.
For Standard concepts for highest levels of treatment episode objects, we might consider using Hemoc.org (See below). These are already being used for Immunotherapy, Endocrine/Homonetherapy and Chemotherapy.
Below is a version from the CCC19 Registry (Hemonc.org). These high-level concepts need to be included. These high-level concepts can be parents of the NAACCR concepts (For now, Surgery and Radiation).
685, Cytotoxic chemotherapy (Hemonc.org for Chemotherapy) 694, Immunotherapy 58229, Targeted therapy 691, Endocrine therapy 695, Radiotherapy 14051, Surgery (Already in Athena as non-standard concept so we want these to become Standard Concept and then make it parent of all NAACCR Surgery concepts) 45186, Transplant/Cellular therapy 45215, Intravesicular therapy (e.g., BCG)
Dima found that some of the above concepts do exist in SNOMED. Dima will evaluate to make sure that there are no duplicates and if SNOMED concepts already exist then we'll sue SNOMED otherwise use Hemonc. If they already exist in SNOMED then we could use SNOMED instead of Hemonc. Focus should be on high level and if there are good SNOMED values then we can make those the parent for the NAACCR concepts. We don’t have Targeted Therapy (High level) already so we bring this and map to Hemonc (58229, Targeted therapy).
Dima will present the final assessment and proposal to the team in the next meeting along with suggestion on the domains these high-level concepts will belong to.
-
Treatment Code list for the Cancer COVID study was finalized including Drugs, Surgery and Radiation.
-
Mik presented CAP mapping borrowed from Nebraska Lexicon for Breast Cancer. Link to the high-level presentation can be found on the wiki page under
https://github.com/OHDSI/OncologyWG/wiki/CAP
Direct mapping only exists for invasive carcinoma
Dima and team are working on including the new version of HemOnc to OMOP. Current version is from last year. Link to HemOnc documentation has been provided to Donna for her review and feedback.
Current version does not have temporality or has poor representation. Dima and team are looking at the source more carefully, he plans to bring back their findings and discuss further with Jeremy.
Status on vocabulary projects so far -> The Odysseus team is working on processing what they have from Nebraska Lexicon. They are still waiting for C-Keys/permission to use the C-Keys. Mik is working on the licensing with CAP. Dima is identifying a few issues with Nebraska Lexicon. One such issue is with the description that Scott has provided. This description is different from CAP so it’s not a 1-1 mapping.
Question for Nebraska Lexicon:
- Pre-coordinated new concepts that Dima found in Nebraska Lex and see how Scott is handling those and discuss our approach.
- Nebraska is consistent within itself but it does not concur with the SNOMED guidelines. They use relationships that are present in SNOMED but they apply values that are not allowed to be used by the relationship per SNOMED model. Concepts of class ‘Observable Entity’ cannot have relationship to Specimen per SNOMED rules.d not specimen.
Next Steps with Donna: Donna has an internal mtg to understand SEER challenges to adopt the OMOP model. She plans to discuss these challenges with the Oncology WG.
- Data standardization project
->Execution and plan for data quality checks Need to establish quality measures related to consistency of relationships; Absence of dup names etc.
->To address the quality issues in the short term, Rimma will post HemOnc issues on Github; Mik will review all issues that have been reported by Rimma, work with Rimma to prioritize and propose a plan to fix them.
->In the long term, Mik, Dima and CR will discuss a systemic approach and establish the rules and execute them on a regular basis. Every release of the vocabulary will need to have QA, documentation.
- ICDO and Nebraska
-> Eduardo shared the proposal around current states and what pathways (2 pathways; 2nd pathway has SNOMED extensions; Easier to maintain) we could take to improve them. Get C-Keys from CAP directly. C-Keys map to SNOMED and ICDO. Nebraska mapped it directly and not using C-Keys. Nebraska is also going to work with CAP to release C-Keys to us.
->Is there a diff between ICDO mapping between what Nebraska does and what SNOMED does? CAP has ICDO -> SNOMED mapping. Lexicon maps site histology to SNOMED. We don’t know if they are reusing the mappings that CAP is releasing.
->Need to verify with Nebraska Lexicon whether they are validating CAP’s mapping? Mik will follow-up with Scott if they have been correcting any existing SNOMED-> ICDO mappings. Do they correct attributes?
->Meeting with SNOMED on the 2nd of April-> Discuss our problems/issues with their content team and figure out how they can help us. We want to know from them if they can take on SNOMED extension and add it to SNOMED. Essentially, we are looking for answers to questions like -> How do you do it? how can you scale it up? how can we help?
SNOMED Meeting in April
(1) Issues related to mappings and extensions of SNOMED based on gaps Rimma has found in the coverage of cancer diagnosis during the mapping of ICDO to SNOMED.
(2) Christian has requested a meeting in April to discuss further.
SNOMED Issues #240
(1) Rimma discovered some additional issues with ICDO and duplicate concepts. This is of high importance given this is foundation diagnostic vocabulary and we are mapping data to it. (https://github.com/OHDSI/OncologyWG/issues/240). There are 500 unique names that have duplicates. Some of them are deprecated concepts. Some of them are cancer’s with different behaviors but they are still reflected with same type of cancer. Even if their naming does not have differentiation, we must have differentiation in our vocabulary.
(2) Need to check in real data if the 3-character codes are present. NU tumor registry does not use 3 character ICDO sites. If they show up in real data, then we should know how to represent them and understand the nature of the deprecation.
(3) Dima and Eudard will investigate each concept separately and make decisions on how to handle the duplicates.
(4) Either build mappings (map to) from old/deprecated to new codes. Another approach suggested is creating SNOMED extension for these depreciated ICDO codes. ICDO maps to SNOMED extension.
(5) Rimma’s suggestion -> Identify the duplicates, if they are describing class then we move to classes. If there are changes in behavior or change to completely new concept, then we should deprecate the old and map the new. For deleted ones, we can deprecate them.
** Meeting Notes - 1/23/2020 **
- Finalize AJCC agenda
Michael, Rimma and Christian will be there in-person. Michael Kallfelz (Odysseus) to dial into the conversation with IMO based on when the meeting is scheduled. A doodle poll is sent out for the AJCC meeting.
Agenda for the meeting: •The approach with AJCC will be to request codes and description. •The reason for needing the codes is that even though there are people that know the AJCC codes from other environments, per Christian, description will be used for research and understand the provenance of the staging •If we don’t get the codes from AJCC, then we can map them to something else. They can’t be standard concepts. •There is a possibility to get the codes and basic description in a non-proprietary format for internal vocabulary construction. •They are only allowing us to see the API results in the context of this meeting, so we want to make sure we spend time going through the API, seeing the results of how the staging variables display with code, desc and guidance then determine what we need. Try to determine how to map it to other staging systems. •Ahead of time, we can go through the AJCC documentation and review it so we know what the API generally looks like. Michael will send the AJCC documentation to the participants of this discussion. •Michael will provide an outline of the issues with the AJCC variables and provide examples to the vocabulary team to get started on their review. •Michael will create of missing codes list and description of how AJCC is used in NACCR and in clinical and pathologist check list. We need to support it as a standard vocabulary or as a source vocabulary. •Our goal with this list is to get to a Cancer specific set of attributes and their values that makes sense to experts and allows us to conduct our use cases with a enough precision without boiling the ocean. We are considering the various systems to steal from. What is the minimal set of attributes we need for each cancer. •Discuss the licensing. We pretty much know what to put on the table
- IMO agenda
• Discuss relationship with OHDSI in terms of licensing. • With the MSKCC funded project to standardize NAACCR and CAP, the visit to IMO needs to determine if they are willing and have resources to get into this collaboration. • From a technical standpoint, there is a need to look at the NAACCR BC data and the CAP protocols. Andy has all the documentation he will need to do the initial assessment on NAACCR and should be able to ask questions and provide feedback. • Given that everything goes well, and they are willing and ready, we will discuss how we’ll move forward with the project • Through the discussion with IMO, we may end up in the outcome that they are not willing to share their mapping with OHDSI but will do the work for MSKCC. This is an acceptable outcome of this discussion. • OR they won’t want to do the work for MSKCC and we walk out empty handed. • IMO provides terminology content for their clients and performs mappings to standards. They are collecting content from multiple clients are building their content system. In this respect, there is a lot in this for them because they have not done much in the Oncology domain. • We need to sell this to IMO’s management, so we need to get them in our F2F conversation as well.
-
AACR presentation by Joe Sirintrapun (MSKCC) -> Compare and contrast ICD-O-3, NCIT, SNOMED-CT, and OncoTree strengths and weaknesses – NEXT MEETING • For the meeting with Joe next week, the plan is to share the questions we have in preparation for our AJCC and IMO discussions. • We can ask Joe his opinion on our ambitious task of mapping between different international variety of staging systems. Joe is familiar with the Campbells so it would be helpful to introduce him to OHDSI.
-
Align on PIONEER agenda:
- Starting with the raw list, walk the PIONEER team thru the (1) Attributes we collected for supporting research use cases in a network. (2) Attribute assignment (3) Shift things between variables and values (4) Deduplicate the values (5) Questions for the PIONEER experts We want to reach out because they are building an OMOP network with cancer data, they want to use standardized analytics. They have good use cases that they plan to execute for which they are in search of data. The potential of this collaboration with PIONEER could be we help them in running studies in our network and they can help us running studies on their network. They also offered to help us with our tumor specific attributes. The plan is to present to them what we did, ask them access our approach, to make an evaluation of what we are missing with the attributes.
-
Feedback on proposal submitted by Rimma for the representation of numeric concepts 10:55 (https://github.com/OHDSI/CommonDataModel/issues/321).
Currently, we have the relationship to numeric solution as a part of the Oncology module. The problem with this approach is that it makes recommending the extension harder as it sounds like an offshoot thing.
In order to get someone to do the extension they have to go to a different location to get additional ddls and vocabulary loads through non standard routes, making the effort look less solid and challenging in embracing it.
We could propose to the bigger CDM group 2 solutions (1) Relationship to numeric is one solution. (2) Other solution is to do what we did with the Drug Strength table which does something similar. We can submit the relationship to numeric but the question from the bigger group could be, how many ways of doing the same thing are we proposing? We could suggest an additional table which allows to map pieces of reference table to nontraditional information/non conceptual information. What is this table going to look like? Plan is to then go to the CDM group with both the proposal and seek their guidance on picking one. One of the problems with the solution 2 is that for us to make the solution more generic, the folks that created the Drug Strength table need to come together to find a generic solution. Since our proposal (solution 1) is tested and we have outlined diff use cases and the proposal can be presented to the larger group.
** Meeting Notes - 1/16/2020 **
Issues to discuss
-
AJCC visit -> discussion, planning and prep a. Michael to provide an outline the issues with AJCC and will provide an overview with examples to the Vocabulary group. List of values that are not present in NAACCR. b. Do we need Descriptions of the AJCC codes? Need to understand this better. If we only take the codes, then we may not need the license.
-
NAACCR next steps a. We need to continue to group variables. Any questions that come up in the process, we can reach out to NAACCR. b. We should focus on the most prevalent staging system and then tackle the others. c. Michael is going to provide locations of where we could possibly find the SEER cross-walk d. Going through variables that we are sure we need to use like size, laterality, metastases etc. We’ve seen them used in NAACCR, CAP, we know these variables are a part of the cancer model and we know that we are duplicating values across 150 cancer stages. We need to remove the duplication and reconcile versioning system e. Need to figure out how to resolves variables (date/period specific) that have duplication problems due to time spans and making sure how to map these. f. Engage with Donna and perhaps use her as a facilitator/resource to engage with NAACCR
-
Issue # 84 -> Michael wanted to discuss this with the leaders of the Oncology WG. We need Christian to have this conversation.
-
ICDO vocabulary issues have been documented. We standardize ICDO to SNOMED. Overall quality will be much better until SNOMED 3.2 release which is scheduled for April after which the plan is to release the fixes to the ICDO vocabulary issues.
-
There is an inventory of mapping between ICDO and SNOMED. We have identified a lot of gaps in terms of granularity. SNOMED was willing to put the pre-coordinated concepts as well as any missing histology. Suzy Roy will investigate further with the SNOMED team.
-
Issue # 49 -> Impact of moving this to the backlog. Need to discuss this with Christian and create a proposal for presentation to the bigger Vocabulary group. This issue was not discussed.
-
Issue # 11 -> Christian has confirmed the type concept change is complete. Discuss next steps. -> Christian is still in the process of completing disease status.
-
Issue # 40 -> Rimma has submitted a proposal on the Github where all CDM proposals reside. This should be a #1 priority for the next meeting. This task needs to happen sooner than later as by not being a part of the official extension there is a lot of one offish work that needs to happen
-
Issue # 118 -> Michael has provided the CSV format per Claire's requirement to document the OMOP Oncology Extension. Rimma is going to reach out to Claire for a data dictionary for meta-data fields. Since the CSV format is provided to Claire, our part is done.
-
Progress on Vocabulary documentation
** Meeting Notes - 1/2/2020 **
Completed issues:
-
Issue # 139 – This is complete.
-
Issue # 217 – This is complete. For now, all the relationships that we are aware of are in. If we discover more, we’ll create a new task.
Issues discussed:
- Issue # 220 – Further assessment of this issue to evaluate how much data gets skipped at NU and Tufts reveled minimal impacts. The plan is for MSK to run the query against their OMOP data to further assess the impact. If the impact remains low then any modeling changes can be delayed for a while.
- Issue # 240 – Eduard has completed documentation for the mapping of ICDO to SNOMED and shared with others members for their review. Once reviewed the changes can be released.
- Issue # 234 – Dima is also working on resource allocation for this task. These are duplications we artificially created. This task needs to be reviewed before the call with NAACCR to ensure any questions coming out of this can be addressed with NAACCR.
- Issue # 200 – This was a modeling effort so need to assign it to Rimma or someone from the Development team.
- Issue # 40 – A proposal for the support of numeric vocabulary concepts need to be submitted/presented in the CDM/Vocabulary team for OMOP as this is bigger than just Oncology. Christian has recommended we put this in the backlog as the CDM/Vocabulary group is focusing on V6.
- Issue # 118 – Need to provide documentation with description of the tables (column names, datatypes) based on Claire’s requirements. Once these are provided to Claire in the csv format this task will be completed.
- Issue # 11 – Christian has completed the consolidation of the type concepts and is ready to release it. Dima can start working on consolidating the ‘Registry’ type concept.
- Issue # 49 – Recommendation is to make change in ‘Drug’ domain definition. The idea behind this is that broad categories like chemotherapy, immunotherapy their concepts should have domain ‘drug’. We need approval from broader community that the ‘drug’ domain should have categories and not just purely drug information. Reaching out to Christian and Rimma to see if this is something, they can bring up with the OMOP Modeling team.
** Meeting Notes - 1/2/2020 **
Issues discussed:
- Issue # 220 – Further assessment of this issue at NU and Tufts reveled minimal impacts. The plan is for MSK to run the query against their OMOP data to further assess the impact. If the impact remains low then any modeling changes can be delayed for a while.
** Meeting Notes - 12/26/2019 **
Issues discussed:
-
Issue # 220 – Need technical documentation on the issue and approach to resolve the issue. Direction 1 - Christian’s suggestion - Create every ICO combination and pre-coordinate with viable values. Direction 2 - Instead of #1 do what was done in the previous version where combination was related to both schemas then it should end up with same variable. Dima to try out both the above solutions to evaluate the results and approach.
-
ICDO and NAACR documentation for ingestion. Data quality queries for ICDO to confirm the approach taken has been followed throughout. Rimma will help with creating the queries. They have been completed for topography and next step is to work on histology. Rimma has sent Dima the ICDO issues for Dima to evaluate. NAACR data quality checks will come after the NAACCR ingestion. Some examples of data quality checking include checking all relationships that are supposed to be at each pre-coordination should have topography and histology. Making sure there are no topography that do not have any pre-coordinated terms. Checking mappings between topography and SNOMED understanding how many have been mapped. For NAACCR, data quality checks would be a connection of schema to the appropriate ICDO code. Connection between schemas and variables. Validating that vocabularies are internally consistent. Besides documentation for ingestion of NAACCR we also need the code that ingests NAACCR and it should be made publicly available. Related to data quality checks, Michael emailed a list of staging variables that are missing from the ingestion of NAACCR because the values were not available in the source that we pulled the NAACCR from. Every variable that is of type ‘categorical list’ should have relationship ‘has answer’. Dima will take a look at this and come up with a solution. Dima and Christian will evaluate the duplicates/overlaps between CAP and NAACR for breast cancer and bladder cancer. This task is on the calendar for week or 12/30.
-
Issue # 224 – For the issue related to building hierarchies in NAACCR we do not know what to do with this use cases until these are clarified with NAACCR. We need to find out clearly what variables are replacement variables in terms of conversion from collaborative staging to current staging. Also variables that are a part of the collaborative stage where they might be redundant so we may not need it.
-
Issue # 221 – Nothing new to report with this issue. Odyssues team is working on this as part of the project with MSK.
-
Issue # 222 – These are duplications we artificially created. This task needs to be reviewed before the call with NAACCR to ensure any questions coming out of this can be addressed with NAACCR.
-
Issue # 176 – Complete
-
CAP is publishing a new version in Feb. They have mappings from ICDO sites to SNOMED codes. This might be useful to look at and compare it to what we have done. Part of the evaluation version checklist we got from CAP with their mapping release.
-
Issue # 234 - Still leading 0s in a lot of NAACCR values. It would be better to do a comprehensive evaluation, download NAACCR DD again from SEER API in an automated fashion, compare it to see if we are missing any (Treat it as a string and not a number). The risk of not doing this is that we will end up with missing data.
** Meeting Notes - 12/20/2019 **
Issues discussed:
-
Next step for CAP and NAACCR outreach efforts: Assessment of NAACCR and CAP will be completed incrementally per cancer type. Based on initial assessment of breast cancer by Rimma, it is evident that we need to reach out to NAACCR and ask them what their rational is before completing an assessment of the entire NAACCR Data Dictionary. In talking to NAACCR, we might be able to figure out patters that would help answer our questions during the full assessment. We want to understand the difference between collaborative staging and replacement of them. We want a technical contact who can tell us what belongs to collaborative staging which is the new way of doing it. we need a simple contact with NAACCR established who would be willing to help us navigate through their space. Andrew has reached out to NAACCR asking for their help with the hope that if there is anything online that is well documented then they will point us to that. Plan is for Dima and Christian to do an assessment of CAP find duplicates with NAACCR for breast cancer and prostate cancer. We have the new AJCC codes (Northwestern) that Michael will share with Christian and Dima so we can do an assessment of what we’re up against with AJCC.
-
Handling staging variables in NAACCR ETL based on closed Issue # 51 – NAACCR has duplicate staging variables. AJJC version 8 and above, TNM are version 7 and below. Because of licensing issues with SEER API, we don’t have values for AJCC staging variables, Dima mapped variables from one to the other. Because we don’t have values for AJCC variables, we use variables for the old one. This makes ETL weird, instead of getting a value and map it, we map variables to variables. We pull the values from ‘maps to’. This was done before the symposium. This problem will persist until we have the version 8 staging variables. Michael can find out what the pricing for a single license of AJCC API for the new staging variables so we can see what’s in it. For the time being the above solution was approved. Michael can get the API information and figure out the deltas.
-
Issue # 220 – The way the schema was collapsed, we might have made it impossible to ingest to provide full coverage. For example, If we have data for 772 for ICDO combinations, the way we have structured the vocabulary, we wont be able to import the data as the mapping will not find anything and skip it. Christian suggested, doing the whole mapping and collapsing it down even for the combinations that done exist. Rimma would like to take some time to look into this issue and understand what’s being done with the collapsing.
-
Issue # 234 - Still leading 0s in a lot of NAACCR values. It would be better to do a comprehensive evaluation, download NAACCR DD again from SEER API in an automated fashion, compare it to see if we are missing any (Treat it as a string and not a number). The risk of not doing this is that we will end up with missing data.
** Meeting Notes - 12/12/2019 **
Issues discussed:
-
Issue # 73 The group arrived at the decision to insert the treatment 'did not happen' concepts into the Observation table. If no date is present in the NAACCR data, then diagnosis date will be mapped to the observation_date field. Adam from MSK is going to check if their NAACCR data has any dates related to treatment ‘did not happen’.
-
Issue # 221 Discuss approach to handle duplicate concepts with the Campbell Brother. Schedule a meeting with the Campbell Brother after Christmas.
-
Issue # 224 For this group the task is to come up with the approach for the 2 issues that we currently know exists (Below)
(a) For duplicate entries between diff schemas when they mean the same (same semantic concept) but repeat for each diagnostic schema, the question is, should we pick and choose 1 and map all others to this one and make it non standards OR create a new concept make it standard and add all the existing concepts to this standard.
(b) For entries where there are similar but diff degree of granularity. In preparation for mapping to standards, we need to identify which is more granular and less granular and put them in a hierarchy accordingly.
For (a) above (example questions is ‘which organ did the tumor metastasis’), the recommendation is to create a new one instead of arbitrarily using one of the existing one. Reason for this is as we know we are ultimately going to replace this by either SNOMED or LOINC based on consensus building with Campbell Brothers and possibly others. So, this is like a place holder for the future consented standard concept. This approach will also mean no change to the ETL. We will create a new one which will be easy to replace. Assign vocabulary_id to ‘NAACCR extension’ and concept_code to ‘OMOP_surrogate’ (OMOP followed by some random number).
There is a follow-up to discuss with Parsa. We don’t have vocabulary for relationship, we need to discuss this with Parsa.
For (b) above (example question is ‘did the tumor metastasis’), do we want to do anything to connect the 2 observations (‘which organ did the tumor metastasis’ and ‘did the tumor metastasis’). Rimma has sent the list of pairs to Dima. Plan is to take a look at these offline and understand the scope of the problem. Follow-up in the next Vocabulary call as this might affect the ETL.
Meeting Notes - 12/5/2019
Dmytry Dymshyts [email protected]; Rimma Belenkaya [email protected]; Christian Reich [email protected]; Williams, Andrew E [email protected]; [email protected]; Reich, Christian [email protected]; [email protected]; [email protected]; Smith, Daniel G. [email protected];Rasmus Peuliche Vogelsang [email protected];[email protected]
Key Points:
- Closed issues: Issue # 71 Issue # 67 Issue # 146 Issue # 193 Dima will close this after adding comments. Decision was made to not define schema for NULL histology concepts based on the counts and populate only non-specific schema.
- In-progress issues: Issue # 176 Since a decision is made on 193, Dima will release this change. Issue # 125 sub tasks need to be added based on the Odysseus project. Dima and Rimma plan to discuss with the Vocabulary Subgroup next week some issues that have been identified with NAACCR de-duplication effort.
- Relationship 'procedure end' does not exist in the procedure table (radiation, high-level treatment entities in NAACCR), we have it in episode. The question is whether we should add this relationship so that when therapies are ingested in the episodes so they can have end date. Dima and Robert confirmed that this relationship is present already for radiation. Added to backlog Issue # 217 in case we have other discoveries in the future.
- Issue # 197 Besides Genomic we are interested in their choice of the vocabulary. We may plug into those vocabularies when we map NAACCR to the standard in our choice of the standard. Need to reach out to Andre.
- Issue # 179 IMO is reviewing Nebraska Lexicon when he starts looking at breast cancer. Rimma will send Andy the Data Dictionary for NAACCR breast cancer.
- Issue # 207 this is in progress. This change does not affect ETL.
- Issue # 73 Treatment 'did not happen'. Rimma mentioned that stopped treatment reason is of huge interest to investigators. They need to be delineated very clearly. There is a need to document whatever is not a part of measurement domain will not be a part of the ETL into the observation table. Christian suggested we put the value as is in Observation (00, 82, 85, 86, 87, 88, 99 etc for RX HOSP-CHEMO). The source code is the NAACCR value code and observation concept is the NAACCR value. The name of the variable is not necessary. Short term solution is we only put in Measurement table. In the long term (part of NAACCR dedup) completely take out these variables.
Meeting Notes - 11/21/2019
Dmytry Dymshyts [email protected]; Rimma Belenkaya [email protected]; Christian Reich [email protected]; Williams, Andrew E [email protected]; Michael J Gurley [email protected]; [email protected]; Reich, Christian [email protected]; [email protected]; [email protected]; Smith, Daniel G. [email protected]; Jiang, Renjian [email protected]; [email protected]; Blacketer, Clair [JRDUS] [email protected]; Etzioni, Ruth B [email protected]
Key Points:
- Claire walked us through their document approaches to make the documentation user friendly and professional looking. Claire has placed the DDLs, Episode, Episode Event table and updates to Modifier fields in the branch below (SQL Server version): https://github.com/OHDSI/CommonDataModel/tree/Dev The information in the above link needs to be reviewed so participants can follow it.
- The Oncology WG needs to document the overall OMOP Oncology Module with all ingredients in vocabulary and linkages to the vocabulary. There is documentation started but needs to be refined.
- The link below is documentation that was used for the book of OHDSI. The plan is to add the Oncology CDM information under 'CDM Versions' in the below wiki link: https://github.com/OHDSI/CommonDataModel/wiki
- RMarkdown (Render site function) files used to generate actual documentation. The idea is to make one changes that feeds the DQ Dashboard, DDL (https://github.com/ohdsi/DDLgeneratr), book of OHDSI and actual documentation. Next step are to (a) setup a meeting with Claire so she can walk us through the process/mechanism to make the documentation updates using RMarkdown, Yamoo etc. (b) Discuss resource allocation in the Leadership meeting on 11/26. (a) Designating resource that will be working on populating documentation (b) Scoping for a resource specializing in content development
- Ruth Etzioni presented the CISNET Prostate Cancer work. The Oncology WG will discuss Ruth's work and collaboration in the Leadership meeting. The idea is that in the future year, this could be one of the rapid response projects that receives funding from Ruth's organization.
Meeting Notes - 11/14/2019
Attendees: Rimma Belenkaya [email protected]; Michael Gurley [email protected]; Robert Miller [email protected]; Yang, Qi [email protected]; [email protected];Smith, Daniel G. [email protected];Jiang, Renjian [email protected];
- Closed issues Issue # 110 -> This discussion has been closed
- Dima is working on completing issues below: Issue # 67 - will have a development task. Michael and Robert will discuss what the Development decision will be based on the changes to vocabulary so the ETL can work with the changes. Issue # 146 Issue # 176 Issue # 71
- Issues in-progress Issue # 193 is in-progress Issue # 179 -> continue to follow-up on this task as it's needed to develop the national standards. Issue # 197 -> May from mCODE will join our Genomic meeting on 11/22 to present the Genomics part of the model. Rimma will reach out to Mark Cramer about the clinical model.
- Shilpa to schedule a kick-off call the week of 11/17 for the CAP and NAACCR standardization projects that is being led by Olga Ganina. Issues that belong to the creation of standard vocabulary along with outreach effort will fall under the project. Michael has provided the limited mapping between CAP and NAACCR that CAP provided.
- AJCC is not covered under the above project. In the assessment of the criticality of AJCC staging system in the completion of the milestone, Michael pointed out that there are a lot of missing codes. As we continue to pursue AJCC, perhaps there is an option to just obtain the codes from them and hold off on the descriptions. Obtaining the descriptions might also mean a significant amount of vocabulary maintenance work which we are not sure we are willing to take on at this point. Michael is going to calculate all the values that are missing, distribute them to everyone to query their data to get a sense of how much data will be non-usable. Will discuss this further in the Outreach call on 11/19 with Christian.
- Issue # 200 All documentation related tasks are critical and need to be published for users. In terms of our ETL preparedness, Robert has pushed a version to address ambiguities as well as improve performance. There are other specific tasks (date) that need to be fixed, write unit tests etc. It was decided that once the ETL is finalized, documentation can be started. Teams recommendation of utilizing documentation experts from the OHDSI community is currently being evaluated. Christian does not think that Mui and Meghan have knowledge in the area or the availability to help with documentation. Will reach out to Claire to see if she can help as she has experience with the CDM documentation to make it into a presentable and public facing format. Rimma will help with the content.
- For the completion of the milestone ‘running symposium plots with expanded participation’, we need to decide what completion of the milestone is. Whether the CAP/NAACR standardization work needs to be completed before this milestone can be considered complete.
- For the NAACCR de-duplication effort the recommendation is to focus on starting with a use case, identify what variables need to be de-duplicated for the user case to run. Next step is to review the use cases in the Vocabulary Subgroup meeting.
Meeting Notes - 11/7/2019
Attendees: Rimma Belenkaya [email protected]; Michael Gurley [email protected]; Robert Miller [email protected]; Yang, Qi [email protected]; [email protected];Smith, Daniel G. [email protected];Jiang, Renjian [email protected];
- Closed issues: Issue # 113
- All ICDO related issues will be completed on Monday Issue # 67 - will have a development task. Michael and Robert will discuss what the Development decision will be based on the changes to vocabulary so the ETL can work with the changes. Issue # 146 Issue # 176
- New task Issue # 200 has been added for Rimma and Dima to document the ICDO concept decisions and processing of the vital stats.
- Issue # 71 will be completed by Monday.
- Issue # 193 is in-progress
- Issue # 197 -> Andrea from Mitre is one of the contributors of the mCode development. Rimma suggested he be invited to the next Subgroup meeting to share an overview of the mCODE model. Rimma will outline questions for him specifically for domains like outcomes, genomics, choice of their vocabulary etc.
- Issue # 110 -> This discussion was complete and it can be closed. No vocabulary task is needed. A Development Issue # 111 is open to handle the ETL changes. A separate script will be created to handle this change.
- Issue # 179 -> continue to follow-up on this task as it's needed to develop the national standards.
Meeting Notes - 10/31/2019
Attendees: Rimma Belenkaya [email protected]; Michael Gurley [email protected]; Robert Miller [email protected]; Yang, Qi [email protected]; Reich, Christian [email protected]; [email protected];
Key Points:
- Closed issues: Issue # 151 Issue # 105 Issue # 81
- Issue # 113 -> Dima's testing of the query revealed that we end up with several Value concepts for same ICDO-naaccr_item-code combination because ICDO codes drive to different schemes. To avoid this Dima will take all ICDO to Schemes from CS algorithm and take ICDO codes missing from CS algorithm from EOD. This was there wont be anymore ambiguous code besides the 13. Next steps will be to (1) Replace and release a new version of the vocabulary with duplicate issues resolved (2) Change the NAACCR ETL code to incorporate the table (Robert) (3) Have one NAACCR ETL code base and run it through SQLRender to be able to generate the different dialects.
- Issue # 176 -> As Dima is working through the Issue 113, he's discovering missing ICDO to schema relationship. He plans to use this issue to add additional codes.
- Issue # 71 -> Team agreed that this task (one relationship) will be completed by Dima. We are essentially adding relationships that should have been there but not explicitly expressed. The fact that it simplifies ETL and understanding of the data is the goal of this effort. This is where we will draw the line and any other work related to the relationships will be handled by the modeling team. Rima is working on a poster on what we have done so far and will work on a paper to introduce the community so we can engage them and receive input on how to approach this.
- Dima identified that there are some known issue with relationship names (e.g 'has answer LOINC' is used for NAACCR codes) are used out of context. We should use relationships in context. Adding this issue to a parking-lot modeling issues list.
- Following the completion of Issue # 71, Development subgroup will work on Issue # 186 to implement the use of this new relationship in the ETL
- Issue # 67 -> This will be fixed and will be released with the factored ICDO changes.
- Issue # 179 -> Rimma and Dima will meet to discuss the next steps for this.
- Issue # 110 -> Conversion of NAACCR vital status variable into the proper place. Rimma will work on this as a modeling task. There were some discussions around whether we want to use NAACCR data (Death, survival date etc) to enhance our OMOP (observation end period etc). The group was leaning towards doing that as a secondary ETL if this enhancement was needed to obtain this additional information.
Meeting Notes - 10/24/2019
Attendees: Rimma Belenkaya [email protected]; Michael Gurley [email protected]; Robert Miller [email protected]; Yang, Qi [email protected]; Reich, Christian [email protected]; [email protected];[email protected]; [email protected]
Key Points:
- Closed issues: Issue # 94 Issue # 87 Issue # 85 Issue # 79 Issue # 78
- Issue # 113 Table for handling ambiguous ICDO site/histology combinations present in multiple schemas is done. Robert will send it to Dima and also point him to the code on the git. Robert and Dima's testing will determine if there are other schemas that are missed by the code. Dima can test against
- Issue # 177 has been assigned to Rimma and Dima to investigate how ambiguous ICDO site/histology combinations can be incorporated into the vocabulary instead of stand-alone tables.
- Issue # 151 is in progress. Dima will check if this is done.
- Issue # 105 decision was made to close this task and make a backlog item for the subtask/second questions that was reported. For the backlog item, it was decided to make it non-standard in the Observation domain and without values
- Issue # 81 decision was made to make these non standard in the observation table.
- Issue # 71 decision has not yet been made on this. Few considerations include (1) letting Dima complete this task or (2) go through a proper modeling exercise. Rimma and Christian to talk offline.
Meeting Notes - 10/17/2019
Attendees: Michael Gurley [email protected]; Robert Miller [email protected]; Anastasios Siapos [email protected]; [email protected]; [email protected]; Yang, Qi [email protected]; Reich, Christian [email protected]; [email protected];[email protected]
Key Points:
- Andrew briefly mentioned the CD2H opportunity and a decision was made to address it in the Outreach meeting.
- Prioritized tasks related to Milestone #1 'Rerun symposium plots with expanded participation'.
- #11 is being put on-hold as this is low priority. This task is a pre-requisite to a development task to remove hard coded values.
- #49 is being put on-hold as this is low priority (Pending the ATC solution). This task is a pre-requisite to a development task to remove hard coded values.
- #78, #79, #82, #85, #87, #94 these tasks are prioritized higher and will be tackled next by Dima. Dima and Christian will work off-line on additional resources needed to accomplish these tasks.
- #113 is a high priority task assigned to Dima and will need additional resources. This task needs further clarification and discussion. Scope of this task is not yet completely defined. This task also has an associated Development task that needs to be completed after the vocabulary task.
- #110 will be next in priority after above tasks have been completed.
- #67 is is on-going.
Oncology Working Group Publications/Presentation
Data Model
- Cancer Models Representation
- EPISODE
- EPISODE_EVENT
- MEASUREMENT
- CONCEPT_NUMERIC
- Disease Episode Model
Vocabularies
OMOP Model
- Populating the OMOP Oncology Extension
- NAACCR Tumor Registry
- EHR and Claims