-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store death causes in condition_occurrence table #210
Comments
So, what's your proposal?
That way, we know a person died, and the causes we can get by filtering for death related Condition Type Concepts. Sounds clean, with one exception: You need to read the Condition Type Concepts in order to decide they are causes of death. There is no modeling mechanism to make this clear. |
This sounds great, I have added it to the agenda for the December meeting |
@pavgra / @cgreich - great proposal, it will really make it much cleaner. Except for #2- why do we need to "move death_date and death_datetime to PERSON" if we can just use condition_start_date and condition_start_datetime in CONDITION_OCCURRENCE table to specify when death occurred? never mind there can be multiple valid death records. With this approach - no modification to PERSON at all |
True. That would be even cleaner. Except that we really hate different death dates, because then the analytic has to pick the right one. Since this is an artifact of the data (unless you are a cat you can't die more than once) it should really be dealt with by the ETL. Let's see what the community has to say. |
I think this is a great discussion and I would like to bring it up at our 9/4 CDM meeting as it is a perfect candidate for CDM v6.0. Can you flesh this out into a full proposal? I have a link here for a proposal template - not all of it may apply since you are requesting to remove a table but it can at least get you started. |
from 9/4/ meeting, proposal:
The proper vocabulary needs to be specified on how to store cause of death, this would be: |
How would you store the death with no cause then? |
I think the idea is that for each record of death, there would be a record in CONDITION_OCCURRENCE for the death itself and then a record for the cause of death. @cgreich is that right? |
@clairblacketer, why would you need a separate record for the fact of death? can a death happen w/o a cause? |
Yes. No cause means you only have a record with SNOMED 4306655 Death. |
@cgreich, can a death happen w/o a cause? why would you need to put records for both death fact and cause of death, when record for the cause of death indicates fact of death and not requires duplicative record. |
I like how the proposal better incorporates cause of death, but I'm not sure I'm a fan of putting the death date in the condition occurrence as opposed to the person table. To me, this is the same as saying why don't we move the date of birth into the condition table and use the condition concept id 4083587 (DOB). |
A cause of death is a Condition. Usually one so bad that as a consequence the patient dies. But from a philosophical perspective such cause of death is no different than any other condition. The death itself is also a condition, even though you are unlikely to recover from it (unless you are Jesus). So, putting both into the Condition table is legitimate. The problem now is how to connect the death to its cause. We could have used FACT_RELATIONSHIP (ugly), or the Condition Type (formerly Death Type). That's a not 100% clean from a modelling perspective, but practicable. Why do we need the death record at all? Because we don't always have a cause. Actually, most of the time we don't. |
You are rehashing. We debated that up and down. You are correct, a birth date and a death date is a symmetrical feature of each person. But from our observational data these are very different. Here is why: the birth date, like the gender and race, are modeled as fixed entities. They don't change or happen over time. While death is like any other event table: It happens at some point in time. You could argue that:
|
For each death with no cause we need to put something like |
@cgreich, Anna brought up the point which I tried to formulate: there cannot be a death w/o a cause. At least, it's unknown cause but it is. And then a separate redundant record for death is not required, it's ambiguous. |
David, no death in Person table. Christian wrote a whole book above explaining why :) I I wouldn't say that we should make life harder for people and make them search across 2 tables. We'd rather change the domain and everything will be in condition_occurrence |
I'm sorry I missed the call yesterday as I had a conflicting meeting. In PEDSnet, we adapted the death to accommodate multiple causes of death. We added a "death_cause_id" as the primary key for the table. At this time, it was the least disruptive to capture this data for the network (adding a column versus removing a table). We also use the "concept_class_id ='Death Type" convention for the death_type_concept_id and our causes are SNOMED Codes. It may be worth mentioning, that we also added a "death_impute_concept_id" to the table as we have cases where the date of death is estimated for various reasons. |
You are getting philosophical. You are saying there is no death without a cause? Meaning, life is the default, and no life is an exception, that only happens if there is a reason? I would claim the opposite, actually. :) Three reasons why we needn't enforce the cause of death:
|
Funny you mention that. The whole debate started with the proposal of adding another cause_of_death table, because in contrast to death itself there can be more than one. And then folks came up with the brilliant (in my mind) idea that we don't need any of these tables: Death is just another Condition event (or, some might argue, demographic fact), and cause of death is definitely a Condition event. So, why having this table altogether? There are only two use cases you need the death table for: (i) mortality of an intervention and (ii) cause of death characterization. Either one can be done in the Condition table no problem. Or do you have another use case? |
As you formalize the final guidance, please design Achilles Heel rules (a SQL query) and what you expect to see. One rule is - 'enforce single date of death' in CDM. (current rule is (if we expect one row only) |
@cgreich , no philosophy, just trying to turn things to a cleaner way. So why a dedicated record for death in the conditions table is going to be better than a solution proposed by @aostropolets?
If you use the approach, there remains the one and only way to describe deaths and no need to repeat any data. |
I have decided to vote to support this proposal. I admit it wasn't something I was originally warm to, but as I mulled it over, I think I have convinced myself that this has strong conceptual and pragmatic arguments. I thought I would share my logic for those who still may also be on the fence: So, the CDM has a few basic principles in its organization. One is that we only create domain tables if 1) the domain has a justified analytic use case, and 2) the domain has domain-specific attributes that are needed to support the analytic use case. When we initially designed DEATH it was because 1) we definitely want to do mortality studies, and 2) cause-of-death was a recognized domain-specific attribute that could support work. This proposal really starts with the experience of some data partners that there is not an explicit one-to-one relationship between death and cause-of-death, and we don't want 'information loss' by not preserving whatever cause-of-death source records exist. Given that we must move cause-of-death out of the DEATH table to somewhere else, that leaves the DEATH table now without any domain-specific attributes, making it now violating our basic principle. So, that leaves us with 2 decisions to make: where to put the 'cause-of-death' information, and where to put 'death' information. I like the proposal of 'cause-of-death' being a CONDITION_OCCURRENCE. I think we do need to add a CONDITION_TYPE_CONCEPT_ID for 'cause of death', in case that's the explicit provenance. But inherently, cause of deaths are diseases. Recall that while ICD is now 'International Classification of Diseases', it started as 'International List of Causes of Death'. The fields currently in the CONDITION_OCCURRENCE should nicely capture the information for 'cause-of-death', and the standard vocabulary concepts for the condition domain will give us complete coverage of what we need. I also like the proposal for 'death' being a CONDITION_OCCURRENCE. This allows us to maintain provenence (CONDITION_TYPE_CONCEPT_ID can distinguish if its a death record from hospital discharge vs. death registry record vs. patient status indicator). It also allows for using the same condition vocabulary concepts, which offer the possibility of provided greater granularity to the death, if its available. One thing that I would argue we'd need to do as a community is make sure that we have the appropriate domains assigned to all relevant 'death' concepts in the vocabulary (e.g. currently, the SNOMED concept of 'death' has the domain of 'observation'). It may also be desirable to review the CONCEPT_ANCESTOR records around death, so that there is appropriate hierarchical coverage between the top parent of 'Death' and all the associated more specific flavors of death, but I don't see these relationships being complete at this point. Putting 'death' records in CONDITION_OCCURRENCE relaxes the criteria that there be only one death date, but provides extra flexibility in case an analyst wants to select death based on provenance or some other logic. I initially thought putting 'death date' in the PERSON table, but I think there's several good arguments against this counter-proposal: 1) this would mean we'd 'lose' the provenance of where death record came from, 2) this would not provide the flexibility for multiple records, 3) revising a dataset with new information would require UPDATE TO PERSON rather than INSERT INTO CONDITION, which is slightly cludgier, and 4) finding death would be based on SELECT * FROM PERSON WHERE DEATH_DATE IS NOT NULL instead of SELECT * FROM CONDITION_OCCURRENCE WHERE CONDITION_CONCEPT_ID IN <concepts for 'death'> ...... (basically we'd be looking for the absence of death date, instead of the presence of death records). If this proposal is ratified, I'd propose we create a conceptset for 'death' and cohort definition for 'persons with death' that can be readily re-used across the community of CDM v6-compliant datasets. That'll go a long way to operationalizing this proposal into a workable solution for the analytic use cases of interest. |
I can support the logic behind the proposal pretty readily. There are a couple places where I'm uncertain, and maybe just need clarification:
|
It seems to me that we started with the desire for multiple records in the death table with the undesirable consequence that the dates could conflict. And we ended up with multiple records in the condition table with the undesirable consequence that the dates could conflict. So we just got rid of a table with no other progress. When we have conflicting information about the birth date, we pick one and put it in the person table, and have the option to put the conflicting information into the observation table. That is, we usually force you to make it easier for the researcher, and come up with a work around to avoid losing information. Semantically, death is not a condition. E.g., what if the death condition is given an end date? Were they resuscitated, or is it an error, or do we ignore it. And what goes into the condition_era table for death? If anything, "life" would be the condition, with death as the end date, but that is not very usable. If cause of death is always a condition, then, yes, I would put it in the condition table with a death-cause type. (This makes sense if you ask the question, did the patient ever have such-and-such a condition, you would go to the condition table and the answer would be yes even if it killed them.) I would pick a death date and put it in the death table or in the person table. And if there is conflicting death information, like different dates, I would put that in the observation table. Then you can see the death table (or person table) as a derived table like condition_era. But you don't force every researcher to learn our death hierarchy and make value judgments, and infer death from the observation or condition table. So the proposal becomes keep the death table, pull cause of death out of the death table into the condition table, and set up the vocabulary so that conflicting death dates can go into the observation table. |
Wonderful. I call that "After Hours Stock Trading". Which means, the debate doesn't happen on the floor of the house (CDM WG regular meetings), but outside. I think we all agree that cause of death is just another Condition and close the discussion. Now the death itself: I think we also all agree that we don't need a separate death table to just hold the death date. Now the question is where should it go. From a philosophical perspective, you could claim that death is not a Condition, but a demographic fact, symmetrical to birth. Except, as discussed above, in our data birth is static (we don't have Persons that have yet to be born) and death is dynamic. You could also claim it is a Condition, the ultimate one. The fact that it has no end date (unless you want to apply Judgement Day when we all raise from the death and meet our creator) it shares with all other acute or rapidly fatal Conditions. Think cardiac arrest or asphyxiation. I don't feel that strongly either way. Except if we put it into PERSON we have to make a change, and if we put it into Condition all we have to do is to add a convention. So, the proposal would be to kill the DEATH table, and merge the information into the Condition table, and do a good job in the documentation. |
hmmm... |
i feel somewhat strongly that death shouldnt be in PERSON, for the reasons
I outlined before: losing provenence, inconvenient updating, inconvenient
to find all deaths (need to search for date is not null).
george proposed that death may be more appropriate as OBSERVATION rather
than CONDITION, and Charlie made an argument for OBSERVATION also. these
both seem roughly equivalent to me, and since the vocabulary is supposed to
drive the domain choice of any concept, it seems 'death' concepts simply
need to be reconciled to go to the same domain. currently it looks to me
that most 'death' concepts are listed as observations, so purely on that
basis, i would be happy to keep that designation and go with george and
charlies counterproposal. cause of death records would all still be
conditions. person would not change.
…On Sat, Sep 8, 2018, 5:46 PM Christian Reich ***@***.***> wrote:
Wonderful. I call that "After Hours Stock Trading". Which means, the
debate doesn't happen on the floor of the house (CDM WG regular meetings),
but outside.
I think we all agree that cause of death is just another Condition and
close the discussion.
Now the death itself: I think we also all agree that we don't need a
separate death table to just hold the death date. Now the question is where
should it go.
From a philosophical perspective, you could claim that death is not a
Condition, but a demographic fact, symmetrical to birth. Except, as
discussed above, in our data birth is static (we don't have Persons that
have yet to be born) and death is dynamic. You could also claim it is a
Condition, the ultimate one. The fact that it has no end date (unless you
want to apply Judgement Day when we all raise from the death and meet our
creator) it shares with all other acute or rapidly fatal Conditions. Think
cardiac arrest or asphyxiation.
I don't feel that strongly either way. Except if we put it into PERSON we
have to make a change, and if we put it into Condition all we have to do is
to add a convention.
So, the proposal would be to kill the DEATH table, and merge the
information into the Condition table, and do a good job in the
documentation.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAsrGnAPpIsCHZ_ogM-I4HyX-GnRXQ8Mks5uZDqpgaJpZM4WRu1r>
.
|
If we are going to put death in the condition or observation table, then we should prepare for it. New users will be looking for where death is stored, so we need to be able to point them to an explanation. We will see slightly different death algorithms in different studies: use the first or last date when there is more than one death date, which sources to trust the most, what to do when other data follow the death. Death I guess becomes a phenotype that needs to be programed and evaluated. Does Atlas treat it like any other condition or observation, where you supply the concept set? Or does it supply a preferred phenotype? (I also wonder, if it is important for the user to pick his or her own algorithm to decide death, why not the same for birth and race and ethnicity, which also come with conflicting information and also change over time as we gather new information. I guess for those, we can store the conflicting information in the observation table.) Therefore, if death is stored in condition or observation, perhaps we should enforce a single preferred death row chosen by the implementers, with conflicting information stored with codes that indicate that the information is considered secondary. If there is only a secondary row and no preferred row, then the user knows that the implementers don't actually believe it; they are just revealing all they know but it is coming from an unreliable source (and perhaps other clinical data followed the supposed death). The code could be called, "the report of my death was an exaggeration," to quote Twain. |
it makes sense and I agree with that.
these should become one of new THEMIS business rules and added into ACHILLES Heel and into our future "OMOP CDM Validator"
and I think George is bringing up another great point - either of the changes proposed above will have an impact on CDM and thus queries and will render previous versions incompatible (removing DEATH table, changing business logic...). Maybe I am stating the obvious, but incompatible changes like that should be released as a part of the major version release e.g. CDM 6.0. We had already said that ATLAS 3.0 will target CDM 6.0 - this will give us a chance to implement these in ATLAS 3.0 (and other tools and methods) right from the start while leaving CDM 5.x and ATLAS 2.x users unaffected. Actually, we should be careful with ensuring backward compatibility across not just OMOP CDM but all of our tools, but this is for a separate post... |
There is the right way and then the best way which seems to be the stage of decision making that we are at.
To thicken the plot I don't see why once we have a primary death we can't store that in the person table. Atlas and Achilles can always use observations but primary death does become static no? We can make a case for that. I am sure we can also make case that it would only complicate documentation and potentially result in invalid implementations.
While this is a large scale analysis stack we do work to bring others on board and sometimes making a seemingly simple concept like death more complex can cause resistance in the on-boarding process. Food for though. Observations will clearly be where we put it :)
|
Friends: I am lost. What are we still debating, actually? If I understand correctly, we agreed on:
The only remaining discussion is about where to put the death date, and what to do if there are many.
After death the CDM will allow another 60 days of data to come in. That is also a THEMIS decision. |
At this risk of infuriating you Christian :) -
Dropping the death table is a change to the CDM I believe. Condition or observation - let the appropriate concept type drive the decision for consistency - seems logical and palatable and easy to explain. It seems unnatural to not have the option of adding death date to what is probably the person dimension? It's a fact and an attribute and how you model this depends on how you use it. Again, you are changing the CDM so a change should not be the reason. I am done pestering you.
Happy New Years to the community!
|
Ok to sum this up as I have been following along, @cgreich stated: We are agreed on:
As @gklebanov mentioned this is a huge change that breaks some backwards compatibility, which is why we are pushing it now so we can get it into CDM v6.0. Others around the community have mentioned their need to store multiple causes of death, which is where this idea originated. To @hripcsa's point, we do need to make the documentation clear on how this should be implemented moving forward. There is a THEMIS rule currently that allows for more than one row in the DEATH table to capture multiple causes of death but this proposal would negate that. The additional THEMIS rule about multiple deaths on different days states
Since we are already changing the concepts in the DEATH_TYPE vocabulary to fall under CONDITION_TYPE, I say we keep things (relatively) easy and use the SNOMED concept for death as it is currently - 4306655. This would put death dates in the OBSERVATION table and allow for multiple dates per person. Each study investigating death would then need to determine how they would choose a death date, which would negate part of the above THEMIS rule, though we can keep the convention to throw out death dates if patients have data 60+ days after death. Finally, to make it somewhat easier on new adopters of the CDM, I can keep the wiki page on DEATH active but detail what we have discussed here and what the new conventions are moving forward. |
It seems like a good opportunity to change the PERSON table to include the date of death - the most accurate one if more than one are available. The CDM is changing anyway, and awkwardly fumbling around in an OBSERVATION table for the primary outcome of most health research seems silly. |
Removal of DEATH Table
ProposalRelevant tables: After much discussion on the the DEATH table, it seems like we are at a consensus that the DEATH table should be removed and causes of death should be stored in the CONDITION_OCCURRENCE table. We have been somewhat divided on where to store the date of death. Both the PERSON table and some event table (either CONDITION_OCCURRENCE or OBSERVATION) have been suggested. As a compromise, here is a possible solution:
Conventions
|
Thank you @clairblacketer for offering an explicit proposal to consider that provides a nice compromise from amongst the various perspectives that we've heard from across the community. We can maintain provenance of 'death observations' in the OBSERVATION table (which could allow for storage of multiple records that may exist), but the designation of the one death date in the PERSON table, so it seems we satisfy everyone's concerns. I support this proposal. I do recognize that we'll want to have some recommendations for analysts for the most effective way to create a 'cohort of persons who are dead', since this proposal will make for a couple different alternatives. But that seems like a problem that can be readily solved with conventions that the community can evolve. |
I agree. Thank you @clairblacketer. This is a nice compromise and I think makes the data model approachable for new users of OMOP. |
#210 DEATH table removed and cause of death now stored in CONDITION_OCCURRENCE Hi, @clairblacketer... I thought the death date will be added into the PERSON table as below... But I don't see this change made for CDM V6. Is this change coming soon? Thanks! *** A singular death date should be chosen by the ETL and stored in the PERSON table.
|
Hi @juh7007 I see that it was somehow missed in BigQuery but present in the other dialects. This is fixed now. Thanks! |
Thanks, @clairblacketer !!! |
Today OMOP's
death
table contains information about a person'sdeath_date
and a cause of death. Discussion in http://forums.ohdsi.org/t/condition-occurrence-death-diagnoses/2609/52 figures that there may be a need for storing several causes of death per person. This means a need to have multiple records. It's clear that just storing multiple records in thedeath
table using its current structure is not the best idea: that would bring redundancy and possible ambiguity, because of multiple death dates per a person. That's why it is reasonable to move the death date, since it's a one time thing per person, in the person table, and leave thedeath
table for just storing the causes of death. But if we go forward, we can see that resultingdeath
table basically stores all info presented in acondition_occurrence
table, and what is more, logically, cause of death is a condition. That's why second step of the proposal is to replace thedeath
table by storing causes of death incondition_occurrence
table.The text was updated successfully, but these errors were encountered: