- ConceptQL Specification
- Motivation for ConceptQL
- ConceptQL Overview
- Selection Operators
- All Other Operators i.e. Mutation Operators
- Set Operators
- Time-oriented Operators
- Inline-Filter Operators
- Comparison Operators
label
Option and Its Features- Appendix A - Experimental Operators
- Appendix B - Algorithm Showcase
- Appendix C - History of ConceptQL and Its Evolution
ConceptQL (pronounced concept-Q-L) is a high-level language that allows researchers to unambiguously define their research algorithms. Over the last 7 years, it has been running in production as the heart of our study-building software, Jigsaw
In that time its implementation has evolved considerably, but it has remained consistent with its original goals.
Outcomes Insights has built a vast library of research algorithms and applied those algorithms to large databases of claims data. Early into building the library, we realized we had to overcome two major issues:
- Methods sections of research papers commonly use natural language to specify the criteria used to build cohorts from a claims database.
- Algorithms defined in natural language are often imprecise, open to multiple interpretations, and generally difficult to reproduce.
- Researchers could benefit from a language that removes the ambiguity of natural language while increasing the reproducibility of their research algorithms.
- Querying against claims databases is often difficult.
- Hand-coding algorithms to extract cohorts from datasets is time-consuming, error-prone, and opaque.
- Researchers could benefit from a language that allows algorithms to be defined at a high-level and then gets translated into the appropriate queries against a database.
We developed ConceptQL to address these two issues.
We have written a tool that can read research algorithms defined in ConceptQL. The tool can create a diagram for the algorithm which makes it easy to visualize and understand. The tool can also translate the algorithm into a SQL query which runs against data structured in the Generalized Data Model (GDM). The purpose of the GDM is to standardize the format and content of observational data, so standardized applications, tools and methods can be applied to them.
For instance, using ConceptQL we can take a statement that looks like this:
[ "icd9", "412" ]
And generate a diagram that looks like this:
Example 1
["icd9","412"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
And generate SQL that looks like this:
SELECT *
FROM gdm_data.clinical_codes AS cc
WHERE cc.clinical_code_concept_id IN (
SELECT id
FROM concepts
WHERE vocabulary_id = 'ICD9CM'
AND concept_code = '412'
)
As stated above, one of the goals of ConceptQL is to make it easy to assemble fairly complex queries without having to roll up our sleeves and write raw SQL. To accommodate this complexity, ConceptQL itself has some complexities of its own. That said, we believe ConceptQL will help researchers define, hone, and share their research algorithms.
Perhaps seeing an example will be the quickest way to get a sense of a language. Here is a trivial example to whet your appetite. The example is in YAML, but could just as easily be in JSON or any other markup language capable of representing nested sets of heterogeneous arrays and hashes. In fact, the ConceptQL "language" is a just set of nested arrays and hashes. If you squint, it might even look a bit like Lisp. Indeed, the nested arrays are S-expressions.
ConceptQL is a domain-specific, declarative "language" that represents search criteria with some set operations and temporal operations to glue those criteria together.
# Example 1: A simple example in YAML
# This is just a simple array with the first element being 'icd9cm' and the second being '412'
# This example will search the clinical_codes table for all conditions that match the ICD-9 diagnosis code 412.
---
- icd9cm
- 412
Reading ConceptQL in YAML or JSON can be difficult. It is often easier to explore ConceptQL using directed graphs. For instance, the diagram for the simple example listed in YAML above is:
Example 2 - All Conditions Matching MI
["icd9","412"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
Each oval depicts an "operator", or rather, a ConceptQL expression. An arrow between a pair of operators indicates that the records from the operator on the tail of the arrow pass on to the operator at the head of the arrow. A simple example should help here:
Example 3 - First Office Visit Per Patient
["first",["cpt","99214"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 58127 | procedure_occurrence | 2009-07-03 | 2009-07-03 | 99214 |
5 | 38313 | procedure_occurrence | 2008-12-14 | 2008-12-14 | 99214 |
6 | 48992 | procedure_occurrence | 2009-09-12 | 2009-09-12 | 99214 |
7 | 63797 | procedure_occurrence | 2008-08-13 | 2008-08-13 | 99214 |
8 | 55871 | procedure_occurrence | 2008-02-27 | 2008-02-27 | 99214 |
9 | 45498 | procedure_occurrence | 2009-09-11 | 2009-09-11 | 99214 |
10 | 48752 | procedure_occurrence | 2009-05-09 | 2009-05-09 | 99214 |
11 | 73107 | procedure_occurrence | 2008-06-17 | 2008-06-17 | 99214 |
12 | 71635 | procedure_occurrence | 2008-02-11 | 2008-02-11 | 99214 |
13 | 74458 | procedure_occurrence | 2008-01-24 | 2008-01-24 | 99214 |
The diagram above reads "get all procedures that match the CPT 99214 (Office Visit) and then filter them down to the first occurrence (by date) for each person". More succinctly, "get the first office visit for each patient". The diagram is much more terse than that and to accurately read the diagram, you need a lot of implicit knowledge about how each operator operates. Fortunately, this document will impart that knowledge to you.
Please note that all of the diagrams end with an arrow pointing at nothing. We'll explain why soon.
ConceptQL diagrams have the "leaf" operators at the top and "trunk" operators at the bottom. You can think of the records of a ConceptQL statement as a flowing stream of data. The leaf operators, or operators that gather records out of the database, act like tributaries. The records flow downwards and either join with other records, or filter out other records until the streams emerge at the bottom of the diagram. Think of each arrow as a stream of records, flowing down through one operator to the next.
The trailing arrow in the diagrams serves as a reminder that every ConceptQL statement yields a stream of records.
You might have noticed that the operators and edges in the diagrams often have a color. That color represents what "type" of stream the operator or edge represents. There are several types in ConceptQL, and you'll notice they are somewhat correlated with the tables found in OMOP's CDM v4.0:
- condition_occurrence
- red
- death
- brown
- drug_exposure
- purple
- person
- blue
- procedure_occurrence
- green
Each stream has a vocabulary of origin (essentially, the vocabulary used to pull records from the GDM database). Based on that origin, each stream will have a particular type. The stream carries this type information as it moves through each operator. When certain operators, particularly set and temporal operators, need to perform filtering, they can use this type information to determine how to best filter a stream. There will be much more discussion about types woven throughout this document. For now, it is sufficient to know that each stream has a type.
You'll also notice that the trailing arrow(s) at the end of the diagrams indicate which types of streams are ultimately passed on at the end of a ConceptQL statement.
At its inception, ConceptQL was developed to query OMOP's CDM v4.0 and its associated tables. Each table represented a certain "type" of data (e.g. condition_occurrence, procedure_occurrence, visit_occurrence, etc) and ConceptQL was designed to treat each type of data as distinct from the other types.
Now that ConceptQL runs against the GDM, types are far less important, but are still maintained to help users distinguish between the kinds of data a ConceptQL statement is querying.
Though thinking in "streams" is helpful, on a few occasions we need to know what's going on under the hood.
Every table in the GDM structure has a surrogate key column (an ID column). When we execute a ConceptQL statement, the "streams" that are generated by the statement are just sets of these IDs for rows that matched the ConceptQL criteria. So each stream is just a set of IDs that point back to some rows in one of the GDM tables. When a stream has a "type" it is really just that the stream contains records associated with its vocabulary of origin.
So when we execute this ConceptQL statement, the resulting "stream" is all IDs from the patients
table where the patient is male:
Example 4 - All Male Patients
["gender","Male"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
1 | 1 | person | 1923-05-01 | 1923-05-01 | 00013D2EFD8E45D1 |
2 | 2 | person | 1943-01-01 | 1943-01-01 | 00016F745862898F |
4 | 4 | person | 1941-06-01 | 1941-06-01 | 00021CA6FF03E670 |
5 | 5 | person | 1936-08-01 | 1936-08-01 | 00024B3D2352D2D0 |
6 | 6 | person | 1943-10-01 | 1943-10-01 | 0002DAE1C81CC70D |
7 | 7 | person | 1922-07-01 | 1922-07-01 | 0002F28CE057345B |
8 | 8 | person | 1935-09-01 | 1935-09-01 | 000308435E3E5B76 |
12 | 12 | person | 1929-06-01 | 1929-06-01 | 00048EF1F4791C68 |
14 | 14 | person | 1934-05-01 | 1934-05-01 | 00052705243EA128 |
16 | 16 | person | 1934-01-01 | 1934-01-01 | 0007E57CC13CE880 |
When we execute this ConceptQL statement, the resulting "stream" is all clinical_codes
IDs that match ICD-9CM 250.01:
Example 5 - All Condition Occurrences that match ICD-9CM 250.01
["icd9","250.01"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
136 | 15609 | condition_occurrence | 2009-04-06 | 2009-04-06 | 25001 |
131 | 17030 | condition_occurrence | 2010-10-05 | 2010-10-05 | 25001 |
131 | 18099 | condition_occurrence | 2010-02-05 | 2010-02-05 | 25001 |
82 | 22080 | condition_occurrence | 2008-02-13 | 2008-02-13 | 25001 |
137 | 23953 | condition_occurrence | 2008-10-02 | 2008-10-02 | 25001 |
249 | 25525 | condition_occurrence | 2009-05-27 | 2009-05-27 | 25001 |
83 | 27582 | condition_occurrence | 2009-04-29 | 2009-04-29 | 25001 |
14 | 28741 | condition_occurrence | 2009-06-27 | 2009-06-27 | 25001 |
17 | 30214 | condition_occurrence | 2010-03-21 | 2010-03-21 | 25001 |
187 | 31594 | condition_occurrence | 2009-11-16 | 2009-11-16 | 25001 |
Generally, it is helpful to just think of those queries generating a "stream of people" or a "stream of conditions" and not worry about the table of origin or the fact that they are just IDs.
When a ConceptQL statement is executed, it yields a final set of streams that are just all the IDs that passed through all the criteria. What is done with that set of IDs is up to the user who assembled the ConceptQL statement. If a user gathers all 250.01 Conditions, they will end up with a set of clinical_codes
IDs. They could take those IDs and do all sorts of things like:
- Gather the first and last date of occurrence per person
- Count the number of occurrences per person
- Count number of persons with the condition
- Count the total number of occurrences for the entire population
This kind of aggregation and analysis is beyond the scope of ConceptQL. ConceptQL will get you the IDs of the rows you're interested in, it's up to other parts of the calling system to determine what you do with them.
Selection operators are search for specific records within the data, e.g. searching for a diagnosis of an old myocardial infarction (ICD-9-CM 412) is a selection. Selection operators are always leaf operators, meaning no operators "feed" into a selection operator.
There are several basic types of selection operators.
A Vocabulary
operator is any selection operator that selects records based on codes from a specific vocabulary. Below is a list of the most common vocabulary operators available in ConceptQL:
Operator Name | Stream Type | Arguments | Returns |
---|---|---|---|
cpt4 | procedure_occurrence | 1 or more CPT codes | All records whose source_value match any of the CPT codes |
icd9cm | condition_occurrence | 1 or more ICD-9CM codes | All records whose source_value match any of the ICD-9 codes |
icd9_procedure | procedure_occurrence | 1 or more ICD-9 procedure codes | All records whose source_value match any of the ICD-9 procedure codes |
icd10cm | condition_occurrence | 1 or more ICD-10 | All records whose source_value match any of the ICD-10 codes |
hcpcs | procedure_occurrence | 1 or more HCPCS codes | All records whose source_value match any of the HCPCS codes |
loinc | observation | 1 or more LOINC codes | All records whose source_value match any of the LOINC codes |
rxnorm | drug_exposure | 1 or more RxNorm IDs | All records whose drug_concept_id match any of the RxNorm IDs |
snomed | condition_occurrence | 1 or more SNOMED codes | All records whose source_value match any of the SNOMED codes |
Person operators generate person records, or records that are derived from the table containing patient demographics. The start_date and end_date for a person-based record is the patient's birth date, as explained in more detail in temporal operators and person streams.
This person operator selects people by gender. Currently, available genders are Male, Female, or Unknown.
Example 6 - Gathers all female person records
["gender","Female"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
3 | 3 | person | 1936-09-01 | 1936-09-01 | 0001FDD721E223DC |
9 | 9 | person | 1976-09-01 | 1976-09-01 | 000345A39D4157C9 |
10 | 10 | person | 1938-10-01 | 1938-10-01 | 00036A21B65B0206 |
11 | 11 | person | 1934-02-01 | 1934-02-01 | 000489E7EAAD463F |
13 | 13 | person | 1936-07-01 | 1936-07-01 | 0004F0ABD505251D |
15 | 15 | person | 1936-03-01 | 1936-03-01 | 00070B63745BE497 |
17 | 17 | person | 1919-09-01 | 1919-09-01 | 0007F12A492FD25D |
18 | 18 | person | 1919-10-01 | 1919-10-01 | 000A005BA0BED3EA |
19 | 19 | person | 1942-07-01 | 1942-07-01 | 000B4662348C35B4 |
26 | 26 | person | 1939-12-01 | 1939-12-01 | 0010D6F80D245D62 |
This person operator selects people by race. Available races are defined in the Race vocabulary.
Example 7 - Gathers all persons indicated as white
["race","White"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
1 | 1 | person | 1923-05-01 | 1923-05-01 | 00013D2EFD8E45D1 |
2 | 2 | person | 1943-01-01 | 1943-01-01 | 00016F745862898F |
3 | 3 | person | 1936-09-01 | 1936-09-01 | 0001FDD721E223DC |
5 | 5 | person | 1936-08-01 | 1936-08-01 | 00024B3D2352D2D0 |
7 | 7 | person | 1922-07-01 | 1922-07-01 | 0002F28CE057345B |
8 | 8 | person | 1935-09-01 | 1935-09-01 | 000308435E3E5B76 |
9 | 9 | person | 1976-09-01 | 1976-09-01 | 000345A39D4157C9 |
11 | 11 | person | 1934-02-01 | 1934-02-01 | 000489E7EAAD463F |
12 | 12 | person | 1929-06-01 | 1929-06-01 | 00048EF1F4791C68 |
13 | 13 | person | 1936-07-01 | 1936-07-01 | 0004F0ABD505251D |
This person operator selects people by ethnicity. Available ethnicities are defined in the Ethnicity vocabulary.
Example 8 - Gathers all persons indicated as Hispanic
["ethnicity","Hispanic"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
4 | 4 | person | 1941-06-01 | 1941-06-01 | 00021CA6FF03E670 |
22 | 22 | person | 1942-08-01 | 1942-08-01 | 000D6D88463D8A76 |
72 | 72 | person | 1933-04-01 | 1933-04-01 | 002354398A00234E |
76 | 76 | person | 1938-03-01 | 1938-03-01 | 00242FF5E7A3F2D9 |
181 | 181 | person | 1939-11-01 | 1939-11-01 | 005E0AB5172E715F |
This person operator selects all patient records.
Example 9 - Gathers all persons indicated as Hispanic
["person"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
1 | 1 | person | 1923-05-01 | 1923-05-01 | 00013D2EFD8E45D1 |
2 | 2 | person | 1943-01-01 | 1943-01-01 | 00016F745862898F |
3 | 3 | person | 1936-09-01 | 1936-09-01 | 0001FDD721E223DC |
4 | 4 | person | 1941-06-01 | 1941-06-01 | 00021CA6FF03E670 |
5 | 5 | person | 1936-08-01 | 1936-08-01 | 00024B3D2352D2D0 |
6 | 6 | person | 1943-10-01 | 1943-10-01 | 0002DAE1C81CC70D |
7 | 7 | person | 1922-07-01 | 1922-07-01 | 0002F28CE057345B |
8 | 8 | person | 1935-09-01 | 1935-09-01 | 000308435E3E5B76 |
9 | 9 | person | 1976-09-01 | 1976-09-01 | 000345A39D4157C9 |
10 | 10 | person | 1938-10-01 | 1938-10-01 | 00036A21B65B0206 |
This operator pulls all death records from the death
table.
Example 10
["death"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
16 | 1 | death | 2010-12-01 | 2010-12-01 | |
26 | 2 | death | 2009-07-01 | 2009-07-01 | |
65 | 3 | death | 2009-01-01 | 2009-01-01 | |
119 | 4 | death | 2008-09-01 | 2008-09-01 | |
166 | 5 | death | 2008-07-01 | 2008-07-01 | |
181 | 6 | death | 2010-07-01 | 2010-07-01 | |
190 | 7 | death | 2009-09-01 | 2009-09-01 | |
191 | 8 | death | 2009-09-01 | 2009-09-01 | |
201 | 9 | death | 2008-05-01 | 2008-05-01 | |
215 | 10 | death | 2008-01-01 | 2008-01-01 |
This operator pulls all information period records from the information_periods
table.
Example 11
["information_periods"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
1 | 1 | observation_period | 2008-01-01 | 2010-12-31 | |
1 | 2 | observation_period | 2008-01-01 | 2010-12-31 | |
1 | 3 | observation_period | 2008-01-01 | 2010-12-31 | |
1 | 4 | observation_period | 2008-01-01 | 2010-12-31 | |
2 | 5 | observation_period | 2008-01-01 | 2010-12-31 | |
2 | 6 | observation_period | 2008-01-01 | 2010-12-31 | |
2 | 7 | observation_period | 2009-01-01 | 2010-12-31 | |
2 | 793 | observation_period | 2008-01-01 | 2010-12-31 | |
3 | 8 | observation_period | 2008-01-01 | 2010-12-31 | |
3 | 9 | observation_period | 2008-01-01 | 2010-12-31 |
Utilization operators are selection operators that generate records related to admissions.
This utilization operator generates records that are associated with Skilled Nursing Facilities (SNF).
Example 12
["snf"]
No Results found.
This utilization operator generates records that are associated with Hospice Facilities.
Example 13
["hospice"]
No Results found.
This utilization operator generates records that are associated with Hospitalizations.
Example 14
["hospitalization"]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
134 | 1 | condition_occurrence | 2009-11-23 | 2009-11-30 | |
247 | 2 | condition_occurrence | 2008-02-28 | 2008-03-04 | |
68 | 3 | condition_occurrence | 2008-04-19 | 2008-04-23 | |
247 | 4 | condition_occurrence | 2008-03-25 | 2008-03-29 | |
66 | 5 | condition_occurrence | 2009-08-04 | 2009-08-06 | |
95 | 6 | condition_occurrence | 2008-09-03 | 2008-09-12 | |
247 | 7 | condition_occurrence | 2008-08-14 | 2008-08-21 | |
202 | 8 | condition_occurrence | 2008-08-18 | 2008-08-20 | |
247 | 9 | condition_occurrence | 2008-07-13 | 2008-07-18 | |
153 | 10 | condition_occurrence | 2010-01-29 | 2010-02-08 |
Virtually all other operators add, remove, filter, or otherwise alter streams of records. They are discussed in this section.
Because streams represent sets of records, it makes sense to include operators that operate on sets
- Takes any number of upstream operators and aggregates their streams
- Unions together streams with identical types
- Think of streams with the same type flowing together into a single stream
- We're really just gathering the union of all IDs for identically-typed streams
- Streams with the different types flow along together concurrently without interacting
- It does not make sense to union, say, condition_occurrences with procedure_occurrences, so streams with different types won't mingle together, but will continue to flow downstream in parallel
- Unions together streams with identical types
Example 15 - Two streams of the same type (condition_occurrence) joined into a single stream
["union",["icd9","412"],["icd9","250.01"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
136 | 15609 | condition_occurrence | 2009-04-06 | 2009-04-06 | 25001 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
Example 16 - Two streams of the same type (condition_occurrence) joined into a single stream, then a different stream (procedure_occurrence) flows concurrently
["union",["union",["icd9","412"],["icd9","250.01"]],["cpt","99214"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
136 | 15609 | condition_occurrence | 2009-04-06 | 2009-04-06 | 25001 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
Example 17 - Two streams of the same type (condition_occurrence) joined into a single stream, along with a different stream (procedure_occurrence) flows concurrently (same as above example)
["union",["icd9","412"],["icd9","250.01"],["cpt","99214"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
136 | 15609 | condition_occurrence | 2009-04-06 | 2009-04-06 | 25001 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
This operator takes two sets of incoming streams, a left-hand stream and a right-hand stream. The operator matches like-type streams between the left-hand and right-hand streams. The operator removes any records in the left-hand stream if they appear in the right-hand stream. The operator passes only records for the left-hand stream downstream. The operator discards all records in the right-hand stream. For example:
Example 18 - All males who are not white
["except",{"left":["gender","male"],"right":["race","white"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
4 | 4 | person | 1941-06-01 | 1941-06-01 | 00021CA6FF03E670 |
6 | 6 | person | 1943-10-01 | 1943-10-01 | 0002DAE1C81CC70D |
21 | 21 | person | 1932-08-01 | 1932-08-01 | 000C7486B11E7030 |
22 | 22 | person | 1942-08-01 | 1942-08-01 | 000D6D88463D8A76 |
50 | 50 | person | 1941-08-01 | 1941-08-01 | 001AEDD510C92C87 |
68 | 68 | person | 1954-11-01 | 1954-11-01 | 0021D4CDAFC0609F |
69 | 69 | person | 1925-05-01 | 1925-05-01 | 00225409819CF5F6 |
71 | 71 | person | 1931-01-01 | 1931-01-01 | 0022CC1943D038E2 |
72 | 72 | person | 1933-04-01 | 1933-04-01 | 002354398A00234E |
75 | 75 | person | 1932-08-01 | 1932-08-01 | 00241223F034F97B |
If the left-hand stream has no types that match the right-hand stream, the left-hand stream passes through unaffected:
Example 19 - All Conditions that are MI
["except",{"left":["icd9","412"],"right":["cpt","99214"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
And just to show how multiple streams behave:
Example 20 - Passes two streams downstream: a stream of Conditions that are MI (this stream is completely unaffected by the right hand stream) and a stream of People that are Male but not White
["except",{"left":["union",["icd9","412"],["gender","Male"]],"right":["race","White"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
4 | 4 | person | 1941-06-01 | 1941-06-01 | 00021CA6FF03E670 |
6 | 6 | person | 1943-10-01 | 1943-10-01 | 0002DAE1C81CC70D |
21 | 21 | person | 1932-08-01 | 1932-08-01 | 000C7486B11E7030 |
22 | 22 | person | 1942-08-01 | 1942-08-01 | 000D6D88463D8A76 |
50 | 50 | person | 1941-08-01 | 1941-08-01 | 001AEDD510C92C87 |
68 | 68 | person | 1954-11-01 | 1954-11-01 | 0021D4CDAFC0609F |
69 | 69 | person | 1925-05-01 | 1925-05-01 | 00225409819CF5F6 |
71 | 71 | person | 1931-01-01 | 1931-01-01 | 0022CC1943D038E2 |
72 | 72 | person | 1933-04-01 | 1933-04-01 | 002354398A00234E |
75 | 75 | person | 1932-08-01 | 1932-08-01 | 00241223F034F97B |
Why should we allow two different types of streams to continue downstream concurrently?
This feature lets us do interesting things, like find the first occurrence of either an MI or Death as in the example below. Throw in a few more criteria and you could find the first occurrence of all censoring events for each patient
Example 21 - First occurrence of either MI or Death for each patient
["first",["union",["icd9","412"],["death"]]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
11 | 82614 | condition_occurrence | 2008-10-29 | 2008-10-29 | 412 |
16 | 1 | death | 2010-12-01 | 2010-12-01 | |
17 | 27269 | condition_occurrence | 2008-08-25 | 2008-08-25 | 412 |
26 | 2 | death | 2009-07-01 | 2009-07-01 | |
38 | 40556 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
54 | 74606 | condition_occurrence | 2008-06-05 | 2008-06-05 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
65 | 3 | death | 2009-01-01 | 2009-01-01 | |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
73 | 52843 | condition_occurrence | 2008-11-16 | 2008-11-16 | 412 |
All records in a stream carry a start_date and end_date with them. All temporal comparisons of streams use these two date columns. Each record in a stream derives its start and end date from its corresponding row in its table of origin.
If a record comes from a table that only has a single date value, the record derives both its start_date and end_date from that single date, e.g. a death record derives both its start_date and end_date from its corresponding row's observation_date.
The person stream is a special case. Person records use the person's date of birth as the start_date and end_date. This may sound strange, but we will explain below why this can be useful.
When looking at a set of records for a person, perhaps we want to select just the chronologically first or last record. Or maybe we want to select the 2nd record or 2nd to last record. Relative temporal operators provide this type of filtering. Relative temporal operators use a record's start_date to determine chronological order.
- Takes a two arguments: the stream to select from and an integer argument
- For the integer argument
- Positive numbers mean 1st, 2nd, 3rd occurrence in chronological order
- e.g. 1 => first
- e.g. 4 => fourth
- Negative numbers mean 1st, 2nd, 3rd occurrence in reverse chronological order
- e.g. -1 => last
- e.g. -4 => fourth from last
- 0 is undefined?
- Positive numbers mean 1st, 2nd, 3rd occurrence in chronological order
- Has an optional parameter:
unique
- Setting the
unique
parameter totrue
will de-duplicate all upstream records bypatient_id
,domain
, andsource_code
, creating a set of unique codes per patient. Then it will find the Nth occurrence from that set of unique codes.
- Setting the
Example 22 - For each patient, select the Condition that represents the second occurrence of an MI
["occurrence",2,["icd9","412"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
17 | 46165 | condition_occurrence | 2009-04-30 | 2009-04-30 | 412 |
88 | 73803 | condition_occurrence | 2009-05-30 | 2009-05-30 | 412 |
152 | 84298 | condition_occurrence | 2010-11-22 | 2010-11-22 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
187 | 87583 | condition_occurrence | 2010-12-26 | 2010-12-26 | 412 |
212 | 34052 | condition_occurrence | 2009-10-07 | 2009-10-07 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
214 | 83257 | condition_occurrence | 2009-11-12 | 2009-11-12 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
223 | 39893 | condition_occurrence | 2008-10-31 | 2008-10-31 | 412 |
In the example shown below, the unique
parameter under the Nth Occurrence
operator in the algorithm will de-duplicate all upstream records and create a new set of records limited to the first 410.00
, the first 410.01
, the first 250.00
, and the first 250.01
ICD-9 codes for a patient. The occurrence
parameter will find the nth of those unique codes. In the example below, the algorithm will find the second of three unique codes for each patient. This algorithm ensures that this patient has at least two of the ICD-9 codes listed in the ICD-9 operator. As a reminder, the First
, Last
, and Nth Occurrence
operators return only one record per patient. If there are multiple records with the same date that meet the requirements of the First
, Last
, and Nth Occurrence
, an arbitrary row is returned.
Example 23
["occurrence",2,["icd9","410.00","410.01","250.00","250.01"],{"unique":true}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 57897 | condition_occurrence | 2009-05-16 | 2009-05-16 | 25000 |
7 | 69706 | condition_occurrence | 2008-05-12 | 2008-05-12 | 25000 |
8 | 90541 | condition_occurrence | 2009-03-01 | 2009-03-03 | 25000 |
9 | 84726 | condition_occurrence | 2010-09-12 | 2010-09-12 | 25000 |
11 | 35227 | condition_occurrence | 2008-12-08 | 2008-12-08 | 25000 |
12 | 43800 | condition_occurrence | 2009-05-21 | 2009-05-21 | 25000 |
13 | 57738 | condition_occurrence | 2008-11-22 | 2008-11-22 | 25000 |
14 | 28741 | condition_occurrence | 2009-06-27 | 2009-06-27 | 25001 |
15 | 68873 | condition_occurrence | 2010-11-18 | 2010-11-18 | 25000 |
17 | 73012 | condition_occurrence | 2008-04-26 | 2008-04-26 | 25000 |
- Nth Occurrence Operator that is shorthand for writing
[ "occurrence", 1 ]
Example 24 - For each patient, select the Condition that represents the first occurrence of an MI
["first",["icd9","412"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
11 | 82614 | condition_occurrence | 2008-10-29 | 2008-10-29 | 412 |
17 | 27269 | condition_occurrence | 2008-08-25 | 2008-08-25 | 412 |
38 | 40556 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
54 | 74606 | condition_occurrence | 2008-06-05 | 2008-06-05 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
73 | 52843 | condition_occurrence | 2008-11-16 | 2008-11-16 | 412 |
77 | 58165 | condition_occurrence | 2010-10-06 | 2010-10-06 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
- Nth Occurrence Operator that is shorthand for writing
[ "occurrence", -1 ]
Example 25 - For each patient, select the Condition that represents the last occurrence of an MI
["last",["icd9","412"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
11 | 82614 | condition_occurrence | 2008-10-29 | 2008-10-29 | 412 |
17 | 46165 | condition_occurrence | 2009-04-30 | 2009-04-30 | 412 |
38 | 40556 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
54 | 74606 | condition_occurrence | 2008-06-05 | 2008-06-05 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
73 | 52843 | condition_occurrence | 2008-11-16 | 2008-11-16 | 412 |
77 | 58165 | condition_occurrence | 2010-10-06 | 2010-10-06 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 73803 | condition_occurrence | 2009-05-30 | 2009-05-30 | 412 |
For situations where we need to represent pre-defined date ranges, we can use "date literal" operators.
- Takes a hash with two elements: { start: <date-format>, end: <date-format> }
- Creates an inclusive, continuous range of dates defined by a start and end date
- Takes a single argument: <date-format>
- Represents a single day
- Shorthand for creating a date range that starts and ends on the same date
- Not yet implemented
Dates follow these formats:
- "YYYY-MM-DD"
- Four-digit year, two-digit month with leading 0s, two-digit day with leading 0s
- "START"
- Represents the first date of information available from the data source
- "END"
- Represents the last date of information available from the data source.
As described above, each record carries a start and end date, defining its own date range. It is through these date ranges that we are able to do temporal filtering of streams via temporal operators.
Temporal operators take two incoming sets of records, a set of left-hand records (LHR) and a set of right-hand records (RHR). A temporal operator groups records by person, then compares dates each person's LHR against dates in their RHR. Only LHR are passed along. RHR are discarded after the comparison.
If a person only has LHR or RHR but not both, the Temporal Operator does not pass any LHR records along.
Each operator performs a different type of comparison between LHR and RHR dates. All operators have some options which are best described later.
This is a temporal operator. It passes along any LHR that have a start_date and end_date completely contained within a RHR start_date and end_date.
Example 26 - All MIs for the year 2010
["during",{"left":["icd9","412"],"right":["date_range",{"start":"2010-01-01","end":"2010-12-31"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
90 | 33940 | condition_occurrence | 2010-05-26 | 2010-05-26 | 412 |
38 | 40556 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
239 | 49718 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
217 | 56467 | condition_occurrence | 2010-06-22 | 2010-06-23 | 412 |
77 | 58165 | condition_occurrence | 2010-10-06 | 2010-10-06 | 412 |
110 | 62592 | condition_occurrence | 2010-06-27 | 2010-06-27 | 412 |
152 | 84298 | condition_occurrence | 2010-11-22 | 2010-11-22 | 412 |
187 | 87583 | condition_occurrence | 2010-12-26 | 2010-12-26 | 412 |
109 | 89334 | condition_occurrence | 2010-07-06 | 2010-07-06 | 412 |
This is a temporal operator. It passes along any LHR that have a start_date and end_date which completely contain an RHR start_date and end_date.
Example 27 - All MIs occurring on 2010-06-22
["contains",{"left":["icd9","412"],"right":["date_range",{"start":"2010-06-22","end":"2010-06-22"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
217 | 56467 | condition_occurrence | 2010-06-22 | 2010-06-23 | 412 |
This is a temporal operator. It passes along any LHR whose date range overlaps in any way with a RHR's date range. This diagram attempts to demonstrate all L records that would qualify as having "any_overlap" with an R record.
Example 28 - All MIs that overlap with the month of 2010-06
["any_overlap",{"left":["icd9","412"],"right":["date_range",{"start":"2010-06-01","end":"2010-06-30"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
217 | 56467 | condition_occurrence | 2010-06-22 | 2010-06-23 | 412 |
110 | 62592 | condition_occurrence | 2010-06-27 | 2010-06-27 | 412 |
This is a temporal operator. Its behavior warrants a bit of explanation.
Imagine events A-A-B-A-B-A. How many A's come before a B? One could argue that only two A's come before the first B, so only the first two A's qualify. However, there are three A's that ultimately come before a B.
The Before
operator uses the latter approach, so all LHR records are compared against the start_date of the last RHR record.
Example 29 - All MIs that occurred before a patient's last case of diabetes (250.01)
["before",{"left":["icd9","412"],"right":["icd9","250.01"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
184 | 24281 | condition_occurrence | 2008-01-07 | 2008-01-07 | 412 |
17 | 27269 | condition_occurrence | 2008-08-25 | 2008-08-25 | 412 |
17 | 46165 | condition_occurrence | 2009-04-30 | 2009-04-30 | 412 |
152 | 82988 | condition_occurrence | 2008-08-16 | 2008-08-16 | 412 |
214 | 83257 | condition_occurrence | 2009-11-12 | 2009-11-12 | 412 |
109 | 89334 | condition_occurrence | 2010-07-06 | 2010-07-06 | 412 |
214 | 89463 | condition_occurrence | 2008-02-17 | 2008-02-17 | 412 |
This is a temporal operator. Its behavior warrants a bit of explanation.
Imagine events A-A-B-A-B-A. How many A's come after a B? One could argue that only one A come before the last B, so only the last A qualifies. However, there are two A's that ultimately come after a B.
The After
operator uses the latter approach, so all LHR records are compared against the end_date of the first RHR record.
Example 30 - All MIs that occurred after a patient's first case of diabetes (250.01)
["after",{"left":["icd9","412"],"right":["icd9","250.01"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
96 | 20748 | condition_occurrence | 2008-09-21 | 2008-09-24 | 412 |
223 | 38682 | condition_occurrence | 2008-07-28 | 2008-07-28 | 412 |
223 | 39893 | condition_occurrence | 2008-10-31 | 2008-10-31 | 412 |
77 | 58165 | condition_occurrence | 2010-10-06 | 2010-10-06 | 412 |
190 | 61423 | condition_occurrence | 2009-04-09 | 2009-04-09 | 412 |
110 | 62592 | condition_occurrence | 2010-06-27 | 2010-06-27 | 412 |
214 | 83257 | condition_occurrence | 2009-11-12 | 2009-11-12 | 412 |
152 | 84298 | condition_occurrence | 2010-11-22 | 2010-11-22 | 412 |
187 | 87583 | condition_occurrence | 2010-12-26 | 2010-12-26 | 412 |
If this is not the behavior you desire, use one of the relative time operators to select which RHR should be the one used to do comparison
Example 31 - All MIs that occurred before a patient's first case of diabetes (250.01)
["before",{"left":["icd9","412"],"right":["first",["icd9","250.01"]]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
184 | 24281 | condition_occurrence | 2008-01-07 | 2008-01-07 | 412 |
17 | 27269 | condition_occurrence | 2008-08-25 | 2008-08-25 | 412 |
17 | 46165 | condition_occurrence | 2009-04-30 | 2009-04-30 | 412 |
152 | 82988 | condition_occurrence | 2008-08-16 | 2008-08-16 | 412 |
109 | 89334 | condition_occurrence | 2010-07-06 | 2010-07-06 | 412 |
214 | 89463 | condition_occurrence | 2008-02-17 | 2008-02-17 | 412 |
187 | 90638 | condition_occurrence | 2009-01-31 | 2009-01-31 | 412 |
There are situations when the date columns associated with a record should have their values shifted forward or backward in time to make a comparison with another set of dates. This is where the Time Window
operator is used. It has the following properties:
- Takes 2 arguments
- First argument is the stream on which to operate
- Second argument is a hash with two keys: [:start, :end] each with a value in the following format: "(-?\d+[dmy])+"
- Both start and end must be defined, even if you are only adjusting one of the dates
- Some examples
- 30d => 30 days
- 20 => 20 days
- d => 1 day
- 1y => 1 year
- -1m => -1 month
- 10d3m => 3 months and 10 days
- -2y10m-3d => -2 years, +10 months, -3 days
- The start or end value can also be '', '0', or nil
- This will leave the date unaffected
- The start or end value can also be the string 'start' or 'end'
- 'start' represents the start_date for each record
- 'end' represents the end_date for each record
- See the example below
Example 32 - All Diagnoses of Diabetes (ICD-9 250.01) within 30 days of an MI
["during",{"left":["icd9","250.01"],"right":["time_window",["icd9","412"],{"start":"-30d","end":"30d"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
96 | 46366 | condition_occurrence | 2008-09-13 | 2008-09-13 | 25001 |
223 | 81480 | condition_occurrence | 2008-07-03 | 2008-07-03 | 25001 |
Example 33 - Shift the window for all MIs back by 200 years
["time_window",["icd9","412"],{"start":"-200y","end":"-200y"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 1808-03-22 | 1808-03-23 | 412 |
177 | 507 | condition_occurrence | 1809-06-13 | 1809-06-16 | 412 |
230 | 523 | condition_occurrence | 1808-03-14 | 1808-03-21 | 412 |
161 | 963 | condition_occurrence | 1809-10-25 | 1809-10-29 | 412 |
60 | 986 | condition_occurrence | 1809-07-19 | 1809-07-22 | 412 |
81 | 1405 | condition_occurrence | 1809-01-28 | 1809-01-30 | 412 |
88 | 1572 | condition_occurrence | 1809-01-03 | 1809-01-09 | 412 |
213 | 15005 | condition_occurrence | 1810-02-07 | 1810-02-07 | 412 |
66 | 16171 | condition_occurrence | 1809-07-25 | 1809-07-25 | 412 |
220 | 20660 | condition_occurrence | 1809-10-31 | 1809-10-31 | 412 |
Example 34 - Expand the dates for all MIs to a window ranging from 2 months and 2 days prior to 1 year and 3 days after the MI
["time_window",["icd9","412"],{"start":"-2m-2d","end":"3d1y"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-01-20 | 2009-03-26 | 412 |
177 | 507 | condition_occurrence | 2009-04-11 | 2010-06-19 | 412 |
230 | 523 | condition_occurrence | 2008-01-12 | 2009-03-24 | 412 |
161 | 963 | condition_occurrence | 2009-08-23 | 2010-11-01 | 412 |
60 | 986 | condition_occurrence | 2009-05-17 | 2010-07-25 | 412 |
81 | 1405 | condition_occurrence | 2008-11-26 | 2010-02-02 | 412 |
88 | 1572 | condition_occurrence | 2008-11-01 | 2010-01-12 | 412 |
213 | 15005 | condition_occurrence | 2009-12-05 | 2011-02-10 | 412 |
66 | 16171 | condition_occurrence | 2009-05-23 | 2010-07-28 | 412 |
220 | 20660 | condition_occurrence | 2009-08-29 | 2010-11-03 | 412 |
Example 35 - Collapse all 412 date ranges down to just the date of admission by leaving start_date unaffected and setting end_date to start_date
["time_window",["icd9","412"],{"start":"","end":"start"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-22 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-13 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-14 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-25 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-19 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-28 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-03 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
Example 36 - Nonsensical, but allowed: swap the start_date and end_date for a range
["time_window",["icd9","412"],{"start":"end","end":"start"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-23 | 2008-03-22 | 412 |
177 | 507 | condition_occurrence | 2009-06-16 | 2009-06-13 | 412 |
230 | 523 | condition_occurrence | 2008-03-21 | 2008-03-14 | 412 |
161 | 963 | condition_occurrence | 2009-10-29 | 2009-10-25 | 412 |
60 | 986 | condition_occurrence | 2009-07-22 | 2009-07-19 | 412 |
81 | 1405 | condition_occurrence | 2009-01-30 | 2009-01-28 | 412 |
88 | 1572 | condition_occurrence | 2009-01-09 | 2009-01-03 | 412 |
213 | 15005 | condition_occurrence | 2010-02-07 | 2010-02-07 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
Like temporal comparison operators, this operator takes left and right hand streams of records.
The operator finds the most recent end_date in the RHR and uses that date to "trim" the start_dates of each LHR record.
There are four possible scenarios:
- There is no corresponding RHR for the LHR
- LHR row is passed through unaffected
- RHR end_date is before LHR start_date
- LHR row is passed through unaffected
- RHR end_date is after LHR end_date
- LHR row is completely discarded
- RHR end_date falls between LHR start_date and end_date
- LHR's start_date is set to RHR's end_date
Example 37 - Create time windows starting at 1980 and ending when patient is 50
["trim_date_start",{"left":["time_window",["person"],{"end":"+50y"}],"right":["date_range",{"start":"1980-01-01","end":"1980-01-01"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 2 | person | 1980-01-01 | 1993-01-01 | 00016F745862898F |
3 | 3 | person | 1980-01-01 | 1986-09-01 | 0001FDD721E223DC |
4 | 4 | person | 1980-01-01 | 1991-06-01 | 00021CA6FF03E670 |
5 | 5 | person | 1980-01-01 | 1986-08-01 | 00024B3D2352D2D0 |
6 | 6 | person | 1980-01-01 | 1993-10-01 | 0002DAE1C81CC70D |
8 | 8 | person | 1980-01-01 | 1985-09-01 | 000308435E3E5B76 |
9 | 9 | person | 1980-01-01 | 2026-09-01 | 000345A39D4157C9 |
10 | 10 | person | 1980-01-01 | 1988-10-01 | 00036A21B65B0206 |
11 | 11 | person | 1980-01-01 | 1984-02-01 | 000489E7EAAD463F |
13 | 13 | person | 1980-01-01 | 1986-07-01 | 0004F0ABD505251D |
Like temporal comparison operators, this operator takes left and right hand streams of records.
The operator finds the earliest start_date in the RHR and uses that date to "trim" the end_dates of each LHR record.
There are four possible scenarios:
- There is no corresponding RHR for the LHR
- LHR row is passed through unaffected
- RHR start_date is after LHR end_date
- LHR row is passed through unaffected
- RHR start_date is before LHR start_date
- LHR row is completely discarded
- RHR start_date falls between LHR start_date and end_date
- LHR's end_date is set to RHR's start_date
Example 38 - Create time windows starting at patient's birth and ending when patient is 50 or 1980, what ever is earlier
["trim_date_end",{"left":["time_window",["person"],{"end":"+50y"}],"right":["date_range",{"start":"1980-01-01","end":"1980-01-01"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
1 | 1 | person | 1923-05-01 | 1973-05-01 | 00013D2EFD8E45D1 |
2 | 2 | person | 1943-01-01 | 1980-01-01 | 00016F745862898F |
3 | 3 | person | 1936-09-01 | 1980-01-01 | 0001FDD721E223DC |
4 | 4 | person | 1941-06-01 | 1980-01-01 | 00021CA6FF03E670 |
5 | 5 | person | 1936-08-01 | 1980-01-01 | 00024B3D2352D2D0 |
6 | 6 | person | 1943-10-01 | 1980-01-01 | 0002DAE1C81CC70D |
7 | 7 | person | 1922-07-01 | 1972-07-01 | 0002F28CE057345B |
8 | 8 | person | 1935-09-01 | 1980-01-01 | 000308435E3E5B76 |
9 | 9 | person | 1976-09-01 | 1980-01-01 | 000345A39D4157C9 |
10 | 10 | person | 1938-10-01 | 1980-01-01 | 00036A21B65B0206 |
Sometimes it is difficult to reason through Time Window
when working with temporal comparison operators. It would be nice if a more intuitive language could be used to describe some common temporal relationships.
We've added a few parameters, primarily for the Before
and After
operators, that will help with temporal comparisons.
within
- Takes same date adjustment format as
time_window
, e.g. 30d or 2m or 1y-3d - The start_date and end_date of the RHR are adjusted out in each direction by the amount specified and the LHR must pass the original temporal comparison and then fall within the window created by the adjustment
- Takes same date adjustment format as
at_least
- Takes same date adjustment format as
time_window
, e.g. 30d or 2m or 1y-3d - The start_date and end_date of the RHR are adjusted out in each direction by the amount specified and the LHR must pass the original temporal comparison and then fall outside the window created by the adjustment
- Takes same date adjustment format as
Let's see them in action:
Prescriptions After a Office Visit - Find all prescriptions of interest occurring within three days after an office visit
Example 39
["after",{"left":["ndc","61392070054","65084025214","65726040125",{"label":"Prescriptions of Interest"}],"right":["cpt","99214",{"label":"Office Visit"}],"within":"3d"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
207 | 11312 | drug_exposure | 2010-08-13 | 2010-08-13 | 61392070054 |
Walk through of example above:
- Pull some prescriptions of interest into LHR
- Pull some office visits into RHR
- Compare LHR against RHR, enforcing that the LHR' start_date falls after the RHR' end_date
- Enforce that any remaining LHR' start_date falls within an RHR's (start_date - 3 days) and (end_date + 3 days)
Find all Heart Attacks Probably Resulting in Death -- Does anyone die within a year of a heart attack?
Example 40
["before",{"left":["icd9","410.00","410.01","410.10","410.11"],"right":["death"],"within":"1y"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
191 | 42977 | condition_occurrence | 2009-05-21 | 2009-05-21 | 41010 |
Walk through of example above:
- Pull hospitalizations into LHR
- Pull death records into RHR
- Compare LHR against RHR, enforcing that the LHR' end_date falls before the RHR' start_date
- Enforce that any remaining LHR row's end_date falls within an RHR row's (start_date - 1 year) and (end_date + 1 year)
Example 41 - Find all diagnosis of heart attack at least 1 week after a diagnosis of diabetes
["after",{"left":["icd9","410.00","410.01","410.10","410.11",{"label":"Heart Attack Dx"}],"right":["icd9","250.01",{"label":"Diabetes Dx"}],"at_least":"1w"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
58 | 79480 | condition_occurrence | 2009-02-24 | 2009-02-24 | 41011 |
This example illustrates how at_least
works.
Currently, temporal comparisons are done with an inner join between the LHR relation and the RHR relation. This has some interesting effects:
- If more than one RHR row matches with an LHR row, multiple copies of the LHR row will end up in the downstream records
- Should we limit the LHR to only unique rows, essentially de-duping the downstream records?
- If the same row appears in both the LHR and RHR relation, it is likely the row will match itself (e.g. a row occurs during itself and contains itself etc.)
- This is a bit awkward and perhaps we should skip joining rows against each other if they are identical (i.e. have the same
criterion_id
andcriterion_type
)?
- This is a bit awkward and perhaps we should skip joining rows against each other if they are identical (i.e. have the same
Person streams carry a patient's date of birth in their start and end date columns. This makes them almost useless when they are part of the L stream of a temporal operator. But person streams are useful as the R stream. By Time Window
ing the patient's date of birth, we can filter based on the patient's age like so:
Example 42 - All MIs that occurred after a male patient's 50th birthday
["after",{"left":["icd9","412"],"right":["time_window",["gender","Male"],{"start":"50y","end":"50y"}]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
190 | 61423 | condition_occurrence | 2009-04-09 | 2009-04-09 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
197 | 21914 | condition_occurrence | 2009-05-07 | 2009-05-07 | 412 |
149 | 28518 | condition_occurrence | 2008-04-07 | 2008-04-07 | 412 |
239 | 49718 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
239 | 47130 | condition_occurrence | 2009-08-04 | 2009-08-04 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
214 | 89463 | condition_occurrence | 2008-02-17 | 2008-02-17 | 412 |
214 | 83257 | condition_occurrence | 2009-11-12 | 2009-11-12 | 412 |
187 | 90638 | condition_occurrence | 2009-01-31 | 2009-01-31 | 412 |
There are rare occasions when we'd like to stitch a set of events together into a span of time, such as a set of prescriptions events into a span of time we'd consider a patient to be taking a certain medication. ConceptQL provides the Episode
operator for such an occasion.
Example 43 - Episodes of diabetes diagnoses
["episode",["icd9","250.00"],{"gap_of":"90"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 0 | episode | 2009-05-11 | 2009-07-31 | |
3 | 0 | episode | 2009-11-17 | 2009-11-17 | |
5 | 0 | episode | 2009-06-28 | 2009-06-28 | |
7 | 0 | episode | 2008-04-23 | 2008-06-09 | |
7 | 0 | episode | 2008-10-02 | 2008-11-30 | |
7 | 0 | episode | 2009-07-12 | 2009-09-13 | |
7 | 0 | episode | 2010-04-22 | 2010-07-18 | |
8 | 0 | episode | 2008-10-17 | 2008-10-17 | |
8 | 0 | episode | 2009-03-01 | 2009-03-03 | |
8 | 0 | episode | 2009-07-16 | 2009-07-16 |
One side-effect of this operator is that the criterion_domain
is set to "episode" and each row no longer contains a reference back to a criterion_id
or criterion_table
because one or more records are folded into a single row after passing through the episode operator. Often, the episode operator is best-suited to act as the RHR of a temporal operator.
This operator compares a set of incoming streams. A record is only passed along if there is a corresponding record in each of the other streams that occurs within the specified time frame.
For instance, say the operator is passed two streams, A and B, and the time frame is 30 days before or after. If a record in A occurs within 30 days of a record in B, both the record from A and the record from B get passed along.
The impetus for this operator was to be able to find procedures that occur within a certain period of time from a diagnosis and to be able to get both the procedure record and the diagnosis record as output.
Example 44 - Diabetes diagnosis within 30 days of an office visit
["concurrent_within",["cpt_or_hcpcs","99214"],["icd9","250.00"],{"start":"-30d","end":"30d"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
19 | 62873 | procedure_occurrence | 2008-05-10 | 2008-05-10 | 99214 |
77 | 52511 | condition_occurrence | 2008-10-26 | 2008-10-26 | 25000 |
83 | 52455 | condition_occurrence | 2010-10-26 | 2010-10-26 | 25000 |
86 | 68057 | procedure_occurrence | 2009-04-21 | 2009-04-21 | 99214 |
95 | 29773 | procedure_occurrence | 2008-12-07 | 2008-12-07 | 99214 |
98 | 56125 | procedure_occurrence | 2009-11-16 | 2009-11-18 | 99214 |
99 | 94263 | condition_occurrence | 2009-06-27 | 2009-06-27 | 25000 |
105 | 85026 | condition_occurrence | 2008-03-07 | 2008-03-07 | 25000 |
108 | 91221 | procedure_occurrence | 2009-06-05 | 2009-06-05 | 99214 |
110 | 92651 | procedure_occurrence | 2010-05-08 | 2010-05-08 | 99214 |
There are a several operators that filter an incoming stream before passing it along.
Often in claims data, an event is reported with a place of service via the CMS place of service codes, such as "inpatient hospital" (21) or "outpatient hospital" (22). For these kinds of queries, ConceptQL has the Place of Service Filter
Example 45
["place_of_service_filter",22,["icd9","250.00"]]
No Results found.
Our sample data for the examples in this document does not contain any place of service information so there are no records after applying this filter.
For better or worse, provenance, or rather information about the origin of claims data, is some times an important factor in research algorithms. Often, paper authors will describe using "inpatient records", by which they mean using claims that come from "the inpatient file" from a set of claims data. GDM tracks the provenance of data as part of its ETL process, and ConceptQL includes an operator to select rows that have a particular provenance, aptly called Provenance
.
Example 46
["provenance","inpatient",["icd9","250.00"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
68 | 19 | condition_occurrence | 2008-04-19 | 2008-04-23 | 25000 |
207 | 106 | condition_occurrence | 2010-04-28 | 2010-05-02 | 25000 |
131 | 170 | condition_occurrence | 2008-03-22 | 2008-03-23 | 25000 |
243 | 205 | condition_occurrence | 2009-08-20 | 2009-08-23 | 25000 |
230 | 231 | condition_occurrence | 2008-03-30 | 2008-04-02 | 25000 |
32 | 257 | condition_occurrence | 2010-08-30 | 2010-09-07 | 25000 |
79 | 267 | condition_occurrence | 2009-08-09 | 2009-08-24 | 25000 |
230 | 284 | condition_occurrence | 2008-03-01 | 2008-03-06 | 25000 |
168 | 297 | condition_occurrence | 2010-02-14 | 2010-02-18 | 25000 |
37 | 310 | condition_occurrence | 2009-01-19 | 2009-01-20 | 25000 |
There are times when we'd like to captures events only when a particular provider specialty was involved. ConceptQL provides the Provider Filter
operator for this. This operator is a bit bare-bones at the moment in that it requires users to enter in the concept ID of the specialty they are seeking, rather than providing a list of known specialties.
Example 47
["provider_filter",["icd9","250.00"],{"specialties":"12301023"}]
No Results found.
Our sample data for the examples in this document does not contain any provider specialty information so there are no records after applying this filter.
A very common pattern in algorithms is to consider a condition to be valid if:
- the condition is seen just once in the inpatient file, or
- the condition is present twice in the outpatient file, with
- a minimum separation, normally of a 30 days, between the two occurrences
We have created an operator that handles this pattern: One In Two Out
Example 48 - First of diabetes diagnoses, either found once in the inpatient file, or twice in the outpatient file with a minimum of 30 days between outpatient diagnoses
["one_in_two_out",["icd9","250.00"],{"outpatient_minimum_gap":"30d"}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 76985 | condition_occurrence | 2009-07-31 | 2009-07-31 | 25000 |
7 | 38732 | condition_occurrence | 2008-06-09 | 2008-06-09 | 25000 |
8 | 90541 | condition_occurrence | 2009-03-01 | 2009-03-03 | 25000 |
9 | 84726 | condition_occurrence | 2010-09-12 | 2010-09-12 | 25000 |
11 | 35227 | condition_occurrence | 2008-12-08 | 2008-12-08 | 25000 |
12 | 43800 | condition_occurrence | 2009-05-21 | 2009-05-21 | 25000 |
13 | 57738 | condition_occurrence | 2008-11-22 | 2008-11-22 | 25000 |
14 | 24380 | condition_occurrence | 2009-09-13 | 2009-09-13 | 25000 |
15 | 68873 | condition_occurrence | 2010-11-18 | 2010-11-18 | 25000 |
17 | 72739 | condition_occurrence | 2008-05-16 | 2008-05-16 | 25000 |
The One In Two Out
operator yields a single row per patient. The row is the earliest confirmed event from the operator's upstream set of events.
The One In Two Out
operator uses the file from which a row came from to determine if the row represents an "inpatient" record or an "outpatient" record. The place of service associated with a record is not taken into account. Nothing about the associated visit is considered either.
condition_occurrence
rows with provenance related to of "inpatient" files are considered inpatient records. All other records are considered outpatient.
Non-condition_occurrence
rows are removed from the stream.
In order to simplify how sets of inpatient and outpatient records are compared to each other temporally, we collapse each incoming row's date range to a single date. Users can choose which date to use for both inpatient and outpatient records.
- Inpatient Length of Stay
- Enforces a minimum length of stay for inpatient records
- If given a whole number greater than 0, requires that the patient had been hospitalized for at least that many days
- Optional
- Default: empty
- If left empty, length of stay is ignored
- Length of Stay (LOS) is calculated as:
- If (end_date - start_date) < 2, then LOS is 1
- else LOS is (end_date - start_date) + 1
- Inpatient Return Date
- Determines which date from an inpatient record should be used as the date in which the associated condition "occurred"
- Options
- Admit Date
- Discharge Date
- Optional
- Defaults to discharge date (end_date of the given condition_occurrence)
- The discharge date is preferred over the admit date because only at discharge can we say for certain a patient was diagnosed with a given condition. We cannot be certain a patient had the condition at the time of admission
- Outpatient Minimum Gap
- Requires that the second, confirming outpatient diagnosis must occur at least this many days after the initial diagnosis
- Uses same time adjustment syntax as
time_window
, e.g. 1d3m10y - Required
- Default: 30d
- Outpatient Maximum Gap
- Requires that the second, confirming outpatient diagnosis must occur at within this many days after the initial diagnosis
- Uses same time adjustment syntax as
time_window
, e.g. 1d3m10y - Optional
- Default: empty
- When empty, the confirming diagnosis may occur at any time after the minimum_gap to be considered valid
- This option is useful when attempting to limit confirming diagnoses for acute diseases, e.g. if attempting to find confirmation of the flu, it would be advisable to set this option in order to limit confirmation to distinct periods when a patient had the flu
- Outpatient Event to Return
- Determines which event return should two outpatient records meet all criteria
- In most situations, using the initial outpatient event is desired, particularly when the algorithm is used to record an exposure of interest or outcome
- However, returning the initial outpatient event when the algorithm is used to determine an index event can introduce immortal time bias. In these cases, using the confirming event will avoid immortal time bias
- Options
- Initial Event
- Confirming Event
- Optional
- Defaults to Initial Event
- Determines which event return should two outpatient records meet all criteria
Often we want to filter out a set of records by people. For instance, say we wanted to find all MIs for all males. We'd use the Person Filter
operator for that. Like the Except
operator, it takes a left-hand stream and a right-hand stream.
Unlike the Except
operator, the Person Filter
operator will use all types of all streams in the right-hand side to filter out records in all types of all streams on the left hand side.
Example 49 - All MI Conditions for people who are male
["person_filter",{"left":["icd9","412"],"right":["gender","Male"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
190 | 61423 | condition_occurrence | 2009-04-09 | 2009-04-09 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
197 | 21914 | condition_occurrence | 2009-05-07 | 2009-05-07 | 412 |
149 | 28518 | condition_occurrence | 2008-04-07 | 2008-04-07 | 412 |
239 | 49718 | condition_occurrence | 2010-02-12 | 2010-02-12 | 412 |
239 | 47130 | condition_occurrence | 2009-08-04 | 2009-08-04 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
214 | 89463 | condition_occurrence | 2008-02-17 | 2008-02-17 | 412 |
214 | 83257 | condition_occurrence | 2009-11-12 | 2009-11-12 | 412 |
187 | 90638 | condition_occurrence | 2009-01-31 | 2009-01-31 | 412 |
But we can get crazier. The right-hand side doesn't have to be a person stream. If a non-person stream is used in the right-hand side, the person_filter will cast all right-hand streams to person first and use the union of those streams:
Example 50 - All MI Conditions for people who had an office visit at some point in the data
["person_filter",{"left":["icd9","412"],"right":["cpt","99214"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
96 | 20748 | condition_occurrence | 2008-09-21 | 2008-09-24 | 412 |
Example 51 - All MI Conditions for people who had an office visit at some point in the data (an explicit representation of what's happening in the diagram above)
["person_filter",{"left":["icd9","412"],"right":["person",["cpt","99214"]]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
131 | 172 | condition_occurrence | 2008-03-22 | 2008-03-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
230 | 523 | condition_occurrence | 2008-03-14 | 2008-03-21 | 412 |
161 | 963 | condition_occurrence | 2009-10-25 | 2009-10-29 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
81 | 1405 | condition_occurrence | 2009-01-28 | 2009-01-30 | 412 |
88 | 1572 | condition_occurrence | 2009-01-03 | 2009-01-09 | 412 |
66 | 16171 | condition_occurrence | 2009-07-25 | 2009-07-25 | 412 |
220 | 20660 | condition_occurrence | 2009-10-31 | 2009-10-31 | 412 |
96 | 20748 | condition_occurrence | 2008-09-21 | 2008-09-24 | 412 |
Example 52 - All MI Conditions for people who are Male OR had an office visit at some point in the data
["person_filter",{"left":["icd9","412"],"right":["union",["cpt","99214"],["gender","Male"]]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
234 | 22299 | condition_occurrence | 2008-09-01 | 2008-09-01 | 412 |
177 | 48526 | condition_occurrence | 2008-09-23 | 2008-09-23 | 412 |
177 | 507 | condition_occurrence | 2009-06-13 | 2009-06-16 | 412 |
106 | 29318 | condition_occurrence | 2009-02-27 | 2009-02-27 | 412 |
110 | 62592 | condition_occurrence | 2010-06-27 | 2010-06-27 | 412 |
184 | 24281 | condition_occurrence | 2008-01-07 | 2008-01-07 | 412 |
190 | 61423 | condition_occurrence | 2009-04-09 | 2009-04-09 | 412 |
60 | 986 | condition_occurrence | 2009-07-19 | 2009-07-22 | 412 |
197 | 21914 | condition_occurrence | 2009-05-07 | 2009-05-07 | 412 |
149 | 28518 | condition_occurrence | 2008-04-07 | 2008-04-07 | 412 |
And don't forget the left-hand side can have multiple types of streams:
Example 53 - Yields two streams: a stream of all MI Conditions for people who are Male and a stream of all office visit Procedures for people who are Male
["person_filter",{"left":["union",["icd9","412"],["cpt","99214"]],"right":["gender","Male"]}]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
71 | 31789 | procedure_occurrence | 2009-07-12 | 2009-07-12 | 99214 |
68 | 35465 | procedure_occurrence | 2008-11-08 | 2008-11-08 | 99214 |
68 | 35191 | procedure_occurrence | 2008-06-30 | 2008-06-30 | 99214 |
68 | 35154 | procedure_occurrence | 2009-03-26 | 2009-03-26 | 99214 |
68 | 33110 | procedure_occurrence | 2010-08-13 | 2010-08-13 | 99214 |
68 | 32826 | procedure_occurrence | 2009-07-22 | 2009-07-22 | 99214 |
68 | 49092 | procedure_occurrence | 2008-10-28 | 2008-10-28 | 99214 |
68 | 88881 | procedure_occurrence | 2010-03-17 | 2010-03-17 | 99214 |
68 | 80436 | procedure_occurrence | 2009-11-13 | 2009-11-13 | 99214 |
146 | 37311 | procedure_occurrence | 2009-08-17 | 2009-08-17 | 99214 |
These operators compare incoming streams of records in special ways to produce a set of records.
Much like temporal operators, this operator takes left and right hand streams. It passes along LHR that have a RHR with a matching person_id, criterion_id, and criterion_domain. Essentially, it checks to see if a LHR is also present in the right hand stream.
This operator is nearly identical to the Filter
operator but primarily is used internally. There are additional options available to this operator that are not exposed in the Jigsaw UI. Users are advised to use the Filter
operator as opposed to this one.
Claims data often reports diagnoses on the same "line" as a procedure. This is done for billing reasons, essentially the provider is saying "I preformed this procedure due to these conditions present in the patient". ConceptQL has an operator that specifically targets this kind of relationship: Co-Reported
The operator takes two or more streams. Events in each stream must be reported at the same time in order to be passed through the operator. In the case of GDM this means that events that share the same context ID.
Example 54 - Diabetes diagnosis reported as part the reason for an office visit
["co_reported",["icd9","250.00"],["cpt","99214"]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
25 | 93217 | condition_occurrence | 2008-09-18 | 2008-09-18 | 25000 |
25 | 93220 | procedure_occurrence | 2008-09-18 | 2008-09-18 | 99214 |
31 | 87695 | condition_occurrence | 2008-06-17 | 2008-06-17 | 25000 |
31 | 87696 | procedure_occurrence | 2008-06-17 | 2008-06-17 | 99214 |
40 | 81973 | condition_occurrence | 2009-05-01 | 2009-05-01 | 25000 |
40 | 81993 | procedure_occurrence | 2009-05-01 | 2009-05-01 | 99214 |
44 | 44702 | condition_occurrence | 2008-01-10 | 2008-01-10 | 25000 |
44 | 44715 | procedure_occurrence | 2008-01-10 | 2008-01-10 | 99214 |
58 | 35093 | condition_occurrence | 2009-06-15 | 2009-06-15 | 25000 |
58 | 35095 | procedure_occurrence | 2009-06-15 | 2009-06-15 | 99214 |
Note that both the diagnosis and procedure are pass on through the operator.
Any ConceptQL operator can be assigned a label. The label simply provides a way to apply a brief description to an operator, generally, what kind of records the operator is producing.
Any operator with a label
inserts its label
into the set of records.
For instance, here are some diabetes diagnoses with a label applied:
Example 55
["icd9","250.00",{"label":"diabetes"}]
person_id | criterion_id | criterion_table | criterion_domain | start_date | end_date | source_value | source_vocabulary_id | label |
---|---|---|---|---|---|---|---|---|
68 | 19 | clinical_codes | condition_occurrence | 2008-04-19 | 2008-04-23 | 25000 | ICD9CM | diabetes |
207 | 106 | clinical_codes | condition_occurrence | 2010-04-28 | 2010-05-02 | 25000 | ICD9CM | diabetes |
131 | 170 | clinical_codes | condition_occurrence | 2008-03-22 | 2008-03-23 | 25000 | ICD9CM | diabetes |
243 | 205 | clinical_codes | condition_occurrence | 2009-08-20 | 2009-08-23 | 25000 | ICD9CM | diabetes |
230 | 231 | clinical_codes | condition_occurrence | 2008-03-30 | 2008-04-02 | 25000 | ICD9CM | diabetes |
32 | 257 | clinical_codes | condition_occurrence | 2010-08-30 | 2010-09-07 | 25000 | ICD9CM | diabetes |
79 | 267 | clinical_codes | condition_occurrence | 2009-08-09 | 2009-08-24 | 25000 | ICD9CM | diabetes |
230 | 284 | clinical_codes | condition_occurrence | 2008-03-01 | 2008-03-06 | 25000 | ICD9CM | diabetes |
168 | 297 | clinical_codes | condition_occurrence | 2010-02-14 | 2010-02-18 | 25000 | ICD9CM | diabetes |
37 | 310 | clinical_codes | condition_occurrence | 2009-01-19 | 2009-01-20 | 25000 | ICD9CM | diabetes |
The most recent upstream label
is the one that is output:
Example 56
["first",["icd9","250.00",{"label":"won't show"}],{"label":"1st diabetes"}]
person_id | criterion_id | criterion_table | criterion_domain | start_date | end_date | source_value | source_vocabulary_id | label |
---|---|---|---|---|---|---|---|---|
2 | 89752 | clinical_codes | condition_occurrence | 2009-05-11 | 2009-05-11 | 25000 | ICD9CM | 1st diabetes |
3 | 93400 | clinical_codes | condition_occurrence | 2009-11-17 | 2009-11-17 | 25000 | ICD9CM | 1st diabetes |
5 | 89717 | clinical_codes | condition_occurrence | 2009-06-28 | 2009-06-28 | 25000 | ICD9CM | 1st diabetes |
7 | 15942 | clinical_codes | condition_occurrence | 2008-04-23 | 2008-04-23 | 25000 | ICD9CM | 1st diabetes |
8 | 95135 | clinical_codes | condition_occurrence | 2008-10-17 | 2008-10-17 | 25000 | ICD9CM | 1st diabetes |
9 | 24929 | clinical_codes | condition_occurrence | 2010-08-01 | 2010-08-01 | 25000 | ICD9CM | 1st diabetes |
11 | 73103 | clinical_codes | condition_occurrence | 2008-06-17 | 2008-06-17 | 25000 | ICD9CM | 1st diabetes |
12 | 15443 | clinical_codes | condition_occurrence | 2009-04-20 | 2009-04-20 | 25000 | ICD9CM | 1st diabetes |
13 | 19968 | clinical_codes | condition_occurrence | 2008-03-05 | 2008-03-05 | 25000 | ICD9CM | 1st diabetes |
14 | 77991 | clinical_codes | condition_occurrence | 2009-06-05 | 2009-06-05 | 25000 | ICD9CM | 1st diabetes |
The idea behind this is to create hints about what is being output:
Example 57
["first",["union",["icd9","250.00",{"label":"diabetes"}],["icd9","401.9",{"label":"hypertension"}]]]
person_id | criterion_id | criterion_table | criterion_domain | start_date | end_date | source_value | source_vocabulary_id | label |
---|---|---|---|---|---|---|---|---|
1 | 1065 | clinical_codes | condition_occurrence | 2010-03-12 | 2010-03-13 | 4019 | ICD9CM | hypertension |
2 | 57517 | clinical_codes | condition_occurrence | 2009-03-30 | 2009-03-30 | 4019 | ICD9CM | hypertension |
3 | 93400 | clinical_codes | condition_occurrence | 2009-11-17 | 2009-11-17 | 25000 | ICD9CM | diabetes |
5 | 38310 | clinical_codes | condition_occurrence | 2008-12-14 | 2008-12-14 | 4019 | ICD9CM | hypertension |
6 | 76645 | clinical_codes | condition_occurrence | 2010-02-23 | 2010-02-23 | 4019 | ICD9CM | hypertension |
7 | 15942 | clinical_codes | condition_occurrence | 2008-04-23 | 2008-04-23 | 25000 | ICD9CM | diabetes |
8 | 55867 | clinical_codes | condition_occurrence | 2008-02-27 | 2008-02-27 | 4019 | ICD9CM | hypertension |
9 | 27621 | clinical_codes | condition_occurrence | 2009-10-12 | 2009-10-19 | 4019 | ICD9CM | hypertension |
11 | 73103 | clinical_codes | condition_occurrence | 2008-06-17 | 2008-06-17 | 25000 | ICD9CM | diabetes |
12 | 34937 | clinical_codes | condition_occurrence | 2008-09-17 | 2008-09-17 | 4019 | ICD9CM | hypertension |
If a algorithm is particularly complex, or has a stream of records that are used more than once, it can be helpful to break the algorithm into a set of sub-algorithms. This can be done using the label
options and the Recall
operator. Any operator that has a label can be accessed via the Recall
operator.
- Takes 1 argument
- The "label" of an operator from which you'd like to pull the exact same set of records
A stream must be have a label applied to it before Recall
can use it.
Example 58 - Save away a stream of records to build the 1 inpatient, 2 outpatient pattern used in claims data algorithms
[["one_in_two_out",["except",{"left":["union",["icd9","330.0","330.1","330.2","330.3","330.8","330.9","331.0","331.11","331.19","331.2","331.3","331.4","331.5","331.6","331.7","331.81","331.82","331.83","331.89","331.9","332.0","333.4","333.5","333.7","333.71","333.72","333.79","333.85","333.94","334.0","334.1","334.2","334.3","334.4","334.8","334.9","335.0","335.10","335.11","335.19","335.20","335.21","335.22","335.23","335.24","335.29","335.8","335.9","338.0","340","341.0","341.1","341.20","341.21","341.22","341.8","341.9","345.00","345.01","345.10","345.11","345.2","345.3","345.40","345.41","345.50","345.51","345.60","345.61","345.70","345.71","345.80","345.81","345.90","345.91","347.00","347.01","347.10","347.11","649.40","649.41","649.42","649.43","649.44","768.70","768.71","768.72","768.73","780.31","780.32","780.33","780.39","784.3"],["icd10cm","E75.00","E75.01","E75.02","E75.09","E75.10","E75.11","E75.19","E75.23","E75.25","E75.26","E75.29","E75.4","F84.2","G10","G11.0","G11.1","G11.2","G11.3","G11.4","G11.8","G11.9","G12.0","G12.1","G12.20","G12.21","G12.22","G12.23","G12.24","G12.25","G12.29","G12.8","G12.9","G13.2","G13.8","G20","G21.4","G24.01","G24.02","G24.09","G24.2","G24.8","G25.4","G25.5","G25.81","G30.0","G30.1","G30.8","G30.9","G31.01","G31.09","G31.1","G31.2","G31.81","G31.82","G31.83","G31.84","G31.85","G31.89","G31.9","G32.81","G35","G36.1","G36.8","G36.9","G37.0","G37.1","G37.2","G37.3","G37.4","G37.5","G37.8","G37.9","G40.001","G40.009","G40.011","G40.019","G40.101","G40.109","G40.111","G40.119","G40.201","G40.209","G40.211","G40.219","G40.301","G40.309","G40.311","G40.319","G40.401","G40.409","G40.411","G40.419","G40.501","G40.509","G40.801","G40.802","G40.803","G40.804","G40.811","G40.812","G40.813","G40.814","G40.821","G40.822","G40.823","G40.824","G40.89","G40.901","G40.909","G40.911","G40.919","G40.A01","G40.A09","G40.A11","G40.A19","G40.B01","G40.B09","G40.B11","G40.B19","G47.411","G47.419","G47.421","G47.429","G80.3","G89.0","G91.0","G91.1","G91.2","G91.3","G91.4","G91.8","G91.9","G93.7","G93.89","G93.9","G94","O99.350","O99.351","O99.352","O99.353","O99.354","O99.355","P91.60","P91.61","P91.62","P91.63","R41.0","R41.82","R47.01","R56.00","R56.01","R56.1","R56.9"],{"label":"neuro dxs"}],"right":["co_reported",["recall","neuro dxs"],["drg","020","021","022","023","024","025","026","027","028","029","030","031","032","033","034","035","036","037","038","039","040","041","042","052","053","054","055","056","057","058","059","060","061","062","063","064","065","066","067","068","069","070","071","072","073","074","075","076","077","078","079","080","081","082","083","084","085","086","087","088","089","090","091","092","093","094","095","096","097","098","099","100","101","102","103"]]}],{"outpatient_event_to_return":"Confirming Event","outpatient_minimum_gap":"30d","inpatient_return_date":"Discharge Date","outpatient_maximum_gap":"365d"}]]
person_id | criterion_id | criterion_domain | start_date | end_date | source_value |
---|---|---|---|---|---|
2 | 397 | condition_occurrence | 2009-04-18 | 2009-04-18 | 7843 |
7 | 27644 | condition_occurrence | 2008-09-21 | 2008-09-30 | 3320 |
14 | 35530 | condition_occurrence | 2009-07-04 | 2009-07-04 | 3310 |
15 | 60361 | condition_occurrence | 2010-09-01 | 2010-09-01 | 78039 |
19 | 63713 | condition_occurrence | 2009-01-09 | 2009-01-09 | 3310 |
28 | 131 | condition_occurrence | 2008-04-30 | 2008-04-30 | 3310 |
35 | 34819 | condition_occurrence | 2009-05-02 | 2009-05-02 | 340 |
40 | 33044 | condition_occurrence | 2010-05-31 | 2010-05-31 | 3319 |
42 | 72000 | condition_occurrence | 2008-08-28 | 2008-08-28 | 3310 |
44 | 39145 | condition_occurrence | 2008-08-31 | 2008-08-31 | 3310 |
These are operators that are currently implemented but are subject to change or removal in the future.
This is an inline filter operator. Passes along any record that contains a value_as_number
that falls between the given options. Records with NULL
for value_as_number
are discarded.
Like a temporal comparison operator, takes left and right hand streams and compares the value in the value_as_number
column. Passes along LHR where value_as_number
matches a RHR record.
LHR with NULL
for value_as_number
are discarded.
Here are some algorithms from OMOP's Health Outcomes of Interest turned into ConceptQL statements to give more examples. I truncated some of the sets of codes to help ensure the diagrams didn't get too large.
- ICD-9CM of 584
- AND
- ICD-9 procedure codes of 39.95 or 54.98 within 60 days after diagnosis
- AND NOT
- A diagnostic code of chronic dialysis any time before initial diagnosis
- V45.1, V56.0, V56.31, V56.32, V56.8
- A diagnostic code of chronic dialysis any time before initial diagnosis
Example 59
["during",{"left":["except",{"left":["icd9","584"],"right":["after",{"left":["icd9","584"],"right":["icd9","V45.1","V56.0","V56.31","V56.32","V56.8"]}]}],"right":["time_window",["icd9_procedure","39.95","54.98"],{"start":"0","end":"60d"}]}]
No Results found.
- Person Died
- And Occurrence of 410* prior to death
- And either
- MI diagnosis within 30 days prior to 410
- MI therapy within 60 days after 410
Example 60
["during",{"left":["before",{"left":["icd9","410*"],"right":["death"]}],"right":["union",["time_window",["union",["cpt","0146T","75898","82554","92980","93010","93233","93508","93540","93545"],["icd9_procedure","00.24","36.02","89.53","89.57","89.69"],["loinc","10839-9","13969-1","18843-3","2154-3","33204-9","48425-3","49259-5","6597-9","8634-8"]],{"start":"-30d","end":"0"}],["time_window",["union",["cpt","0146T","75898","82554","92980","93010","93233"],["icd9_procedure","00.24","36.02","89.53","89.57","89.69"]],{"start":"","end":"60d"}]]}]
No Results found.
ConceptQL was originally developed to query data from the OMOP Common Data Model (CDM) v4.0. This meant that ConceptQL was able to take simple statements like [ "icd9", "412" ]
and determine the proper table to query (in this case condition_occurrence
) along with the correct source_code and vocabulary_id to search for within that table.
As the OMOP CDM continued to evolve and as its ETL requirements continued to spiral, Outcomes Insights, Inc developed the GDM and ConceptQL was adapted to support querying both data models. Eventually support for OMOP's CDM was removed from ConceptQL because it was no longer used in any production applications of ConceptQL. However, this experience has demonstrated that ConceptQL can be adapted to support other data models should the need arise.
ConceptQL was originally implemented to work against PostgreSQL, but was adapted to work against MSSQL, Oracle, and Impala. We have since refocused ConceptQL to work against PostgreSQL, but we have shown that, with a modest amount of effort, ConceptQL can be made to support most RDBMS systems that support most features found in SQL:2003, such as window functions. Most SQL generated by ConceptQL is fairly "vanilla" in that it does not rely on special quirks or query hints specific to a particular RDBMS in order to function.