Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic-iv/concepts: fix postgres-make-concepts and minor updates #1363

Merged
2 changes: 1 addition & 1 deletion mimic-iv/concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This folder contains scripts to generate useful abstractions of raw MIMIC-IV data ("concepts").
The scripts are written using the **BigQuery Standard SQL Dialect**. Concepts are categorized into folders if possible, otherwise they remain in the top-level directory. The [postgres](/mimic-iv/concepts/postgres) subfolder contains automatically generated PostgreSQL versions of these scripts; [see below for how these were generated](#postgresql-concepts). Concepts are categorized into folders if possible, otherwise they remain in the top-level directory.

The concepts are organized into individual SQL scripts, with each script generating a table. The BigQuery `mimic_derived` dataset under `physionet-data` contains the concepts pregenerated. Access to this dataset is available to MIMIC-IV approved users: see the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/) on how to access MIMIC-IV on BigQuery (which includes the derived concepts).
The concepts are organized into individual SQL scripts, with each script generating a table. The BigQuery `mimiciv_derived` dataset under `physionet-data` contains the concepts pregenerated. Access to this dataset is available to MIMIC-IV approved users: see the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/) on how to access MIMIC-IV on BigQuery (which includes the derived concepts).

* [List of the concept folders and their content](#concept-index)
* [Generating the concept tables on BigQuery](#generating-the-concepts-on-bigquery)
Expand Down
4 changes: 2 additions & 2 deletions mimic-iv/concepts/convert_bigquery_to_postgres.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
# String replacements are necessary for some queries.
export REGEX_SCHEMA='s/`physionet-data.(mimiciv_hosp|mimiciv_icu|mimiciv_derived).([A-Za-z0-9_-]+)`/\1.\2/g'
# Note that these queries are very senstive to changes, e.g. adding whitespaces after comma can already change the behavior.
export REGEX_DATETIME_DIFF="s/DATETIME_DIFF\(([^,]+), ?([^,]+), ?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATETIME_DIFF(\1, \2, '\3')/g"
export REGEX_DATETIME_DIFF="s/DATETIME_DIFF\(([^,]+), ?(.*), ?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATETIME_DIFF(\1, \2, '\3')/g"
export REGEX_DATETIME_TRUNC="s/DATETIME_TRUNC\(([^,]+), ?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATE_TRUNC('\2', \1)/g"
# Add necessary quotes to INTERVAL, e.g. "INTERVAL 5 hour" to "INTERVAL '5' hour"
export REGEX_INTERVAL="s/interval ([[:digit:]]+) (hour|day|month|year)/INTERVAL '\1' \2/gI"
# Add numeric cast to ROUND(), e.g. "ROUND(1.234, 2)" to "ROUND( CAST(1.234 as numeric), 2)".
export PERL_REGEX_ROUND='s/ROUND\(((.|\n)*?)\, /ROUND\( CAST\( \1 as numeric\)\,/g'
export PERL_REGEX_ROUND='s/ROUND\(((.|\n)*?)(\, [24]\))/ROUND\( CAST\( \1 as numeric\)\3/g'
Copy link
Contributor Author

@schu schu Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex is rather fragile.

Would it be possible to always ROUND(CAST(... as numeric), ...) in the BigQuery SQL code to make the transformation for postgres unnecessary? For example

, CASE WHEN uo_tm_6hr >= 6 THEN ROUND(CAST((ur.urineoutput_6hr/wd.weight/uo_tm_6hr) AS NUMERIC), 4) END AS uo_mlkghr_6hr
does CAST. In the postgres code this leads to ROUND(CAST(CAST(... as numeric) as numeric), 4).

Ideas welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these regexes are in general terrifying.... since we're starting to come up with non-standard rules for the SQL, we should probably put them somewhere. First thought is the style guide might be a reasonable place (albeit not perfect).

I think the change you're suggesting makes sense, and I'd be happy to get rid of the perl regex, so I'll investigate by bringing this into a branch (tests won't run successfully on external PRs).

# Specific queries for some problems that arose with some files.
export REGEX_INT="s/CAST\(hr AS INT64\)/CAST\(hr AS bigint\)/g"
export REGEX_ARRAY="s/GENERATE_ARRAY\(-24, CEIL\(DATETIME\_DIFF\(it\.outtime_hr, it\.intime_hr, HOUR\)\)\)/ARRAY\(SELECT \* FROM generate\_series\(-24, CEIL\(DATETIME\_DIFF\(it\.outtime_hr, it\.intime_hr, HOUR\)\)\)\)/g"
Expand Down
2 changes: 1 addition & 1 deletion mimic-iv/concepts/make_concepts.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# This script generates the concepts in the BigQuery table mimic_derived.
# This script generates the concepts in the BigQuery table mimiciv_derived.
export TARGET_DATASET=mimiciv_derived

# specify bigquery query command options
Expand Down
4 changes: 2 additions & 2 deletions mimic-iv/concepts/measurement/oxygen_delivery.sql
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ with ce_stg1 as
, itemid
, value AS o2_device
, ROW_NUMBER() OVER (PARTITION BY subject_id, charttime, itemid ORDER BY value) as rn
FROM mimic_icu.chartevents
FROM mimiciv_icu.chartevents
WHERE itemid = 226732 -- oxygen delivery device(s)
)
, stg AS
Expand Down Expand Up @@ -95,4 +95,4 @@ SELECT
, MAX(CASE WHEN rn = 4 THEN o2_device ELSE NULL END) AS o2_delivery_device_4
FROM stg
GROUP BY subject_id, charttime
;
;
6 changes: 3 additions & 3 deletions mimic-iv/concepts/postgres/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ scripts are intended to be run against the MIMIC-IV data in a PostgreSQL databas
To generate concepts, change to this directory and run `psql`. Then within psql, run:

```sql
-- NOTE: many scripts *require* you to use mimic_derived as the schema for outputting concepts
-- NOTE: many scripts *require* you to use mimiciv_derived as the schema for outputting concepts
-- change the search path at your peril!
set search_path to mimic_derived, mimic_core, mimic_hosp, mimic_icu, mimic_ed;
set search_path to mimiciv_derived, mimiciv_hosp, mimiciv_icu, mimiciv_ed;
\i postgres-functions.sql -- only needs to be run once
\i postgres-make-concepts.sql
```

... or, execute the SQL files in your GUI of choice.
... or, execute the SQL files in your GUI of choice.
24 changes: 12 additions & 12 deletions mimic-iv/concepts/postgres/comorbidity/first_day_lab.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ WITH cbc AS
, MAX(platelet) as platelets_max
, MIN(wbc) as wbc_min
, MAX(wbc) as wbc_max
FROM mimic_icu.icustays ie
LEFT JOIN mimic_derived.complete_blood_count le
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.complete_blood_count le
ON le.subject_id = ie.subject_id
AND le.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND le.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
Expand All @@ -34,8 +34,8 @@ WITH cbc AS
, MIN(glucose) AS glucose_min, MAX(glucose) AS glucose_max
, MIN(sodium) AS sodium_min, MAX(sodium) AS sodium_max
, MIN(potassium) AS potassium_min, MAX(potassium) AS potassium_max
FROM mimic_icu.icustays ie
LEFT JOIN mimic_derived.chemistry le
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.chemistry le
ON le.subject_id = ie.subject_id
AND le.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND le.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
Expand All @@ -55,8 +55,8 @@ WITH cbc AS
, MIN(immature_granulocytes) AS immature_granulocytes_min, MAX(immature_granulocytes) AS immature_granulocytes_max
, MIN(metamyelocytes) AS metamyelocytes_min, MAX(metamyelocytes) AS metamyelocytes_max
, MIN(nrbc) AS nrbc_min, MAX(nrbc) AS nrbc_max
FROM mimic_icu.icustays ie
LEFT JOIN mimic_derived.blood_differential le
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.blood_differential le
ON le.subject_id = ie.subject_id
AND le.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND le.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
Expand All @@ -72,8 +72,8 @@ WITH cbc AS
, MIN(inr) AS inr_min, MAX(inr) AS inr_max
, MIN(pt) AS pt_min, MAX(pt) AS pt_max
, MIN(ptt) AS ptt_min, MAX(ptt) AS ptt_max
FROM mimic_icu.icustays ie
LEFT JOIN mimic_derived.coagulation le
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.coagulation le
ON le.subject_id = ie.subject_id
AND le.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND le.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
Expand All @@ -95,8 +95,8 @@ WITH cbc AS
, MIN(ck_mb) AS ck_mb_min, MAX(ck_mb) AS ck_mb_max
, MIN(ggt) AS ggt_min, MAX(ggt) AS ggt_max
, MIN(ld_ldh) AS ld_ldh_min, MAX(ld_ldh) AS ld_ldh_max
FROM mimic_icu.icustays ie
LEFT JOIN mimic_derived.enzyme le
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.enzyme le
ON le.subject_id = ie.subject_id
AND le.charttime >= DATETIME_SUB(ie.intime, INTERVAL '6' HOUR)
AND le.charttime <= DATETIME_ADD(ie.intime, INTERVAL '1' DAY)
Expand Down Expand Up @@ -153,7 +153,7 @@ ie.subject_id
, ck_mb_min, ck_mb_max
, ggt_min, ggt_max
, ld_ldh_min, ld_ldh_max
FROM mimic_icu.icustays ie
FROM mimiciv_icu.icustays ie
LEFT JOIN cbc
ON ie.stay_id = cbc.stay_id
LEFT JOIN chem
Expand All @@ -164,4 +164,4 @@ LEFT JOIN coag
ON ie.stay_id = coag.stay_id
LEFT JOIN enz
ON ie.stay_id = enz.stay_id
;
;
2 changes: 1 addition & 1 deletion mimic-iv/concepts/postgres/demographics/age.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ SELECT
, ad.admittime
, pa.anchor_age
, pa.anchor_year
, DATETIME_DIFF(ad.admittime, DATETIME(pa.anchor_year, 1, 1, 0, 0, 0), YEAR) + pa.anchor_age AS age
, DATETIME_DIFF(ad.admittime, DATETIME(pa.anchor_year, 1, 1, 0, 0, 0), 'YEAR') + pa.anchor_age AS age
FROM mimiciv_hosp.admissions ad
INNER JOIN mimiciv_hosp.patients pa
ON ad.subject_id = pa.subject_id
Expand Down
4 changes: 2 additions & 2 deletions mimic-iv/concepts/postgres/demographics/icustay_detail.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ SELECT ie.subject_id, ie.hadm_id, ie.stay_id
-- hospital level factors
, adm.admittime, adm.dischtime
, DATETIME_DIFF(adm.dischtime, adm.admittime, 'DAY') as los_hospital
, DATETIME_DIFF(adm.admittime, DATETIME(pat.anchor_year, 1, 1, 0, 0, 0), YEAR) + pat.anchor_age as admission_age
, DATETIME_DIFF(adm.admittime, DATETIME(pat.anchor_year, 1, 1, 0, 0, 0), 'YEAR') + pat.anchor_age as admission_age
, adm.race
, adm.hospital_expire_flag
, DENSE_RANK() OVER (PARTITION BY adm.subject_id ORDER BY adm.admittime) AS hospstay_seq
Expand All @@ -18,7 +18,7 @@ SELECT ie.subject_id, ie.hadm_id, ie.stay_id

-- icu level factors
, ie.intime as icu_intime, ie.outtime as icu_outtime
, ROUND( CAST( DATETIME_DIFF(ie.outtime as numeric),ie.intime, 'HOUR')/24.0, 2) as los_icu
, ROUND( CAST( DATETIME_DIFF(ie.outtime, ie.intime, 'HOUR')/24.0 as numeric), 2) as los_icu
, DENSE_RANK() OVER (PARTITION BY ie.hadm_id ORDER BY ie.intime) AS icustay_seq

-- first ICU stay *for the current hospitalization*
Expand Down
2 changes: 1 addition & 1 deletion mimic-iv/concepts/postgres/firstday/first_day_height.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ WITH ce AS
SELECT
ie.subject_id
, ie.stay_id
, ROUND( CAST( AVG(height) as numeric),2) AS height
, ROUND( CAST( AVG(height) as numeric), 2) AS height
FROM mimiciv_icu.icustays ie
LEFT JOIN mimiciv_derived.height ht
ON ie.stay_id = ht.stay_id
Expand Down
10 changes: 5 additions & 5 deletions mimic-iv/concepts/postgres/measurement/blood_differential.sql
Original file line number Diff line number Diff line change
Expand Up @@ -100,27 +100,27 @@ subject_id, hadm_id, charttime, specimen_id
WHEN basophils_abs IS NULL AND basophils IS NOT NULL AND impute_abs = 1
THEN basophils * wbc / 100
ELSE basophils_abs
END as numeric),4) AS basophils_abs
END as numeric), 4) AS basophils_abs
, ROUND( CAST( CASE
WHEN eosinophils_abs IS NULL AND eosinophils IS NOT NULL AND impute_abs = 1
THEN eosinophils * wbc / 100
ELSE eosinophils_abs
END as numeric),4) AS eosinophils_abs
END as numeric), 4) AS eosinophils_abs
, ROUND( CAST( CASE
WHEN lymphocytes_abs IS NULL AND lymphocytes IS NOT NULL AND impute_abs = 1
THEN lymphocytes * wbc / 100
ELSE lymphocytes_abs
END as numeric),4) AS lymphocytes_abs
END as numeric), 4) AS lymphocytes_abs
, ROUND( CAST( CASE
WHEN monocytes_abs IS NULL AND monocytes IS NOT NULL AND impute_abs = 1
THEN monocytes * wbc / 100
ELSE monocytes_abs
END as numeric),4) AS monocytes_abs
END as numeric), 4) AS monocytes_abs
, ROUND( CAST( CASE
WHEN neutrophils_abs IS NULL AND neutrophils IS NOT NULL AND impute_abs = 1
THEN neutrophils * wbc / 100
ELSE neutrophils_abs
END as numeric),4) AS neutrophils_abs
END as numeric), 4) AS neutrophils_abs

, basophils
, eosinophils
Expand Down
4 changes: 2 additions & 2 deletions mimic-iv/concepts/postgres/measurement/height.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ WITH ht_in AS
SELECT
c.subject_id, c.stay_id, c.charttime
-- Ensure that all heights are in centimeters
, ROUND( CAST( c.valuenum * 2.54 as numeric),2) AS height
, ROUND( CAST( c.valuenum * 2.54 as numeric), 2) AS height
, c.valuenum as height_orig
FROM mimiciv_icu.chartevents c
WHERE c.valuenum IS NOT NULL
Expand All @@ -18,7 +18,7 @@ WITH ht_in AS
SELECT
c.subject_id, c.stay_id, c.charttime
-- Ensure that all heights are in centimeters
, ROUND( CAST( c.valuenum as numeric),2) AS height
, ROUND( CAST( c.valuenum as numeric), 2) AS height
FROM mimiciv_icu.chartevents c
WHERE c.valuenum IS NOT NULL
-- Height cm
Expand Down
4 changes: 2 additions & 2 deletions mimic-iv/concepts/postgres/measurement/oxygen_delivery.sql
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ with ce_stg1 as
, itemid
, value AS o2_device
, ROW_NUMBER() OVER (PARTITION BY subject_id, charttime, itemid ORDER BY value) as rn
FROM mimic_icu.chartevents
FROM mimiciv_icu.chartevents
WHERE itemid = 226732 -- oxygen delivery device(s)
)
, stg AS
Expand Down Expand Up @@ -97,4 +97,4 @@ SELECT
, MAX(CASE WHEN rn = 4 THEN o2_device ELSE NULL END) AS o2_delivery_device_4
FROM stg
GROUP BY subject_id, charttime
;
;
12 changes: 6 additions & 6 deletions mimic-iv/concepts/postgres/measurement/urine_output_rate.sql
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,13 @@ select
, ur.urineoutput_6hr
, ur.urineoutput_12hr
, ur.urineoutput_24hr
, CASE WHEN uo_tm_6hr >= 6 THEN ROUND( CAST( CAST((ur.urineoutput_6hr/wd.weight/uo_tm_6hr) AS NUMERIC) as numeric),4) END AS uo_mlkghr_6hr
, CASE WHEN uo_tm_12hr >= 12 THEN ROUND( CAST( CAST((ur.urineoutput_12hr/wd.weight/uo_tm_12hr) AS NUMERIC) as numeric),4) END AS uo_mlkghr_12hr
, CASE WHEN uo_tm_24hr >= 24 THEN ROUND( CAST( CAST((ur.urineoutput_24hr/wd.weight/uo_tm_24hr) AS NUMERIC) as numeric),4) END AS uo_mlkghr_24hr
, CASE WHEN uo_tm_6hr >= 6 THEN ROUND( CAST( CAST((ur.urineoutput_6hr/wd.weight/uo_tm_6hr) AS NUMERIC) as numeric), 4) END AS uo_mlkghr_6hr
, CASE WHEN uo_tm_12hr >= 12 THEN ROUND( CAST( CAST((ur.urineoutput_12hr/wd.weight/uo_tm_12hr) AS NUMERIC) as numeric), 4) END AS uo_mlkghr_12hr
, CASE WHEN uo_tm_24hr >= 24 THEN ROUND( CAST( CAST((ur.urineoutput_24hr/wd.weight/uo_tm_24hr) AS NUMERIC) as numeric), 4) END AS uo_mlkghr_24hr
-- time of earliest UO measurement that was used to calculate the rate
, ROUND( CAST( uo_tm_6hr as numeric),2) AS uo_tm_6hr
, ROUND( CAST( uo_tm_12hr as numeric),2) AS uo_tm_12hr
, ROUND( CAST( uo_tm_24hr as numeric),2) AS uo_tm_24hr
, ROUND( CAST( uo_tm_6hr as numeric), 2) AS uo_tm_6hr
, ROUND( CAST( uo_tm_12hr as numeric), 2) AS uo_tm_12hr
, ROUND( CAST( uo_tm_24hr as numeric), 2) AS uo_tm_24hr
from ur_stg ur
LEFT JOIN mimiciv_derived.weight_durations wd
ON ur.stay_id = wd.stay_id
Expand Down
2 changes: 1 addition & 1 deletion mimic-iv/concepts/postgres/measurement/vitalsign.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ select
, ROUND( CAST(
AVG(case when itemid in (223761) and valuenum > 70 and valuenum < 120 then (valuenum-32)/1.8 -- converted to degC in valuenum call
when itemid in (223762) and valuenum > 10 and valuenum < 50 then valuenum else null end)
as numeric),2) as temperature
as numeric), 2) as temperature
, MAX(CASE WHEN itemid = 224642 THEN value ELSE NULL END) AS temperature_site
, AVG(case when itemid in (220277) and valuenum > 0 and valuenum <= 100 then valuenum else null end) as spo2
, AVG(case when itemid in (225664,220621,226537) and valuenum > 0 then valuenum else null end) as glucose
Expand Down
12 changes: 12 additions & 0 deletions mimic-iv/concepts/postgres/medication/milrinone.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY.
DROP TABLE IF EXISTS milrinone; CREATE TABLE milrinone AS
-- This query extracts dose+durations of milrinone administration
select
stay_id, linkorderid
-- all rows in mcg/kg/min
, rate as vaso_rate
, amount as vaso_amount
, starttime
, endtime
from mimiciv_icu.inputevents
where itemid = 221986 -- milrinone
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY.
DROP TABLE IF EXISTS norepinephrine_equivalent_dose; CREATE TABLE norepinephrine_equivalent_dose AS
-- This query calculates norepinephrine equivalent dose for vasopressors.
-- Based on "Vasopressor dose equivalence: A scoping review and suggested formula"
-- by Goradia et al. 2020.
SELECT stay_id, starttime, endtime
-- calculate the dose
, ROUND( CAST( COALESCE(norepinephrine, 0)
+ COALESCE(epinephrine, 0)
+ COALESCE(phenylephrine/10, 0)
+ COALESCE(dopamine/100, 0)
-- + metaraminol/8 -- metaraminol not used in BIDMC
+ COALESCE(vasopressin*2.5, 0)
-- angotensin_ii*10 -- angitensin ii rarely used, currently not incorporated
-- (it could be included due to norepinephrine sparing effects)
as numeric), 4) AS norepinephrine_equivalent_dose
-- angotensin_ii*10 -- angitensin ii rarely used, currently not incorporated
-- (it could be included due to norepinephrine sparing effects)
FROM mimiciv_derived.vasoactive_agent
WHERE norepinephrine IS NOT NULL
OR epinephrine IS NOT NULL
OR phenylephrine IS NOT NULL
OR dopamine IS NOT NULL
OR vasopressin IS NOT NULL;
Loading