Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic-iv/concepts: fix postgres-make-concepts and minor updates #1363

Merged

Conversation

schu
Copy link
Contributor

@schu schu commented Aug 11, 2022

Please see the individual commits for context.

Currently, `REGEX_DATETIME_DIFF` doesn't match for all occurrences of
`DATETIME_DIFF` and `postgres-make-concepts.sql` errors out. For example:

```
psql:demographics/age.sql:30: ERROR:  column "year" does not exist
LINE 22: ...mittime, DATETIME(pa.anchor_year, 1, 1, 0, 0, 0), YEAR) + pa..
```

Also, `PERL_REGEX_ROUND` doesn't correctly match `ROUND(...)` with
nested functions, for example with a nested `DATETIME_DIFF`:

```
psql:demographics/icustay_detail.sql:33: ERROR:  syntax error at or near "as"
LINE 17: , ROUND( CAST( DATETIME_DIFF(ie.outtime as numeric),ie.intim...
```

Update both regular expressions accordingly.
Since 8ed4060 the relation name prefix
is `mimiciv_`.
The following auto generated files for PostgreSQL were not added to the
repo yet:

postgres/medication/milrinone.sql
postgres/medication/norepinephrine_equivalent_dose.sql
postgres/medication/vasoactive_agent.sql
Some concepts have a dependency on others, for example
`norepinephrine_equivalent_dose.sql` expects
`mimiciv_derived.vasoactive_agent` to exist.

Fixup for 43a4245
export REGEX_DATETIME_TRUNC="s/DATETIME_TRUNC\(([^,]+), ?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATE_TRUNC('\2', \1)/g"
# Add necessary quotes to INTERVAL, e.g. "INTERVAL 5 hour" to "INTERVAL '5' hour"
export REGEX_INTERVAL="s/interval ([[:digit:]]+) (hour|day|month|year)/INTERVAL '\1' \2/gI"
# Add numeric cast to ROUND(), e.g. "ROUND(1.234, 2)" to "ROUND( CAST(1.234 as numeric), 2)".
export PERL_REGEX_ROUND='s/ROUND\(((.|\n)*?)\, /ROUND\( CAST\( \1 as numeric\)\,/g'
export PERL_REGEX_ROUND='s/ROUND\(((.|\n)*?)(\, [24]\))/ROUND\( CAST\( \1 as numeric\)\3/g'
Copy link
Contributor Author

@schu schu Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex is rather fragile.

Would it be possible to always ROUND(CAST(... as numeric), ...) in the BigQuery SQL code to make the transformation for postgres unnecessary? For example

, CASE WHEN uo_tm_6hr >= 6 THEN ROUND(CAST((ur.urineoutput_6hr/wd.weight/uo_tm_6hr) AS NUMERIC), 4) END AS uo_mlkghr_6hr
does CAST. In the postgres code this leads to ROUND(CAST(CAST(... as numeric) as numeric), 4).

Ideas welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these regexes are in general terrifying.... since we're starting to come up with non-standard rules for the SQL, we should probably put them somewhere. First thought is the style guide might be a reasonable place (albeit not perfect).

I think the change you're suggesting makes sense, and I'd be happy to get rid of the perl regex, so I'll investigate by bringing this into a branch (tests won't run successfully on external PRs).

@alistairewj
Copy link
Member

Thanks for the changes (and the other ones as well!). Which OS are you running this on? For the regexes, they are quite brittle as I'd like to support two common flavours of shell:

  • BSD sed and bash 3.x (shipped on Mac OS X)
  • GNU sed and bash 4.x+ (shipped on Ubuntu)

Trying to allow for this compatibility eliminates the possibility of using some common syntax.

@schu
Copy link
Contributor Author

schu commented Aug 15, 2022

Which OS are you running this on?

Fedora 36 with Postgres 14.3 (the latest version in dnf).

@schu
Copy link
Contributor Author

schu commented Sep 29, 2022

Hey, any news? :)

I'd be happy to get rid of the perl regex

If you want, I can test and push an alternative patch and drop the regex.

@alistairewj
Copy link
Member

I got way too distracted by other things and the MySQL container build - but yes especially with people running into issues (e.g. #1393) this should definitely be fixed... making some changes now

@alistairewj
Copy link
Member

OK this looks to work well - the checks will fail since they're running on a fork which is unprivileged.. it's a TODO of mine to make PRs trigger a privileged workflow rather than trying to run the tests themselves.

I tested it locally and built all the tables successfully (BSD sed over here). @schu can you give convert_bigquery_to_postgres.sh and postgres-make-concepts.sql a run with the latest commits pulled down? If it works for you, then we should be good to merge.

@alistairewj
Copy link
Member

I'll merge this as I'll test the functionality with gh actions

@alistairewj alistairewj merged commit 5d0e11d into MIT-LCP:main Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants