Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake numeric type issue with Ares Exporter - Achilles::exportToAres - for CDM version 5.4 #366

Open
heschmidt04 opened this issue Mar 15, 2024 · 1 comment

Comments

@heschmidt04
Copy link

Describe the bug
Ares Exporter fails on numeric type because temp table in memory has autosensing data type conversion issue. A search on github Ares issues didn't yield anything for me if this has happened before in export.

To Reproduce
Steps to reproduce the behavior:

  1. Run Achilles generated SQL in Snowflake with sqlonly = T to create Achilles tables
  2. Run DQD in RStudio - authentication via external browser to Snowflake
  3. Run Ares Exporter in RStudio - authentication via external browser to Snowflake
  4. Run modified SQL from RStudio Ares Exporter error errorReportSql.txt in Snowflake

Expected behavior
Ares Exporter to write files to output folder.
Is there a possible work around for this part?

Screenshots
DBMS: snowflake

Error: net.snowflake.client.jdbc.SnowflakeSQLException: Numeric value '' is not recognized

Stackoverflow: https://stackoverflow.com/questions/70176093/numeric-value-is-not-recognized

-- SQL modified to debug 
CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom -- WORKS  
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

-- Debugging work 
CREATE OR REPLACE TABLE AO_export_denom AS
SELECT 
    CAST(stratum_1 as bigint) stratum_1
    , count_value 
FROM achilles_results 
WHERE analysis_id = 117 
GROUP BY analysis_id
        , stratum_1
        , count_value

-- Debugging work 
CREATE OR REPLACE TABLE AO_export_t1 AS 
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
-- Putting debug together 

SELECT   
      t1.table_name as SERIES_NAME
    , t1.stratum_1 as X_CALENDAR_MONTH
    , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
FROM AO_export_t1 AS t1 
JOIN AO_export_denom AS denom 
  ON t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

The work around works because the datatype is auto sensed

-- AO_EXPORT_DENOM

name	type	kind
STRATUM_1	NUMBER(38,0)	COLUMN
COUNT_VALUE	NUMBER(38,0)	COLUMN

-- AO_EXPORT_T1

name	type	kind
TABLE_NAME	VARCHAR(20)	COLUMN
STRATUM_1	NUMBER(38,0)	COLUMN
COUNT_VALUE	NUMBER(38,0)	COLUMN

Desktop (please complete the following information):
R version: R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20

Attached base packages:

  • stats
  • graphics
  • grDevices
  • utils
  • datasets
  • methods
  • base

Other attached packages:

  • AresIndexer (0.2.0)
  • DataQualityDashboard (2.5.0)
  • Achilles (1.7.2)
  • DatabaseConnector (6.3.2)

Additional context
Files that were created from the process are these. No error.txt file in the errors directory.

Mar 15 17:06 dq-result_camel.json
Mar 15 17:06 log_DqDashboard_Snowflake-MGB-OMOP.txt -- binary file??? Not sure why.
Mar 15 17:06 dq-result.json
Mar 15 17:06 datadensity-total.csv
Mar 15 17:06 records-by-domain.csv

@thejasbh
Copy link

I encountered the same issue when exporting data to Ares. After some investigation, I discovered that using the TRY_CAST function in Snowflake was an effective solution. By implementing TRY_CAST across all relevant scripts in the Achilles::exportToAres function, I was able to resolve the issue and ensure smooth data exports to Ares.

However, it's important to note that while this solution works well for Snowflake, it may not be suitable for other databases that do not support the TRY_CAST function. This can lead to compatibility issues across different database systems. Therefore, to make the scripts work universally across all databases, a more complex fix might be necessary.

Example of the modification:

Original script :

CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

Script with TRY_CAST:

CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom   
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

Note: The reason TRY_CAST works in this context is because, as mentioned by @heschmidt04, Snowflake attempts to auto-detect data types in tables. The TRY_CAST function includes built-in error handling, allowing the script to continue executing even if type conversion fails, unlike CAST which may halt execution on errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants