test/test.sh
ICEES API allows define custom schema and identifiers. The schema is stored at config/features.yml
. The identifiers are stored at config/identifiers.yml
. Edit them to fit your dataset.
ICEES API has the following assumptions:
- Each table named
<table>
should have a column named<Table>Id
as the identifier where<Table>
is<table>
capitalized. For example, for tablepatient
, the id column isPatientId
. - Each table has a column named
year
.
These columns do not need to be specified in features.yml
.
To insert data into the database when it is first created, for each table in the schema, create a directory with its table name under <data path>
, which is set by DATA_PATH
in .env
, and put csv files under that directory. The csv files should have the same headers as the table. For example, put patient.csv
under <data path>/patient
and put visit.csv
under <data path>/visit
.
To generate random samples, run
python samples.py patient 2010 1000 db/data/patient/patient.csv
python samples.py visit 2010 1000 db/data/visit/visit.csv
The .env
file contains environmental variables that control the services. Edit it to fit your application.
ICEES_PORT
: the database port in the container
ICEES_HOST
: the database host in the container
ICEES_API_LOG_PATH
: the path where logs are stored on the host
ICEES_API_HOST_PORT
: the port where icees api is listening to on the host
OPENAPI_TITLE
: the title for the OpenAPI schema (default "ICEES API")
OPENAPI_HOST
: the host where icees api is deployed
OPENAPI_SCHEME
: the protocol where icees api is deployed
OPENAPI_SERVER_MATURITY
: The server maturity (ie 'development' or 'production')
DB_PATH
: the path to the SQLite database file on the host
CONFIG_PATH
: the directory where schema and identifiers are stored
ICEES_API_INSTANCE_NAME
: icees api instance name
ICEES_INFORES_CURIE
: ICEES instance identifier (see https://docs.google.com/spreadsheets/d/1Ak1hRqlTLr1qa-7O0s5bqeTHukj9gSLQML1-lg6xIHM)
run
docker-compose up --build -d
docker build . -t icees-api:0.4.0
A feature qualifier limits values of a feature
<operator> ::= <
| >
| <=
| >=
| =
| <>`
<feature_qualifier> ::= {"operator":<operator>, "value":<value>}
| {"operator":"in", "values":[<value>, ..., <value>]}
| {"operator":"between", "value_a":<value>, "value_b":<value>}
There are two ways to specify a feature or a set of features, using a list or a dict. We show the schema for the former first, then show the schema for the latter.
<feature> ::= {
"feature_name": "<feature name>",
"feature_qualifier": <feature_qualifier>
[,"year": <year>]
}
where
feature name
: see config/features.yml
year
is optional. When year
is specified, it uses features from that year, otherwise it gets the year from context
Example:
{
"feature_name": "AgeStudyStart",
"feature_qualifier": {
"operator": "=",
"value": "0-2"
}
}
<features> ::= [<feature>, ..., <feature>]
Example:
[{
"feature_name": "AgeStudyStart",
"feature_qualifier": {
"operator": "=",
"value": "0-2"
}
}, {
"feature_name": "ObesityBMI",
"feature_qualifier": {
"operator": "=",
"value": 0
}
}]
In the apis that allow aggregation of bins, we can specify multiple feature qualifiers for each feature.
<feature2> ::= {
"feature_name": "<feature name>",
"feature_qualifiers": [<feature_qualifiere>, ..., <feature_qualifier>]
[,"year": <year>]
}
Example:
{
"feature_name": "AgeStudyStart",
"feature_qualifiers": [
{
"operator":"=",
"value":"0-2"
}, {
"operator":"between",
"value_a":"3-17",
"value_b":"18-34"
}, {
"operator":"in",
"values":["35-50","51-69"]
}, {
"operator":"=",
"value":"70+"
}
]
}
Similarly for a set of features
<features2> ::= [<feature2>, ..., <feature2>]
Example:
[{
"feature_name": "AgeStudyStart",
"feature_qualifiers": [
{
"operator":"=",
"value":"0-2"
}, {
"operator":"between",
"value_a":"3-17",
"value_b":"18-34"
}, {
"operator":"in",
"values":["35-50","51-69"]
},{
"operator":"=",
"value":"70+"
}
]
}, {
"feature_name": "EstResidentialDensity",
"feature_qualifiers": [
{
"operator": "<",
"value": 1
}
]
}]
in
and between
are currently only supported in <feature2>
.
Now, we turn to define a feature or a feature set using a dict.
<feature> ::= {"<feature name>": <feature_qualifier>}
<features> ::= {"<feature name>": <feature_qualifier>, ..., "<feature name>": <feature_qualifier>}
<feature2> ::= {"<feature name>": [<feature_qualifier>, ..., <feature_qualifier>]}
<features2> ::= {"<feature name>": [<feature_qualifier>, ..., <feature_qualifier>], ..., "<feature name>": [<feature_qualifier>, ..., <feature_qualifier>]}
method
POST
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort
schema
<features>
method
GET
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>
method
GET
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>/features
method
GET
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/dictionary
method
POST
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>/feature_association
schema
{"feature_a":<feature>,"feauture_b":<feature>}
method
POST
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>/feature_association2
schema
{"feature_a":<feature2>,"feature_b":<feature2>[,"check_coverage_is_full":<boolean>]}
example
{
"feature_a": {
"feature_name": "AgeStudyStart",
"feature_qualifiers": [
{
"operator":"=",
"value":"0-2"
}, {
"operator":"between",
"value_a":"3-17",
"value_b":"18-34"
}, {
"operator":"in",
"values":["35-50","51-69"]
},{
"operator":"=",
"value":"70+"
}
]
},
"feature_b": {
"feature_name": "ObesityBMI",
"feature_qualifiers": [
{
"operator":"=",
"value":0
}, {
"operator":"<>",
"value":0
}
]
}
}
method
POST
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>/associations_to_all_features
schema
{
"feature": <feature>,
"maximum_p_value": <maximum p value>,
"correction": {
"method": <correction method>
[,"alpha": <correction alpha>]
}
}
where correction
is optional, alpha
is optional. method
and alpha
are specified here: https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html
method
POST
route
/(patient|visit)/(2010|2011|2012|2013|2014|2015|2016)/cohort/<cohort id>/associations_to_all_features2
schema
{
"feature": <feature>,
"maximum_p_value": <maximum p value>
[,"check_coverage_is_full": <boolean>],
"correction": {
"method": <correction method>
[,"alpha": <correction alpha>]
}
}
where correction
is optional, alpha
is optional. method
and alpha
are specified here: https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html
example
{
"feature":{
"AgeStudyStart":[
{
"operator":"=",
"value":"0-2"
}, {
"operator":"between",
"value_a":"3-17",
"value_b":"18-34"
}, {
"operator":"in",
"values":["35-50","51-69"]
},{
"operator":"=",
"value":"70+"
}
]
},
"maximum_p_value": 0.1
}
method
POST
route
/knowledge_graph?reasoner=&verbose=
input parameters:
query_options
table
: ICEES tableyear
: ICEES yearcohort_features
: features for defining the cohortfeature
: a feature and operator and value for spliting the cohort to two subcohortsmaximum_p_value
: ICEES maximum p value. The p value is calculated for each ICEES feature intable
, using 2 * n contingency table where the rows are subcohorts and the columns are individual values of that feature. Any feature with p value greater than maximum p value is filtered out.regex
: filter target node name by regex.
if reasoner
is specified, then it returns a Reason API response.
example
{
"query_options": {
"table": "patient",
"year": 2010,
"cohort_features": {
"AgeStudyStart": {
"operator": "=",
"value": "0-2"
}
},
"feature": {
"EstResidentialDensity": {
"operator": "<",
"value": 1
}
},
"maximum_p_value":1
},
"message": {
"query_graph": {
"nodes": {
"n00": {
"categories": ["biolink:PopulationOfIndividualOrganisms"]
},
"n01": {
"categories": ["biolink:ChemicalSubstance"]
}
},
"edges": {
"e00": {
"predicates": ["biolink:correlated_with"],
"subject": "n00",
"object": "n01"
}
}
}
}
}
method
POST
route
/knowledge_graph_overlay?reasoner=&verbose=
input parameters:
<query_options> ::= {
"table": <string>,
"year": <integer>,
"cohort_features": <features>
}
| {
"cohort_id": <string>
}
{
"query_options": <query_options>,
"message": {
"knowledge_graph": <knowledge_graph>
}
}
method
POST
route
/query?reasoner=&verbose=
if reasoner
is specified, then it returns a Reason API response.
input parameters:
{
"query_options": <query_options>,
"message": {
"query_graph": <query_graph>
}
}
get cohort of all patients
curl -k -XPOST https://localhost:8080/patient/2010/cohort -H "Content-Type: application/json" -H "Accept: application/json" -d '{}'
get cohort of all patients active in a year
curl -k -XPOST https://localhost:8080/patient/2010/cohort -H "Content-Type: application/json" -H "Accept: application/json" -d '[{
"feature_name": "Active_In_Year",
"feature_qualifier": {
"operator": "=",
"value": 1
}
}]'
get cohort of patients with AgeStudyStart = 0-2
curl -k -XPOST https://localhost:8080/patient/2010/cohort -H "Content-Type: application/json" -H "Accept: application/json" -d '[{
"feature_name": "AgeStudyStart",
"feature_qualifier": {
"operator":"=",
"value":"0-2"
}
}]'
Assuming we have cohort id COHORT:10
get definition of cohort
curl -k -XGET https://localhost:8080/patient/2010/cohort/COHORT:10 -H "Accept: application/json"
get features of cohort
curl -k -XGET https://localhost:8080/patient/2010/cohort/COHORT:10/features -H "Accept: application/json"
get cohort dictionary
curl -k -XGET https://localhost:8080/patient/2010/cohort/COHORT:10/features -H "Accept: application/json"
get feature association
curl -k -XPOST https://localhost:8080/patient/2010/cohort/COHORT:10/feature_association -H "Content-Type: application/json" -d '{
"feature_a": {
"feature_name": "AgeStudyStart",
"feature_qualifier: {"operator":"=", "value":"0-2"}
},
"feature_b": {
"feature_name": "ObesityBMI",
"feature_qualifier": {"operator":"=", "value":0}
}
}'
get association to all features
curl -k -XPOST https://localhost:8080/patient/2010/cohort/COHORT:10/associations_to_all_features -H "Content-Type: application/json" -d '{
"feature": {
"feature_name": "AgeStudyStart",
"feature_qualifier": {"operator":"=", "value":"0-2"}
},
"maximum_p_value":0.1
}' -H "Accept: application/json"
knowledge graph
curl -X POST -k "http://localhost:5000/knowledge_graph" -H "accept: application/json" -H "Content-Type: application/json" -d '
{
"query_options": {
"table": "patient",
"year": 2010,
"cohort_features": {
"AgeStudyStart": {
"operator": "=",
"value": "0-2"
}
},
"feature": {
"EstResidentialDensity": {
"operator": "<",
"value": 1
}
},
"maximum_p_value":1
},
"message": {
"query_graph": {
"nodes": {
"n00": {
"categories": ["biolink:PopulationOfIndividualOrganisms"]
},
"n01": {
"categories": ["biolink:ChemicalSubstance"]
}
},
"edges": {
"e00": {
"predicates": ["biolink:correlated_with"],
"subject": "n00",
"object": "n01"
}
}
}
}
}
'
knowledge graph schema
curl -X GET -k "http://localhost:5000/knowledge_graph/schema" -H "accept: application/json"
The qc tool is under the qctool
directory. The following commands are run in the qctool
directory
pip install -r requirements.txt
Example:
python src/qc.py \
--a_type features \
--a ../config/all_features.yaml \
--b_type mapping \
--b ../config/FHIR_mappings.yml \
--update_a ../config/all_features_update.yaml \
--update_b ../config/FHIR_mappings.yml \
--number_entries 10 \
--similarity_threshold 0.5 \
--table patient visit \
--ignore_suffix Table _flag_first _flag_last
Usage:
python src/qc.py --help