HEAL · AJuehneNIH · Jan 31, 2024 · Jan 26, 2024 · Jan 26, 2024 · Jan 26, 2024
diff --git a/.github/workflows/vlmd_validation.yaml b/.github/workflows/vlmd_validation.yaml
@@ -26,7 +26,7 @@ jobs:
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip
-        pip install pytest jsonschema frictionless
+        pip install -r requirements.txt
     - name: Test with pytest
       run: |
         pytest
diff --git a/VERSIONS.json b/VERSIONS.json
@@ -1,4 +1,4 @@
 {
     "slmd":"1.0.0",
-    "vlmd":"0.2.0"
+    "vlmd":"0.3.0"
 }
diff --git a/requirements.txt b/requirements.txt
@@ -1,5 +1,4 @@
-git+https://github.com/norc-heal/json-schema-for-humans.git@develop
-frictionless
 jsonschema
 pytest
-jinja2
+jinja2
+pandas
diff --git a/variable-level-metadata-schema/README.md b/variable-level-metadata-schema/README.md
@@ -7,28 +7,29 @@ This metadata directory contains the specifications for variable level metadata
 ❗ Look here for schema specifications.
 
 ### json data dictionary format specification
-1. `schemas/jsonschema/data-dictionary.json`: The "json" json data dictionary schema (ie json template schema)
+
+1. `schemas/data-dictionary.json`: The "json" json data dictionary schema (ie json template schema)
     - Intended to specify the data dictionary representation of json objects available in the HEAL platform metadata-service.
-    - See here for the markdown rendered version --> [`docs/md-rendered-schemas/jsonschema-jsontemplate-data-dictionary.md`](docs/md-rendered-schemas/jsonschema-jsontemplate-data-dictionary.md)
+    - See here for the markdown rendered version --> [`docs/jsontemplate-data-dictionary.md`](docs/jsontemplate-data-dictionary.md)
 
 ### csv field format specifications
-- See here for the markdown rendered version --> [`docs/md-rendered-schemas/jsonschema-jsontemplate-data-dictionary.md`](docs/md-rendered-schemas/jsonschema-csvtemplate-fields.md)
+
+2. `schemas/csvtemplate/fields.json`: The "csv" json schema (ie csv template schema)
+
+- See here for the markdown rendered version --> [`docs/csvtemplate-fields.md`](docs/csvtemplate-fields.md)
 
 
-2. `schemas/frictionless/fields.json` Table schema (previously known as "frictionless") standard specification
-    - This json file is intended to represent csv data dictionary documents following the [Table Schema specification](https://specs.frictionlessdata.io/table-schema/).
-    - Csv version is intended to make data dictionary creation and discovery available in a more familiar/human readable format,
-    - The representation of data dictionary field values in a csv file. It's used to facilitate documentation of data dictionary csv 
-    files in addition to input validation. 
-3. `schemas/jsontemplate/fields.json`The "csv" json schema (ie csv template schema)
-    - :warning: The "csv" json schema is intended to be an intermediate specification used for documentation and in translation workflows to the json schema template. As fully specifying a tabular file (for example missing value specification) is out of scope here (see the table schema representation in (2))
+- Csv version is intended to make data dictionary creation and discovery available in a more familiar/human readable format,
+- The representation of data dictionary field values in a csv file. It's used to facilitate documentation of data dictionary csv 
+files in addition to input validation. 
+
+- :warning: The "csv" json schema is intended to be an intermediate specification used for documentation and in translation workflows to the json data dictionary. As fully specifying a tabular file (for example missing value specification) is out of scope here (see the table schema representation in (2))
+
 
 ## Document flow chart
 
 ```mermaid
 
-    %%{init: {"flowchart": {"defaultRenderer": "elk","htmlLabels": false}} }%%
-
     flowchart TD
 
         subgraph dictionary[Dictionary YAML files]
@@ -40,22 +41,21 @@ This metadata directory contains the specifications for variable level metadata
 
         subgraph Schema specifications
 
-            jsonspec["schema/jsontemplate/data-dictionary.json"]
-            csvspec["schema/jsontemplate/csvtemplate/fields.json"]
-            csvtblspec["schema/frictionless/csvtemplate/fields.json"]
+            jsonspec["schema/data-dictionary.json"]
+            csvspec["schema/csvtemplate/fields.json"]
         end
 
-        subgraph "Rendered schema documentation \n(html also available)"
+        subgraph "Rendered schema documentation"
 
-            csvmd["/docs/\nmd-rendered-schemas/\njsonschema-csvtemplate-fields.md"]
-            jsonmd["/docs/\nmd-rendered-schemas/\njsonschema-jsontemplate-data-dictionary.md"]
+            csvmd["/docs/csvtemplate-fields.md"]
+            jsonmd["/docs/jsontemplate-data-dictionary.md"]
 
         end
 
     defs --> fields --> dd
     defs --> dd
 
-    fields --> csvspec --> csvtblspec
+    fields --> csvspec
     dd --> jsonspec
 
     csvspec --> csvmd
@@ -68,9 +68,8 @@ This metadata directory contains the specifications for variable level metadata
 - `docs`: 
 See the rendered human readable schemas
 in a markdown format and an interactive html format.
-- `schemas/jsonschema`: contains the final and full specification for schemas following json schema.
-- `schemas/frictionless`: contains schemas following the frictionless table schema specifications. See [here](https://specs.frictionlessdata.io/table-schema/) for the specification. 
-- `schemas/dictionary`: the yaml files used to generate json schemas and documentation with build.py. 
+- `schemas/*.json`: contains the final and full specification for schemas following json schema. 
+- `schemas/dictionary/*.yaml`: the yaml files used to generate json schemas and documentation with build.py. 
 - `templates`: empty templates in csv spreadsheet format and JSON format. 
 - `examples`: exapmles of filled out templates in csv spreadsheet format and JSON format.
 - `build.py`: This script compiles the yaml files and generates associated  schemas in addition to the human rendered schema
@@ -104,8 +103,8 @@ Given csv field values can only be scalar values with records separated by a new
     - if type `object` in `items`: flattened to the children property or properties
     - if type is a scalar (`string`,`integer`,`number`) in `items`,
      translated to type `string` with pattern `^(?:[^|]+\||[^|]*)(?:[^|]*\|)*[^|]*$` to indicate a string containing a pipe delimiter (i.e., a stringified array with a pipe delimiter)
-### `property` name conversion rules 
-To facilitate the mapping of json spec property names to csv property names,  the resulting flattened `property` names from the flattened properties should correspond to the [jsonpath](https://datatracker.ietf.org/doc/id/draft-goessner-dispatch-jsonpath-00.html) representation where:
+### `property` name conversion rules (ie Representing nested arrays and objects in csv documents)
+To facilitate the mapping of json spec property names to csv property names,  the resulting flattened `property` names from the flattened properties should correspond to the [jsonpath](https://datatracker.ietf.org/doc/id/draft-goessner-dispatch-jsonpath-00.html) representation as a `patternProperty`:
 
 1. type `object`
 
@@ -152,13 +151,15 @@ To facilitate the mapping of json spec property names to csv property names,  th
     }}}
 
     ```
-    translates to the csv stringified type array property:
+    translates to the csv stringified type array `patternProperty`:
 
     ```json
         { "..more props..":"...",
-        "standardsMappings[0].instrument.url": {
-                "type": "string",
-                "format": "uri"
+        "patternProperties":{
+            "^standardsMappings[\\d+].instrument.url$": {
+                    "type": "string",
+                    "format": "uri"
+                }
             }
         }
     ```
@@ -167,7 +168,7 @@ To facilitate the mapping of json spec property names to csv property names,  th
 
 1. Currently, no complex types (`anyOf`,`oneOf`) are supported and the `type` MUST be specified. This is to ensure coverage for all csv to json translation use cases.
     - Each json specification schema property type must be a scalar (e.g., `boolean`,`string`,`integer`,`number`), an `array`, or an `object`
-    - Each csv specification schema property type must be a scalar (e.g., `boolean`,`string`,`integer`,`number`)
+    - Each csv specification schema property type must be a scalar (e.g., `boolean`,`string`,`integer`,`number`) but see note on stringified arrays and objects.
 
 ### csv to json and json to csv translations
 
@@ -207,6 +208,7 @@ a core HEAL property. To allow these properties to be included, we list these pr
 
  One consideration, however, is that `propertyNames` was introduced in json schema draft-6.
 
+
 ## Considerations
 
-Please use github issues for any additional considerations. See additional comments above.
+Please use github issues for any additional considerations. See additional comments above.
diff --git a/variable-level-metadata-schema/build.py b/variable-level-metadata-schema/build.py
@@ -11,7 +11,6 @@
 from collections.abc import MutableMapping, MutableSequence, MutableSet,Sequence
 from functools import reduce
 import jsonschema
-from json_schema_for_humans.generate import generate_from_filename
 import jinja2 
 import json
 
@@ -109,7 +108,7 @@ def to_csv_properties(schema,**additional_props):
 
     return csv_schema
 
-def flatten_properties(properties, parentkey="", sep=".",itemsep="[0]"):
+def flatten_properties(properties, parentkey="", sep=".",itemsep="\[\d+\]"):
     """
     flatten schema properties
     """
@@ -141,75 +140,32 @@ def flatten_properties(properties, parentkey="", sep=".",itemsep="[0]"):
 
 def flatten_schema(schema):
     schema_flattened = dict(schema)
-    properties = schema.get("properties")
-    if properties:
-        schema_flattened["properties"] = flatten_properties(properties)
-    return schema_flattened
-
-def _to_frictionless_field(propname, prop, schema):
-    get_anyof = lambda propname: [
-        _prop.get(propname) for _prop in prop.get("oneOf", [])
-    ]
-
-    # anyof is convenient way to reference multiple enum lists of same type
-    anyof = {
-        "type": [t for t in get_anyof("type") if t],
-        "enum": [val for enumlist in get_anyof("enum") for val in enumlist],
-    }
-    jsonfields = {
-        "name": propname,
-        "description": prop.get("description"),
-        "title": prop.get("title"),
-        "examples": prop.get("examples"),
-        "type": list(set(anyof.get("type", []) + [p for p in [prop.get("type")] if p])),
-        "enum": list(set(anyof.get("enum", []) + prop.get("enum", []))),
-        "pattern": prop.get("pattern"),
-    }
-    # add required
-    if propname in schema.get("required", []):
-        jsonfields["required"] = True
-
-    constraintfields = ["enum", "pattern", "required"]
-    targetfield = {}
-
-    for propname, prop in jsonfields.items():
-        if propname == "type":
-            targetfield[propname] = prop[0] if len(prop) == 1 else "any"
-        elif propname in constraintfields and prop:
-            if targetfield.get("constraints"):
-                targetfield["constraints"][propname] = prop
-            else:
-                targetfield["constraints"] = {propname: prop}
-        elif prop:
-            targetfield[propname] = prop
-
-    return targetfield
-
-
-def to_frictionless(schema):
-    assert schema["type"] == "object"
-    assert "properties" in schema
-
-    frictionless_schema = {}
-
-    # schema level annotations
-    for propname in ["description", "title", "name", "examples"]:
-        if schema.get(propname):
-            frictionless_schema[propname] = schema[propname]
-
-    # get fields subschema
-    fields = schema["properties"]
-    frictionless_fields = []
-    for name, field in fields.items():
-        assert isinstance(field, MutableMapping), "all field properties must be jsons"
-        frictionless_fields.append(_to_frictionless_field(name, field, schema))
-
-    frictionless_schema["fields"] = frictionless_fields
-    frictionless_schema["missingValues"] = [
-        ""
-    ]  # TODO: have a way to specify if anyOf is a missing val
-    return frictionless_schema
+    if "properties" in schema:
+        properties = schema_flattened.pop("properties")
+        item_sep = "\[\d+\]"
+        schema_flattened["properties"] = flatten_properties(properties,itemsep=item_sep)
+        schema_flattened["patternProperties"] = {}
+        for propname in list(schema_flattened["properties"].keys()):
+            if item_sep in propname:
+                var0 = propname.replace(item_sep,"[0]")
+                var1 = propname.replace(item_sep,"[1]")
+                var2 = propname.replace(item_sep,"[2]")
+                pattern_property_note = (
+                    "\n\n"
+                    "Specifying field names:\n\n"
+                    "This field can have 1 or more columns using the digit index number in brackets (`[0]` --> `[1]` --> `[n]`)\n\n"
+                    "For 1 value, you will have the field (column) names:\n"
+                    "`{0}`\n\n"
+                    # "\tFor 2 values, you will have the columns: "
+                    # "`{0},`{1}`\n"
+                    "For 3 values, you will have the field (column) names:\n"
+                    "`{0}`\t`{1}`\t`{2}`\n\n"
+                ).format(var0,var1,var2)
+                pattern_prop = schema_flattened["properties"].pop(propname)
+                pattern_prop["description"] = pattern_prop.get("description","") + pattern_property_note
+                schema_flattened["patternProperties"]["^"+propname+"$"] = pattern_prop
 
+    return schema_flattened
 
 def run_pipeline_step(input, step):
     """function for input into the reduce functool
@@ -229,6 +185,7 @@ def run_pipeline_step(input, step):
         raise Exception("Step must be at least of length 1")
 
 def render_markdown(item,schema,templatefile):
+
     env = jinja2.Environment(
         loader=jinja2.FileSystemLoader("docs/assets/templates"),
         trim_blocks=True,
@@ -242,6 +199,17 @@ def render_markdown(item,schema,templatefile):
 
 def generate_template(schema):
     template = {}
+    schema = dict(schema)
+    if 'patternProperties' in schema:
+        schema["properties"] = schema.get("properties",{})
+        for patternname,prop in schema["patternProperties"].items():
+            propname = (
+                patternname
+                .replace("^","")
+                .replace("$","")
+                .replace("\[\d+\]","[0]")
+            )
+            schema["properties"][propname] = prop
     if 'properties' in schema:
         for prop, prop_schema in schema['properties'].items():
             if 'type' in prop_schema:
@@ -258,7 +226,7 @@ def generate_template(schema):
                 ref_schema = get_referenced_schema(prop_schema['$ref'])
                 template[prop] = generate_template(ref_schema)
     return template
-
+    
 if __name__ == "__main__":
     # compile frictionless schema fields
     dictionary = load_all_yamls()
@@ -272,25 +240,9 @@ def generate_template(schema):
         (lambda _schema: {"version":versions["vlmd"],**_schema},None)
     ]
     json_data_dictionary = reduce(run_pipeline_step, json_pipeline, dictionary)
-    Path("schemas/jsonschema/data-dictionary.json").write_text(json.dumps(json_data_dictionary, indent=4))
+    Path("schemas/data-dictionary.json").write_text(json.dumps(json_data_dictionary, indent=4))
 
     schema_version_prop = {"schemaVersion":json_data_dictionary["properties"]["schemaVersion"]}
-    csv_pipeline = [
-        # recursive fxn so need to grab items from overall dictionary for json paths
-        (resolve_refs, {"schema": dictionary}),
-        # no longer need the definitons as they have been resolved
-        (lambda _schema: _schema["fields"], None),        
-        (flatten_schema, None),
-        (to_csv_properties,schema_version_prop),
-        (to_frictionless, None),
-        (lambda _schema: {"version":versions["vlmd"],**_schema},None)
-    ]
-    frictionlessfields = reduce(run_pipeline_step, csv_pipeline, dictionary)
-    Path("schemas/frictionless/csvtemplate/fields.json").write_text(
-        json.dumps(frictionlessfields, indent=2)
-    )
-
-
     # compile json schema fields
     csv_pipeline = [
         # recursive fxn so need to grab items from overall dictionary for json paths
@@ -302,15 +254,7 @@ def generate_template(schema):
         (lambda _schema: {"version":versions["vlmd"],**_schema},None)
     ]
     csvfields = reduce(run_pipeline_step, csv_pipeline, dictionary)
-    Path("schemas/jsonschema/csvtemplate/fields.json").write_text(json.dumps(csvfields, indent=4))
-
-    # generate json schema versions of field schemas for documentation 
-
-    # generate html using the json-schema for human library
-    generate_from_filename("schemas/jsonschema/csvtemplate/fields.json",
-        "docs/html-rendered-schemas/jsonschema-csvtemplate-fields.html")
-    generate_from_filename("schemas/jsonschema/data-dictionary.json",
-        "docs/html-rendered-schemas/jsonschema-jsontemplate-data-dictionary.html")
+    Path("schemas/csvtemplate/fields.json").write_text(json.dumps(csvfields, indent=4))
 
     # render and write markdown versions
     csvfields_md = render_markdown(
@@ -322,8 +266,8 @@ def generate_template(schema):
         schema=json_data_dictionary,
         templatefile="jsontemplate.md"
     )
-    Path("docs/md-rendered-schemas/jsonschema-csvtemplate-fields.md").write_text(csvfields_md)
-    Path("docs/md-rendered-schemas/jsonschema-jsontemplate-data-dictionary.md").write_text(json_dd_md)
+    Path("docs/csvtemplate-fields.md").write_text(csvfields_md)
+    Path("docs/jsontemplate-data-dictionary.md").write_text(json_dd_md)
 
     # generate templates
     Path("templates/template_submission.json").write_text(json.dumps([generate_template(json_data_dictionary)],indent=4))

diff --git a/variable-level-metadata-schema/docs/assets/templates/csvtemplate.md b/variable-level-metadata-schema/docs/assets/templates/csvtemplate.md
@@ -16,8 +16,17 @@ The aim of this HEAL metadata piece is to track and provide basic information ab
 
 {% for itemname,item in schema.properties.items() %}
 {% include 'properties.md' %}
+
+------
+
 {% endfor %}
 
+{% for itemname,item in schema.patternProperties.items() %}
+{% set itemname = itemname.replace("^","").replace("$","").replace("\[\d+\]","[`number`]") %}
+{% include 'properties.md' %}
+
+------
+{% endfor %}
 
 ## End of schema - Additional Property information 
 

diff --git a/variable-level-metadata-schema/docs/assets/templates/jsontemplate.md b/variable-level-metadata-schema/docs/assets/templates/jsontemplate.md
@@ -12,7 +12,11 @@ _version {{ schema.version }}_
 ### Properties for each `fields` record
 {% set schema = item['items'] %}
 {% for itemname,item in item['items']['properties'].items() %}
+
 {% include 'properties.md' %}
+
+------
+
 {% endfor %}
 {% endif %}
 {% endfor %}