Add new JSON Schema to Support v0.19 #3621

lordsoffallen · 2024-02-14T18:09:53Z

Description

Development notes

I simply replaced DataSet --> Dataset + I added huggingface datasets as they were missing. With kedro datasets moved out, I wonder if these json config somehow also should be moved into plugin repo.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Read the contributing guidelines
Signed off each commit with a Developer Certificate of Origin (DCO)
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes
Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: ftopal <[email protected]>

astrojuanlu · 2024-02-14T18:24:57Z

Thanks for this PR @lordsoffallen! Approved the CI

astrojuanlu · 2024-02-14T18:26:40Z

Diff between 0.18 and 0.19 schemas:

--- static/jsonschema/kedro-catalog-0.18.json	2023-05-17 13:21:56
+++ static/jsonschema/kedro-catalog-0.19.json	2024-02-14 19:25:40
@@ -9,42 +9,44 @@
         "type": {
           "type": "string",
           "enum": [
-            "CachedDataSet",
-            "IncrementalDataSet",
-            "MemoryDataSet",
-            "LambdaDataSet",
-            "PartitionedDataSet",
-            "api.APIDataSet",
-            "biosequence.BioSequenceDataSet",
-            "dask.ParquetDataSet",
-            "email.EmailMessageDataSet",
-            "geopandas.GeoJSONDataSet",
+            "CachedDataset",
+            "IncrementalDataset",
+            "MemoryDataset",
+            "LambdaDataset",
+            "PartitionedDataset",
+            "api.APIDataset",
+            "biosequence.BioSequenceDataset",
+            "dask.ParquetDataset",
+            "email.EmailMessageDataset",
+            "geopandas.GeoJSONDataset",
             "holoviews.HoloviewsWriter",
-            "json.JSONDataSet",
+            "huggingface.HFDataset",
+            "huggingface.HFTransformerPipelineDataset",
+            "json.JSONDataset",
             "matplotlib.MatplotlibWriter",
-            "networkx.NetworkXDataSet",
-            "pandas.CSVDataSet",
-            "pandas.ExcelDataSet",
-            "pandas.FeatherDataSet",
-            "pandas.GBQTableDataSet",
-            "pandas.HDFDataSet",
-            "pandas.JSONDataSet",
-            "pandas.ParquetDataSet",
-            "pandas.SQLTableDataSet",
-            "pandas.SQLQueryDataSet",
-            "pandas.XMLDataSet",
-            "pillow.ImageDataSet",
-            "pickle.PickleDataSet",
-            "plotly.PlotlyDataSet",
-            "redis.PickleDataSet",
-            "spark.SparkDataSet",
-            "spark.SparkHiveDataSet",
-            "spark.SparkJDBCDataSet",
+            "networkx.NetworkXDataset",
+            "pandas.CSVDataset",
+            "pandas.ExcelDataset",
+            "pandas.FeatherDataset",
+            "pandas.GBQTableDataset",
+            "pandas.HDFDataset",
+            "pandas.JSONDataset",
+            "pandas.ParquetDataset",
+            "pandas.SQLTableDataset",
+            "pandas.SQLQueryDataset",
+            "pandas.XMLDataset",
+            "pillow.ImageDataset",
+            "pickle.PickleDataset",
+            "plotly.PlotlyDataset",
+            "redis.PickleDataset",
+            "spark.SparkDataset",
+            "spark.SparkHiveDataset",
+            "spark.SparkJDBCDataset",
             "tensorflow.TensorFlowModelDataset",
-            "text.TextDataSet",
-            "tracking.JSONDataSet",
-            "tracking.MetricsDataSet",
-            "yaml.YAMLDataSet"
+            "text.TextDataset",
+            "tracking.JSONDataset",
+            "tracking.MetricsDataset",
+            "yaml.YAMLDataset"
           ]
         }
       },
@@ -53,7 +55,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "CachedDataSet"
+                "const": "CachedDataset"
               }
             }
           },
@@ -64,7 +66,7 @@
             "properties": {
               "dataset": {
                 "pattern": ".*",
-                "description": "A Kedro DataSet object or a dictionary to cache."
+                "description": "A Kedro Dataset object or a dictionary to cache."
               },
               "copy_mode": {
                 "type": "string",
@@ -77,7 +79,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "IncrementalDataSet"
+                "const": "IncrementalDataset"
               }
             }
           },
@@ -89,11 +91,11 @@
             "properties": {
               "path": {
                 "type": "string",
-                "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataSet``."
+                "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataset``."
               },
               "dataset": {
                 "pattern": ".*",
-                "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataSet``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
+                "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataset``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
               },
               "checkpoint": {
                 "pattern": "object",
@@ -129,7 +131,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "MemoryDataSet"
+                "const": "MemoryDataset"
               }
             }
           },
@@ -151,7 +153,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "LambdaDataSet"
+                "const": "LambdaDataset"
               }
             }
           },
@@ -184,7 +186,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "PartitionedDataSet"
+                "const": "PartitionedDataset"
               }
             }
           },
@@ -196,11 +198,11 @@
             "properties": {
               "path": {
                 "type": "string",
-                "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataSet``."
+                "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataset``."
               },
               "dataset": {
                 "pattern": ".*",
-                "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataSet``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
+                "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataset``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
               },
               "filepath_arg": {
                 "type": "string",
@@ -232,7 +234,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "api.APIDataSet"
+                "const": "api.APIDataset"
               }
             }
           },
@@ -280,7 +282,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "biosequence.BioSequenceDataSet"
+                "const": "biosequence.BioSequenceDataset"
               }
             }
           },
@@ -319,7 +321,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "dask.ParquetDataSet"
+                "const": "dask.ParquetDataset"
               }
             }
           },
@@ -358,7 +360,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "email.EmailMessageDataSet"
+                "const": "email.EmailMessageDataset"
               }
             }
           },
@@ -397,7 +399,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "geopandas.GeoJSONDataSet"
+                "const": "geopandas.GeoJSONDataset"
               }
             }
           },
@@ -471,12 +473,57 @@
           "if": {
             "properties": {
               "type": {
-                "const": "json.JSONDataSet"
+                "const": "huggingface.HFDataset"
               }
             }
           },
           "then": {
             "required": [
+              "dataset_name"
+            ],
+            "properties": {
+              "dataset_name": {
+                "type": "string",
+                "description": "Huggingface dataset name"
+              }
+            }
+          }
+        },
+        {
+          "if": {
+            "properties": {
+              "type": {
+                "const": "huggingface.HFTransformerPipelineDataset"
+              }
+            }
+          },
+          "then": {
+            "properties": {
+              "task": {
+                "type": "string",
+                "description": "Huggingface pipeline task name"
+              },
+              "model_name": {
+                "type": "string",
+                "description": "Huggingface model name"
+              },
+              "pipeline_kwargs": {
+                "type": "object",
+                "description": "Additional kwargs to be passed into the pipeline"
+              }
+            }
+          }
+        },
+        {
+          "if": {
+            "properties": {
+              "type": {
+                "const": "json.JSONDataset"
+              }
+            }
+          },
+          "then": {
+            "required": [
               "filepath"
             ],
             "properties": {
@@ -541,7 +588,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "networkx.NetworkXDataSet"
+                "const": "networkx.NetworkXDataset"
               }
             }
           },
@@ -580,7 +627,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.CSVDataSet"
+                "const": "pandas.CSVDataset"
               }
             }
           },
@@ -619,7 +666,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.ExcelDataSet"
+                "const": "pandas.ExcelDataset"
               }
             }
           },
@@ -662,7 +709,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.FeatherDataSet"
+                "const": "pandas.FeatherDataset"
               }
             }
           },
@@ -697,7 +744,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.GBQTableDataSet"
+                "const": "pandas.GBQTableDataset"
               }
             }
           },
@@ -738,7 +785,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.HDFDataSet"
+                "const": "pandas.HDFDataset"
               }
             }
           },
@@ -782,7 +829,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.JSONDataSet"
+                "const": "pandas.JSONDataset"
               }
             }
           },
@@ -821,7 +868,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.ParquetDataSet"
+                "const": "pandas.ParquetDataset"
               }
             }
           },
@@ -860,7 +907,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.SQLTableDataSet"
+                "const": "pandas.SQLTableDataset"
               }
             }
           },
@@ -896,7 +943,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.SQLQueryDataSet"
+                "const": "pandas.SQLQueryDataset"
               }
             }
           },
@@ -932,7 +979,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pandas.XMLDataSet"
+                "const": "pandas.XMLDataset"
               }
             }
           },
@@ -971,7 +1018,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pickle.PickleDataSet"
+                "const": "pickle.PickleDataset"
               }
             }
           },
@@ -1014,7 +1061,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "pillow.ImageDataSet"
+                "const": "pillow.ImageDataset"
               }
             }
           },
@@ -1049,7 +1096,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "plotly.PlotlyDataSet"
+                "const": "plotly.PlotlyDataset"
               }
             }
           },
@@ -1093,7 +1140,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "redis.PickleDataSet"
+                "const": "redis.PickleDataset"
               }
             }
           },
@@ -1133,7 +1180,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "spark.SparkDataSet"
+                "const": "spark.SparkDataset"
               }
             }
           },
@@ -1144,7 +1191,7 @@
             "properties": {
               "filepath": {
                 "type": "string",
-                "description": "Filepath in POSIX format to a Spark dataframe. When using Databricks\nand working with data written to mount path points,\nspecify ``filepath``s for (versioned) ``SparkDataSet``s\nstarting with ``/dbfs/mnt``."
+                "description": "Filepath in POSIX format to a Spark dataframe. When using Databricks\nand working with data written to mount path points,\nspecify ``filepath``s for (versioned) ``SparkDataset``s\nstarting with ``/dbfs/mnt``."
               },
               "file_format": {
                 "type": "string",
@@ -1172,7 +1219,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "spark.SparkHiveDataSet"
+                "const": "spark.SparkHiveDataset"
               }
             }
           },
@@ -1206,7 +1253,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "spark.SparkJDBCDataSet"
+                "const": "spark.SparkJDBCDataset"
               }
             }
           },
@@ -1285,7 +1332,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "text.TextDataSet"
+                "const": "text.TextDataset"
               }
             }
           },
@@ -1316,7 +1363,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "tracking.JSONDataSet"
+                "const": "tracking.JSONDataset"
               }
             }
           },
@@ -1351,7 +1398,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "tracking.MetricsDataSet"
+                "const": "tracking.MetricsDataset"
               }
             }
           },
@@ -1386,7 +1433,7 @@
           "if": {
             "properties": {
               "type": {
-                "const": "yaml.YAMLDataSet"
+                "const": "yaml.YAMLDataset"
               }
             }
           },

RELEASE.md

stichbury

Approved with one minor change. Thanks!

Co-authored-by: Jo Stichbury <[email protected]> Signed-off-by: Fazil <[email protected]>

astrojuanlu · 2024-02-15T15:07:46Z

Doc failures seem unrelated

lordsoffallen · 2024-02-15T22:16:44Z

Doc failures seem unrelated

How do we fix it? I am not sure where the problem lies 😅

merelcht

Thanks so much for this contribution @lordsoffallen! ⭐

I left some suggestions, but otherwise all good to merge!

static/jsonschema/kedro-catalog-0.19.json

Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>

lordsoffallen requested review from yetudada, astrojuanlu, stichbury and merelcht as code owners February 14, 2024 18:09

lordsoffallen added 2 commits February 14, 2024 19:13

Add new kedro catalog to support latest kedro version

03619f7

Signed-off-by: ftopal <[email protected]>

Update docs

90ad5fd

Signed-off-by: ftopal <[email protected]>

lordsoffallen force-pushed the main branch from 918d08c to 90ad5fd Compare February 14, 2024 18:14

Update release file

1bac6fa

Signed-off-by: ftopal <[email protected]>

stichbury reviewed Feb 15, 2024

View reviewed changes

RELEASE.md Outdated Show resolved Hide resolved

stichbury approved these changes Feb 15, 2024

View reviewed changes

Update RELEASE.md

6074d82

Co-authored-by: Jo Stichbury <[email protected]> Signed-off-by: Fazil <[email protected]>

Merge branch 'main' into main

063a237

merelcht approved these changes Feb 20, 2024

View reviewed changes

static/jsonschema/kedro-catalog-0.19.json Outdated Show resolved Hide resolved

static/jsonschema/kedro-catalog-0.19.json Outdated Show resolved Hide resolved

static/jsonschema/kedro-catalog-0.19.json Outdated Show resolved Hide resolved

lordsoffallen and others added 3 commits February 20, 2024 18:05

Update static/jsonschema/kedro-catalog-0.19.json

f74c63c

Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>

Update static/jsonschema/kedro-catalog-0.19.json

009e424

Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>

Update static/jsonschema/kedro-catalog-0.19.json

712c7a4

Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>

merelcht enabled auto-merge (squash) February 20, 2024 17:13

merelcht merged commit 30ae2c7 into kedro-org:main Feb 20, 2024
33 checks passed

astrojuanlu mentioned this pull request Feb 27, 2024

Release 0.19.3 #3655

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new JSON Schema to Support v0.19 #3621

Add new JSON Schema to Support v0.19 #3621

lordsoffallen commented Feb 14, 2024 •

edited

Loading

astrojuanlu commented Feb 14, 2024

astrojuanlu commented Feb 14, 2024

stichbury left a comment

astrojuanlu commented Feb 15, 2024

lordsoffallen commented Feb 15, 2024

merelcht left a comment

Add new JSON Schema to Support v0.19 #3621

Add new JSON Schema to Support v0.19 #3621

Conversation

lordsoffallen commented Feb 14, 2024 • edited Loading

Description

Development notes

Developer Certificate of Origin

Checklist

astrojuanlu commented Feb 14, 2024

astrojuanlu commented Feb 14, 2024

stichbury left a comment

Choose a reason for hiding this comment

astrojuanlu commented Feb 15, 2024

lordsoffallen commented Feb 15, 2024

merelcht left a comment

Choose a reason for hiding this comment

lordsoffallen commented Feb 14, 2024 •

edited

Loading