-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new JSON Schema to Support v0.19 #3621
Conversation
Signed-off-by: ftopal <[email protected]>
Signed-off-by: ftopal <[email protected]>
Signed-off-by: ftopal <[email protected]>
Thanks for this PR @lordsoffallen! Approved the CI |
Diff between 0.18 and 0.19 schemas: --- static/jsonschema/kedro-catalog-0.18.json 2023-05-17 13:21:56
+++ static/jsonschema/kedro-catalog-0.19.json 2024-02-14 19:25:40
@@ -9,42 +9,44 @@
"type": {
"type": "string",
"enum": [
- "CachedDataSet",
- "IncrementalDataSet",
- "MemoryDataSet",
- "LambdaDataSet",
- "PartitionedDataSet",
- "api.APIDataSet",
- "biosequence.BioSequenceDataSet",
- "dask.ParquetDataSet",
- "email.EmailMessageDataSet",
- "geopandas.GeoJSONDataSet",
+ "CachedDataset",
+ "IncrementalDataset",
+ "MemoryDataset",
+ "LambdaDataset",
+ "PartitionedDataset",
+ "api.APIDataset",
+ "biosequence.BioSequenceDataset",
+ "dask.ParquetDataset",
+ "email.EmailMessageDataset",
+ "geopandas.GeoJSONDataset",
"holoviews.HoloviewsWriter",
- "json.JSONDataSet",
+ "huggingface.HFDataset",
+ "huggingface.HFTransformerPipelineDataset",
+ "json.JSONDataset",
"matplotlib.MatplotlibWriter",
- "networkx.NetworkXDataSet",
- "pandas.CSVDataSet",
- "pandas.ExcelDataSet",
- "pandas.FeatherDataSet",
- "pandas.GBQTableDataSet",
- "pandas.HDFDataSet",
- "pandas.JSONDataSet",
- "pandas.ParquetDataSet",
- "pandas.SQLTableDataSet",
- "pandas.SQLQueryDataSet",
- "pandas.XMLDataSet",
- "pillow.ImageDataSet",
- "pickle.PickleDataSet",
- "plotly.PlotlyDataSet",
- "redis.PickleDataSet",
- "spark.SparkDataSet",
- "spark.SparkHiveDataSet",
- "spark.SparkJDBCDataSet",
+ "networkx.NetworkXDataset",
+ "pandas.CSVDataset",
+ "pandas.ExcelDataset",
+ "pandas.FeatherDataset",
+ "pandas.GBQTableDataset",
+ "pandas.HDFDataset",
+ "pandas.JSONDataset",
+ "pandas.ParquetDataset",
+ "pandas.SQLTableDataset",
+ "pandas.SQLQueryDataset",
+ "pandas.XMLDataset",
+ "pillow.ImageDataset",
+ "pickle.PickleDataset",
+ "plotly.PlotlyDataset",
+ "redis.PickleDataset",
+ "spark.SparkDataset",
+ "spark.SparkHiveDataset",
+ "spark.SparkJDBCDataset",
"tensorflow.TensorFlowModelDataset",
- "text.TextDataSet",
- "tracking.JSONDataSet",
- "tracking.MetricsDataSet",
- "yaml.YAMLDataSet"
+ "text.TextDataset",
+ "tracking.JSONDataset",
+ "tracking.MetricsDataset",
+ "yaml.YAMLDataset"
]
}
},
@@ -53,7 +55,7 @@
"if": {
"properties": {
"type": {
- "const": "CachedDataSet"
+ "const": "CachedDataset"
}
}
},
@@ -64,7 +66,7 @@
"properties": {
"dataset": {
"pattern": ".*",
- "description": "A Kedro DataSet object or a dictionary to cache."
+ "description": "A Kedro Dataset object or a dictionary to cache."
},
"copy_mode": {
"type": "string",
@@ -77,7 +79,7 @@
"if": {
"properties": {
"type": {
- "const": "IncrementalDataSet"
+ "const": "IncrementalDataset"
}
}
},
@@ -89,11 +91,11 @@
"properties": {
"path": {
"type": "string",
- "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataSet``."
+ "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataset``."
},
"dataset": {
"pattern": ".*",
- "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataSet``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
+ "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataset``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
},
"checkpoint": {
"pattern": "object",
@@ -129,7 +131,7 @@
"if": {
"properties": {
"type": {
- "const": "MemoryDataSet"
+ "const": "MemoryDataset"
}
}
},
@@ -151,7 +153,7 @@
"if": {
"properties": {
"type": {
- "const": "LambdaDataSet"
+ "const": "LambdaDataset"
}
}
},
@@ -184,7 +186,7 @@
"if": {
"properties": {
"type": {
- "const": "PartitionedDataSet"
+ "const": "PartitionedDataset"
}
}
},
@@ -196,11 +198,11 @@
"properties": {
"path": {
"type": "string",
- "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataSet``."
+ "description": "Path to the folder containing partitioned data.\nIf path starts with the protocol (e.g., ``s3://``) then the\ncorresponding ``fsspec`` concrete filesystem implementation will\nbe used. If protocol is not specified,\n``fsspec.implementations.local.LocalFileSystem`` will be used.\n**Note:** Some concrete implementations are bundled with ``fsspec``,\nwhile others (like ``s3`` or ``gcs``) must be installed separately\nprior to usage of the ``PartitionedDataset``."
},
"dataset": {
"pattern": ".*",
- "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataSet``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
+ "description": "Underlying dataset definition. This is used to instantiate\nthe dataset for each file located inside the ``path``.\nAccepted formats are:\na) object of a class that inherits from ``AbstractDataset``\nb) a string representing a fully qualified class name to such class\nc) a dictionary with ``type`` key pointing to a string from b),\nother keys are passed to the Dataset initializer.\nCredentials for the dataset can be explicitly specified in\nthis configuration."
},
"filepath_arg": {
"type": "string",
@@ -232,7 +234,7 @@
"if": {
"properties": {
"type": {
- "const": "api.APIDataSet"
+ "const": "api.APIDataset"
}
}
},
@@ -280,7 +282,7 @@
"if": {
"properties": {
"type": {
- "const": "biosequence.BioSequenceDataSet"
+ "const": "biosequence.BioSequenceDataset"
}
}
},
@@ -319,7 +321,7 @@
"if": {
"properties": {
"type": {
- "const": "dask.ParquetDataSet"
+ "const": "dask.ParquetDataset"
}
}
},
@@ -358,7 +360,7 @@
"if": {
"properties": {
"type": {
- "const": "email.EmailMessageDataSet"
+ "const": "email.EmailMessageDataset"
}
}
},
@@ -397,7 +399,7 @@
"if": {
"properties": {
"type": {
- "const": "geopandas.GeoJSONDataSet"
+ "const": "geopandas.GeoJSONDataset"
}
}
},
@@ -471,12 +473,57 @@
"if": {
"properties": {
"type": {
- "const": "json.JSONDataSet"
+ "const": "huggingface.HFDataset"
}
}
},
"then": {
"required": [
+ "dataset_name"
+ ],
+ "properties": {
+ "dataset_name": {
+ "type": "string",
+ "description": "Huggingface dataset name"
+ }
+ }
+ }
+ },
+ {
+ "if": {
+ "properties": {
+ "type": {
+ "const": "huggingface.HFTransformerPipelineDataset"
+ }
+ }
+ },
+ "then": {
+ "properties": {
+ "task": {
+ "type": "string",
+ "description": "Huggingface pipeline task name"
+ },
+ "model_name": {
+ "type": "string",
+ "description": "Huggingface model name"
+ },
+ "pipeline_kwargs": {
+ "type": "object",
+ "description": "Additional kwargs to be passed into the pipeline"
+ }
+ }
+ }
+ },
+ {
+ "if": {
+ "properties": {
+ "type": {
+ "const": "json.JSONDataset"
+ }
+ }
+ },
+ "then": {
+ "required": [
"filepath"
],
"properties": {
@@ -541,7 +588,7 @@
"if": {
"properties": {
"type": {
- "const": "networkx.NetworkXDataSet"
+ "const": "networkx.NetworkXDataset"
}
}
},
@@ -580,7 +627,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.CSVDataSet"
+ "const": "pandas.CSVDataset"
}
}
},
@@ -619,7 +666,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.ExcelDataSet"
+ "const": "pandas.ExcelDataset"
}
}
},
@@ -662,7 +709,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.FeatherDataSet"
+ "const": "pandas.FeatherDataset"
}
}
},
@@ -697,7 +744,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.GBQTableDataSet"
+ "const": "pandas.GBQTableDataset"
}
}
},
@@ -738,7 +785,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.HDFDataSet"
+ "const": "pandas.HDFDataset"
}
}
},
@@ -782,7 +829,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.JSONDataSet"
+ "const": "pandas.JSONDataset"
}
}
},
@@ -821,7 +868,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.ParquetDataSet"
+ "const": "pandas.ParquetDataset"
}
}
},
@@ -860,7 +907,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.SQLTableDataSet"
+ "const": "pandas.SQLTableDataset"
}
}
},
@@ -896,7 +943,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.SQLQueryDataSet"
+ "const": "pandas.SQLQueryDataset"
}
}
},
@@ -932,7 +979,7 @@
"if": {
"properties": {
"type": {
- "const": "pandas.XMLDataSet"
+ "const": "pandas.XMLDataset"
}
}
},
@@ -971,7 +1018,7 @@
"if": {
"properties": {
"type": {
- "const": "pickle.PickleDataSet"
+ "const": "pickle.PickleDataset"
}
}
},
@@ -1014,7 +1061,7 @@
"if": {
"properties": {
"type": {
- "const": "pillow.ImageDataSet"
+ "const": "pillow.ImageDataset"
}
}
},
@@ -1049,7 +1096,7 @@
"if": {
"properties": {
"type": {
- "const": "plotly.PlotlyDataSet"
+ "const": "plotly.PlotlyDataset"
}
}
},
@@ -1093,7 +1140,7 @@
"if": {
"properties": {
"type": {
- "const": "redis.PickleDataSet"
+ "const": "redis.PickleDataset"
}
}
},
@@ -1133,7 +1180,7 @@
"if": {
"properties": {
"type": {
- "const": "spark.SparkDataSet"
+ "const": "spark.SparkDataset"
}
}
},
@@ -1144,7 +1191,7 @@
"properties": {
"filepath": {
"type": "string",
- "description": "Filepath in POSIX format to a Spark dataframe. When using Databricks\nand working with data written to mount path points,\nspecify ``filepath``s for (versioned) ``SparkDataSet``s\nstarting with ``/dbfs/mnt``."
+ "description": "Filepath in POSIX format to a Spark dataframe. When using Databricks\nand working with data written to mount path points,\nspecify ``filepath``s for (versioned) ``SparkDataset``s\nstarting with ``/dbfs/mnt``."
},
"file_format": {
"type": "string",
@@ -1172,7 +1219,7 @@
"if": {
"properties": {
"type": {
- "const": "spark.SparkHiveDataSet"
+ "const": "spark.SparkHiveDataset"
}
}
},
@@ -1206,7 +1253,7 @@
"if": {
"properties": {
"type": {
- "const": "spark.SparkJDBCDataSet"
+ "const": "spark.SparkJDBCDataset"
}
}
},
@@ -1285,7 +1332,7 @@
"if": {
"properties": {
"type": {
- "const": "text.TextDataSet"
+ "const": "text.TextDataset"
}
}
},
@@ -1316,7 +1363,7 @@
"if": {
"properties": {
"type": {
- "const": "tracking.JSONDataSet"
+ "const": "tracking.JSONDataset"
}
}
},
@@ -1351,7 +1398,7 @@
"if": {
"properties": {
"type": {
- "const": "tracking.MetricsDataSet"
+ "const": "tracking.MetricsDataset"
}
}
},
@@ -1386,7 +1433,7 @@
"if": {
"properties": {
"type": {
- "const": "yaml.YAMLDataSet"
+ "const": "yaml.YAMLDataset"
}
}
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one minor change. Thanks!
Co-authored-by: Jo Stichbury <[email protected]> Signed-off-by: Fazil <[email protected]>
Doc failures seem unrelated |
How do we fix it? I am not sure where the problem lies 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this contribution @lordsoffallen! ⭐
I left some suggestions, but otherwise all good to merge!
Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>
Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>
Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Fazil <[email protected]>
Description
To fix #3590
Development notes
I simply replaced DataSet --> Dataset + I added huggingface datasets as they were missing. With kedro datasets moved out, I wonder if these json config somehow also should be moved into plugin repo.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file