-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validators for bboxes annotation files #32
base: main
Are you sure you want to change the base?
Changes from all commits
cb8962a
9aaa6ab
e1e889d
49fc060
9bdcffd
289e6d1
20b72f2
599fe87
02e714c
91addb1
f448c57
e01b993
534d32b
272d617
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
{ | ||
"$schema": "https://json-schema.org/draft/2020-12/schema", | ||
"type": "object", | ||
"properties": { | ||
"info": { | ||
"type": "object" | ||
}, | ||
"licenses": { | ||
"type": "array" | ||
}, | ||
"images": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"file_name": { | ||
"type": "string" | ||
}, | ||
"id": { | ||
"type": "integer" | ||
}, | ||
"width": { | ||
"type": "integer" | ||
}, | ||
"height": { | ||
"type": "integer" | ||
} | ||
} | ||
} | ||
}, | ||
"annotations": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"id": { | ||
"type": "integer" | ||
}, | ||
"image_id": { | ||
"type": "integer" | ||
}, | ||
"bbox": { | ||
"type": "array", | ||
"items": { | ||
"type": "integer" | ||
} | ||
}, | ||
"category_id": { | ||
"type": "integer" | ||
}, | ||
"area": { | ||
"type": "number" | ||
}, | ||
"iscrowd": { | ||
"type": "integer" | ||
} | ||
} | ||
} | ||
}, | ||
"categories": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"id": { | ||
"type": "integer" | ||
}, | ||
"name": { | ||
"type": "string" | ||
}, | ||
"supercategory": { | ||
"type": "string" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## JSON schemas for manual annotations files. | ||
|
||
We use JSON schemas to validate the types of a supported annotation file. | ||
|
||
Note that the schema validation only checks the type of a key if that key is present. It does not check for the presence of the keys. | ||
|
||
If the meta-schema (under $schema) is not provided, the jsonschema validator uses the the latest released draft of the JSON schema specification. | ||
|
||
## VIA schema | ||
|
||
The VIA schema corresponds to the format exported by VGG Image Annotator 2.x.y (VIA) for object detection annotations. | ||
|
||
Each image under `_via_img_metadata` is indexed using a unique key: FILENAME-FILESIZE. We use "additionalProperties" to allow for any key name, see https://stackoverflow.com/a/69811612/24834957. | ||
|
||
The section `_via_image_id_list` contains an ordered list of image keys using a unique key: `FILENAME-FILESIZE`, the position in the list defines the image ID. | ||
|
||
The section `_via_attributes` region attributes and file attributes, to display in VIA's UI and to classify the data. | ||
|
||
The section `_via_data_format_version` contains the version of the VIA tool used. | ||
|
||
|
||
## COCO schema | ||
The COCO schema follows the COCO dataset format for object detection, see https://cocodataset.org/#format-data. | ||
|
||
Box coordinates are measured from the top left corner of the image, and are 0-indexed. | ||
### References | ||
---------- | ||
- https://github.com/python-jsonschema/jsonschema | ||
- https://json-schema.org/understanding-json-schema/ | ||
- https://cocodataset.org/#format-data | ||
- https://gitlab.com/vgg/via/-/blob/master/via-2.x.y/CodeDoc.md?ref_type=heads#description-of-via-project-json-file | ||
- https://python-jsonschema.readthedocs.io/en/stable/api/#jsonschema.validate |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
{ | ||
"$schema": "https://json-schema.org/draft/2020-12/schema", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any particular reason for using this specific schema version? I guess it what VIA and COCO use (if they bother to specify)? |
||
"type": "object", | ||
"properties": { | ||
"_via_settings": { | ||
"type": "object", | ||
"properties": { | ||
"ui": { | ||
"type": "object" | ||
}, | ||
"core": { | ||
"type": "object" | ||
}, | ||
"project": { | ||
"type": "object" | ||
} | ||
} | ||
}, | ||
"_via_img_metadata": { | ||
"type": "object", | ||
"additionalProperties": { | ||
"type": "object", | ||
"properties": { | ||
"filename": { | ||
"type": "string" | ||
}, | ||
"size": { | ||
"type": "integer" | ||
}, | ||
"regions": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"shape_attributes": { | ||
"type": "object", | ||
"properties": { | ||
"name": { | ||
"type": "string" | ||
}, | ||
"x": { | ||
"type": "integer" | ||
}, | ||
"y": { | ||
"type": "integer" | ||
}, | ||
"width": { | ||
"type": "integer" | ||
}, | ||
"height": { | ||
"type": "integer" | ||
} | ||
} | ||
}, | ||
"region_attributes": { | ||
"type": "object" | ||
} | ||
} | ||
} | ||
}, | ||
"file_attributes": { | ||
"type": "object" | ||
} | ||
} | ||
} | ||
}, | ||
"_via_image_id_list": { | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
}, | ||
"_via_attributes": { | ||
"type": "object", | ||
"properties": { | ||
"region": { | ||
"type": "object" | ||
}, | ||
"file": { | ||
"type": "object" | ||
} | ||
} | ||
}, | ||
"_via_data_format_version": { | ||
"type": "string" | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,159 @@ | ||||||||||||||||
"""Utility functions for JSON schema files.""" | ||||||||||||||||
|
||||||||||||||||
import json | ||||||||||||||||
from pathlib import Path | ||||||||||||||||
|
||||||||||||||||
import jsonschema | ||||||||||||||||
import jsonschema.exceptions | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's purely a matter of taste, but I would do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or if you take my next suggestion, no need to import these at all. |
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _get_default_VIA_schema() -> dict: | ||||||||||||||||
"""Get the VIA schema as a dictionary.""" | ||||||||||||||||
via_schema_path = Path(__file__).parent / "schemas" / "VIA_schema.json" | ||||||||||||||||
with open(via_schema_path) as file: | ||||||||||||||||
via_schema_dict = json.load(file) | ||||||||||||||||
return via_schema_dict | ||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _get_default_COCO_schema() -> dict: | ||||||||||||||||
"""Get the COCO schema as a dictionary.""" | ||||||||||||||||
coco_schema_path = Path(__file__).parent / "schemas" / "COCO_schema.json" | ||||||||||||||||
with open(coco_schema_path) as file: | ||||||||||||||||
coco_schema_dict = json.load(file) | ||||||||||||||||
return coco_schema_dict | ||||||||||||||||
Comment on lines
+10
to
+23
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not important, but you could also merge these two, and then call |
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _check_file_is_json(filepath: Path): | ||||||||||||||||
"""Check the input file can be read as a JSON.""" | ||||||||||||||||
try: | ||||||||||||||||
with open(filepath) as file: | ||||||||||||||||
json.load(file) | ||||||||||||||||
except json.JSONDecodeError as decode_error: | ||||||||||||||||
# We override the error message for clarity | ||||||||||||||||
raise ValueError( | ||||||||||||||||
f"Error decoding JSON data from file: {filepath}. " | ||||||||||||||||
"The data being deserialized is not a valid JSON. " | ||||||||||||||||
Comment on lines
+34
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally find the original error message (the one you override) very cryptic, so imo it's worth doing the override. |
||||||||||||||||
) from decode_error | ||||||||||||||||
except Exception as error: | ||||||||||||||||
raise error | ||||||||||||||||
Comment on lines
+37
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can simply remove these lines, because there is no additional logic added to catching the generic
Suggested change
|
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _check_file_matches_schema(filepath: Path, schema: dict | None): | ||||||||||||||||
"""Check the input JSON file matches the given schema. | ||||||||||||||||
|
||||||||||||||||
The schema validation only checks the type for each specified | ||||||||||||||||
key if the key exists. It does not check that the keys in the | ||||||||||||||||
schema are present in the JSON file. | ||||||||||||||||
""" | ||||||||||||||||
# Read json file | ||||||||||||||||
with open(filepath) as file: | ||||||||||||||||
data = json.load(file) | ||||||||||||||||
|
||||||||||||||||
# Check against schema if provided | ||||||||||||||||
if schema: | ||||||||||||||||
try: | ||||||||||||||||
jsonschema.validate(instance=data, schema=schema) | ||||||||||||||||
except jsonschema.exceptions.ValidationError as val_err: | ||||||||||||||||
raise val_err | ||||||||||||||||
except jsonschema.exceptions.SchemaError as schema_err: | ||||||||||||||||
raise schema_err | ||||||||||||||||
Comment on lines
+54
to
+59
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since you are just re-raising the same exceptions you are catching, is there any point to this try-expect block?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point being, there is no point in doing error handling here if you are not going to change the error type, add a message, or "absorb" the error instead of raising it. In this particular case here, I would completely do away with it, as per my suggestion. |
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _check_required_properties_keys( | ||||||||||||||||
required_properties_keys: list, schema: dict | ||||||||||||||||
): | ||||||||||||||||
"""Check the input schema includes the required "properties" keys.""" | ||||||||||||||||
# Get keys of "properties" dictionaries in schema | ||||||||||||||||
properties_keys_in_schema = _extract_properties_keys(schema) | ||||||||||||||||
|
||||||||||||||||
# Get list of "properties" keys that are required but not in schema | ||||||||||||||||
missing_keys = set(required_properties_keys) - set( | ||||||||||||||||
properties_keys_in_schema | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
# Raise error if there are missing keys in the schema | ||||||||||||||||
if missing_keys: | ||||||||||||||||
raise ValueError( | ||||||||||||||||
f"Required key(s) {sorted(missing_keys)} not found " | ||||||||||||||||
"in schema. Note that " | ||||||||||||||||
"a key may not be found correctly if the schema keywords " | ||||||||||||||||
"(such as 'properties', 'type' or 'items') are not spelt " | ||||||||||||||||
"correctly." | ||||||||||||||||
Comment on lines
+79
to
+81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a common occurrence? I wonder if we can do without the second sentence. |
||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _check_required_keys_in_dict( | ||||||||||||||||
list_required_keys: list[str], | ||||||||||||||||
data: dict, | ||||||||||||||||
additional_message: str = "", | ||||||||||||||||
): | ||||||||||||||||
"""Check if the required keys are present in the input dictionary.""" | ||||||||||||||||
missing_keys = set(list_required_keys) - set(data.keys()) | ||||||||||||||||
if missing_keys: | ||||||||||||||||
raise ValueError( | ||||||||||||||||
f"Required key(s) {sorted(missing_keys)} not " | ||||||||||||||||
f"found{additional_message}." | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def _extract_properties_keys(schema: dict, parent_key="") -> list: | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function is quite complex to read and understand, but I think that's just a reflection of the complexity of the task you are doing. I tried rewriting it in a different way. My version ended up shorter but arguably harder to understand (it includes recursive calls to the same function). Anyway, I will leave my version below in case you are interested, but I don't think it's worth changing. Note that existing tests still pass with my version, but haven't verified 100% that it behaves the same way as yours in all cases. My attemptdef _extract_properties_keys(schema: dict) -> list:
"""Extract keys from all "properties" dictionaries in a JSON schema.
Traverses a JSON schema and collects all property keys, including nested
ones. Returns them as a sorted list of strings with full paths
(e.g. 'parent/child').
"""
def _collect_keys(current_schema: dict, prefix: str = "") -> list:
result: list[str] = []
# Skip if not an object schema
if (
not isinstance(current_schema, dict)
or "type" not in current_schema
):
return result
# Handle properties
if "properties" in current_schema:
for key, value in current_schema["properties"].items():
full_key = f"{prefix}/{key}" if prefix else key
result.append(full_key)
# Recurse into nested properties
result.extend(_collect_keys(value, full_key))
# Handle additionalProperties
if "additionalProperties" in current_schema:
props = current_schema["additionalProperties"]
result.extend(_collect_keys(props, prefix))
# Handle array items
if "items" in current_schema:
result.extend(_collect_keys(current_schema["items"], prefix))
return result
return sorted(_collect_keys(schema)) |
||||||||||||||||
"""Recursively extract the keys of all "properties" subdictionaries. | ||||||||||||||||
|
||||||||||||||||
Recursively extract the keys of all subdictionaries in the input | ||||||||||||||||
dictionary that are values to a "properties" key. The input dictionary | ||||||||||||||||
represents a JSON schema dictionary | ||||||||||||||||
(see https://json-schema.org/understanding-json-schema/about). | ||||||||||||||||
|
||||||||||||||||
The "properties" key always appears as part of a dictionary with at least | ||||||||||||||||
another key, that is "type" or "item". | ||||||||||||||||
""" | ||||||||||||||||
# The "property keys" are either "properties" or "additionalProperties" | ||||||||||||||||
# as they are the keys with the relevant data | ||||||||||||||||
property_keys = ["properties", "additionalProperties"] | ||||||||||||||||
|
||||||||||||||||
def _contains_properties_key(input: dict): | ||||||||||||||||
"""Return True if the input dictionary contains a property key.""" | ||||||||||||||||
return any(x in input for x in property_keys) | ||||||||||||||||
|
||||||||||||||||
def _get_properties_subdict(input: dict): | ||||||||||||||||
"""Get the subdictionary under the property key.""" | ||||||||||||||||
return input[next(k for k in property_keys if k in input)] | ||||||||||||||||
|
||||||||||||||||
keys_of_properties_dicts = [] | ||||||||||||||||
if "type" in schema: | ||||||||||||||||
if _contains_properties_key(schema): | ||||||||||||||||
# Get the subdictionary under the properties key | ||||||||||||||||
properties_subdict = _get_properties_subdict(schema) | ||||||||||||||||
|
||||||||||||||||
# Check if there is a nested "properties" dict inside the current | ||||||||||||||||
# one. If so, go down one level. | ||||||||||||||||
if _contains_properties_key(properties_subdict): | ||||||||||||||||
properties_subdict = _get_properties_subdict( | ||||||||||||||||
properties_subdict | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
# Add keys of deepest "properties dict" to list | ||||||||||||||||
keys_of_properties_dicts.extend( | ||||||||||||||||
[ | ||||||||||||||||
f"{parent_key}/{ky}" if parent_key else ky | ||||||||||||||||
for ky in properties_subdict | ||||||||||||||||
] | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
# Inspect non-properties dictionaries under this properties subdict | ||||||||||||||||
for ky, val in properties_subdict.items(): | ||||||||||||||||
full_key = f"{parent_key}/{ky}" if parent_key else ky | ||||||||||||||||
keys_of_properties_dicts.extend( | ||||||||||||||||
_extract_properties_keys(val, full_key) | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
elif "items" in schema: | ||||||||||||||||
# Analyse the dictionary under the "items" key | ||||||||||||||||
properties_subdict = schema["items"] | ||||||||||||||||
keys_of_properties_dicts.extend( | ||||||||||||||||
_extract_properties_keys( | ||||||||||||||||
properties_subdict, parent_key=parent_key | ||||||||||||||||
) | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
return sorted(keys_of_properties_dicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like a verb is missing here, "contains"?