-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not_exist_ok for dict_set only allows to add a non existing key to a dictionary, or non existing entry to a list. Not a whole nested structure implied by the query #1402
Conversation
acdc7ac
to
17ce7d3
Compare
Hi @yoavkatz , I think the needed patches are rather ugly... and I hardly managed to cover all (with axes and pry-bars..; translating from Shai K Ophir, min 4:32 here). Users will have to employ similar measures in the future, or perhaps change their existing code right now, if we pull the rag under their legs (the rag that allowed to dict_set a whole construct through the query; although that rag looked odd to you). |
@@ -47,6 +47,7 @@ | |||
), | |||
preprocess_steps=[ | |||
RenameSplits(mapper={"dev": "train", "validation": "test"}), | |||
SetEmptyDictIfDoesNotExist(field="media"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 'media' field is used by generic code, so it should be added by the code and not the cards.
See for example:
Line 268 in fc91a7e
"media": instance.get("media", {}), |
Hence, this field should be added
https://github.com/IBM/unitxt/blob/reduce-error-message-clutter/src/unitxt/standard.py#L297
@@ -277,6 +278,20 @@ def process( | |||
return instance | |||
|
|||
|
|||
class SetEmptyDictIfDoesNotExist(InstanceOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of this is method is more than expected by the name. It not only creates an empty dict if it does not exist, but also transform the field from string to dict if string is a json. I think we should avoid such complex behavior, which is more indicative of a bug somewhere else.
For example, on Friday I found a bug in post processing pipeline:
(Still not in main)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yoavkatz , If this helps, I found what triggered my need to add this json.loads
to SetEmptyDictIfDoesNotExist
. Again, just for patching with גרזנים ולומיק the need to manually generate container elements (lists and dicts, thus far I wrote a לומיק just for dicts, but will need to add lists too), to not surprise the user of dict_set
:
This method, shamelessly makes task_data
into a string:
def serialize_instance_fields(self, instance, task_data):
if settings.task_data_as_text:
instance["task_data"] = json.dumps(task_data)
if not isinstance(instance["source"], str):
instance["source"] = json.dumps(instance["source"])
return instance
5019948
to
3c1ab49
Compare
Hi @yoavkatz , I suspect that the new Other than these explanation and apologies, the computed results will remain the same, users will not have to change their existing code, and/or write tedious future invocations of dict_set s. |
3c1ab49
to
e5e8be4
Compare
Hi. I guess what I'm missing is how prevalent is this problem and wherher its actually a problem caused by another issue.. I means media and task data are generic fields that should always be there and added by generic cide. If not, we need to understand why. In general, we need to strongly avoid complex flags, that increase the learning curve of the library. |
I agree (although I like flags..). Can this issue hint to what you are looking for? I bumped into it while needing to push this LoadJson into the patch. |
…dictionary, or non existing entry to a list. Not a nested structure as implied by the query Signed-off-by: dafnapension <[email protected]>
…w fails if dict_set only adds to an existing dictionary Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
…t ensure task_data exists, because default value is used Signed-off-by: dafnapension <[email protected]>
…dexing into a list element Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
…dictionary, or non existing entry to a list. Not a nested structure as implied by the query Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
…mpatibility Signed-off-by: dafnapension <[email protected]>
Signed-off-by: dafnapension <[email protected]>
e5e8be4
to
1c8546f
Compare
Too much existing code trust dict_set to fill in all missing component, and not just the last component |
closes: #1400
Can still set a whole nested structure, but as the value being set, not through processing the query by dict_set.