Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: answer_choices and target keys #26

Open
meixide opened this issue Sep 17, 2024 · 0 comments
Open

error: answer_choices and target keys #26

meixide opened this issue Sep 17, 2024 · 0 comments

Comments

@meixide
Copy link

meixide commented Sep 17, 2024

  1. The value for answer_choices is a string representation of a list, which is not the expected format for the tokenizer. The tokenizer expects a list of strings or a list of lists of strings.

To fix this, we need to convert the string representation of the list into an actual list. We can use the ast.literal_eval function from the ast module to safely evaluate the string as a Python literal in encoder_decoder_functions.py:

if key == "answer_choices":
                # Convert string representation of list to actual list
                text = [item for sublist in ast.literal_eval(batch[key][0]) for item in sublist]
            else:
                text = batch[key]
  1. The target key contains a list with a single integer element [1], which is not a valid input type for the tokenizer. The tokenizer expects a string, a list of strings, or a list of lists of strings.
    To fix this, we need to ensure that the target key is handled appropriately. Since target is likely a label, we might not need to tokenize it in the same way as input and answer_choices. We have to skip tokenizing target or handle it differently in encoder_decoder_functions.py:

keys_to_tokenize = ["input", "answer_choices"]
...

# Handle target separately if needed if "target" in batch: tokenized_batch["target"] = batch["target"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant