-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BC running issues #5
Comments
Thank you for reaching out. We will address your concerns in full shortly. Here are some answers to your questions. MazeI ran and I got an average reward of -75.36, so this seems to work. You are right that the policy used in the training script and the default in the evaluation script is different. In our evaluation in the paper, we used GPT2PPOPolicy, so we will update this. In the meantime, if you are evaluating a checkpoint other than the one linked in the repository, I recommend that you run for more epochs. For the maze task, we trained our methods for 50-100 epochs for each algorithm since the dataset is relatively small. We will add this hyperparameter to an updated version of the paper. Chess
20 QuestionsApologies for the confusion. Yes, that link is for the simulator. Please download it and then point 'oracle_model_path' at the path to the model. We will get to the remaining issues shortly. Thank you for your patience! |
@icwhite Hello, are there any updates on the remaining issues? |
Hi, I have resolved the issues for 20 questions, and fully observed maze. We are still working on Text-Nav, Guess My City, and Car Dealer issues. |
We have resolved the car dealer issue. Please see recent merge. |
BC running issues
Hello! I have tried to evaluate BC using all benchmarks and I have encountered the bugs and errors described below, because of which I either cannot successfully run the code at all, or I cannot get results described in the article
Maze:
For the evaluation I use the command line
python -m llm_rl_scripts.maze.bc.eval_bc PARAMS my_path
During the evaluation of the “fully observed” version it seems like the GPT2PPOPolicy is used (use_reranker_for_reward_eval: bool=False). However, when I have tried to evaluate the model using this policy it acts only with Text(“\n”, is_action=True) and I get -4.0 reward for every move. Moreover, in fully_observed_bc.py another policy (ReRankerSamplePolicy) is used.
Chess:
For the train with chess full games I use the command line
After running this command I get the following warning:
After this warning the code crashes with
That was after the checkpoint was created, and I tried to evaluate BC with the command:
It crashes when evaluating the environment created with
interactions, results = text_env_eval( env=env, policy=policy, n_rollouts=policy_n_rollouts, verbose=True, env_options={"init_position": position}, bsize=policy_bsize, )
with the error
NameError: name 'position' is not defined
Guess My City
For the train I have not found where the 'vocab_file' is located or could be uploaded. Is it some of the files in
llm_rl_scripts/wordle/vocab/
? Please clarify which command should I use in order to train BC on this task.Wordle
For the training BC on Wordle I have tried to run the commands with every vocab file:
python -m llm_rl_scripts.wordle.bc.train_bc_gpt2 HF gpt2 datasets/wordle/train_data.jsonl datasets/wordle/eval_data.jsonl llm_rl_scripts/wordle/vocab/tweet_words.txt
And for any of runs I have received the following error:
And could not come out with the correct run command for llm_rl_scripts.wordle.bc.train_bc.
Trying
python -m llm_rl_scripts.wordle.bc.train_bc HF gpt2 datasets/wordle/train_data.jsonl datasets/wordle/eval_data.jsonl llm_rl_scripts/wordle/vocab/tweet_words.txt
leads to the following warning and error:Car dealer
I could not run this script llm_rl_scripts/car_dealer/bc/train_bc.py due to following error:
It seems like other users have the same issue (Issue #2)
Text_Nav
I have tried to train BC with the following command:
python -m llm_rl_scripts.text_nav.bc.train_bc HF gpt2 datasets/text_nav/train_full_info.json datasets/text_nav/eval_full_info.json
It fails with the following error:
Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File ".../llm_rl_scripts/text_nav/bc/train_bc.py", line 291, in <module> tyro.cli(main) File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 114, in cli _cli_impl( File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 293, in _cli_impl out, consumed_keywords = _calling.call_from_args( File ".../venv1/lib/python3.9/site-packages/tyro/_calling.py", line 192, in call_from_args return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore File ".../llm_rl_scripts/text_nav/bc/train_bc.py", line 259, in main trainer, inference = train_loop( File ".../JAXSeq/JaxSeq/train.py", line 218, in train_loop for batch in tqdm(d, total=steps_per_epoch): File ".../venv1/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/kariakinaleksandr/PycharmProjects/lmlr-gym/JAXSeq/JaxSeq/utils.py", line 437, in _iterable_data_to_batch_iterator for item in dataset: File ".../lmlr-gym/JAXSeq/JaxSeq/data.py", line 210, in __next__ in_tokens, in_training_mask = next(self.in_mask_tokens) File ".../JAXSeq/JaxSeq/data.py", line 248, in _tokens_generator in_training_mask = block_sequences( File ".../JAXSeq/JaxSeq/utils.py", line 240, in block_sequences return np.asarray(full_sequences, dtype=dtype) ValueError: could not convert string to float: '|'
20 Questions
In the
llm_rl_scripts/twenty_questions/bc/train_bc.py
script file unresolved reference 'train_text_histories'. Should it be train_text_trajectories?I have not found the model which is suitable to be an oracle. Tried gpt2, it fails with the following error:
Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File ".../llm_rl_scripts/twenty_questions/bc/train_bc.py", line 316, in <module> tyro.cli(main) File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 114, in cli _cli_impl( File ".../venv1/lib/python3.9/site-packages/tyro/_cli.py", line 293, in _cli_impl out, consumed_keywords = _calling.call_from_args( File ".../venv1/lib/python3.9/site-packages/tyro/_calling.py", line 192, in call_from_args return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore File ".../llm_rl_scripts/twenty_questions/bc/train_bc.py", line 152, in main oracle=T5Oracle.load_oracle( File ".../llm_rl_scripts/twenty_questions/env/oracle.py", line 107, in load_oracle params, model = t5_load_params( File ".../JAXSeq/JaxSeq/models/T5/load.py", line 229, in load_params with open(os.path.join(model_load_path, 'config.json'), 'r') as f: File ".../JAXSeq/JaxSeq/bucket_manager.py", line 24, in open_with_bucket f = open(path, mode=mode, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: 'gpt2/config.json'
Is this a model from a dataset link?
Please resolve this whole issue as soon as possible.
The text was updated successfully, but these errors were encountered: