RL Implementation #3

andreasbinder · 2023-08-02T13:09:02Z

Hi, thank you for the paper and the interesting concept!

We want to build on your idea, using RL methods. However, in your code I did not find the policy implementations.
I assume they are supposed to be here.

It would be great if you can share your code so that I can experiment also on my own :)
Keep up the good work!

Sheerkay · 2024-09-24T13:02:38Z

In your paper, you mentioned using an improved version of the REINFORCE algorithm [32] to directly train the ToT (Tree of Thought) controller and the prompt agent. However, in the code of your GitHub project, I did not find the corresponding reinforcement learning method. It seems that your strategy still relies on natural language prompts.

phuleratribhuwan · 2024-11-13T12:28:31Z

@Sheerkay I came across the same thing while exploring—it appears to be basic code with minimal effort, relying mainly on well-crafted text prompts for the problem. The true power seems to lie in the NLP or GPT capabilities behind it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL Implementation #3

RL Implementation #3

andreasbinder commented Aug 2, 2023

Sheerkay commented Sep 24, 2024

phuleratribhuwan commented Nov 13, 2024

RL Implementation #3

RL Implementation #3

Comments

andreasbinder commented Aug 2, 2023

Sheerkay commented Sep 24, 2024

phuleratribhuwan commented Nov 13, 2024