Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Differences between forward and generate methods #1819

Open
victorcaquilpan opened this issue Jan 6, 2025 · 1 comment
Open

Comments

@victorcaquilpan
Copy link

Question

I have been struggling to understand the differences between these two methods. I hope someone can help to clarify some of the next questions:

  1. Checking the documentation of LLava, we use forward for training and generate for inference, however, in some cases I have seen that people use forward for validation. This is right?
  2. It mentioned that generate method is for autoregressive generation. Forward doesn't follow a autoregressive generation? If that is true, what is the practical difference between both methods in this sense? In theory, if I take a finetuned model, and I run an inference for an input prompt with just the user question, using the forward and generate methods I can get different results? why?
  3. During training, using the forward method the input considers both the question and the answer. Does the model use part of the answer for predicting the tokens? or the model just use the answer to calculate the loss function?
  4. In my case, I am trying to use the hidden states of the last layer as input for a subsequent process, however, I have noticed that even though I can get the same output in forward than in generate methods, the hidden states not necessarily are similar, that is right?

Thanks

@chenyingquanyulbs
Copy link

你好:
首先,我将用中文来对你上述提到的部分问题,做一个理解。
1.你提到的有些人用forward来进行验证评估,我也有发现。按照原先的逻辑应该是训练和验证分开。但是这个逻辑是否绝对,我还在考虑当中。因为,现在有一个问题是,我发现,用generate来验证会有一个错误,即生成的隐藏层数目会有时出错,如果你是需要从4096个隐藏层信息中做下一步操作,就会中断。当然,你如果只要最终的3万多个的完整信息,这不会有问题。
2.顺着上面的思路,我发现了用generate生成的隐藏层有时数量会出错,导致中断。这就使得用generate来验证就不行了。这恐怕是transformer库中generate函数的问题。这就使得想要修正难度很大。这恐怕是其他人用forward来验证的原因。这是我猜测的,或许还有其他原因。
3.最后,关于forward和generate的区别,你的理解我大部分是认可的。究竟forward有没有利用一部分所给的答案来进行输出?这种方法来验证评估是否是重大错误?
4.希望,你有任何更新的疑问和问题,乃至于答案,能够与我一起分享。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants