You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
最近拜读了您的论文《GPT Understands, Too》,关于这段话有些不理解,希望您能帮忙指导解释下:”1) Discreteness: the original word embedding e of M has already become highly discrete after pre-training. If h is initialized with random distribution and then optimized with stochastic gradient descent (SGD), which has been proved to only change the parameters in a small neighborhood (AllenZhu et al., 2019), the optimizer would easily fall into local minima.” 按照我的理解,您这段话先说明预训练模型的词向量彼此之间相互离散,但是可训练参数h本身就是随机初始化的,并不来自于词向量,词向量的离散对h的优化有什么影响吗?
The text was updated successfully, but these errors were encountered:
最近拜读了您的论文《GPT Understands, Too》,关于这段话有些不理解,希望您能帮忙指导解释下:”1) Discreteness: the original word embedding e of M has already become highly discrete after pre-training. If h is initialized with random distribution and then optimized with stochastic gradient descent (SGD), which has been proved to only change the parameters in a small neighborhood (AllenZhu et al., 2019), the optimizer would easily fall into local minima.” 按照我的理解,您这段话先说明预训练模型的词向量彼此之间相互离散,但是可训练参数h本身就是随机初始化的,并不来自于词向量,词向量的离散对h的优化有什么影响吗?
The text was updated successfully, but these errors were encountered: