-
Notifications
You must be signed in to change notification settings - Fork 571
[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89
Comments
Hi @LysandreJik I have tried to add Could you tell how you get Thank you,
|
Hi @insop, to add the with tf.variable_scope("module"): Which results in the [...]
config = copy.deepcopy(config)
if not is_training:
config.hidden_dropout_prob = 0.0
config.attention_probs_dropout_prob = 0.0
input_shape = get_shape_list(input_ids, expected_rank=2)
batch_size = input_shape[0]
seq_length = input_shape[1]
if input_mask is None:
input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)
if token_type_ids is None:
token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)
with tf.variable_scope("module"):
with tf.variable_scope(scope, default_name="bert"):
with tf.variable_scope("embeddings"):
# Perform embedding lookup on the word ids.
(self.word_embedding_output,
[...] |
Hi @LysandreJik |
Great to hear, please let me know if you manage to convert the v2 models/reproduce the results! |
Hi @LysandreJik I have ran your script (
I thought I saw you on other post, but I was mistaken. Thank you,
|
Hi, this issue is related to ALBERT and especially the V2 models, specifically the
xlarge
version 2.TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models.
The models are accessible through the HUB; in order to inspect them I save the checkpoints which I then load in the
modeling.AlbertModel
available in themodeling.py
file. I use this script to save the checkpoint to a file.In a different script, I load the checkpoint in a model from
modeling.py
(a line has to be added to the scope so that the modeling scope begins withmodule
, same as the HUB module). I load the checkpoint in this script. In that same script I load a HUB module, and I compare the outputs of both models given the same input values.For every model, I check that the difference is near to zero by checking the maximum difference between tensor values (included at the bottom of the second script). Here are the results:
ALBERT-BASE-V1 max difference: pooled 8.009374e-06, full transformer 2.3543835e-06
ALBERT-LARGE-V1 max difference: pooled 2.5719404e-05 full transformer 1.8417835e-05
ALBERT-XLARGE-V1 max difference: pooled 0.0006218478 full transformer 0.0
ALBERT-XXLARGE-V1 max difference: pooled 0.0 full transformer 1.0311604e-05
ALBERT-BASE-V2 max difference: pooled 2.3335218e-05 full transformer 4.9591064e-05
ALBERT-LARGE-V2 max difference: pooled 0.00015488267 full transformer 0.00010347366
ALBERT-XLARGE-V2 max difference: pooled 1.9535216 full transformer 5.152705
ALBERT-XXLARGE-V2 max difference: pooled 1.7762184e-05 full transformer 2.592802e-06
Is there an issue with this model in particular, does it have a particular architecture change that is different from the others?
I have had no problems replicating the SQuAD results on all of the V1 models, but I could not do so on the V2 models apart for the base one. Is this related? Thank you for your time.
The text was updated successfully, but these errors were encountered: