[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

LysandreJik · 2019-11-08T16:36:11Z

Hi, this issue is related to ALBERT and especially the V2 models, specifically the xlarge version 2.

TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models.

The models are accessible through the HUB; in order to inspect them I save the checkpoints which I then load in the modeling.AlbertModel available in the modeling.py file. I use this script to save the checkpoint to a file.

In a different script, I load the checkpoint in a model from modeling.py (a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module). I load the checkpoint in this script. In that same script I load a HUB module, and I compare the outputs of both models given the same input values.

For every model, I check that the difference is near to zero by checking the maximum difference between tensor values (included at the bottom of the second script). Here are the results:

ALBERT-BASE-V1 max difference: pooled 8.009374e-06, full transformer 2.3543835e-06
ALBERT-LARGE-V1 max difference: pooled 2.5719404e-05 full transformer 1.8417835e-05
ALBERT-XLARGE-V1 max difference: pooled 0.0006218478 full transformer 0.0
ALBERT-XXLARGE-V1 max difference: pooled 0.0 full transformer 1.0311604e-05

ALBERT-BASE-V2 max difference: pooled 2.3335218e-05 full transformer 4.9591064e-05
ALBERT-LARGE-V2 max difference: pooled 0.00015488267 full transformer 0.00010347366
ALBERT-XLARGE-V2 max difference: pooled 1.9535216 full transformer 5.152705
ALBERT-XXLARGE-V2 max difference: pooled 1.7762184e-05 full transformer 2.592802e-06

Is there an issue with this model in particular, does it have a particular architecture change that is different from the others?
I have had no problems replicating the SQuAD results on all of the V1 models, but I could not do so on the V2 models apart for the base one. Is this related? Thank you for your time.

The text was updated successfully, but these errors were encountered:

insop · 2019-11-25T09:12:46Z

(a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module).

Hi @LysandreJik

I have tried to add module scope, but I don't seem to get it working with a line.
Below is how I get the module scope done and get your compare_albert.py working, doesn't look good but it works.

Could you tell how you get module scope done?

Thank you,

$ diff -uN a.py b.py
--- a.py        2019-11-25 01:07:50.000000000 -0800
+++ b.py        2019-11-25 01:08:23.000000000 -0800
@@ -1,4 +1,4 @@
-def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0):
+def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0, add_scope='module'):
   """Compute the union of the current variables and checkpoint variables."""
   assignment_map = {}
   initialized_variable_names = {}
@@ -8,8 +8,15 @@
     name = var.name
     m = re.match("^(.*):\\d+$", name)
     if m is not None:
-      name = m.group(1)
-    name_to_variable[name] = var
+      # add 'module' scope name to match tf hub module
+      if add_scope is not None:
+          name = 'module/' + m.group(1)
+      else:
+          name = m.group(1)
+      # NOTE: store name as value for scope matching
+      # since 'var' value was not used
+      name_to_variable[name] = m.group(1)
+  
   init_vars = tf.train.list_variables(init_checkpoint)
   init_vars_name = [name for (name, _) in init_vars]
 
@@ -20,7 +27,7 @@
   else:
     assignment_map = collections.OrderedDict()
 
-  for name in name_to_variable:
+  for name, old_name in name_to_variable.items():
     if name in init_vars_name:
       tvar_name = name
     elif (re.sub(r"/group_\d+/", "/group_0/",
@@ -50,7 +57,12 @@
       if not group_matched:
         assignment_map[0][tvar_name] = name
     else:
-      assignment_map[tvar_name] = name
+      if add_scope is not None:
+        # add 'module' scope name to match tf hub module
+        # <'module/'+ xxx, xxx>
+        assignment_map[tvar_name] = old_name
+      else:
+        assignment_map[tvar_name] = name
     initialized_variable_names[name] = 1
     initialized_variable_names[six.ensure_str(name) + ":0"] = 1

LysandreJik · 2019-11-25T21:06:57Z

Hi @insop, to add the module scope, I added the following line at line 194 of modeling.py:

with tf.variable_scope("module"):

Which results in the __init__ method of AlbertModel beginning with these few lines:

[...]
    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope("module"):
      with tf.variable_scope(scope, default_name="bert"):
        with tf.variable_scope("embeddings"):
          # Perform embedding lookup on the word ids.
          (self.word_embedding_output,
[...]

insop · 2019-11-26T07:26:11Z

Hi @LysandreJik
Thanks a lot, it works like a charm!

LysandreJik · 2019-11-26T14:34:07Z

Great to hear, please let me know if you manage to convert the v2 models/reproduce the results!

insop · 2019-11-27T08:00:40Z

Hi @LysandreJik

I have ran your script (compare_albert.py with different input_string, see below) for v2 models.
My run large model shows more difference, not as large as your data for xlarge.
For xlarge, difference seems okay.

~~I have a question to run squad, but I will post in other open issue link that I saw you were there.~~

I thought I saw you on other post, but I was mistaken.
Were you able to run run_squad_sp.py?
(in order to prevent being digressed, I could find other way to communicate in case you were able to run run_squad_sp.py without any issue).

Thank you,


$ python -c 'import tensorflow as tf; print(tf.__version__)'
1.15.0

// one change I've made is this
# Create inputs
#input_sentence = "this is nice".lower()
input_sentence = "The most difficult thing is the decision to act, the rest is merely tenacity. The fears are paper tigers. You can do anything you decide to do. You can act to change and control your life; and the procedure, the process is its own reward.".lower()


model: base

Comparing the HUB and TF1 layers
-- pooled            1.5154481e-05
-- full transformer  3.1471252e-05


model: large

Comparing the HUB and TF1 layers
-- pooled            0.014360733
-- full transformer  0.014184952


model: xlarge

Comparing the HUB and TF1 layers
-- pooled            1.6540289e-06
-- full transformer  4.9889088e-05

model: xxlarge

Comparing the HUB and TF1 layers
-- pooled            2.5779009e-05
-- full transformer  1.8566847e-05

andrewluchen transferred this issue from google-research/google-research Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

LysandreJik commented Nov 8, 2019

insop commented Nov 25, 2019 •

edited

Loading

LysandreJik commented Nov 25, 2019

insop commented Nov 26, 2019

LysandreJik commented Nov 26, 2019

insop commented Nov 27, 2019 •

edited

Loading

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

Comments

LysandreJik commented Nov 8, 2019

insop commented Nov 25, 2019 • edited Loading

LysandreJik commented Nov 25, 2019

insop commented Nov 26, 2019

LysandreJik commented Nov 26, 2019

insop commented Nov 27, 2019 • edited Loading

insop commented Nov 25, 2019 •

edited

Loading

insop commented Nov 27, 2019 •

edited

Loading