Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Deserialization problem with gluon ValueError: There are multiple outputs with name ... #12795

Closed
vafl opened this issue Oct 11, 2018 · 10 comments

Comments

@vafl
Copy link
Contributor

vafl commented Oct 11, 2018

Description

For a simple HybridBlock, saving and deserializing the symbol fails with mxnet 1.3 when an embedding layer is used multiple times. This used to work with mxnet 1.2

It may or may not be related to this issue: #12783

Environment info (Required)

----------Python Info----------
Version      : 3.6.3
Compiler     : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)
Build        : ('default', 'Mar 20 2018 21:25:13')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : /Users/.../.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.0
Directory    : /Users/.../.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet
Commit Hash   : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
----------System Info----------
Platform     : Darwin-16.7.0-x86_64-i386-64bit
system       : Darwin
node         : ...
release      : 16.7.0
version      : Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0440 sec, LOAD: 0.9637 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0922 sec, LOAD: 1.0902 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0733 sec, LOAD: 0.8020 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0545 sec, LOAD: 0.6035 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0400 sec, LOAD: 1.1772 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0443 sec, LOAD: 0.2266 sec.

Package used (Python/R/Scala/Julia):
I'm using python

Error Message:

(Paste the complete error message, including stack trace.)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-628e9346e9c4> in <module>()
     21 test_op.export('/tmp/bla')
     22 
---> 23 mx.gluon.SymbolBlock.imports('/tmp/bla-symbol.json', param_file='/tmp/bla-0000.params', input_names=['data0', 'data1'])

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in imports(symbol_file, input_names, param_file, ctx)
   1021             input_names = [input_names]
   1022         inputs = [symbol.var(i) for i in input_names]
-> 1023         ret = SymbolBlock(sym, inputs)
   1024         if param_file is not None:
   1025             ret.collect_params().load(param_file, ctx=ctx)

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in __init__(self, outputs, inputs, params)
   1049         row_sparse_storage = ndarray.ndarray._STORAGE_TYPE_STR_TO_ID['row_sparse']
   1050         for i in out:
-> 1051             for j in i.get_internals():
   1052                 assert(j.attr("__storage_type__") != str(row_sparse_storage)), \
   1053                     "SymbolBlock doesn't support Parameter '%s' because its storage " \

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/symbol/symbol.py in <genexpr>(.0)
     91         <Symbol _plus0>
     92         """
---> 93         return (self[i] for i in self.list_outputs())
     94 
     95     def __add__(self, other):

~/.pyenv/versions/3.6.3/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/symbol/symbol.py in __getitem__(self, index)
    515                 if name == index:
    516                     if idx is not None:
--> 517                         raise ValueError('There are multiple outputs with name \"%s\"' % index)
    518                     idx = i
    519             if idx is None:

ValueError: There are multiple outputs with name "testop1_embedding0_fwd_output"

Minimum reproducible example

import mxnet as mx
from mxnet import gluon

class TestOp(gluon.HybridBlock):
    def __init__(self, n_in, n_out):
        super().__init__()
        with self.name_scope():
            self.embed = mx.gluon.nn.Embedding(n_in, n_out)
        
    def hybrid_forward(self, F, x, y):
        a = self.embed(x)
        b = self.embed(y)
        return a + b
    
test_op = TestOp(n_in=5, n_out=2)
test_op.initialize()
test_op.hybridize()

test_op(mx.nd.array([0,1,2]), mx.nd.array([1,2,3]))

test_op.export('/tmp/bla')

gluon.SymbolBlock.imports(
    '/tmp/bla-symbol.json',
    param_file='/tmp/bla-0000.params', 
    input_names=['data0', 'data1'])

Steps to reproduce

Run the code.

@marcoabreu
Copy link
Contributor

marcoabreu commented Oct 11, 2018

@srochel @lupesko

@piyushghai
Copy link
Contributor

@mxnet-label-bot [Bug, Gluon]

@piyushghai
Copy link
Contributor

This seems similar to #12783

@lostella
Copy link
Contributor

Looks like the problem occurs with Dense as well, so the issue probably lies in the +:

import mxnet as mx

class MyBlock(mx.gluon.HybridBlock):
    def __init__(self):
        super().__init__()
        with self.name_scope():
            self.model = mx.gluon.nn.Dense(units=5)

    def hybrid_forward(self, F, x, y):
        return self.model(x) + self.model(y)

block = MyBlock()
block.initialize()
block.hybridize()

output = block(mx.nd.random_normal(shape=(100,)), mx.nd.random_normal(shape=(100,)))

block.export(path="./model", epoch=0)
symbol = mx.gluon.SymbolBlock.imports(
    symbol_file="./model-symbol.json",
    input_names=["data0", "data1"],
    param_file="./model-0000.params",
    ctx=mx.Context.default_ctx
)

gives

Traceback (most recent call last):
  File "2018-10-11-very-weird-issue.py", line 23, in <module>
    ctx=mx.Context.default_ctx
  File "[...]/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1023, in imports
    ret = SymbolBlock(sym, inputs)
  File "[...]/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1051, in __init__
    for j in i.get_internals():
  File "[...]/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 93, in <genexpr>
    return (self[i] for i in self.list_outputs())
  File "[...]/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 517, in __getitem__
    raise ValueError('There are multiple outputs with name \"%s\"' % index)
ValueError: There are multiple outputs with name "myblock0_dense0_fwd_output"

@sandeep-krishnamurthy
Copy link
Contributor

I observed that the issue occurs when 2 inputs pass through the same block. Trying to understand the root cause for this as it works fine with mxnet 1.2.1

@sandeep-krishnamurthy
Copy link
Contributor

sandeep-krishnamurthy commented Oct 11, 2018

https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L1050
this is the new addition after 1.2.1 which calls output symbol.get_internals() and then finds duplicate names and fails, exposing the duplicate names issue from 1.3 onwards.
Duplicate output names when we have 2 inputs passing through the same block was always the case, I am not very sure why this has not created issues for our users.

Trying to root cause and solution with the help of @safrooze and @zhreshold

@szha

@safrooze
Copy link
Contributor

@lostella while the issue is being root caused, one work around in this case would be to use different blocks with shared parameters:

class MyBlock(mx.gluon.HybridBlock):
    def __init__(self):
        super().__init__()
        with self.name_scope():
            self.model0 = mx.gluon.nn.Dense(units=5)
            self.model1 = mx.gluon.nn.Dense(units=5, params=self.model0.collect_params())

    def hybrid_forward(self, F, x, y):
        return self.model0(x) + self.model1(y)

@lichun-wang
Copy link

i use mxnet 1.3.0 also meet this problem, In my code ,it was caused by code ' sym.get_internals()', after I deleted it , then it can run .

@samskalicky
Copy link
Contributor

Need to check and see if issue is resolved in #14619 (comment)

@leezu
Copy link
Contributor

leezu commented Aug 20, 2019

The "Minimum reproducible example" works for me on 1.5 and current master. This can probably be closed?

@leezu leezu closed this as completed Nov 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants