Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] OmegaConf.structured() extremely slow #622

Closed
Jasha10 opened this issue Mar 18, 2021 · 4 comments · Fixed by #625
Closed

[Bug] OmegaConf.structured() extremely slow #622

Jasha10 opened this issue Mar 18, 2021 · 4 comments · Fixed by #625
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Jasha10
Copy link
Collaborator

Jasha10 commented Mar 18, 2021

Calling OmegaConf.structured(MyDataclass) is extremely slow with certain dataclasses.

jbss@rig1:~/mwe$ cat mwe.py
import time
from dataclasses import dataclass
from omegaconf import OmegaConf

@dataclass
class Config:
    iterp: str = "${long_string1xxx}_${long_string2xxx:${a_key}}"

t0 = time.time()
cfg = OmegaConf.structured(Config)
t1 = time.time()
print(f"Config Elapsed time = {t1-t0}")
jbss@rig1:~/mwe$ python mwe.py
Config Elapsed time = 27.60597038269043

That's 27 seconds for the call to OmegaConf.structured!
I did an ablation study to find that the length of long_string1xxx and long_string2xxx do matter; longer strings are slower.

@dataclass
class Config:
    iterp: str = "${long_string1xxx}_${long_string2xxx:${a_key}}"       # takes ~27 seconds

    # ablation:

    # iterp: str = "${short1}_${short2:${a_key}}"                            # ~0.0002s
    # iterp: str = "${long_string1}_${long_string2:${a_key}}"                # ~0.5s
    # iterp: str = "${long_string1x}_${long_string2x:${a_key}}"              # ~1.9s
    # iterp: str = "${long_string1xx}_${long_string2xx:${a_key}}"            # ~6.9s

    # iterp: str = "${long_string1xxx}_${short:${a_key}}"                    # ~0.07s
    # iterp: str = "${short}_${long_string2xxx:${a_key}}"                    # ~0.06s
    # iterp: str = "${short1}_${short2:${a_very_very_very_very_long_key}}"   # ~0.01s

    # iterp: str = "${long_string2xxx:${a_key}}"                             # ~0.01s
    # iterp: str = "_${long_string2xxx:${a_key}}"                            # ~0.01s
    # iterp: str = "${long_string1xxx}_"                                     # ~0.0003s
    # iterp: str = "${long_string1xxx}"                                      # ~0.0003s
    # iterp: str = "${long_string1xxx}_${long_string2xxx}"                   # ~0.0003s
    # iterp: str = "${long_string1xxx}_${long_string2xxx:a_key}"             # ~0.003s

System information

Using the latest omegaconf installed from master branch.

Python 3.9.1 (default, Dec 11 2020, 14:32:07)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import omegaconf
>>> omegaconf.__version__
'2.1.0dev23'
  • Operating system : Ubuntu 18.04

I was not able to reproduce this with a release version of omegaconf downloaded from pip.

@Jasha10 Jasha10 added the bug Something isn't working label Mar 18, 2021
@omry omry added this to the OmegaConf 2.1 milestone Mar 18, 2021
@omry
Copy link
Owner

omry commented Mar 18, 2021

@odelalleau , possibly related to the SIMPLE_INTERPOLATION_PATTERN.

@omry
Copy link
Owner

omry commented Mar 18, 2021

Minimal repro:

from omegaconf._utils import get_value_kind

print(
    get_value_kind(
        "${long_string1xxx}_${long_string2xxx:${a_key}}",
        strict_interpolation_validation=True,
    )
)

Instrumentation (pip install pyinstrument):

$ python -m pyinstrument 1.py 
ValueKind.INTERPOLATION

  _     ._   __/__   _ _  _  _ _/_   Recorded: 15:54:42  Samples:  65
 /_//_/// /_\ / //_// / //_'/ //     Duration: 17.890    CPU time: 17.891
/   _/                      v3.4.1

Program: 1.py

17.890 <module>  1.py:1
└─ 17.829 get_value_kind  omegaconf/_utils.py:380
   └─ 17.822 Pattern.match  <built-in>:0

To view this report with different options, run:
    pyinstrument --load-prev 2021-03-18T15-54-42 [options]

@omry
Copy link
Owner

omry commented Mar 18, 2021

Limiting the regex to smaller strings and falling back to antlr for bigger one seems like it's fixing it:

$ git diff
diff --git a/omegaconf/_utils.py b/omegaconf/_utils.py
index 5a6bcb9..e64a0fd 100644
--- a/omegaconf/_utils.py
+++ b/omegaconf/_utils.py
@@ -408,7 +408,7 @@ def get_value_kind(
     if isinstance(value, str) and "${" in value:
         if strict_interpolation_validation:
             # First try the cheap regex matching that detects common interpolations.
-            if SIMPLE_INTERPOLATION_PATTERN.match(value) is None:
+            if len(value) > 20 or SIMPLE_INTERPOLATION_PATTERN.match(value) is None:
                 # If no match, do the more expensive grammar parsing to detect errors.
                 parse(value)
         return ValueKind.INTERPOLATION
$ python -m pyinstrument 1.py 
ValueKind.INTERPOLATION

  _     ._   __/__   _ _  _  _ _/_   Recorded: 15:58:53  Samples:  62
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.068     CPU time: 0.068
/   _/                      v3.4.1

Program: 1.py

0.067 <module>  1.py:1
├─ 0.061 <module>  omegaconf/__init__.py:1
│  ├─ 0.056 <module>  omegaconf/base.py:1
│  │  ├─ 0.026 <module>  antlr4/__init__.py:1
│  │  │     [59 frames hidden]  antlr4, unittest, asyncio, ssl, base6...
│  │  │        0.019 <module>  antlr4/InputStream.py:6
│  │  │        ├─ 0.018 <module>  unittest/__init__.py:1
│  │  │        │  ├─ 0.016 <module>  unittest/async_case.py:1
│  │  │        │  │  ├─ 0.014 <module>  asyncio/__init__.py:1
│  │  │        │  │  │  ├─ 0.012 <module>  asyncio/base_events.py:1
│  │  │        │  │  │  │  ├─ 0.004 <module>  ssl.py:4
│  │  │        │  │  │  │  │  ├─ 0.002 create_dynamic  <built-in>:0
│  │  │        │  │  │  │  │  │  0.001 <module>  struct.py:3
│  │  │        │  │  │  │  │  │  └─ 0.001 stat  <built-in>:0
│  │  │        │  │  │  │  ├─ 0.002 loads  <built-in>:0
│  │  │        │  │  │  │  │  0.001 <module>  asyncio/transports.py:1
│  │  │        │  │  │  │  │  └─ 0.001 __build_class__  <built-in>:0
│  │  │        │  │  │  │  ├─ 0.001 <module>  asyncio/events.py:1
│  │  │        │  │  │  │  │  └─ 0.001 stat  <built-in>:0
│  │  │        │  │  │  │  ├─ 0.001 <module>  asyncio/coroutines.py:1
│  │  │        │  │  │  │  │  └─ 0.001 str.rpartition  <built-in>:0
│  │  │        │  │  │  │     0.001 <lambda>  socket.py:86
│  │  │        │  │  │  │     └─ 0.001 str.startswith  <built-in>:0
│  │  │        │  │  │  └─ 0.002 loads  <built-in>:0
│  │  │        │  │  ├─ 0.001 loads  <built-in>:0
│  │  │        │  │  └─ 0.001 <module>  unittest/case.py:1
│  │  │        │  │     └─ 0.001 __build_class__  <built-in>:0
│  │  │        └─ 0.001 open_code  <built-in>:0
│  │  ├─ 0.026 <module>  omegaconf/_utils.py:1
│  │  │  ├─ 0.011 <module>  yaml/__init__.py:2
│  │  │  │     [44 frames hidden]  yaml, re, sre_compile, sre_parse
│  │  │  │        0.002 <module>  yaml/constructor.py:3
│  │  │  │        ├─ 0.001 loads  <built-in>:0
│  │  │  │        0.001 stat  <built-in>:0
│  │  │  │        0.001 <module>  yaml/dumper.py:2
│  │  │  │        └─ 0.001 hasattr  <built-in>:0
│  │  │  ├─ 0.008 <module>  omegaconf/grammar_parser.py:1
│  │  │  │  └─ 0.008 <module>  omegaconf/grammar_visitor.py:1
│  │  │  │     ├─ 0.004 <module>  omegaconf/grammar/gen/OmegaConfGrammarLexer.py:2
│  │  │  │     │  └─ 0.004 OmegaConfGrammarLexer  omegaconf/grammar/gen/OmegaConfGrammarLexer.py:179
│  │  │  │     │     └─ 0.004 deserialize  antlr4/atn/ATNDeserializer.py:59
│  │  │  │     │           [15 frames hidden]  antlr4
│  │  │  │     ├─ 0.003 <module>  omegaconf/grammar/gen/OmegaConfGrammarParser.py:3
│  │  │  │     │  └─ 0.003 OmegaConfGrammarParser  omegaconf/grammar/gen/OmegaConfGrammarParser.py:90
│  │  │  │     │     └─ 0.003 deserialize  antlr4/atn/ATNDeserializer.py:59
│  │  │  │     │           [12 frames hidden]  antlr4
│  │  │  │     └─ 0.001 GrammarVisitor  omegaconf/grammar_visitor.py:38
│  │  │  │        └─ 0.001 __getitem__  typing.py:802
│  │  │  │              [3 frames hidden]  typing
│  │  │  │                 0.001 __getitem_inner__  typing.py:818
│  │  │  │                 └─ 0.001 len  <built-in>:0
│  │  │  ├─ 0.005 <module>  attr/__init__.py:1
│  │  │  │     [7 frames hidden]  attr
│  │  │  │        0.002 _make_init  attr/_make.py:1829
│  │  │  │        └─ 0.002 compile  <built-in>:0
│  │  │  │        0.001 <module>  attr/_make.py:1
│  │  │  │        └─ 0.001 __build_class__  <built-in>:0
│  │  │  │        0.001 getattr  <built-in>:0
│  │  │  └─ 0.001 <module>  omegaconf/errors.py:1
│  │  │     └─ 0.001 __build_class__  <built-in>:0
│  │  ├─ 0.001 dataclass  dataclasses.py:996
│  │  │     [6 frames hidden]  dataclasses
│  │  ├─ 0.001 <module>  typing.py:1
│  │  │     [5 frames hidden]  typing
│  │  └─ 0.001 <module>  dataclasses.py:1
│  │        [7 frames hidden]  dataclasses, re, sre_compile
│  ├─ 0.003 <module>  omegaconf/omegaconf.py:1
│  │  ├─ 0.001 loads  <built-in>:0
│  │  ├─ 0.001 <module>  pathlib.py:1
│  │  │     [4 frames hidden]  pathlib, urllib, collections
│  │  └─ 0.001 inner  typing.py:255
│  │        [4 frames hidden]  typing
│  ├─ 0.001 loads  <built-in>:0
│  └─ 0.001 <module>  omegaconf/dictconfig.py:1
│     └─ 0.001 DictConfig  omegaconf/dictconfig.py:54
│        └─ 0.001 inner  typing.py:255
│              [3 frames hidden]  typing
└─ 0.006 get_value_kind  omegaconf/_utils.py:380
   └─ 0.006 parse  omegaconf/grammar_parser.py:89
      └─ 0.006 configValue  omegaconf/grammar/gen/OmegaConfGrammarParser.py:210
         ├─ 0.005 interpolation  omegaconf/grammar/gen/OmegaConfGrammarParser.py:823
         │  ├─ 0.003 adaptivePredict  antlr4/atn/ParserATNSimulator.py:290
         │  │     [36 frames hidden]  antlr4
         │  │        0.001 closure  antlr4/atn/LexerATNSimulator.py:317
         │  │        └─ 0.001 isinstance  <built-in>:0
         │  └─ 0.002 interpolationResolver  omegaconf/grammar/gen/OmegaConfGrammarParser.py:991
         │     ├─ 0.001 sequence  omegaconf/grammar/gen/OmegaConfGrammarParser.py:722
         │     │  └─ 0.001 element  omegaconf/grammar/gen/OmegaConfGrammarParser.py:441
         │     │     └─ 0.001 primitive  omegaconf/grammar/gen/OmegaConfGrammarParser.py:1283
         │     │        └─ 0.001 interpolation  omegaconf/grammar/gen/OmegaConfGrammarParser.py:823
         │     │           └─ 0.001 adaptivePredict  antlr4/atn/ParserATNSimulator.py:290
         │     │                 [20 frames hidden]  antlr4
         │     └─ 0.001 match  antlr4/Parser.py:112
         │           [16 frames hidden]  antlr4
         └─ 0.001 adaptivePredict  antlr4/atn/ParserATNSimulator.py:290
               [3 frames hidden]  antlr4

To view this report with different options, run:
    pyinstrument --load-prev 2021-03-18T15-58-53 [options]

@odelalleau
Copy link
Collaborator

Thanks for the report! This bug was introduced with #600 (bad regex design). Fix on its way!

odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 19, 2021
odelalleau added a commit to odelalleau/omegaconf that referenced this issue Mar 19, 2021
It is more "interesting" (i.e., much slower before the fix)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants