Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable parser tests #29

Closed
erikogan opened this issue Aug 24, 2016 · 9 comments
Closed

Unstable parser tests #29

erikogan opened this issue Aug 24, 2016 · 9 comments
Labels

Comments

@erikogan
Copy link
Contributor

In my environment, the parser tests (specifically for parse_yaml & parse_json) are unstable. They fail most of the time, but on occasion will pass.

I thought this might be due to the deliberate randomization of hash keys in 5.18, and it seems (anecdotally) that setting PERL_PERTURB_KEYS=0 does appear to help some of the tests, it does not guarantee a successful run.

Am I right in assuming these tests are consistent for you, and there is something else in my environment causing this issue?

@erikogan
Copy link
Contributor Author

Output from a failed test:

% perl t/engine.t parse_json/00
# Subtest: Test config: /Users/erik/work/serge/t/data/engine/parse_json/00/translate.serge
    ok 1 - Config file read
WARNING: 'email_from' is not defined. Will skip sending any reports.
WARNING: 'email_to' is not defined. Will skip sending any reports.


*** [test_job] "Test Job" ***

Initializing the database...
Preloading properties...
*** OPTIMIZATIONS DISABLED: job definition has changed (or job is ran for the first time) ***
Source dir: [/Users/erik/work/serge/t/data/engine/parse_json/00/resources]
DB source: [DBI:SQLite:dbname=:memory:]
TS file path: [/Users/erik/work/serge/t/data/engine/parse_json/00/test-output/po/%LOCALE%/%FILE%.po]
Output path: [/Users/erik/work/serge/t/data/engine/parse_json/00/test-output/localized-resources/%LOCALE%/%FILE%]
Destination languages: [test]
Modified languages: [test]
Preloading cache for job 'test_job' in namespace 'test_namespace'...

Updating database from source files...

Scanning directory structure...
Scanned in 0.000226 sec, 2 files match the criteria
  relaxed.json
  strings.json
2 files are new, 0 were orphaned and 0 are no longer orphaned since last run
update_database_from_source_files() took 0.003282 seconds
Updating database from translation files for [test] language...
update_database_from_ts_files() took 0.000207 seconds

Generating translation files...

  relaxed.json (forced mode)
    [1/test] ţēšţ 1
    [2/test] ţēšţ 2
    [3/test] ţēšţ 3
    Saving /Users/erik/work/serge/t/data/engine/parse_json/00/test-output/po/test/relaxed.json.po (479 bytes)
  strings.json (forced mode)
    [4/test] ţēšţ 2
    [5/test] ţēšţ 1
    Saving /Users/erik/work/serge/t/data/engine/parse_json/00/test-output/po/test/strings.json.po (372 bytes)
generate_ts_files() took 0.016646 seconds

Generating localized files...

  relaxed.json (forced mode)
    [1/test] ţēšţ 1
    [2/test] ţēšţ 2
    [3/test] ţēšţ 3
    Saving /Users/erik/work/serge/t/data/engine/parse_json/00/test-output/localized-resources/test/relaxed.json, because forced mode is on (123 bytes)
  strings.json (forced mode)
    [4/test] ţēšţ 2
    [5/test] ţēšţ 1
    Saving /Users/erik/work/serge/t/data/engine/parse_json/00/test-output/localized-resources/test/strings.json, because forced mode is on (270 bytes)

generate_localized_files() took 0.017815 seconds
*** [end] ***
    ok 2 - Processing all jobs in config file
TEST: dumping the database to /Users/erik/work/serge/t/data/engine/parse_json/00/test-output/database...
    # Subtest: Compare directory 'test-output/database' with 'reference-output/database'
        ok 1 - Files 'test-output/database/files' and 'reference-output/database/files' should be equal
        not ok 2 - Files 'test-output/database/items' and 'reference-output/database/items' should be equal
        #   Failed test 'Files 'test-output/database/items' and 'reference-output/database/items' should be equal'
        #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 43.
        not ok 3 - Files 'test-output/database/properties' and 'reference-output/database/properties' should be equal
        #   Failed test 'Files 'test-output/database/properties' and 'reference-output/database/properties' should be equal'
        #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 43.
        ok 4 - Files 'test-output/database/strings' and 'reference-output/database/strings' should be equal
        not ok 5 - Files 'test-output/database/translations' and 'reference-output/database/translations' should be equal
        #   Failed test 'Files 'test-output/database/translations' and 'reference-output/database/translations' should be equal'
        #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 43.
        1..5
        # Looks like you failed 3 tests of 5.
    not ok 3 - Compare directory 'test-output/database' with 'reference-output/database'
    #   Failed test 'Compare directory 'test-output/database' with 'reference-output/database''
    #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 62.
    # Subtest: Compare directory 'test-output/po' with 'reference-output/po'
        ok 1 - Files 'test-output/po/test/relaxed.json.po' and 'reference-output/po/test/relaxed.json.po' should be equal
        not ok 2 - Files 'test-output/po/test/strings.json.po' and 'reference-output/po/test/strings.json.po' should be equal
        #   Failed test 'Files 'test-output/po/test/strings.json.po' and 'reference-output/po/test/strings.json.po' should be equal'
        #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 43.
        1..2
        # Looks like you failed 1 test of 2.
    not ok 4 - Compare directory 'test-output/po' with 'reference-output/po'
    #   Failed test 'Compare directory 'test-output/po' with 'reference-output/po''
    #   at /Users/erik/work/serge/t/lib/Test/Diff.pm line 62.
    1..4
    # Looks like you failed 2 tests of 4.
not ok 1 - Test config: /Users/erik/work/serge/t/data/engine/parse_json/00/translate.serge
#   Failed test 'Test config: /Users/erik/work/serge/t/data/engine/parse_json/00/translate.serge'
#   at t/engine.t line 132.
1..1

@erikogan
Copy link
Contributor Author

Oh, and:

% perl -V
Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

  Platform:
    osname=darwin, osvers=15.0, archname=darwin-thread-multi-2level
    uname='darwin osx219.apple.com 15.0 darwin kernel version 15.0.0: fri may 22 22:03:51 pdt 2015; root:xnu-3216.0.0.1.11~1development_x86_64 x86_64 '
    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=cc'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-arch i386 -arch x86_64 -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector',
    optimize='-Os',
    cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector'
    ccversion='', gccversion='4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc -mmacosx-version-min=10.11.3', ldflags ='-arch i386 -arch x86_64 -fstack-protector'
    libpth=/usr/lib /usr/local/lib
    libs=
    perllibs=
    libc=, so=dylib, useshrplib=true, libperl=libperl.dylib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-arch i386 -arch x86_64 -bundle -undefined dynamic_lookup -fstack-protector'


Characteristics of this binary (from libperl):
  Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
                        PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
                        USE_64_BIT_INT USE_ITHREADS USE_LARGE_FILES
                        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
                        USE_REENTRANT_API
  Locally applied patches:
    /Library/Perl/Updates/<version> comes before system perl directories
    installprivlib and installarchlib points to the Updates directory
  Built under darwin
  Compiled at Aug 11 2015 04:22:26
  @INC:
    /Library/Perl/5.18/darwin-thread-multi-2level
    /Library/Perl/5.18
    /Network/Library/Perl/5.18/darwin-thread-multi-2level
    /Network/Library/Perl/5.18
    /Library/Perl/Updates/5.18.2/darwin-thread-multi-2level
    /Library/Perl/Updates/5.18.2
    /System/Library/Perl/5.18/darwin-thread-multi-2level
    /System/Library/Perl/5.18
    /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level
    /System/Library/Perl/Extras/5.18
    .

@iafan
Copy link
Contributor

iafan commented Aug 24, 2016

I admit I sit on Perl 5.16, and yes, tests run fine for me (no differences between runs).

Try running perl engine.t --init — it will create "reference" data snapshots of these tests, and then you can use git diff to see what are the differences between the committed reference versions of the data files, which might give a better idea of the cause.

I'll also upgrade to the newer version of Perl and see if I can reproduce this.

@iafan iafan added the bug label Aug 24, 2016
@erikogan
Copy link
Contributor Author

erikogan commented Aug 24, 2016

I ran PERL_PERTURB_KEYS=0 perl t/engine.t --init and committed the results here.

The differences appear to mostly be in the ordering of items in the DB, however some of the changes appear to be ids (or other checksum values) which I would expect to be stable (but I have not looked at the inputs to generate the checksum).

After running this the tests continue to fail when run with PERL_PERTURB_KEYS=0 perl t/engine.t, but the number of failures is not stable, it’s either 2, 4, or 6. So I think there is some other issues that are not fixed by setting that particular environment variable. (However, I still strongly suspect the new hashing algorithm in 5.18 is the most likely cause)

@erikogan
Copy link
Contributor Author

Oh, and I just ran PERL_PERTURB_KEYS=0 perl t/engine.t --init again on the branch referenced above and there are differences between what I checked in and what was just output. So yes, the data is no longer stable in 5.18.

@erikogan
Copy link
Contributor Author

It just occurred to me that it’s probably pretty easy for me to spin up a Docker container with a pre 5.18 version of Perl and see if I can get stability on the original tests and the ones I hope to add.

@erikogan
Copy link
Contributor Author

erikogan commented Aug 24, 2016

Ok, I just confirmed that running the tests in a container with 5.16.3 causes the tests to be stable. I will submit my PR with data it generated (once I fix the bug I discovered now that I can trust the tests I did not write)

But this further implicates the changes in 5.18, and the most likely culprit is the hashing algorithm overhaul.

@iafan
Copy link
Contributor

iafan commented Aug 25, 2016

Just a quick update: this is reproducible in Perl 5.22. This is a quick test:

Go to <serge_root>/t/data/engine/parse_json/00/resources folder and run:

$ serge test-parser parse_json strings.json

Every time you run it, the dumped data structure (the order of the strings if which they are reported by the parser callback) will be different.

@iafan
Copy link
Contributor

iafan commented Aug 25, 2016

I looked at the JSON/YAML parser documentation and the only easy way to solve this is to sort object keys at each level; this is guaranteed to stabilize the output. Unfortunately, there's no way to tell the parsers to use e.g. Tie::IxHash as a replacement class to preserve the order.

A more involving way to solve this would be to find (or write) streaming callback-based parsers that will report the structure as it is parsed, allowing us ultimately to preserve the original order.

Given the fact that both JSON and YAML specs state that the order of the keys in the object/dictionary is not important and is not guaranteed, I think it's safe to go with the first (easy) option.

@iafan iafan closed this as completed in a6c4d3f Aug 25, 2016
iafan pushed a commit that referenced this issue Aug 25, 2016
Also: minor wording/formatting changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants