Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

differing results between runs #8

Open
travisstaloch opened this issue Aug 15, 2022 · 5 comments
Open

differing results between runs #8

travisstaloch opened this issue Aug 15, 2022 · 5 comments

Comments

@travisstaloch
Copy link

First of all, thank you for sharing this nice implementation. I've been experimenting with porting it to zig in hopes of using it in zig's universal headers project. The aim of that project is to compress their distributed c headers and so quine-mccluskey has come up.

I've been randomly generating ones and checking the results of qm.simple(random_ones) against my port. While doing so, i think I've found some type of problem with the results. The following script shows either 28 or 29 for the length of the simplified implicants. When run in a loop 100 times, the length is the same each 100 times but varies between runs - once again either 100 28s or 100 29s.

# tmp.py
import qm
q = qm.QuineMcCluskey()
simple = q.simplify([1,2,3,5,6,7,8,9,10,11,12,13,14,15,142,399,20,21,22,279,24,25,23,29,31,38,39,44,45,46,179,51,53,54,185,60,62,196,211,87,89,227,101,235,238,495,239,242,243,244,118,120,508,125])
print(len(simple))
$ python --version
Python 3.9.2
$ python tmp.py
29
$ python tmp.py
28

So I am not sure what to make of this. I'm not sure if its a bug or just an inherent limitation. I've been doing a some print debugging and noticed that the length of essential_implicants seems change between 33 and 34. This is as far as I've looked so far and thought i would ask for advice before digging any further. Any idea why this might be happening?

@travisstaloch
Copy link
Author

Here is a smaller example with fewer ones which also shows different results between runs - alternating between 5 and 6 (usually 6).

# tmp.py
import qm
q = qm.QuineMcCluskey()
simple = q.simplify([1,2,5,6,7,8,9,10,11,12,13,14,17])
print(len(simple),  simple)
$ python tmp.py
6 {'01-0-', '0011-', '0--10', '-0001', '010--', '0--01'}

$ python tmp.py
5 {'001-1', '-0001', '01-0-', '010--', '0--10'}

@maehw
Copy link

maehw commented Sep 5, 2022

Hi. I just wanted to add that I have similar effects and that I can reproduce the effects on @travisstaloch's minimal example.

quine_mccluskey % python3 ./test.py 
5 {'0--10', '01-0-', '001-1', '010--', '-0001'}
quine_mccluskey % python3 ./test.py
6 {'-0001', '0011-', '010--', '0--01', '01--0', '0--10'}

Edit:

  • __version__ = "0.3"
  • python3 --version: Python 3.10.6

@tomas789
Copy link

tomas789 commented Dec 2, 2022

I'm experiencing the same behaviour on the included test suite. I created a small script which runs the tests/test.py file many times over and logs histogram of return values. Those are the results:

  • Python 3.7.15: {0: 46, 1: 54}
  • Python 3.8.15: {0: 44, 1: 56}
  • Python 3.9.15: {0: 57, 1: 43}
  • Python 3.10.8: {0: 46, 1: 54}
  • Python 3.11.0: {0: 46, 1: 54}
  • Python 3.9.12 [PyPy 7.3.9]: {0: 0, 1: 100}

The script itself is just

import subprocess

res = {0: 0, 1: 0}
for _ in range(100):
    ret = subprocess.run(["python", "tests/test.py"], check=False)
    res[ret.returncode] += 1

print(res)

@tomas789
Copy link

tomas789 commented Dec 2, 2022

I'm doing some investigation of what the problem might be. I'm working with @travisstaloch's example.

First thing I have noticed is that the result of the call to __get_prime_implicants is in different order. This suggest that there is some inherent non-determinism but I'm not sure yet on where is it coming from.

I then sorted the results and in runs returning 28 and 29 the implicants are the same. This is not the case for essential_implicants. This means that the non-determinism is likely somewhere in the function __get_essential_implicants. Here are the differences between two sets of essential_implicants

>>> for x in a.union(b): print(x, x in a, x in b)
... 
001111000 True True
110001111 True True
011110100 True True
0-0001110 True True
01110-011 True False
0000-100- True True
011000100 True True
000--0110 False True
000-10101 True True
00000--1- True True
010111001 True True
-11101111 True True
000001--- True True
001100101 True True
0001-11-0 True True
111111100 True True
-00010111 True True
011-10011 True True
01-110011 True True
00-011001 True True
0000-1-01 True True
0000101-- True True
0000--1-1 True True
011101-11 True True
0000-011- True False
001111101 True True
000-0110- True True
0111-0011 False True
0001--110 True True
00000---1 True True
01111001- True True
000-0011- True True
0-0110011 True True
01110111- True True
00-110110 True True
00-010111 True True

I have also tried hard-coding only one version of the prime_implicants and the result was also non-deterministic. Then I tried fixing the value of essential_implicants and the non-determinism disappeared.

After drilling down into the __get_prime_implicants I discovered that the variables groups and ei_range are the same but this is not the case for ei. Here is again comparison of eis from both runs

>>> sorted(a)
['-00010111', '-11101111', '0-0001110', '0-0110011', '00-010111', '00-011001', '00-110110', '000--0110', '000-0011-', '000-0110-', '000-10101', '0000--1-1', '0000-1-01', '0000-100-', '00000---1', '00000--1-', '000001---', '0000101--', '0001--110', '0001-11-0', '001100101', '001111000', '001111101', '01-110011', '010111001', '011-10011', '011000100', '01110-011', '011101-11', '01110111-', '01111001-', '011110100', '110001111', '111111100']
>>> sorted(b)
['-00010111', '-11101111', '0-0001110', '0-0110011', '00-010111', '00-011001', '00-110110', '000-0011-', '000-0110-', '000-10101', '0000--1-1', '0000-1-01', '0000-100-', '00000---1', '00000--1-', '000001---', '0000101--', '0001--110', '0001-11-0', '001100101', '001111000', '001111101', '01-110011', '010111001', '011-10011', '011000100', '0111-0011', '011101-11', '01110111-', '01111001-', '011110100', '110001111', '111111100']

I added this print print(f"{t: >2} {g}, {not perms[g] <= ei_range}") into the second for-loop and shows that the order of the inner for-loop is different. Adding sorted to the inner for-loop fixed the non-determinism bug.

Original code: non-deterministic

for t in sorted(list(groups.keys()), reverse=True):
    for g in groups[t]:
        if not perms[g] <= ei_range:
            ei.add(g)
            ei_range |= perms[g]

Code with sort: deterministic

for t in sorted(list(groups.keys()), reverse=True):
    for g in sorted(groups[t]):
        if not perms[g] <= ei_range:
            ei.add(g)
            ei_range |= perms[g]

Code with reverse sort: deterministic

for t in sorted(list(groups.keys()), reverse=True):
    for g in sorted(groups[t], reverse=True):
        if not perms[g] <= ei_range:
            ei.add(g)
            ei_range |= perms[g]

This is not a surprise as the sets have no order. I don't understand the algorithm enough to judge which order is the correct one but when sorting them in the reverse order, the tests pass in 100% of the runs I did. When sorting them in ascending order they fail in 100% of cases.

With the benefit of hindsight one can guess that Pypi is storing sets in deterministic way (at least across multiple deterministic runs) but Cpython is at least using different seed for the hash function used to represent the sets. Both implementations appear to be using hash tables under the hood for sets.

It would be great if somebody could double-check the order is actually supposed to be like this.

@maehw
Copy link

maehw commented Dec 2, 2022

Wow. Thanks @tomas789 for your in-depth investigation.

Would be great if @tpircher-zz could take a look at your investigation and the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants