Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failing test: testFDSite #107

Closed
mindhells opened this issue May 15, 2020 · 4 comments
Closed

Intermittent failing test: testFDSite #107

mindhells opened this issue May 15, 2020 · 4 comments

Comments

@mindhells
Copy link
Member

This error appears from time to time. I've been looking for the conditions to reproduce it with no luck so far.

    ______________________________ Test.testFDSite ________________________________
    self = <fract4d.tests.test_fract4d.Test testMethod=testFDSite>
        def testFDSite(self):
            xsize = 64
            ysize = int(xsize * 3.0 / 4.0)
            im = image.T(xsize, ysize)
            (rfd, wfd) = os.pipe()
            site = fract4dc.fdsite_create(wfd)
        
            file = self.compileColorMandel()
        
            for x in range(2):
                handle = fract4dc.pf_load(file)
                pfunc = fract4dc.pf_create(handle)
                fract4dc.pf_init(pfunc, pos_params, self.color_mandel_params)
                cmap = fract4dc.cmap_create(
                    [(0.0, 0, 0, 0, 255),
                     (1 / 256.0, 255, 255, 255, 255),
                     (1.0, 255, 255, 255, 255)])
        
                fract4dc.calc(
                    params=[0.0, 0.0, 0.0, 0.0,
                            4.0,
                            0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
                    antialias=0,
                    maxiter=100,
                    yflip=0,
                    nthreads=1,
                    pfo=pfunc,
                    cmap=cmap,
                    auto_deepen=0,
                    periodicity=1,
                    render_type=0,
                    image=im._img,
                    site=site,
                    asynchronous=True)
        
                nrecved = 0
                while True:
                    if nrecved == x:
                        # print "hit message count"
                        fract4dc.interrupt(site)
        
                    nb = 2 * 4
                    bytes = os.read(rfd, nb)
                    if len(bytes) < nb:
                        self.fail(
                            "bad message with length %s, value %s" %
    >                       (len(bytes), bytes))
    E                   AssertionError: bad message with length 4, value b'\x03\x00\x00\x00'
    fract4d/tests/test_fract4d.py:579: AssertionError

In gtkfractal.py we have the onDatacallback (the file descriptor associated with the site is watched). 1st thing this method tries to read from the pipe, if it doesn't get the full message (this appears to be the problem in this test) it puts the data into a buffer class member and waits till the next call.

In this test we call fract4dc.interrupt(site) which turns a flag interrupted=true in the underlying C++ object. This doesn't stop the calculation job right away but instead lets the worker thread (see the calc function arguments contains asynchronous=True) to no proceed to the next "iteration"... so we have no guarantee the message has been written by the time we try to read it in the next instruction bytes = os.read(rfd, nb).

@dragonmux
Copy link
Member

dragonmux commented May 16, 2020

Most likely this is caused by some delay in the kernel and that the kernel is allowed to return from read() early - it might be worth seeing if this can be made to more safely guarantee a read of the right length as read() will return an empty buffer when at EOS/EOF

It should be telling if the problem goes away with a short (10us-ish) sleep between the interrupt call and calling os.read().

@mindhells
Copy link
Member Author

mindhells commented May 16, 2020

I’ve tried a sleep in between interrupt and read. For this specific example appears to work, although I cannot tell for sure since the error is intermittent. Anyway, would the test still be worthy if we do that?
As far as I know, being this example an “asynchronous calculation”, the message is written into the pipe by a thread different from the “main”. Meaning you can reach the read instruction in python before the write operation in C++ completes. I don’t know about the internals of os.read (or its write counterpart in C++), is that what you mean about kernel delay? could this actually happen?

@mindhells
Copy link
Member Author

mindhells commented May 16, 2020

@DX-MON please have a look at #108

@dragonmux
Copy link
Member

That the sleep worked says it's a mix between inter-thread scheduling timings and the time the kernel needs to synchronise data from the write to the read side of the pipe as this doesn't happen instantly.

You are correct that this can actually happen and is exactly what I was driving at. The rest of the test is fine as all we're doing by adding a sleep or select() is making it so we back off to let the kernel catch up like we'd naturally get from the code normally found around the loop which we're short-circuiting with the interrupt call.

substrate has the same problem with its console tests that involve a PTMX and PTS pair (pipes but for TTYs) and a lot of the issue is, as far as I have been able to work out, to do with the kernel expecting the read and write halves to be used by different processes where the scheduling delay between them and time to copy in and out of userspace takes care of sync.

edyoung added a commit that referenced this issue Jun 7, 2020
* work in progress: create markdown-based manual

* finished first pass through manual.md

* exclude parsetab and lextab from pylint (#102)

* Remove buildtools package (#104)

No executable files ending in get.py are being installed.

* Add pylint singleton-comparison check (#105)

* Remove unused traceback import (#106)

* fix #107: wait until fd has something to read or fail (#108)

* Arbitrary precision formula experiment (#103)

* examples: allow custom formula filename in cmake

* [#7] example experiment: formula with arbitrary precision support

* MP example: add swap idiom and other improvements

* add debug information to cpp examples compilation

* release prep for 4.2 (#111)

* Update http links in README.md and setup.py to https (#115)

* Starting point for a benchmark to compare multiple-precision libs (#112)

* typo

* experiment to benchmark arbitrary precision math

* was running 3 benchmarks

* update #112 (#113)

* install google benchmark globally in exmaples infrastructure

* update exmaples readme

* disable frequency scaling during benchmark

* remove i/o from benchmark loop

* benchmark several different bit lengths

* Basis for discussion on available math libraries

* notes

Co-authored-by: Alberto Gonzalez <[email protected]>

* tweak doc generation process

* Check gnofract4d executable with pylint (#117)

* Install icons into the hicolor theme (#121)

gnofract4d-logo.png is size 640x640.

Individual icons created using ImageMagick, e.g.:
magick gnofract4d-logo.png -strip -resize 256x256 logo/256x256/gnofract4d.png

Install the 48x48 icon into pixmaps.

* Run pylint as a separate job (#118)

Report linting and testing results separately. Only run pylint once.

* C++ Engine refactor (#116)

* tidy worker initialization

* remove unused typedef

* worker: improve AA comparison

* fix docker test script

* remove duplicate extern declaration

* workers: refactor common members into base

* remove some completed todo's

* organize fractfunc initialization and members

* unravel coupling: pointfunc - site

* refactor: extract calculation options into a struct to ease initializers

* remove duplicate

* move rgba type basic operations to its type definition

* remove some temporaries in stworker->work

* review worker member names

* remove unneeded worker alloc members

* remove pointFunc factory

* update some old-fashion idioms on fractfunc and stats

* prefix fractfunc private members

* avoid clearing fates twice when autoupdating iters and period tolerance

* reorganize and narrow down public/private worker interface

* remove unused, mark experimental and review old fashioned code on stfractworker

* calcoptions: add some comments and move asynchronous back to where it belongs

* prevent uninitialized members on fractFunc

* remove initialization success flag on workers

* fix comment

* reorganize fractfunct members and remove unused

* work in progress: create markdown-based manual

* finished first pass through manual.md

* tweak doc generation process

* generate manual with hugo

new directory 'manual' generates standalone HTML for manual

* rest of hugo-based manual

* repoint to our copy of theme

* repoint to our version of theme

* install hugo-generated manual

* update submodule ref

* ignorance

* delete docbook version of manual. so long!

* fix test for doc version

* fix submodule commit

* install hugo in travis

* better errors on doc generation

* more debug output

* jfc

* apt version of hugo is too old, try this

* maybe this

* fold createdocs into setup

* delete old files

* work in progress: create markdown-based manual

* finished first pass through manual.md

* tweak doc generation process

* work in progress: create markdown-based manual

* generate manual with hugo

new directory 'manual' generates standalone HTML for manual

* rest of hugo-based manual

* repoint to our copy of theme

* repoint to our version of theme

* install hugo-generated manual

* update submodule ref

* ignorance

* delete docbook version of manual. so long!

* fix test for doc version

* fix submodule commit

* install hugo in travis

* better errors on doc generation

* more debug output

* jfc

* apt version of hugo is too old, try this

* maybe this

* fold createdocs into setup

* delete old files

* delete gui createdocs

It's more trouble than it is worth.
Getting access violations when calling it from setup.py.
Since this doesn't change much anyway,
easier to maintain commands.html manually

* setup.py updates

distutils to setuptools
make doc generation a custom build step

* merge madness

* build manual first

* custom build command doesn't work :-(

* back to generating docs separately

* checkin generated doc files for consistency

* restore css file

* move css file inside fract4dgui

* fix finding resources after install

* guess which pylint to use

* typo

* disable that darn test

* fix pylint whining

Co-authored-by: Alberto Gonzalez <[email protected]>
Co-authored-by: Chris Mayo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants