Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netserver on OSv doesn't work in TCP_MAERTS mode with test duration limited by time #44

Closed
dmitryfleytman opened this issue Sep 25, 2013 · 11 comments
Assignees

Comments

@dmitryfleytman
Copy link
Contributor

Netserver relies on alarm() to set test completion timer and this part is disabled by osv.patch.
As a workaround test duration may be limited by amount of data sent instead of amount of time passed. Then netserver is working.

@ghost ghost assigned dmitryfleytman Sep 25, 2013
@dmitryfleytman
Copy link
Contributor Author

Explanation by Nadav Har'El:

alarm() isn't supported in OSv because it uses signals. Signals (other than SIGSEGV and SIGFPE which happen synchronously) aren't supported on OSv, because they can arrive at any time, including during non-reentrant kernel function calls. The signal handler can then call the same or other kernel functions, and cause a disaster. Unix has a solution for this, with a list of signal-safe systemcalls, and system call restart, but this isn't really something we can do in OSv where there is no concept of system call, and they are just functions like any function.

It's not like OSv is losing any functionality this way - alarm() and signal handlers is a really archaic interface, and most modern programs would use threads that go to sleep instead of signal handlers. The problem is that by not supporting alarm() we're making it hard to port programs.

Maybe we should implement alarm() (and also signal()) in a naive way that ignores the potential reentrancy disasters, and it will end up working in 90% of the cases. But it will be really safer if we don't... :(

@dmitryfleytman
Copy link
Contributor Author

Warning clause to be added to osv/tools/netperf.txt

@penberg
Copy link
Contributor

penberg commented Sep 25, 2013

@dmitryfleytman @nyh I think we should keep this issue open for further discussion whether or not we should fix it. I certainly think we should.

@penberg
Copy link
Contributor

penberg commented Sep 25, 2013

@nyh what do you think?

@nyh
Copy link
Contributor

nyh commented Sep 25, 2013

On Wed, Sep 25, 2013 at 11:33 AM, Pekka Enberg [email protected]:

@nyh https://github.com/nyh what do you think?

I'll try to explain again what I think.

We will not be able to fully support kill(2)/signal(2) (and similarly,
alarm(2)), because their full support means that the signal handler may
do a longjmp(), even if the thread was originally in the middle of running
some kernel code (which might have, say, taken a mutex), a sure recipe for
disaster.

But we can write a working kill(2) that will work under certain
assumptions, such as:

  1. The interrupted thread is doing nothing but sleeping, waiting for the
    signal
    OR
  2. The signal handler doesn't do longjmp, and doesn't call any OSv
    functions. It just sets a memory variable and returns.
    OR
  3. The program doesn't care which thread gets the signal, so we can "cheat"
    by creating a new thread to run the signal handler!

I think we can pretty easily implement kill(2) which will work in these
conditions, and make ^\ work in the JVM, and SIGALARM work for netperf.
There will be cases where it won't work, but probably for most interesting
use of the dying Unix signals feature, it will work.

After we have kill(2) working, we'll need to implement alarm(2). An easy
implementation is to create a new thread (if it doesn't exist yet) which
sends these SIGALRMs.

This should be a new issue - it's not a netperf-specific issue.

@nyh
Copy link
Contributor

nyh commented Sep 25, 2013

On Wed, Sep 25, 2013 at 10:04 PM, Nadav Har'El [email protected]:

But we can write a working kill(2) that will work under certain
assumptions, such as:

  1. The interrupted thread is doing nothing but sleeping, waiting for the
    signal
    OR
  2. The signal handler doesn't do longjmp, and doesn't call any OSv
    functions. It just sets a memory variable and returns.
    OR
  3. The program doesn't care which thread gets the signal, so we can
    "cheat" by creating a new thread to run the signal handler!

I started writing a kill() and alarm() which will work using trick #3 above
(we can later expand it to work under different assumptions).
I'll send a patch later.

We'll have to see if/how it helps in concrete cases we previously
considered like netperf and Java's SIGQUIT (^).

Nadav Har'El
[email protected]

@nyh
Copy link
Contributor

nyh commented Sep 25, 2013

On Wed, Sep 25, 2013 at 10:56 PM, Nadav Har'El [email protected]:

I started writing a kill() and alarm() which will work using trick #3
above (we can later expand it to work under different assumptions).
I'll send a patch later.

We'll have to see if/how it helps in concrete cases we previously
considered like netperf and Java's SIGQUIT (^).

Ok, I just sent two patches, for implementing kill() and alarm()
(obviously, the alarm() implementation requires kill()).
They work in test cases, but the question is whether they help in real
applications.

I would be grateful if you could check if this implementation of alarm()
helps you run the Netperf mode you were interested in.

If it doesn't work, maybe you can tell me how/why it doesn't work.

Nadav Har'El
[email protected]

Nadav Har'El
[email protected]

@dmitryfleytman
Copy link
Contributor Author

@nyh, I've tested your patches, netserver issue resolved. Thanks.

@ghost ghost assigned nyh Sep 28, 2013
@nyh
Copy link
Contributor

nyh commented Sep 30, 2013

Added alarm(), making this netperf mode work. Closing this issue.

@nyh nyh closed this as completed Sep 30, 2013
@dmitryfleytman
Copy link
Contributor Author

It turned out that netperf relies on send_/receive_ syscalls being interrupted by SIG_ALRM in some cases. The issue needs to be reopen.

@nyh nyh reopened this Dec 25, 2013
@nyh
Copy link
Contributor

nyh commented Feb 5, 2014

I believe Dmitry already solved this issue. Dmitry, or someone with super-powers - can you please close this issue? Github doesn't allow a mere mortal like me to do it :(

avikivity added a commit that referenced this issue Oct 12, 2015
* apps 922711c...8ccc734 (18):
  > cassandra: update JNA version and download site
  > cassandra: remove redundant line from manifest
  > Merge pull request #48 from davedoesdev/master
  > Merge pull request #47 from glauciom/patch-1
  > Merge pull request #46 from bhuztez/otp-18.0
  > Upgraded ONOS to 1.2.1
  > perl: fix build
  > Merge pull request #42 from russt/OSV-APPS-39
  > cado: fix module.py
  > cado: some Makefile fixes
  > Merge pull request #44 from russt/OSV-APPS-43
  > Merge pull request #41 from cosmo0920/update-jruby-sinatra-example
  > perl-hello: support for scripts/build
  > perl: add default command line for scripts/build
  > perl: support for scripts/build
  > perl: fix Makefile for parallel make
  > perl: use tabs in Makefile
  > Merge pull request #40 from russt/OSV-APPS-39

Signed-off-by: Avi Kivity <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants