Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TSAN warnings related to dfa::DFA::s0 #3311

Merged
merged 1 commit into from
Nov 14, 2021

Conversation

jcking
Copy link
Collaborator

@jcking jcking commented Oct 18, 2021

In Java, dfa::DFA::s0 is marked as volatile and in Go it uses a mutex. The access is raw in C++ leading to TSAN warnings and undefined behavior if two threads update the pointer simultaneously. This is resolved by using std::atomic and acquire/release semantics.

@jcking jcking changed the title Fix TSAN warnings related to dfa::DFAState::s0 Fix TSAN warnings related to dfa::DFA::s0 Oct 18, 2021
@jcking jcking force-pushed the cpp-go-runtime-thread-safety branch 2 times, most recently from e016eee to def8cc7 Compare October 19, 2021 15:17
@jcking
Copy link
Collaborator Author

jcking commented Oct 19, 2021

@mike-lischke

@mike-lischke
Copy link
Member

mike-lischke commented Oct 22, 2021

Hmm, what exactly is TSAN complaining about? You added atomics in a section which is already protected by a lock (see _stateLock.writeLock();). And that's not an average lock, but one which allows multiple reads at the same time to improve performance. What does your patch improve?

Could this be a false positive reported by TSAN?

@jcking
Copy link
Collaborator Author

jcking commented Oct 22, 2021

Hmm, what exactly is TSAN complaining about? You added atomics in a section which is already protected by a lock (see _stateLock.writeLock();). And that's not an average lock, but one which allows multiple reads at the same time to improve performance. What does your patch improve?

Could this be a false positive reported by TSAN?

The race is in LexerATNSimulator::matchATN which I do not believe has mutation exclusion when updating the pointer.

@mike-lischke
Copy link
Member

mike-lischke commented Oct 31, 2021

I'm still not convinced. Both lexer and parser ATN simulators have state and edge locks to prevent concurrent access. They protect not only the DFA s0 states but also the process of adding new DFA states. Adding std::atomic to the mix is not going to help here, but complicates things. So either std::atomic should go or the state/edge locks.

However I saw one case where no protection is in place: LexerATNSimulator::matchATN(). There's a write operation:

  if (!suppressEdge) {
    _decisionToDFA[_mode].s0 = next;
  }

which is not protected by the lock. Would be good to handle that properly by somehow extending the lock that was acquired in addDFAState to include this set operation. Maybe addDFAState can do the check for suppressing the edge and also set the s0 closure, which is currently done after addDFAState?

@jcking
Copy link
Collaborator Author

jcking commented Nov 1, 2021

I'm still not convinced. Both lexer and parser ATN simulators have state and edge locks to prevent concurrent access. They protect not only the DFA s0 states but also the process of adding new DFA states. Adding std::atomic to the mix is not going to help here, but complicates things. So either std::atomic should go or the state/edge locks.

However I saw one case where no protection is in place: LexerATNSimulator::matchATN(). There's a write operation:

  if (!suppressEdge) {
    _decisionToDFA[_mode].s0 = next;
  }

which is not protected by the lock. Would be good to handle that properly by somehow extending the lock that was acquired in addDFAState to include this set operation. Maybe addDFAState can do the check for suppressing the edge and also set the s0 closure, which is currently done after addDFAState?

Here is the sanitized TSAN output.

WARNING: ThreadSanitizer: data race (pid=8618)
  Read of size 8 at 0x7b8400002838 by thread T73:
    #0 antlr4::atn::LexerATNSimulator::match(antlr4::CharStream*, unsigned long) atn/LexerATNSimulator.cpp:84:11
    #1 antlr4::Lexer::nextToken() Lexer.cpp:81:59
    #2 non-virtual thunk to antlr4::Lexer::nextToken() Lexer.cpp
    #3 antlr4::BufferedTokenStream::fetch(unsigned long) BufferedTokenStream.cpp:96:44
    #4 antlr4::BufferedTokenStream::sync(unsigned long) BufferedTokenStream.cpp:82:22
    #5 antlr4::BufferedTokenStream::setup() BufferedTokenStream.cpp:188:3
    #6 antlr4::BufferedTokenStream::lazyInit() BufferedTokenStream.cpp:182:5
    #7 antlr4::CommonTokenStream::LT(long) CommonTokenStream.cpp:44:3
    #8 antlr4::Parser::enterRule(antlr4::ParserRuleContext*, unsigned long, unsigned long) Parser.cpp:338:25

Note that a frame or two may be missing due to optimizations. We definitely need to have that write guarded by a mutex or std::atomic. Extending addDFAState to just do the work makes sense to me.

@jcking jcking force-pushed the cpp-go-runtime-thread-safety branch 3 times, most recently from fdf57d0 to a9e43d8 Compare November 2, 2021 16:46
@jcking
Copy link
Collaborator Author

jcking commented Nov 2, 2021

I think I have covered all the bases. I need to revisit the whole lock setup, as those _stateLock and _edgeLock should really not be static if possible. The state should be per LexerATNSimulator and ParserATNSimulator as they use separate data. So maybe the locks should be on the ATN.

@jcking jcking force-pushed the cpp-go-runtime-thread-safety branch from a9e43d8 to f711e32 Compare November 9, 2021 18:34
@mike-lischke
Copy link
Member

No, the locks must be static because they protect static data. The DFA is shared among all parser + lexer instances.

@mike-lischke
Copy link
Member

@parrt Another pure C++ patch, ready for merge.

@parrt parrt added this to the 4.9.4 milestone Nov 14, 2021
@parrt parrt merged commit 0129cd4 into antlr:master Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants