Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve a class of ConcurrentModificationException from during bulk requests #3094

Merged

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Aug 3, 2023

Description

This PR changes the order of setUserInfoInThreadContext and addSecurityRoles to ensure that addSecurityRoles is called before setUserInfoInThreadContext. The ConcurrentModificationException would arise in scenarios where the first thread was done with setUserInfoInThreadContext and had moved onto addSecurityRoles to add the mapped roles into the user's set of security roles. On the first call to addSecurityRoles it may be populating the set with data, any subsequent call would be a noop in other threads.

If simultaneously there is another thread executing setUserInfoInThreadContext while the first thread is in addSecurityRoles then a ConcurrentModificationException is thrown inside the Sets.union(...) call.

By calling addSecurityRoles before setUserInfoInThreadContext, it can be guaranteed that no ConcurrentModificationException could be thrown because the user's security roles will already be set and any thread that attempts another call will be a noop.

  • Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation)

Bug fix

Issues Resolved

#2263

Testing

Tested by running a python script to bulk insert 100k documents and using tmux to run the script in multiple shells at once. Before the change the bug is reproducible regularly, but after the change the bug cannot be reproduced.

Check List

  • New functionality includes testing
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…esolve ConcurrentModificationException on bulk request

Signed-off-by: Craig Perkins <[email protected]>
@codecov
Copy link

codecov bot commented Aug 3, 2023

Codecov Report

Merging #3094 (30095c8) into main (6cc90e6) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #3094      +/-   ##
============================================
+ Coverage     62.40%   62.42%   +0.01%     
+ Complexity     3352     3350       -2     
============================================
  Files           254      254              
  Lines         19749    19749              
  Branches       3334     3334              
============================================
+ Hits          12325    12328       +3     
+ Misses         5792     5788       -4     
- Partials       1632     1633       +1     
Files Changed Coverage Δ
...earch/security/privileges/PrivilegesEvaluator.java 73.12% <100.00%> (ø)

... and 3 files with indirect coverage changes

willyborankin
willyborankin previously approved these changes Aug 3, 2023
RyanL1997
RyanL1997 previously approved these changes Aug 3, 2023
@cwperks cwperks added the backport 2.x backport to 2.x branch label Aug 4, 2023
@cwperks
Copy link
Member Author

cwperks commented Aug 4, 2023

FYI here's the python script I've been using for testing:

from opensearchpy import OpenSearch
from opensearchpy import helpers
import random as r
import requests
import json
import base64

proto = 'https'
host = 'localhost'
port = 9200
auth = ('admin', 'admin') # For testing only. Don't store credentials in code.

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = True, # enables gzip compression for request bodies
    http_auth = auth,
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False
)

# Create an index with non-default settings.
index_name = 'movies'

try:
	response = client.indices.create(f'movies')
	print('\nCreating index:')
	print(response)
except:
	pass


title_choices = ['Titanic', 'Jurassic Park', 'Star Wars']
year_choices = ['2013', '1992', '2023', '2001', '1985']

lorem = [
	'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
	'Quisque vitae varius ex, eu volutpat orci.',
	'Aenean ullamcorper orci et vulputate fermentum.',
	'Cras erat dui, finibus vel lectus ac, pharetra dictum odio.',
	'Nullam tempus scelerisque purus, sed mattis elit condimentum nec.',
	'Etiam risus sapien, auctor eu volutpat sit amet, porta in nunc.',
	'Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.',
	'Proin ipsum purus, laoreet quis dictum a, laoreet sed ligula.',
	'Integer ultricies malesuada quam.',
	'Cras vel elit sed mi placerat pharetra eget vel odio.',
	'Duis ac nulla varius diam ultrices rutrum.'
]

basic_auth_token = base64.b64encode('admin:admin'.encode()).decode()

headers = {
	'Content-Type': 'application/json',
	'Authorization': f'Basic {basic_auth_token}'
}


bulk = []
for i in range(100000):
	document = {
	  'title': r.choice(title_choices),
	  'year': r.choice(year_choices),
	  'description': r.choice(lorem)
	}	

	bulk.append({"_index": index_name, "_id": f"{i}", "_source": document})

	if (i+1) % 100 == 0:
		print(f'\nAdding documents {i - 99} - {i}')
		helpers.bulk(client, bulk)
		bulk = []

@cwperks
Copy link
Member Author

cwperks commented Aug 4, 2023

Steps for reproducing the issue:

  1. Run a node with the security plugin installed before this change
  2. Run the python script above - it helps to reproduce by running it in multiple terminals simultaneously
  3. See ConcurrentModificationException

Repeat the steps after this change and its stable. I'm not sure if a change like this is easily unit testable.

@cwperks
Copy link
Member Author

cwperks commented Aug 4, 2023

This will definitely resolve the issue in PrivilegesEvaluator, but looking again at the linked issue shows this thrown at another place which I haven't been able to reproduce. The OP in the linked issue shows it being thrown from here: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/securityconf/ConfigModelV7.java#L1302

peternied
peternied previously approved these changes Aug 4, 2023
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is good, cleaner interfaces are better, but I don't think this fixes the underlying issue. Lets merge this, but I don't think we should say the root cause has been address - what do you think about addressing that issue seperately?

@cwperks cwperks dismissed stale reviews from peternied, RyanL1997, and willyborankin via 3cac3e4 August 5, 2023 11:05
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we start using more advanced concurrency changes we should have an automated test that can verify the issue.

Can we write a unit/component test that makes an User and then manipulates it on many threads at once to reproduce? I don't think we need the full service reproduction of the issue to know its fixed or not.

Comment on lines 275 to 281
public synchronized final void addSecurityRoles(final Collection<String> securityRoles) {
if (securityRoles != null && this.securityRoles != null) {
this.securityRoles.addAll(securityRoles);
}
}

public final Set<String> getSecurityRoles() {
public synchronized final Set<String> getSecurityRoles() {
Copy link
Member

@peternied peternied Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to protect concurrent access to these methods you need a single locking mechanism, these synchronized keywords work at the method level, not the class level, its at the class level.

I prefer to make a field object to handle the lock for each log explicitly, e.g. private final Object securityRolesLock = new Object(); then do lock(securityRolesLock) { ... } inside of all places where the securityRoles collection is interacted with.

IMO it isn't clear if this will address the underlying issue because concurrent modification can happen outside these methods by iterating over the collection outside of these methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I'm looking into splitting getSecurityRoles and getMappedRoles because the currently logic in place intermingles the 2 which causes the issue. getSecurityRoles is called within mapRoles which can be problematic because after mapRoles is finished executing then addSecurityRoles is called within PrivilegesEvaluator.

What I'd like to see is one thread doing the mapping of the roles and other threads can read from its result if its already been computed. I'll try different ways to write an automated test for this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR would make this issue occur with much less frequency, but obviously it would be best to ensure a situation like this can never happen

@cwperks
Copy link
Member Author

cwperks commented Aug 7, 2023

@peternied @willyborankin @RyanL1997 I updated this PR to remove the synchronized changes.

@willyborankin willyborankin merged commit cd699bb into opensearch-project:main Aug 7, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 7, 2023
…esolve ConcurrentModificationException on bulk request (#3094)

### Description

This PR changes the order of `setUserInfoInThreadContext` and
`addSecurityRoles` to ensure that `addSecurityRoles` is called before
`setUserInfoInThreadContext`. The ConcurrentModificationException would
arise in scenarios where the first thread was done with
`setUserInfoInThreadContext` and had moved onto `addSecurityRoles` to
add the mapped roles into the user's set of security roles. On the first
call to `addSecurityRoles` it may be populating the set with data, any
subsequent call would be a noop in other threads.

If simultaneously there is another thread executing
`setUserInfoInThreadContext` while the first thread is in
`addSecurityRoles` then a ConcurrentModificationException is thrown
inside the `Sets.union(...)` call.

By calling `addSecurityRoles` before `setUserInfoInThreadContext`, it
can be guaranteed that no ConcurrentModificationException could be
thrown because the user's security roles will already be set and any
thread that attempts another call will be a noop.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

#2263

### Testing

Tested by running a python script to bulk insert 100k documents and
using tmux to run the script in multiple shells at once. Before the
change the bug is reproducible regularly, but after the change the bug
cannot be reproduced.

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
(cherry picked from commit cd699bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
cwperks added a commit that referenced this pull request Aug 8, 2023
…urityRoles to resolve ConcurrentModificationException on bulk request (#3113)

Backport cd699bb from #3094.

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Craig Perkins <[email protected]>
stephen-crawford pushed a commit to stephen-crawford/security that referenced this pull request Aug 11, 2023
…esolve ConcurrentModificationException on bulk request (opensearch-project#3094)

### Description

This PR changes the order of `setUserInfoInThreadContext` and
`addSecurityRoles` to ensure that `addSecurityRoles` is called before
`setUserInfoInThreadContext`. The ConcurrentModificationException would
arise in scenarios where the first thread was done with
`setUserInfoInThreadContext` and had moved onto `addSecurityRoles` to
add the mapped roles into the user's set of security roles. On the first
call to `addSecurityRoles` it may be populating the set with data, any
subsequent call would be a noop in other threads.

If simultaneously there is another thread executing
`setUserInfoInThreadContext` while the first thread is in
`addSecurityRoles` then a ConcurrentModificationException is thrown
inside the `Sets.union(...)` call.

By calling `addSecurityRoles` before `setUserInfoInThreadContext`, it
can be guaranteed that no ConcurrentModificationException could be
thrown because the user's security roles will already be set and any
thread that attempts another call will be a noop.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

opensearch-project#2263

### Testing

Tested by running a python script to bulk insert 100k documents and
using tmux to run the script in multiple shells at once. Before the
change the bug is reproducible regularly, but after the change the bug
cannot be reproduced.

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
@cwperks cwperks added the backport 2.9 Backport to 2.9 branch label Aug 14, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 14, 2023
…esolve ConcurrentModificationException on bulk request (#3094)

### Description

This PR changes the order of `setUserInfoInThreadContext` and
`addSecurityRoles` to ensure that `addSecurityRoles` is called before
`setUserInfoInThreadContext`. The ConcurrentModificationException would
arise in scenarios where the first thread was done with
`setUserInfoInThreadContext` and had moved onto `addSecurityRoles` to
add the mapped roles into the user's set of security roles. On the first
call to `addSecurityRoles` it may be populating the set with data, any
subsequent call would be a noop in other threads.

If simultaneously there is another thread executing
`setUserInfoInThreadContext` while the first thread is in
`addSecurityRoles` then a ConcurrentModificationException is thrown
inside the `Sets.union(...)` call.

By calling `addSecurityRoles` before `setUserInfoInThreadContext`, it
can be guaranteed that no ConcurrentModificationException could be
thrown because the user's security roles will already be set and any
thread that attempts another call will be a noop.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

#2263

### Testing

Tested by running a python script to bulk insert 100k documents and
using tmux to run the script in multiple shells at once. Before the
change the bug is reproducible regularly, but after the change the bug
cannot be reproduced.

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
(cherry picked from commit cd699bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
peternied pushed a commit that referenced this pull request Aug 14, 2023
…urityRoles to resolve ConcurrentModificationException on bulk request (#3173)

Backport cd699bb from #3094.

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@cwperks cwperks added the backport 1.3 backport to 1.3 branch label Aug 16, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/security/backport-1.3 1.3
# Navigate to the new working tree
pushd ../.worktrees/security/backport-1.3
# Create a new branch
git switch --create backport/backport-3094-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 cd699bb7d3a07b8919ef2fb5e8fb4ccd2e622acb
# Push it to GitHub
git push --set-upstream origin backport/backport-3094-to-1.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/security/backport-1.3

Then, create a pull request where the base branch is 1.3 and the compare/head branch is backport/backport-3094-to-1.3.

@peternied
Copy link
Member

@cwperks please make sure to backport this change to 1.x as well

cwperks added a commit to cwperks/security that referenced this pull request Aug 16, 2023
…esolve ConcurrentModificationException on bulk request (opensearch-project#3094)

This PR changes the order of `setUserInfoInThreadContext` and
`addSecurityRoles` to ensure that `addSecurityRoles` is called before
`setUserInfoInThreadContext`. The ConcurrentModificationException would
arise in scenarios where the first thread was done with
`setUserInfoInThreadContext` and had moved onto `addSecurityRoles` to
add the mapped roles into the user's set of security roles. On the first
call to `addSecurityRoles` it may be populating the set with data, any
subsequent call would be a noop in other threads.

If simultaneously there is another thread executing
`setUserInfoInThreadContext` while the first thread is in
`addSecurityRoles` then a ConcurrentModificationException is thrown
inside the `Sets.union(...)` call.

By calling `addSecurityRoles` before `setUserInfoInThreadContext`, it
can be guaranteed that no ConcurrentModificationException could be
thrown because the user's security roles will already be set and any
thread that attempts another call will be a noop.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

opensearch-project#2263

Tested by running a python script to bulk insert 100k documents and
using tmux to run the script in multiple shells at once. Before the
change the bug is reproducible regularly, but after the change the bug
cannot be reproduced.

- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
(cherry picked from commit cd699bb)
cwperks added a commit to cwperks/security that referenced this pull request Aug 16, 2023
…esolve ConcurrentModificationException on bulk request (opensearch-project#3094)

This PR changes the order of `setUserInfoInThreadContext` and
`addSecurityRoles` to ensure that `addSecurityRoles` is called before
`setUserInfoInThreadContext`. The ConcurrentModificationException would
arise in scenarios where the first thread was done with
`setUserInfoInThreadContext` and had moved onto `addSecurityRoles` to
add the mapped roles into the user's set of security roles. On the first
call to `addSecurityRoles` it may be populating the set with data, any
subsequent call would be a noop in other threads.

If simultaneously there is another thread executing
`setUserInfoInThreadContext` while the first thread is in
`addSecurityRoles` then a ConcurrentModificationException is thrown
inside the `Sets.union(...)` call.

By calling `addSecurityRoles` before `setUserInfoInThreadContext`, it
can be guaranteed that no ConcurrentModificationException could be
thrown because the user's security roles will already be set and any
thread that attempts another call will be a noop.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

opensearch-project#2263

Tested by running a python script to bulk insert 100k documents and
using tmux to run the script in multiple shells at once. Before the
change the bug is reproducible regularly, but after the change the bug
cannot be reproduced.

- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [ ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
(cherry picked from commit cd699bb)
willyborankin pushed a commit that referenced this pull request Aug 16, 2023
…urityRoles to resolve ConcurrentModificationException on bulk request (#3094) (#3194)

Backport #3094 to 1.x
willyborankin pushed a commit that referenced this pull request Aug 16, 2023
…urityRoles to resolve ConcurrentModificationException on bulk request (#3094) (#3193)

Backport #3094 to 1.3

---------

Signed-off-by: Craig Perkins <[email protected]>
@peternied peternied changed the title Reverse order of setUserInfoInThreadContext and addSecurityRoles to resolve ConcurrentModificationException on bulk request Resolve a class of ConcurrentModificationException from during bulk requests Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.3 backport to 1.3 branch backport 2.x backport to 2.x branch backport 2.9 Backport to 2.9 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants