Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic due to concurrent map writes #4582

Closed
jonsabados opened this issue May 16, 2018 · 6 comments
Closed

Panic due to concurrent map writes #4582

jonsabados opened this issue May 16, 2018 · 6 comments
Milestone

Comments

@jonsabados
Copy link

jonsabados commented May 16, 2018

Environment:

  • Vault Version: v0.10.0
  • Operating System/Architecture: Ubuntu 16.04.3 LTS

Vault Config File:

ui = true

backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  redirect_addr = "http://10.101.22.237:8200"
  token = "xxxxxx"
}

ha_backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  redirect_addr = "http://10.101.22.237:8200"
  token = "xxxxx"
}

listener "tcp" {
  address = "127.0.0.1:8200"
  tls_key_file = "/etc/vault/vault.key"
  tls_cert_file = "/etc/vault/vault.cer"
}

listener "tcp" {
 address = "10.101.22.237:8200"
 tls_key_file = "/etc/vault/vault.key"
 tls_cert_file = "/etc/vault/vault.cer"
}

listener "tcp" {
 address = "10.101.22.237:1867"
 tls_key_file = "/etc/vault/vault.service.key"
 tls_cert_file = "/etc/vault/vault.service.cer"
}

telemetry {
 statsd_address = "127.0.0.1:8125"
}

Startup Log Output:

May  2 21:28:35 ip-10-101-22-237 vault[5440]: ==> Vault server configuration:
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               HA Storage: consul
May  2 21:28:35 ip-10-101-22-237 vault[5440]:              Api Address: http://10.101.22.237:8200
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                      Cgo: disabled
May  2 21:28:35 ip-10-101-22-237 vault[5440]:          Cluster Address: https://10.101.22.237:8201
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 2: tcp (addr: "10.101.22.237:8200", cluster address: "10.101.22.237:8201", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:               Listener 3: tcp (addr: "10.101.22.237:1867", cluster address: "10.101.22.237:1868", tls: "enabled")
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                Log Level: info
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                    Mlock: supported: true, enabled: true
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                  Storage: consul
May  2 21:28:35 ip-10-101-22-237 vault[5440]:                  Version: Vault v0.10.0
May  2 21:28:35 ip-10-101-22-237 vault[5440]:              Version Sha: 5dd7f25f5c4b541f2da62d70075b6f82771a650d
May  2 21:28:35 ip-10-101-22-237 vault[5440]: ==> Vault server started! Log data will stream in below:

We have only encountered this once, and have not been able to reproduce but we experienced a crash of vault due to a panic around concurrent writes to a map & figured it was worth reporting - link to crash log gist provided in references.

Important Factoids:
The instance that crashed was the active instance in an HA cluster. Since it looks to be around ACL related code it is worth noting that we are using AWS IAM auth to retrieve tokens for lambdas. Probably also worth mentioning that we are also using vault to manage credentials for RabbitMQ through the RabbitMQ secrets engine, credentials for MySQL through the database secrets engine and IAM credentials through the AWS secrets engine.

References:
Crash log: https://gist.github.com/jonsabados/712c915c992c925a66f4d4f66e7fbd68

@jefferai
Copy link
Member

Many thanks for reporting this -- any panic is a bug that should be fixed, and providing the log is super duper helpful. We'll update when we know more.

@jefferai jefferai added this to the 0.10.2 milestone May 17, 2018
@jefferai
Copy link
Member

Hi there,

I think I've figured out the issue but it involves a specific set of circumstances once all policies given to a token are being evaluated together, namely:

  • Three or more policy path statements with the same path
  • Two or more of those path statements containing allowed_parameters (again, for the same path)

Can you confirm/deny?

jefferai added a commit that referenced this issue May 17, 2018
Fixes #4582 -- and even if it doesn't, it's the right thing to do
@jonsabados
Copy link
Author

would a wildcard + two exact matches count for Three or more policy path statements with the same path? If so then yup, we have a policy roughly along the lines of

path "sys/*" {
  policy = "deny"
}

path "sys/leases/lookup" {
  capabilities = ["update"],
  allowed_parameters = {
    "lease_id" = ["database/creds/someservice*"],
  }
}

path "sys/leases/lookup" {
  capabilities = ["update"],
  allowed_parameters = {
    "lease_id" = ["aws/sts/someservice*"],
  }
}

given to tokens that would have been in use right around the time of the crash

@jefferai
Copy link
Member

Are you using the default policy? The current version of that policy contains:

# Allow looking up lease properties. This requires knowing the lease ID ahead
# of time and does not divulge any sensitive information.
path "sys/leases/lookup" {
    capabilities = ["update"]
}

If you are, it would indeed mean that you have three exact matches.

@jonsabados
Copy link
Author

Indeed we are, so looks like were a match for the circumstances you described

@jefferai
Copy link
Member

Great, fix will be in 0.10.2! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants