Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessions are not closing/terminated [ICU-1257] #894

Closed
Akisaji opened this issue Jan 29, 2021 · 22 comments · Fixed by #1793
Closed

Sessions are not closing/terminated [ICU-1257] #894

Akisaji opened this issue Jan 29, 2021 · 22 comments · Fixed by #1793
Assignees
Labels
question Further information is requested

Comments

@Akisaji
Copy link

Akisaji commented Jan 29, 2021

Describe the bug
Sessions are displayed as active in the webui and cli and stay active after cancelling the connection.

To Reproduce

  1. Create a connection
  2. Shutdown boundary with connection still active
  3. Start boundary(usually the next day)
  4. Connection is still displayed in the view way beyond it's TTL

Expected behavior
For the connection to be closed when boundary is shut downed and no longer displayed as active. And be able to cancel the connecting without it popping back up.

Additional context
The way we have boundary setup currently, it's shutdown at the end of day so my guess is this is what is causing the issues.

So I'm having this issue currently with multiple sessions and have seen the issue come back several times. Only way i've found to "fix" this is to reinitialize the db which i would like to avoid.

boundary sessions read -id s_0P4gIy6IG6

Session information:
  Auth Token ID:        at_7c30sq3trl
  Created Time:         Thu, 21 Jan 2021 16:03:07 CET
  Endpoint:             tcp://url
  Expiration Time:      Fri, 22 Jan 2021 00:03:07 CET
  Host ID:              hst_GojaoqThFC
  Host Set ID:          hsst_3snDcq1TsV
  ID:                   s_0P4gIy6IG6
  Status:               terminated
  Target ID:            ttcp_p2xUpl4ZHU
  Termination Reason:   canceled
  Type:                 tcp
  Updated Time:         Fri, 29 Jan 2021 12:14:40 CET
  User ID:              u_TWZ6052pWZ
  Version:              31

  Scope:
    ID:                 p_j3z93Fl705
    Name:               test
    Parent Scope ID:    o_c4HqJ42HzW
    Type:               project

  States:
    Start Time:         Thu, 21 Jan 2021 16:04:17 CET
    Status:             terminated

    End Time:           Thu, 21 Jan 2021 16:04:17 CET
    Start Time:         Thu, 21 Jan 2021 16:03:11 CET
    Status:             canceling

    End Time:           Thu, 21 Jan 2021 16:03:11 CET
    Start Time:         Thu, 21 Jan 2021 16:03:08 CET
    Status:             active

    End Time:           Thu, 21 Jan 2021 16:03:08 CET
    Start Time:         Thu, 21 Jan 2021 16:03:07 CET
    Status:             pending

And for a list in the scope i get the following output

boundary sessions list -scope-id p_j3z93Fl705 | grep -A 6 s_0P4gIy6IG6
  ID:                 s_0P4gIy6IG6
    Status:           canceling
    Created Time:     Thu, 21 Jan 2021 16:03:07 CET
    Expiration Time:  Fri, 22 Jan 2021 00:03:07 CET
    Updated Time:     Fri, 29 Jan 2021 12:14:40 CET
    User ID:          u_TWZ6052pWZ
    Target ID:        ttcp_p2xUpl4ZHU
--
  ID:                 s_0P4gIy6IG6
    Status:           terminated
    Created Time:     Thu, 21 Jan 2021 16:03:07 CET
    Expiration Time:  Fri, 22 Jan 2021 00:03:07 CET
    Updated Time:     Fri, 29 Jan 2021 12:14:40 CET
    User ID:          u_TWZ6052pWZ
    Target ID:        ttcp_p2xUpl4ZHU
--
  ID:                 s_0P4gIy6IG6
    Status:           pending
    Created Time:     Thu, 21 Jan 2021 16:03:07 CET
    Expiration Time:  Fri, 22 Jan 2021 00:03:07 CET
    Updated Time:     Fri, 29 Jan 2021 12:14:40 CET
    User ID:          u_TWZ6052pWZ
    Target ID:        ttcp_p2xUpl4ZHU
--
  ID:                 s_0P4gIy6IG6
    Status:           active
    Created Time:     Thu, 21 Jan 2021 16:03:07 CET
    Expiration Time:  Fri, 22 Jan 2021 00:03:07 CET
    Updated Time:     Fri, 29 Jan 2021 12:14:40 CET
    User ID:          u_TWZ6052pWZ
    Target ID:        ttcp_p2xUpl4ZHU

Cancelling the session has no effect in the cli or through the web ui and this is how it looks in the webui. When i press cancel all 4 close for a small second and pop right back up. There is still a username and target displayed in the image below

image

So i also went and checked in the db and saw the following while only 1 or 2 sessions at max should be active.

select count(*) from session_state where state = 'active';
 count 
-------
    98
(1 row)

Also saw this issue in version 0.1.2

boundary -version

Version information:
  Git Revision:        b5d84495a33b72a3139bd224d3cfcd4cbaad7b98
  Version Number:      0.1.3

@malnick
Copy link
Collaborator

malnick commented Feb 2, 2021

Thanks for raising this @Akisaji

What is your architecture? Are you running Boundary workers and controllers in one process on a single box or are they broken out and independent?

I did attempt to repro this locally, and after starting a session to a redis instance, then killing the boundary server (v0.1.5), the redis session was terminated. After bringing Boundary back up, the session was shown as terminated in the UI and CLI.

Can you try upgrading to 0.1.5?

@malnick malnick self-assigned this Feb 2, 2021
@Akisaji
Copy link
Author

Akisaji commented Feb 16, 2021

The worker and controller both run on the same instance.
But with the upgrade to 0.1.5 i haven't encountered the issue yet, so it might have been solved.
All the older sessions that are still "open" still refuse to be closed tho. Is there any way i can force these to stop? Or do i need to manually drop all the occurances of the session in the db?

@malnick
Copy link
Collaborator

malnick commented Feb 18, 2021

I think this is actually confusion about the output of list sessions. That command will list all the states that the session has ever been in, with timestamps to indicate when it was in that state. If you look at the output you posted above, it looks like your session ended on Created Time: Thu, 21 Jan 2021 16:03:07 CET.

@malnick malnick added the question Further information is requested label Feb 18, 2021
@Akisaji
Copy link
Author

Akisaji commented Feb 24, 2021

Even after the upgrade the "open" session came back, and i understand that the cli lists all the states and the final state is also described as terminated.

Session information:
  Auth Token ID:        at_wqXvKRT7sw
  Created Time:         Fri, 29 Jan 2021 16:54:10 CET
  Endpoint:             tcp://ip:22
  Expiration Time:      Sat, 30 Jan 2021 00:54:10 CET
  Host ID:              hst_XVFRltHKnN
  Host Set ID:          hsst_ud1InqRiQR
  ID:                   s_tIXBBviQc5
  Status:               terminated
  Target ID:            ttcp_7HlT7IqIz4
  Termination Reason:   canceled
  Type:                 tcp
  Updated Time:         Wed, 24 Feb 2021 12:57:15 CET
  User ID:              u_TWZ6052pWZ
  Version:              9

  Scope:
    ID:                 p_j3z93Fl705
    Name:               terminator
    Parent Scope ID:    o_c4HqJ42HzW
    Type:               project

  States:
    Start Time:         Fri, 29 Jan 2021 17:29:23 CET
    Status:             terminated

    End Time:           Fri, 29 Jan 2021 17:29:23 CET
    Start Time:         Fri, 29 Jan 2021 17:28:10 CET
    Status:             canceling

    End Time:           Fri, 29 Jan 2021 17:28:10 CET
    Start Time:         Fri, 29 Jan 2021 16:54:10 CET
    Status:             active

    End Time:           Fri, 29 Jan 2021 16:54:10 CET
    Start Time:         Fri, 29 Jan 2021 16:54:10 CET
    Status:             pending

But in the webui this session isn't show as closed/terminated
And after further research i've noticed that this isn't because of the restarts metioned above, I think it is caused by using the -exec function, as it seems only the targets that the exec function have been used for have these "open" sessions. Most of the time it works correctly just occasionally it stay open and only in the webui while the cli actually says it's closed.

The exec is mostly used to start an sshuttle.

Screenshot from 2021-02-24 13-04-19

@ghost
Copy link

ghost commented Mar 18, 2021

Hi there! Thanks for your interest in Boundary. I've made a number of attempts to replicate this issue, but I haven't been able to yet. Could you provide additional details about your environment? What version of Boundary are you running now? What command(s) did you use to initiate the session?

It looks like the latest screenshot shows two sessions with the same ID. This is an exceptionally rare occurrence in the UI. It might be resolved by refreshing the browser. If not, I wonder if you could screenshot or copy/paste the browser's developer console, in case there are any useful frontend error messages.

@Akisaji
Copy link
Author

Akisaji commented Mar 23, 2021

@randallmorey
the current boundary version is

Version information:
  Git Revision:        ba6c0df8ca56eff0f01d9717da1b1435898408d3
  Version Number:      0.1.5

our setup uses an ec2 instance which runs both the controller and the worker. in front is a route53 which leads to an alb for the controller and the worker port isn't behind a lb.
Controller config

disable_mlock = true

controller {
  name = "controller"
  description = "Boundary controller."

  database {
      url = "postgresql://..."
  }
}

# API listener configuration block
listener "tcp" {
  # Should be the address of the NIC that the controller server will be reached on
  address = "private_ip:9200"
  # The purpose of this listener block
  purpose = "api"

  tls_disable = true

}

# Data-plane listener configuration block (used for worker coordination)
listener "tcp" {
  # Should be the IP of the NIC that the worker will connect on
  address = "127.0.0.1:9201"
  # The purpose of this listener
  purpose = "cluster"

  tls_disable = true
}

kms "transit" {
  purpose    = "root"
  address            =
  token              = 

  key_name           = 
  mount_path         = "transit/"

  tls_skip_verify    = "true"
}

kms "transit" {
  purpose    = "worker-auth"
  address            = 
  token              = 

  key_name           = 
  mount_path         = "transit/"

  tls_skip_verify    = "true"
}

kms "transit" {
  purpose    = "recovery"
  address            = 
  token              = 

  key_name           = 
  mount_path         = "transit/"

  tls_skip_verify    = "true"
}

Worker config

listener "tcp" {
  address = "private_ip:9202"
  purpose = "proxy"
  tls_disable = true
}

worker {
  name = "boundary-worker"
  description = "Boundary worker."

  controllers = [
    "127.0.0.1",
]
  public_addr = "public_ip"
}

kms "transit" {
  purpose    = "worker-auth"
  address            = 
  token              = 

  key_name           = 
  mount_path         = "transit/"

  tls_skip_verify    = "true"
}

The issue doesn't always occur and i'm not the only user the issue does occur for, some instances even have 3 sessions with the same ID open. There are also 3 different boundarys setup and on all 3 instances the same issue comes up.
All targets are configured with a session limit of -1.

Commands mostly used:

  • boundary connect -exec sshuttle -addr $addr -token $token -target-name $1 -target-scope-name $team -- -e 'ssh -q -o StrictHostKeyChecking=accept-new' --dns -vvr {{boundary.ip}}:{{boundary.port}} $2
  • boundary connect ssh -addr $addr -token $token -target-name $1 -target-scope-name $team -- -L $2:$3:$4
  • boundary connect ssh -addr $addr -token $token -target-name $1 -target-scope-name $team
  • boundary connect postgres -addr $addr -token $token -target-name $1 -target-scope-name $team -username $2 -- --dbname $3

It isn't one specific command that triggers this to happen as all kinds of backends that use different commands to connect to have this happen.

This is an exceptionally rare occurrence in the UI. It might be resolved by refreshing the browser

If this was the case i doubt i'd have the same issue on 3 instances that all run on different nodes, etc

I didn't see any errors in the frontend of the webui

Currently i also have some sessions that are considered open from as far back as january.

If there are any more specific things you would like to see, feel free to ask.

@ghost
Copy link

ghost commented Mar 23, 2021

Thanks for the additional information. The latest version of Boundary is 0.1.8. Do you experience this issue on the latest version?

@jsp-exhashi
Copy link

jsp-exhashi commented Mar 29, 2021

I'm using 0.1.8 but still have an issue with session cancel.

`
❯ boundary sessions list -scope-id=p_bBZfEkUor6 |grep -A 6 s_Su72GEjZxN
ID: s_Su72GEjZxN
Status: terminated
Created Time: Mon, 29 Mar 2021 21:10:50 KST
Expiration Time: Mon, 29 Mar 2021 21:11:50 KST
Updated Time: Mon, 29 Mar 2021 22:20:19 KST
User ID: u_XUObIl8QBr
Target ID: ttcp_BtT22pA3xC

ID: s_Su72GEjZxN
Status: canceling
Created Time: Mon, 29 Mar 2021 21:10:50 KST
Expiration Time: Mon, 29 Mar 2021 21:11:50 KST
Updated Time: Mon, 29 Mar 2021 22:20:19 KST
User ID: u_XUObIl8QBr
Target ID: ttcp_BtT22pA3xC

ID: s_Su72GEjZxN
Status: active
Created Time: Mon, 29 Mar 2021 21:10:50 KST
Expiration Time: Mon, 29 Mar 2021 21:11:50 KST
Updated Time: Mon, 29 Mar 2021 22:20:19 KST
User ID: u_XUObIl8QBr
Target ID: ttcp_BtT22pA3xC
`

@ghost
Copy link

ghost commented Mar 29, 2021

@jsp-hashicorp what is the Boundary CLI output without piping through an additional command? And what is the output of boundary version?

@ghost
Copy link

ghost commented Mar 29, 2021

@Akisaji I want to confirm, in addition to the previous request, that you upgraded your Boundary cluster to latest (not just the local CLI). Once you have everything upgraded to 0.1.8 and the migration is complete, let us know if you're able to reproduce this issue and which exec command(s) can still reproduce it.

I have tried replicating this behaviour in 0.1.8 unsuccessfully using various connect commands.

@jsp-exhashi
Copy link

jsp-exhashi commented Mar 30, 2021

@randallmorey
Here's the output of boundary version

❯ boundary version

Version information:
  Git Revision:        c0f33f982c87c0eb4127cb16cf06b03a37b91dbd
  Version Number:      0.1.8

Without the pipe, the output of boundary sessions list -scope-id=p_bBZfEkUor displays 307 IDs like followings:

  ID:                    s_PRoVPuE4jO
    Status:              canceling
    Created Time:        Tue, 30 Mar 2021 09:12:55 KST
    Expiration Time:     Tue, 30 Mar 2021 09:13:55 KST
    Updated Time:        Tue, 30 Mar 2021 09:13:22 KST
    User ID:             u_XUObIl8QBr
    Target ID:           ttcp_BtT22pA3xC
    Authorized Actions:
      read
      read:self
      cancel
      cancel:self

  ID:                    s_criMqnwVZt
    Status:              active
    Created Time:        Tue, 30 Mar 2021 09:25:51 KST
    Expiration Time:     Tue, 30 Mar 2021 09:26:51 KST
    Updated Time:        Tue, 30 Mar 2021 09:26:14 KST
    User ID:             u_XUObIl8QBr
    Target ID:           ttcp_BtT22pA3xC
    Authorized Actions:
      read
      read:self
      cancel
      cancel:self

  ID:                    s_aR901r6OlH
    Status:              canceling
    Created Time:        Tue, 30 Mar 2021 09:26:05 KST
    Expiration Time:     Tue, 30 Mar 2021 09:27:05 KST
    Updated Time:        Tue, 30 Mar 2021 09:26:14 KST
    User ID:             u_XUObIl8QBr
    Target ID:           ttcp_BtT22pA3xC
    Authorized Actions:
      read
      read:self
      cancel
      cancel:self

  ID:                    s_lmUR5KVf2i
    Status:              canceling
    Created Time:        Tue, 30 Mar 2021 09:30:30 KST
    Expiration Time:     Tue, 30 Mar 2021 09:31:30 KST
    Updated Time:        Tue, 30 Mar 2021 09:31:19 KST
    User ID:             u_XUObIl8QBr
    Target ID:           ttcp_BtT22pA3xC
    Authorized Actions:
      read
      read:self
      cancel
      cancel:self

  ID:                    s_bXtV1HsLbp
    Status:              canceling
    Created Time:        Tue, 30 Mar 2021 09:32:23 KST
    Expiration Time:     Tue, 30 Mar 2021 09:33:23 KST
    Updated Time:        Tue, 30 Mar 2021 09:32:43 KST
    User ID:             u_XUObIl8QBr
    Target ID:           ttcp_BtT22pA3xC
    Authorized Actions:
      read
      read:self
      cancel
      cancel:self

Most of them are terminated and canceling but a few of active or pending sessions are unable to cancel.
Is there a way to clear those session forcefully?

@jsp-exhashi
Copy link

I've got this error from Boundary UI.

unable to update session: session.
(Repository).CancelSession: session.
(Repository).updateState: error creating new state: db.DoTx: session.
(Repository).updateState: updated session and 0 rows updated: search issue: error #1101

@jefferai
Copy link
Member

jefferai commented Mar 30, 2021

We've identified that there is a bug where a session active on a worker when the worker goes down (permanently, or temporarily via e.g. a reboot) will never be marked as terminated. It doesn't mean any traffic can flow -- if the session is expired or is connection-count-limited and has no connections left, it will not continue to function -- but it does keep the state from transitioning.

We currently plan to address this in the 0.2.x series.

@jefferai jefferai changed the title Sessions are not closing/terminated Sessions are not closing/terminated [ICU-1257] Mar 30, 2021
@jsp-exhashi
Copy link

@jefferai Thanks for the update.

@Theragus
Copy link
Contributor

Hi,

just noticed this issue open while also having kinda the same issue with #1055

Seems to be related?

@jefferai
Copy link
Member

We've identified a number of tasks to try to ensure we properly detect and clean up stale sessions. Likely one of these patches will be most impactful to this situation. Would anyone be interested in testing out a potential patch once it's ready? It wouldn't completely fix all possible issues, but I think it will help here.

@Theragus
Copy link
Contributor

Hi @jefferai ,

yes, if there was a patch for this, we would roll this out to our current deployment to see if something changes in the environment. We currently have a couple of Controllers and Workers deployed in several Public Cloud to see how Boundary works out.

We're currently running Boundary 0.2.0 on all Clients and Servers.

@jefferai
Copy link
Member

Hi @Theragus ,

I can send you a current build which is going to be very similar to the upcoming 0.2.2 release (so we don't anticipate any changes), but it requires database migrations so you'd either have to treat it as a one-way upgrade or upgrade, test, then revert back to a pre-upgrade database snapshot. If you want the build, let me know which platform(s) and I can send it along.

@Theragus
Copy link
Contributor

Hi @jefferai,

All our workers and controllers are running on Ubuntu 18.04 LTS.
For our Boundary clients we have a mix of macOS and Arch Linux.

@jefferai
Copy link
Member

Sounds good. I think we're going to stage 0.2.2 today, which means I could give you possibly-final binaries. Then you could test it out, and if all goes well, we'll release in a couple of days and it'll be the binary you already have.

@jefferai
Copy link
Member

We didn't end up staging today, but here are some binaries for you to test. You'll need to boundary database migrate. Be sure to back up beforehand :-D
darwin_amd64.zip
linux_amd64.zip

@Theragus
Copy link
Contributor

Hi, as far as i see it, this has already been fixed in a older version of boundary, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants