Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong desired status in periodic tasks. #11017

Closed
TheSpbra1n opened this issue Aug 9, 2021 · 3 comments
Closed

Wrong desired status in periodic tasks. #11017

TheSpbra1n opened this issue Aug 9, 2021 · 3 comments
Labels

Comments

@TheSpbra1n
Copy link

TheSpbra1n commented Aug 9, 2021

Nomad version

Nomad v1.1.2 (60638a0)

Operating system and Environment details

Codename: xenial

Issue

We have a lot of old periodic task that are not cleaned by GC.
All of them have wrong desired status - "run"

Job file (if appropriate)

# job name
job "correct_suspended_tickets" {
  region      = "global"
  datacenters = ["dc1", "dc2"]

  type = "batch"

  periodic {
    cron      = "00,30 * * * * *"
    time_zone = "Europe/Moscow"
	prohibit_overlap = true
  }

  constraint {
    attribute = "${node.class}"
    value     = "nomad1"
  }

  meta {
    # Application version
    APP_VERSION       = "0.0.1"
    START_SCRIPT = "xxx"
  }

  group "correct_suspended_tickets" {
    count = 1

    task "correct_suspended_tickets" {
      kill_signal = "SIGTERM"
      driver = "docker"

      config {
        image        = "docker-registry/container:${NOMAD_META_APP_VERSION}"
        advertise_ipv6_address = true
        network_mode = "bridge"
        command = "${NOMAD_META_START_SCRIPT}"
      }

      resources {
        memory = 512 # MB
      }
    }
  }
}

Example status of old periodic task:

# nomad status correct_suspended_tickets/periodic-1627907400
ID            = correct_suspended_tickets/periodic-1627907400
Name          = correct_suspended_tickets/periodic-1627907400
Submit Date   = 2021-08-02T15:30:00+03:00
Type          = batch
Priority      = 50
Datacenters   = dc1,dc2
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group                 Queued  Starting  Running  Failed  Complete  Lost
correct_suspended_tickets  0       0         0        0       1         0

Allocations
ID        Node ID   Task Group                 Version  Desired  Status    Created   Modified
ff069489  92048b59  correct_suspended_tickets  0        run      complete  7d1h ago  19h50m ago

Alloc status:

# nomad status ff069489
ID                  = ff069489-20c4-aba3-650e-845e90d03c25
Eval ID             = 79b5982f
Name                = correct_suspended_tickets/periodic-1627907400.correct_suspended_tickets[0]
Node ID             = 92048b59
Node Name           = docker-18
Job ID              = correct_suspended_tickets/periodic-1627907400
Job Version         = 0
Client Status       = complete
Client Description  = All tasks have completed
Desired Status      = run
Desired Description = <none>
Created             = 7d1h ago
Modified            = 19h51m ago

Task "correct_suspended_tickets" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  512 MiB  300 MiB

Task Events:
Started At     = 2021-08-02T12:31:39Z
Finished At    = 2021-08-02T12:31:41Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type      Description
2021-08-08T21:35:10+03:00  Killing   Sent interrupt. Waiting 5s before force killing
2021-08-08T21:35:08+03:00  Received  Task received by client
2021-08-08T03:51:17+03:00  Killing   Sent interrupt. Waiting 5s before force killing
2021-08-07T22:01:11+03:00  Received  Task received by client
2021-08-06T23:15:00+03:00  Killing   Sent interrupt. Waiting 5s before force killing
2021-08-06T22:20:19+03:00  Received  Task received by client
2021-08-06T00:50:11+03:00  Killing   Sent interrupt. Waiting 5s before force killing
2021-08-06T00:50:10+03:00  Received  Task received by client
2021-08-04T17:22:56+03:00  Killing   Sent interrupt. Waiting 5s before force killing
2021-08-04T03:50:46+03:00  Received  Task received by client
@TheSpbra1n
Copy link
Author

Another task and alloc status:

# nomad status correct_suspended_tickets/periodic-1627905600
ID            = correct_suspended_tickets/periodic-1627905600
Name          = correct_suspended_tickets/periodic-1627905600
Submit Date   = 2021-08-02T15:00:00+03:00
Type          = batch
Priority      = 50
Datacenters   = dc1,dc2
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group                 Queued  Starting  Running  Failed  Complete  Lost
correct_suspended_tickets  0       0         0        0       1         0

Allocations
ID        Node ID   Task Group                 Version  Desired  Status    Created   Modified
7a4493e2  ade44cb7  correct_suspended_tickets  0        run      complete  7d2h ago  18h46m ago

# nomad status 7a4493e2
ID                  = 7a4493e2-a447-648a-af6a-97a8538cfb57
Eval ID             = 983ae5ca
Name                = correct_suspended_tickets/periodic-1627905600.correct_suspended_tickets[0]
Node ID             = ade44cb7
Node Name           = docker-47
Job ID              = correct_suspended_tickets/periodic-1627905600
Job Version         = 0
Client Status       = complete
Client Description  = All tasks have completed
Desired Status      = run
Desired Description = <none>
Created             = 7d2h ago
Modified            = 18h46m ago

Task "correct_suspended_tickets" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  512 MiB  300 MiB

Task Events:
Started At     = 2021-08-02T12:01:21Z
Finished At    = 2021-08-02T12:01:22Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2021-08-08T22:50:06+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2021-08-08T22:50:04+03:00  Received    Task received by client
2021-08-07T01:54:58+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2021-08-07T01:54:56+03:00  Received    Task received by client
2021-08-06T03:20:09+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2021-08-06T03:20:08+03:00  Received    Task received by client
2021-08-05T05:05:08+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2021-08-05T05:05:05+03:00  Received    Task received by client
2021-08-04T16:42:49+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2021-08-02T15:01:22+03:00  Terminated  Exit Code: 0

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 19, 2021

Thanks for the report @TheSpbra1n. I think the desired status is not impacting GC here since it uses the allocation's TerminalStatus(), which would be true when the client status is complete, as in your case.

I'm going to close this issue and re-open #10456 since it has more context and information.

@lgfa29 lgfa29 closed this as completed Aug 19, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants