Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic jobs duplicate pending on children job summary #8692

Closed
Brinuz opened this issue Aug 19, 2020 · 5 comments · Fixed by #9768
Closed

Periodic jobs duplicate pending on children job summary #8692

Brinuz opened this issue Aug 19, 2020 · 5 comments · Fixed by #9768
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling type/bug
Milestone

Comments

@Brinuz
Copy link

Brinuz commented Aug 19, 2020

Nomad version

Nomad v0.12.3
(I could replicate it in all versions after v0.12.1, but not on v0.12.0)

Operating system and Environment details

OSX v10.15.5
I'm simply running the agent in -dev to reproduce it.

Issue

This seems to happen only on periodic jobs.

After a job is normally run or forced, the job children summary ends up with 1 pending and 1 done.
Seems like it initially increases the pending count to 2 and eventually changes the count to 1 pending and 1 running, and eventually to 1 pending and 1 done.

Keep in mind: I'm only running the job once and in fact, only one job is launched.

If I let it run again, the same happens, except I will end up with 2 pending and 2 done (forever).

ID                   = test
Name                 = test
Submit Date          = 2020-08-19T15:27:10+01:00
Type                 = batch
Priority             = 50
Datacenters          = dc1
Namespace            = default
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 2020-08-19T15:07:00Z (34m45s from now)

Children Job Summary
Pending  Running  Dead
2        0        2

Previously Launched Jobs
ID                        Status
test/periodic-1597847482  dead
test/periodic-1597847530  dead

Of course, I would expect to see it going from 1 pending to 1 running and ending up in 1 dead and on and on.

For each new job launch, there's always a new extra pending in the summary.

Reproduction steps

  1. Start the agent nomad agent -dev
  2. Run the periodic nomad job run test.nomad which schedules the job to eventually run.
  3. Optional: Force the job to run nomad job periodic force test
  4. Wait a moment to let the job finish (or check it to see it running).
  5. Run nomad job status test to check the above output (or similar) with X pending and X done.
ID                   = test
Name                 = test
Submit Date          = 2020-08-19T15:41:15+01:00
Type                 = batch
Priority             = 50
Datacenters          = dc1
Namespace            = default
Status               = running
Periodic             = true
Parameterized        = false
Next Periodic Launch = 2020-08-19T14:42:00Z (31s from now)

Children Job Summary
Pending  Running  Dead
1        0        1

Previously Launched Jobs
ID                        Status
test/periodic-1597848080  dead

Job file (if appropriate)

Job file from the example above (test.nomad):

job "test" {
  datacenters = ["dc1"]
  type = "batch"

  periodic {
    cron = "* * * * *"
    prohibit_overlap = true
  }

  group "group" {
    task "task" {
      driver = "docker"

      config {
        image = "alpine:latest"
        command = "ls"
      }
    }
  }
}

Nomad agent logs from start to finish.

==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
            Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 0.12.3

==> Nomad agent started! Log data will stream in below:

    2020-08-19T15:40:55.729+0100 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2020-08-19T15:40:55.729+0100 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.729+0100 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2020-08-19T15:40:55.730+0100 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
    2020-08-19T15:40:55.730+0100 [INFO]  nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
    2020-08-19T15:40:55.730+0100 [INFO]  nomad: serf: EventMemberJoin: Brunos-MacBook-Pro.local.global 127.0.0.1
    2020-08-19T15:40:55.730+0100 [INFO]  nomad: starting scheduling worker(s): num_workers=16 schedulers=[system, service, batch, _core]
    2020-08-19T15:40:55.730+0100 [INFO]  nomad: adding server: server="Brunos-MacBook-Pro.local.global (Addr: 127.0.0.1:4647) (DC: dc1)"
    2020-08-19T15:40:55.730+0100 [INFO]  client: using state directory: state_dir=/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient168750967
    2020-08-19T15:40:55.730+0100 [INFO]  client: using alloc directory: alloc_dir=/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714
    2020-08-19T15:40:55.758+0100 [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=[arch, cni, consul, cpu, host, memory, network, nomad, signal, storage, vault, env_aws, env_gce]
    2020-08-19T15:40:55.758+0100 [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=
    2020-08-19T15:40:55.758+0100 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul period=15s
    2020-08-19T15:40:55.758+0100 [DEBUG] client.fingerprint_mgr.cpu: detected cpu frequency: MHz=2300
    2020-08-19T15:40:55.758+0100 [DEBUG] client.fingerprint_mgr.cpu: detected core count: cores=16
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=127.0.0.1
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=::1
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000    2020-08-19T15:41:01.850+0100 [DEBUG] client: updated allocations: index=1 total=0 pulled=0 filtered=0

    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.785+0100 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2020-08-19T15:40:55.788+0100 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault period=15s
    2020-08-19T15:40:57.134+0100 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
    2020-08-19T15:40:57.134+0100 [INFO]  nomad.raft: entering candidate state: node="Node at 127.0.0.1:4647 [Candidate]" term=2
    2020-08-19T15:40:57.134+0100 [DEBUG] nomad.raft: votes: needed=1
    2020-08-19T15:40:57.134+0100 [DEBUG] nomad.raft: vote granted: from=127.0.0.1:4647 term=2 tally=1
    2020-08-19T15:40:57.134+0100 [INFO]  nomad.raft: election won: tally=1
    2020-08-19T15:40:57.134+0100 [INFO]  nomad.raft: entering leader state: leader="Node at 127.0.0.1:4647 [Leader]"
    2020-08-19T15:40:57.134+0100 [INFO]  nomad: cluster leadership acquired
    2020-08-19T15:40:57.136+0100 [INFO]  nomad.core: established cluster id: cluster_id=47606d41-60db-9b7a-f198-fc05a7b07d70 create_time=1597848057135864000
    2020-08-19T15:41:01.850+0100 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0    2020-08-19T15:41:01.789+0100 [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type error="Get "http://169.254.169.254/computeMetadata/v1/instance/machine-type": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
    2020-08-19T15:41:01.789+0100 [DEBUG] client.fingerprint_mgr.env_gce: error querying GCE Metadata URL, skipping
    2020-08-19T15:41:01.789+0100 [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=[arch, cpu, host, network, nomad, signal, storage]
    2020-08-19T15:41:01.789+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2020-08-19T15:41:01.789+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2020-08-19T15:41:01.790+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2020-08-19T15:41:01.790+0100 [DEBUG] client.device_mgr: exiting since there are no device plugins
    2020-08-19T15:41:01.790+0100 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
    2020-08-19T15:41:01.790+0100 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
    2020-08-19T15:41:01.790+0100 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver

    2020-08-19T15:41:01.790+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=undetected description="exec driver unsupported on client OS"
    2020-08-19T15:41:01.790+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=healthy description=Healthy
    2020-08-19T15:41:01.790+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=mock_driver health=healthy description=Healthy
    2020-08-19T15:41:01.850+0100 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
    2020-08-19T15:41:01.790+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=
    2020-08-19T15:41:01.793+0100 [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:4647] old_servers=[]
    2020-08-19T15:41:01.816+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
    2020-08-19T15:41:01.849+0100 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=healthy description=Healthy
    2020-08-19T15:41:01.849+0100 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[raw_exec mock_driver docker java] undetected:[exec qemu]]"
    2020-08-19T15:41:01.849+0100 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
    2020-08-19T15:41:01.849+0100 [INFO]  client: started client: node_id=b464aded-c53f-ad31-26fd-ae2019dae6b7
    2020-08-19T15:41:01.851+0100 [INFO]  client: node registration complete
    2020-08-19T15:41:01.852+0100 [DEBUG] client: state updated: node_status=ready
    2020-08-19T15:41:02.851+0100 [DEBUG] client: state changed, updating node and re-registering
    2020-08-19T15:41:02.853+0100 [INFO]  client: node registration complete
    2020-08-19T15:41:15.440+0100 [DEBUG] nomad.periodic: registered periodic job: job="<ns: "default", id: "test">"
    2020-08-19T15:41:15.440+0100 [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=44.55994s job="<ns: "default", id: "test">"
    2020-08-19T15:41:15.440+0100 [DEBUG] http: request complete: method=PUT path=/v1/jobs duration=1.62014ms
    2020-08-19T15:41:20.559+0100 [DEBUG] http: request complete: method=GET path=/v1/jobs?prefix=test duration=222.283µs
    2020-08-19T15:41:20.560+0100 [DEBUG] worker: dequeued evaluation: eval_id=0c9cb28c-0b82-14fb-0d7f-cd7750792e86
    2020-08-19T15:41:20.560+0100 [DEBUG] http: request complete: method=PUT path=/v1/job/test/periodic/force?namespace=default duration=570.576µs
    2020-08-19T15:41:20.560+0100 [DEBUG] worker.batch_sched: reconciled current state with desired state: eval_id=0c9cb28c-0b82-14fb-0d7f-cd7750792e86 job_id=test/periodic-1597848080 namespace=default results="Total changes: (place 1) (destructive 0) (inplace 0) (stop 0)
Desired Changes for "group": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)"
    2020-08-19T15:41:20.561+0100 [DEBUG] http: request complete: method=GET path=/v1/evaluation/0c9cb28c-0b82-14fb-0d7f-cd7750792e86 duration=452.598µs
    2020-08-19T15:41:20.561+0100 [DEBUG] worker: submitted plan for evaluation: eval_id=0c9cb28c-0b82-14fb-0d7f-cd7750792e86
    2020-08-19T15:41:20.562+0100 [DEBUG] worker.batch_sched: setting eval status: eval_id=0c9cb28c-0b82-14fb-0d7f-cd7750792e86 job_id=test/periodic-1597848080 namespace=default status=complete
    2020-08-19T15:41:20.562+0100 [DEBUG] worker: updated evaluation: eval="<Eval "0c9cb28c-0b82-14fb-0d7f-cd7750792e86" JobID: "test/periodic-1597848080" Namespace: "default">"
    2020-08-19T15:41:20.562+0100 [DEBUG] client: updated allocations: index=12 total=1 pulled=1 filtered=0
    2020-08-19T15:41:20.562+0100 [DEBUG] client: allocation updates: added=1 removed=0 updated=0 ignored=0
    2020-08-19T15:41:20.563+0100 [DEBUG] worker: ack evaluation: eval_id=0c9cb28c-0b82-14fb-0d7f-cd7750792e86
    2020-08-19T15:41:20.563+0100 [DEBUG] http: request complete: method=GET path=/v1/evaluation/0c9cb28c-0b82-14fb-0d7f-cd7750792e86/allocations duration=1.100824ms
    2020-08-19T15:41:20.564+0100 [DEBUG] client: allocation updates applied: added=1 removed=0 updated=0 ignored=0 errors=0
    2020-08-19T15:41:20.565+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task path=/usr/local/bin/nomad args=[/usr/local/bin/nomad, logmon]
    2020-08-19T15:41:20.567+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task path=/usr/local/bin/nomad pid=8392
    2020-08-19T15:41:20.567+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task path=/usr/local/bin/nomad
    2020-08-19T15:41:20.575+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task version=2
    2020-08-19T15:41:20.575+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task @module=logmon address=/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/plugin734508847 network=unix timestamp=2020-08-19T15:41:20.575+0100
    2020-08-19T15:41:20.577+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task @module=logmon path=/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714/5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6/alloc/logs/.task.stdout.fifo timestamp=2020-08-19T15:41:20.577+0100
    2020-08-19T15:41:20.577+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task @module=logmon path=/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714/5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6/alloc/logs/.task.stderr.fifo timestamp=2020-08-19T15:41:20.577+0100
    2020-08-19T15:41:20.706+0100 [DEBUG] client: updated allocations: index=14 total=1 pulled=0 filtered=1
    2020-08-19T15:41:20.706+0100 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=1
    2020-08-19T15:41:20.706+0100 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=1 errors=0
    2020-08-19T15:41:21.567+0100 [DEBUG] http: request complete: method=GET path=/v1/evaluation/0c9cb28c-0b82-14fb-0d7f-cd7750792e86 duration=758.527µs
    2020-08-19T15:41:21.568+0100 [DEBUG] http: request complete: method=GET path=/v1/evaluation/0c9cb28c-0b82-14fb-0d7f-cd7750792e86/allocations duration=244.94µs
    2020-08-19T15:41:22.748+0100 [DEBUG] client.driver_mgr.docker: docker pull succeeded: driver=docker image_ref=alpine:latest
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: image reference count incremented: driver=docker image_name=alpine:latest image_id=sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e references=1
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: configured resources: driver=docker task_name=task memory=314572800 memory_reservation=0 cpu_shares=100 cpu_quota=0 cpu_period=0
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: binding directories: driver=docker task_name=task binds="[]string{"/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714/5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6/alloc:/alloc", "/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714/5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6/task/local:/local", "/private/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/NomadClient025092714/5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6/task/secrets:/secrets"}"
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: networking mode not specified; using default: driver=docker task_name=task
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: setting container startup command: driver=docker task_name=task command=ls
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: applied labels on the container: driver=docker task_name=task labels=map[com.hashicorp.nomad.alloc_id:5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6]
    2020-08-19T15:41:22.751+0100 [DEBUG] client.driver_mgr.docker: setting container name: driver=docker task_name=task container_name=task-5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6
    2020-08-19T15:41:22.800+0100 [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=2da5b76896672d90e22321fa8ca295f9efa9a519205706ce506856f8e331f697
    2020-08-19T15:41:23.024+0100 [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=2da5b76896672d90e22321fa8ca295f9efa9a519205706ce506856f8e331f697
    2020-08-19T15:41:23.024+0100 [DEBUG] client.driver_mgr.docker.docker_logger: starting plugin: driver=docker path=/usr/local/bin/nomad args=[/usr/local/bin/nomad, docker_logger]
    2020-08-19T15:41:23.027+0100 [DEBUG] client.driver_mgr.docker.docker_logger: plugin started: driver=docker path=/usr/local/bin/nomad pid=8417
    2020-08-19T15:41:23.027+0100 [DEBUG] client.driver_mgr.docker.docker_logger: waiting for RPC address: driver=docker path=/usr/local/bin/nomad
    2020-08-19T15:41:23.035+0100 [DEBUG] client.driver_mgr.docker.docker_logger.nomad: plugin address: driver=docker network=unix @module=docker_logger address=/var/folders/6y/sbdf048s6tj5bb4l6swdqcm80000gn/T/plugin615273124 timestamp=2020-08-19T15:41:23.035+0100
    2020-08-19T15:41:23.035+0100 [DEBUG] client.driver_mgr.docker.docker_logger: using plugin: driver=docker version=2
    2020-08-19T15:41:23.036+0100 [DEBUG] client.driver_mgr.docker.docker_logger.nomad: using client connection initialized from environment: driver=docker @module=docker_logger timestamp=2020-08-19T15:41:23.036+0100
    2020-08-19T15:41:23.101+0100 [DEBUG] client: updated allocations: index=15 total=1 pulled=0 filtered=1
    2020-08-19T15:41:23.101+0100 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=1
    2020-08-19T15:41:23.101+0100 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=1 errors=0
    2020-08-19T15:41:23.178+0100 [DEBUG] client.driver_mgr.docker.docker_logger: plugin process exited: driver=docker path=/usr/local/bin/nomad pid=8417
    2020-08-19T15:41:23.178+0100 [DEBUG] client.driver_mgr.docker.docker_logger: plugin exited: driver=docker
    2020-08-19T15:41:23.179+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task with unrecoverable error: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task error="container stopped"
    2020-08-19T15:41:23.183+0100 [DEBUG] client.driver_mgr.docker: image id reference count decremented: driver=docker image_id=sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e references=0
    2020-08-19T15:41:23.183+0100 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task reason="Restart unnecessary as task terminated successfully"
    2020-08-19T15:41:23.184+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6
    2020-08-19T15:41:23.185+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task path=/usr/local/bin/nomad pid=8392
    2020-08-19T15:41:23.185+0100 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task
    2020-08-19T15:41:23.185+0100 [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=5c9ba6fc-3031-7ee1-e233-2f0aee40c9c6 task=task
    2020-08-19T15:41:23.302+0100 [DEBUG] client: updated allocations: index=16 total=1 pulled=0 filtered=1
    2020-08-19T15:41:23.302+0100 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=1
    2020-08-19T15:41:23.302+0100 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=1 errors=0
    2020-08-19T15:41:29.807+0100 [DEBUG] http: request complete: method=GET path=/v1/jobs?prefix=test duration=265.894µs
    2020-08-19T15:41:29.809+0100 [DEBUG] http: request complete: method=GET path=/v1/job/test?namespace=default duration=1.267357ms
    2020-08-19T15:41:29.811+0100 [DEBUG] http: request complete: method=GET path=/v1/job/test/summary?namespace=default duration=217.174µs
    2020-08-19T15:41:29.812+0100 [DEBUG] http: request complete: method=GET path=/v1/jobs?prefix=test%2Fperiodic- duration=111.132µs
@tgross
Copy link
Member

tgross commented Aug 24, 2020

Hi @Brinuz. Thanks for providing an easy reproduction on this. We'll take a look

@jorgemarey
Copy link
Contributor

Hi @tgross, just encountered this too. Could this have something to do with the changes in #8435 ?

@drewbailey
Copy link
Contributor

drewbailey commented Jan 8, 2021

Hey @jorgemarey this is related to the changes in #8435, The team is discussing a few approaches to fix the issue.

The index == job.CreateIndex check was valid prior to #8435 but now it causessetJobStatus runs twice causing the count inaccurately increase

func (s *StateStore) setJobStatus(index uint64, txn *txn,
job *structs.Job, evalDelete bool, forceStatus string) error {
// Capture the current status so we can check if there is a change
oldStatus := job.Status
if index == job.CreateIndex {
oldStatus = ""
}

I am working on further debugging and testing and will hopefully have another update soon

@tgross
Copy link
Member

tgross commented Jan 12, 2021

Possibly related: #9775

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Jan 12, 2021
@tgross tgross added this to the 1.0.4 milestone Feb 5, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/batch Issues related to batch jobs and scheduling type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants