Poor parameterized job scheduling performance #4736

plaisted · 2018-09-29T19:28:15Z

Nomad version

Nomad v0.8.6 (ab54ebc+CHANGES)

Operating system and Environment details

Windows Server 2012 R2 Standard
"Development" nomad setup. Single server, shared nomad exe for client and server.

Issue

I'm researching replacing our current job scheduling setup with Nomad. Nomad is already used for some of our services so it would make sense to use the same platform and scheduler that takes into account resource allocation across the services and batch jobs. Reading https://www.hashicorp.com/blog/replacing-queues-with-nomad-dispatch suggests this to be a viable option.

The blog does include a disclaimer,

Nomad Dispatch is not as suitable for high volumes of requests which take very short time to process. Nomad can schedule thousands of tasks per second, but the overhead of scheduling is very high. For most long-running processes, this time is amortized over the life of the job, but for processes that complete very quickly, the scheduling component may increase total processing time, especially in large quantities.

but the performance I'm observing seems like it would be prohibitive in many batch processing scenario where jobs should be queued.

Some results below:

Scheduling 100 jobs to run without any parallelism results in the jobs that take 3 seconds to complete taking ~6.5 seconds to complete per execution meaning the scheduling/startup alone takes 3.5 seconds on average. This seems slow but not prohibitive. Once the level of parallelism is raised (10+ jobs in parallel) by adjusting required resources for the job the scheduling slows down nearly an order of magnitude taking 20+ seconds for a job to be scheduled. Increasing the level of parallelism much beyond this caused Nomad to stop functioning after queuing a small portion of the jobs.

Our current queued approach adds on the order of 50-100ms per job for scheduling/setup. Obviously Nomad bring a lot more to the table than just queuing / scheduling so there would be trade offs. At 20-30 seconds a job it seems like something is either going wrong with the system or there's a lot of room for improvement. Is there any way to reduce the scheduling time or is this just normal for Nomad?

Reproduction steps

Add job file below. The job/bat simply outputs the run to a txt file to track the time of completion.
Burst schedule a large number of jobs (eg. 1000). I'm working in a windows environment for this so am using powershell: 1..1000 | % { Start-Process -FilePath .\nomad.exe -ArgumentList "job","dispatch","-meta","TEXT=$_","test-param" -WindowStyle Hidden}
Run different numbers of jobs at different levels of parallelism (by adjusting resources required).

Job file (if appropriate)

job

job "test-param" {
  type = "batch"
  
  datacenters = ["dc1"]

  parameterized {
    meta_required = ["TEXT"]
  }

  group "text-sayer" {
    # ...

    task "say-text" {

      driver = "raw_exec"

        resources {
            cpu = 250
            memory = 10
        }
      config {
        command = "c:/temp/test.bat"
        args = ["${NOMAD_META_TEXT}"]
      }
    }
  }
}

c:/temp/test.bat

ping localhost
echo %1 > c:\temp\%1.txt

The text was updated successfully, but these errors were encountered:

Miserlou · 2018-10-01T14:39:38Z

Related: #4697

Some other reasons you probably don't want to use parameterized Nomad jobs for production workloads:

#4639
#4323

(If you're still evaluating using this software, it is my recommendation that it is not production-ready yet.)

plaisted · 2018-10-01T16:59:36Z

@Miserlou I agree with your overall assessment of the current state of batch scheduling in Nomad.

I noticed similar issues to #4323 when batching large numbers of jobs however I was just using a single server setup at our DC not a large cluster. A portion of jobs would be scheduled at a consistent rate and then CPU usage would spike and scheduling would just essentially stop (a couple jobs, 1-3ish, would schedule a minute after that). Hard restarting the nomad services would cause a short period of normal scheduling to occur but then the problems would occur again. I had to wipe the nomad state in order to get it functioning again.

Miserlou · 2018-10-01T17:34:45Z

Yeah, that problem gets worse as you add more jobs. For instance, there is no ability to remove jobs in batches, so if you need to make a change, it has to be done one by one, which can take many hours. You'll find that sort of issue at every possible turn.

We went with Nomad because of the claims made in the blog post you linked and it has been a total disaster. A lot of my life is now spent fighting with this software.

Poor performance:
#4697

Poor stability:
#4323

Poor telemetry:
#4422

Basic features missing and poor responsiveness from maintainers:
#4639

I get the impression that HC has never used Nomad dispatch jobs for production workloads, and they do not have the resources invested in this product to make it competitive with other offerings like Kubernetes.

My two cents.

dadgar · 2018-10-01T18:23:06Z

@plaisted I believe your issue is that you are running all this on a single machine and the measurement method is odd since the placement time is dependent on other jobs finishing on a single machine. You are essentially creating a head of line blocking scenario.

On my laptop I did the following:

nomad agent -dev
Modify the resource ask of the job you gave to memory = 100000 so my locally machine wouldn't be able to run it. This takes running the tasks out of the equation and lets us test scheduler performance.
I ran this script: https://gist.github.com/dadgar/a8da80b1a1d7943d1522ca570b21a40f

Then end result: 2018/10/01 11:17:58 Submitting jobs took: 412.640895ms
I dispatched 1000 jobs in less than half a second. So scheduling speed is not an issue here. It is the setup of the test.

Hope this helps.

plaisted · 2018-10-01T19:09:58Z

@dadgar Maybe I'm using the term scheduling wrong. The jobs are accepted by nomad nearly instantly as you have noted. Based on https://www.nomadproject.io/docs/internals/scheduling.html it appears Nomad considering the allocation to be part of "scheduling" which your example does not include. Regardless I'm referring to the time it takes for a job to be allocated and run. None of the jobs in your test get allocated which is what is slow in my example.

My test takes into account how long the tasks themselves take to run. Running 1000 jobs that take 3 seconds with no parallelism would take 3000 seconds in a perfect world. Running that scenario in Nomad actually takes 6000 seconds, meaning the system is sitting spending half the time either idle or working to allocate/schedule the tasks. If we run jobs with a high degree of parallelism it gets much worse, instead of half the time nomad spends closer to 90% of the time sitting idle or determining where to schedule jobs.

dadgar · 2018-10-01T19:16:18Z

@plaisted Can you run your test again with the following and report back:
$ nomad agent -dev -config config.hcl

config.hcl:

client {
    gc_max_allocs = 1000
    gc_parallel_destroys = 16
}

plaisted · 2018-10-01T19:46:05Z

This helped. For degree of parallelism of 1 the scheduling time per task went from ~3.5s to ~2s. For high degree of parallelism (17) it reduced from ~25s to ~2s.

I'll go ahead and close this issue as 2 seconds seems to fall within the expectations set in the documentation / blog. It would be nice if this could be further optimized as there are a lot of use cases where 2s of added latency would be considered too high.

dadgar · 2018-10-01T19:57:38Z

@plaisted Agreed. We will be looking into this code path to optimize it. Hopefully for the next release

github-actions · 2022-11-28T02:19:37Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar closed this as completed Oct 1, 2018

dadgar reopened this Oct 1, 2018

dadgar added the stage/waiting-reply label Oct 1, 2018

plaisted closed this as completed Oct 1, 2018

endocrimes mentioned this issue Dec 7, 2018

[WIP] client/gc Performance enhancements #4971

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor parameterized job scheduling performance #4736

Poor parameterized job scheduling performance #4736

plaisted commented Sep 29, 2018

Miserlou commented Oct 1, 2018

plaisted commented Oct 1, 2018

Miserlou commented Oct 1, 2018 •

edited

Loading

dadgar commented Oct 1, 2018

plaisted commented Oct 1, 2018

dadgar commented Oct 1, 2018 •

edited

Loading

plaisted commented Oct 1, 2018

dadgar commented Oct 1, 2018

github-actions bot commented Nov 28, 2022

Poor parameterized job scheduling performance #4736

Poor parameterized job scheduling performance #4736

Comments

plaisted commented Sep 29, 2018

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

Miserlou commented Oct 1, 2018

plaisted commented Oct 1, 2018

Miserlou commented Oct 1, 2018 • edited Loading

dadgar commented Oct 1, 2018

plaisted commented Oct 1, 2018

dadgar commented Oct 1, 2018 • edited Loading

plaisted commented Oct 1, 2018

dadgar commented Oct 1, 2018

github-actions bot commented Nov 28, 2022

Miserlou commented Oct 1, 2018 •

edited

Loading

dadgar commented Oct 1, 2018 •

edited

Loading