From man 1 ps
:
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced
Z defunct ("zombie") process, terminated but not reaped by its parent
-
What is the Load Average metric? Use the Linux Process States and
man 5 proc
(search for loadavg) -
Start the disk stress script (NOTE: avoid running it on your own SSD):
scripts/disk/writer.sh
-
Run the following command and look at the Load values for about a minute until
ldavg-1
stabilizes:sar -q 1 100
- What is the writing speed of our script (ignore the first value, this is EBS General Purpose IOPS Burst)?
- What is the current Load Average? Why? Which processes contribute to this number?
- What are CPU %user, %IO-wait and %idle?
-
While the previous script is running, start a single CPU stress:
stress -c 1 -t 3600
Wait another minute, and answer the questions above again.
-
Stop all the scripts
- Why are processes waiting for IO included in the Load Average?
- Assuming we have 1 CPU core and Load of 5, is our CPU core on 100% utilization?
- How can we know if load is going up or down?
- Does a load average of 70 indicate a problem?
- Most tools use
/proc/loadavg
to fetch Load Average and run queue information. - Have a look at
/proc/pressure/cpu
for new PSI metrics - To get a percentage over a specific interval of time, you can use:
sar -q <interval> <count>
-q
queue length and load averages
- or simply
uptime
- Use eBPF based tools to observe the run queue:
runqlat
andrunqlen