EDUCAÇÃO E TECNOLOGIA

What is ‘Load Average’?

Load average is an age-old metric, which is in existence since the 1970s to indicate whether the system is under heavy/average/low load. It is useful to indicate whether the system’s load is on an increasing trend or a decreasing trend. In this article, let’s learn more about ‘Load Average’.

How to understand ‘Load Average’?

In most cases, ‘Load average’ is reported as an average of 1 minute, 5 minutes and 15 minutes. Please refer to the below screenshot:

Fig: Load Average

 1-minute load average is 6.00

5-minutes load average is 5.48

15-minutes load average is 3.25

It’s hard to say whether ‘Load Average’ is high, normal, or low without knowing the number of CPUs present in the system. You can find out the number of CPUs in the system through one of the approaches given here.

 1 CPU: 

In the above example, we saw 1-minute load average as ‘6.00’. Suppose the above system has just 1 CPU, then it indicates ‘Load Average’ is quite high on this machine. It means this system is processing 600% load in the last minute. We derive 600% because

= (Load Average / Number of CPUs) x 100

= (6.00 / 1) x 100

= 600%

Thus, the system is 500% overloaded more than the capacity it can handle.

 12 CPUs:

Say suppose the above system has 12 CPUs, it means ‘Load Average’ is normal. Since the 1-minute ‘Load Average’ is ‘6.00’, it means only 50% of the compute capacity is utilized. We derive 50% because

= (Load Average / Number of CPUs) x 100

= (6.00 / 12) x 100

= 50%

 24 CPUs:

Suppose the above system has 24 CPUs, it means ‘Load Average’ is quite low. Since the 1 minute ‘Load Average’ is ‘6.00’, it means only 16.66% of the compute capacity is utilized. We derive 16.66% because

= (Load Average / Number of CPUs) x 100

= (6.00 / 24) x 100

= 16.66%

In this scenario, we can fairly conclude that this system is underutilized

What is the use of 1 minute, 5 minutes, 15 minutes ‘Load Average’?

‘Load Average’ intervals help determine whether the system’s demand for compute is increasing over the period or decreasing over time.

 Increasing Load average

 Fig: Increasing Load Average

 Look at the above ‘Load Average’ reporting. You can notice in the above example ‘Load Average’ in the system has been increasing. i.e., 15 minutes before this system’s ‘Load Average’ was ‘3.25’. 5 minutes before system’s ‘Load Average’ was ‘5.48’, and the last 1-minute ‘Load Average’ was ‘6.00’. It indicates the demand for compute is increasing on this system.

 Decreasing Load average

 Fig: Decreasing Load Average

 Look at the above Load average reporting. You can notice in the above example ‘Load Average’ in the system has been decreasing. i.e., 15 minutes before this system’s ‘Load Average’ was ‘5.05’. 5 minutes before system’s ‘Load Average’ was ‘3.53’, and the last 1-minute ‘Load Average’ was ‘0.42’. It indicates the load on this system has been on the decreasing trend.

Does ‘Load Average’ measure only CPU demand?

It is often told that Load Average only indicates the CPU demand on the system. But it’s not true. ‘Load Average’ indicates not only CPU demand but also file I/O demand, network I/O demand, disk I/0 demand, and cycles waiting for locks. Here is an interesting case study we conducted to prove this theory.

What is a good or bad ‘Load Average’?

Load average is fairly a relative term. What might be a good ‘Load Average’ for one application can be a bad load average for another application. If you ask me what a rule of thumb is, I would say if ‘Load Average’ percentage goes beyond 80%, you might want to investigate it.

How to find ‘Load Average’?

Load Average can be found from various sources:

a. Unix/Linux command-line tool ‘top’ reports’ Load Average’ in the field as highlighted in the below image.

Fig: Load average reported in ‘top’ command

 b. Unix/Linux command-line tool ‘uptime’ reports ‘Load Average’ in the field as highlighted in the below image.

Fig: Load average reported in ‘uptime’ command

 c. ‘Load average’ is also printed in the ‘/proc/loadavg’ file.

Fig: Load average reported in ‘/proc/loadavg’

 d. You can use web-based root cause analysis tools like yCrash to report load average.

Conclusion

Load Average is a good metric that has been in existence since the early 1970s. It gives a high-level pulse of the system. There should be good reasons why this metric has survived for more than a half-century. But if you want to do a detailed root cause analysis where performance degradation is happening, ‘Load Average’ wouldn’t be sufficient enough. You want to use other tools like top, vmstat, iostat, yCrash,…

We would also like to conclude this article in the same way Genius Greg Brandon concluded his ‘Load Average’ blog with the quote from a comment in the Linux source code  kernel/sched/loadavg.c  written by scheduler maintainer Peter Zijlstra: