CPU consumption in Unix/Linux operating systems are studied using 8 different metrics: User CPU time, System CPU time, nice CPU time, Idle CPU time, Waiting CPU time, Hardware Interrupt CPU time, Software Interrupt CPU time, Stolen CPU time. In this article let us study about ‘Steal (or Stolen) CPU time’.
What is ‘steal’ CPU time?
‘Steal time’ (also known as ‘Stolen’ time) is relevant only in cloud environments (like AWS) or VMWare environments, where multiple virtual machines will be run on one underlying physical host. In such circumstances, CPU resource will be shared amongst the multiple virtual machines. The hypervisor is a technology that will distribute the underlying physical host’s CPU resources and other resources amongst the virtual machines.
Fig: Hypervisor – virtual machine
Steal time (or stolen time) is the *percentage of time a virtual machine CPU waits for a real CPU while the hypervisor is servicing other virtual machines*. If Steal time is high on a particular virtual machine, it indicates that the virtual machine runs on an overloaded physical host. Companies like Netflix monitor the stolen CPU time closely. If it goes beyond a threshold, then the virtual machine will be shut down from that physical host and relaunched in another physical host.
How to find ‘steal’ CPU time?
Steal CPU time can be found from the following sources:
a. You can use web-based root cause analysis tools like yCrash to report ‘stolen’ CPU time. Tool is capable of generating alerts if ‘stolen’ CPU time goes beyond the threshold.
b. ‘stolen’ CPU time is also reported in the Unix/Linux command line tool ‘top’ in the field ‘st’ as highlighted in the below image.
Fig: ‘stolen’ CPU time in top
How to simulate high ‘stolen’ CPU time?
If you notice in above figure stolen CPU time is at 57.6%. This ‘top’ tool output was captured from an EC2 instance running on the AWS cloud. Sometimes AWS EC2 instances are overloaded and processes get stranded for CPU cycles.
If you are running in a Virtualized environment, you can try running more number of Virtual Machines on the same underlying physical host and try to launch CPU consuming processes on each virtual machine, then you will start to see ‘stolen’ CPU time spiking up.
How to resolve high ‘stolen’ CPU time?
- If you are in a cloud environment, try upgrading to high-capacity compute instances. (Example if you are running in AWS with ‘m4.large’ instance you can upgrade to ‘m4.xlarge’ instance).
- You may consider running lesser number of virtual machine instances on the physical host
- You may consider running lesser number of process in the particular virtual machine that is suffering from high steal time
- You can try optimizing application performance using tools like yCrash so that the application will consume less amount of CPU
- May be CPU share per virtual machine is allocated too low. Try increasing the share.