Understanding CPU Steal Time – when should you be worried?

If we applied a CPU steal time-like metric to the ticketing process, it would look like this:

  • 0% Steal Time – it’s a Wednesday matinee: the ticket booth is picking a moviegoer from line 1, then line 2, then line 1, then line 2, and so on. No one is waiting.
  • 50% Steal Time – It’s Friday night: instead of being able to purchase a ticket immediately, half of the time a person in the line needs to wait for the person at the booth to complete their purchase. Things are taking longer.
  • 100% Steal Time – It’s a Friday night and the cash register is broken: no one is moving.

.. if steal time is greater than 10% for 20 minutes, the VM is likely in a state that it is running slower than it should.

When this happens:

  1. Shut down the instance and move it to another physical server
  2. If steal time remains high, increase the CPU resources
  3. If steal time remains high, contact your hosting provider. Your host may be overselling physical servers.