10 Ways to Troubleshoot Poor vSphere Performance
A poorly performing virtual machine is probably one of the topmost ailments you’ll bump into as a virtualization admin. The issue is also one of the hardest nuts to crack due to its multifaceted nature. Regardless, there are a number of things you can do to read the symptoms, narrow down the cause and apply a fix. Taking hints from this VMware KB, today we explore 10 ways to troubleshoot poor vSphere performance. Have a look at this Altaro webinar for a more holistic approach to boosting vSphere performance.
Before we move on, let’s revisit the Performance monitoring function embedded in vSphere clients. This is one tool that will help you examine performance related issues. Figure 1 shows a performance chart for datastore read and write latencies for a VM using the vSphere Web client. The 3ms peaks observed are well within the acceptable range, however, sustained levels exceeding 10ms are indicative of a looming storage issue or perhaps network congestion.
Use alarms wherever possible so you’re always on top of any performance issue. Alternatively, consider deploying vRealize Operations Manager for a more in-depth assessment of your environment.
How to troubleshoot poor vSphere performance
The steps, or rather, questions you should ask yourself, are listed in an orderly fashion starting with the most trivial. Re-evaluate the performance of the affected VM after each step if you decide to try out a relevant fix. You can then choose to skip to the next step depending on the level of observed improvement if any. If you come across something as glaringly obvious as a failed disk on a host, it goes without saying that you’d want to fix this first before moving on!
1 – Is it really unexpected behavior?
A VM subjected to a heavy workload can sometimes be perceived as performing poorly. Some examples are virtualized instances of SQL servers, processor intensive or badly written SQL queries or mail servers with large user bases. The performance monitoring charts in vSphere Web client will help you gauge resource utilization across a given period of time so you can assess if the behavior anomaly was a one-off or ongoing and then gauge whether the behavior is expected or not. Products such as MS SQL and Exchange Server, will by design, take up any RAM thrown at them bleeding memory from the VM’s guest OS unless otherwise configured. What that in mind, it’s always a good idea to refer to the product’s documentation.
2 – Are you running the latest product?
Updates and new releases may address performance issues in the form of ironed out bugs or improved drivers and code. Sometimes, however, the latest release could, in fact, make the problem even worse. So test, test and test again before taking the leap or at least wait until there’s a sufficient uptake of the new release or update, so you can make an informed decision.
3 – Are your VMs running VMware Tools?
Make sure that vmtools are installed, updated and running on every VM that supports them. The VMware Tools package, above all, provides a set of optimized virtual device drivers that directly affect VM performance (for the better usually). Again, using the vSphere Web client, you can easily check your overall vmtools health as shown in Fig. 2. Remember to add the vmtools fields by right-clicking on the fields header and selecting them accordingly.
Alternatively, you could cook yourself a PowerCLI script that checks for the vmtools package and its current state. The bulk of the properties related to vmtools is found under <vm>.guest.extensiondata.
4 – Is your VM adequately powered in terms of resources?
Though seemingly obvious, you’d be surprised as to how many VMs are not assigned sufficient resources as per the guest OS requirements and the applications running under it. Remember that, regardless of the benefits virtualization brings about, there are always overheads to contend with. For instance, if the VM runs out of RAM, it will start swapping to disk on a more frequent basis. If the underlying storage is flaky, performance will be badly hit. Whenever possible, use reservations, resource pools, and DRS to ensure that the correct amount of resources are assigned to a VM for maximum operational efficiency.
5 – Is antivirus software or similar running on ESXi?
Yes, even though rare in practice, you can, in fact, find antivirus software – think vShield – running on ESXi. This can adversely affect VM performance on a number of counts if it is not configured properly. One should also keep in mind that there is little justification for running AV software on ESXi given its small footprint and inbuilt security features. Best practice, in fact, calls for anti-malware software to be relegated to the VM’s guest OS. If you must install AV on ESXi, do make it a point to exclude VM files such as VMDKs from scanning schedules especially during peak utilization hours.
AV software aside, there’s also Backup and other I/O intensive software that may adversely impact VM performance.
- Investigating busy hosted virtual machine files
- Using Antivirus and Malware Detection software in VMware ESX/ESXi
6 – Is your underlying storage healthy?
Whether you’re using local or SAN-based datastores, it all boils down to the performance and health of your disks and the underlying sub-systems housing them. Simply put, if VMs do not get their fair share of IOPS, performance will start degrading. Here are a few things you can check and do:
Bad disks: Run regular health checks on your disk / networked storage and replace aging or failing disks immediately.
ESXi OS: Use separate disk(s) for the ESXi host’s OS, the swap partition, and VMs residing on a local datastores. Also, consider using RAID to improve read and write performance.
Snapshots: Delete any unused or redundant snapshots. The more snapshots you have, the greater the disk overheads will be vis-à-vis I/O activity.
Encryption: Use disk encryption only when necessary. Encryption = overheads = decreased performance.
- Troubleshooting hosted disk I/O performance problems
- Storing a virtual machine swap file in a location other than the default in ESX/ESXi
- Best practices for using snapshots in the vSphere environment
7 – Do your ESXi hosts have enough resources?
Running a dozen or so VMs configured with 16GB of RAM concurrently on a single ESXi host that has only 96GB of RAM is simply asking for trouble. Consider adding RAM to the host or use DRS – if you have multiple ESXi hosts and proper licensing – for better load distribution.
- Investigating hosted virtual machine resources
- Verifying sufficient free disk space for a hosted virtual machine
8 – Do you have CPU power management enabled?
CPU power management, when enabled on ESXi servers, may introduce speed latency that can be picked up by applications or workloads resulting in a slower performance. If you suspect this to be the case, do consult the vendor documentation on how to disable CPU power management. If disabling it has no effect, re-enable it in the spirit of running energy-friendly data centers.
- Virtual machine application runs slower than expected in ESXi
- Virtual Machine Clock Reports Time Unpredictably on Multiprocessor Systems
- BIOS Power Policies Affect Performance
9 – Is everything good on the networking front?
Make sure that your ESXi host networking does not become a bottleneck preventing VMs from running and operating optimally. Symptoms may include a laggy response when connecting to VMs via remote clients or management consoles to overly lengthy vMotion transfers. Make sure that the network cards on your hosts are correctly configured. If your infrastructure permits it, partition or segregate network traffic. Run services such as management, vMotion and storage on their own dedicated network. Use optimized TCP/IP stacks and things like Jumbo frames where applicable. Make sure that the firmware on any networking hardware thrown in the mix is up to date. Finally, do not exclude issues with the virtual switches. Check your portgroups, vlan assignment and so on.
- Verifying host networking speed
- Troubleshooting virtual machine network connection issues
- Troubleshooting network performance issues in a vSphere environment
10 – Have you checked your ESXi OS and hardware health lately?
Just like any other system, ESXi needs regular maintenance for it to operate at full throttle both from a hardware and operating system perspective. Purple screens aside, you cannot immediately tell if there’s some issue brewing just waiting to rear its ugly head on a weekend night. Make sure to monitor disk usage and use the health monitoring software that generally comes bundled with your server(s) or products like PRTG.
This pretty much sums today’s post on how to improve VM performance. That said, there are probably more factors affecting VM performance and ways and means to tackle them. I suggest you read the material referenced by the links provided throughout this post and other posts, such as this one on DRS, for more information. Also, get yourself into the habit of visiting sites such as the VMware Technology Network where you’ll find like-minded people sharing similar queries, problems, and potential solutions.
Finally, be sure to check out our dedicated ebook: vSphere Troubleshooting Guide by vExpert Ryan Birk
Thank you Jason for such a great post.