Posts

ESXi Maintenance mode improvements | VMware

Normally when an ESXi host is placed into maintenance mode, all the Powered On VMs will be migrated first to the other hosts in the cluster using DRS, then comes Powered Off VMs and finally VM templates.
Starting from vSphere 6.0 u2, VMware has changed the order. Now when a host is placed into maintenance mode, first the Powered Off VMs will be moved, then VM templates and finally the Powered On VMs.
Reason for change:

Prior to 6.0 U2, when users initiate Maintenance mode in ESXi, they lose the ability to deploy VMs using the templates until the migration completes. Even though the templates are migrated in the last place, ESXi host would have already queued the VM Templates and Powered Off VMs in its migration process. When we consider hosts with 40-50 VMs, users might need to wait for some time to work with the templates or to Power On a VM. Therefore VMware decided to make this small change which might turn useful to administrators atleast at some point of time.

VMware ESXi Memory State

Image
Memory state of an ESXi shows how much memory constrained the host is.
There are 5 memory states and depending on each memory state, ESXi will engage its memory reclamation techniques (TPS, Ballooning, Compression & Swapping).
High state: No reclamationClear state: TPSSoft state: TPS + BallooningHard state: TPS + Compression + SwappingLow state: TPS + Compression + Swapping + Blocking
How ESXi calculates its memory state ?
Memory state is not calculated using its overall utilization. But by comparing the current utilization with a dynamic  minFree value. Based on this comparison, ESXi changes its state.
High state : enough free memory availableClear state : <100% of minFreeSoft state : <64% of minFreeHard state : <32% of minFreeLow state : <16% of minFree
What is minFree value ?
minFree value is calculated using the configured memory of ESXi host. An ESXi host configured with 28GB RAM will have 899 MB as its minFree value. Any host configured wit…

ALUA & VMware multipating

What is the appropriate Path Selection Plugin (PSP) for your ESXi ?

Do not wait... check with your storage vendor. As a VMware administrator, In the past I'd my own reasons to choose Round Robin ahead of MRU and Fixed. But that may not be always good for the below reason.

Each storage vendor has their own method of handling I/O. For all recent Active/Active storage processors (SP), you will see two paths to a given LUN in VMware. But only one processor will actually own the LUN. The path to this storage processor is called as the Optimized path and the path to the other will be Unoptimized path. VMware PSP (MRU and Fixed) will always send its request to the Optimized path and once it reaches the SP, it will internally transfer the request to the other SP. In short ALUA is something which helps the array to service its I/O request using the interconnects between the SP.

In scenarios where the Optimized path fails, VMware vSphere PSP will choose the PSP and failback based on the PSP. I…

VMFS locking mechanisms | VMware

In a multi-host storage access environment like vSphere, a locking mechanism is required for datastore/LUN to ensure data integrity. For VMFS there are two types of locking mechanisms:
SCSI reservationsAtomic Test and Set - ATSSCSI reservations
This method is used by storage which does not support Hardware Acceleration. In this method, the host locks the datastore when it execute operations which requires metadata protection and will release the lock once it completes the activity. SCSI reservation does not lock a LUN, but it reserves the LUN to get the On-disk lock. Whenever a host locks, it must renew the lease on lock to indicate that it stills holds the lock and not crashed. When a new host need to access a file, it checks whether the lease has been renewed. If it hasn't renewed, another host can break the lock and access it. In a multi-host environment, excessive SCSI reservations might degrade the storage performance.
The operations that require reservation are : Creating, resign…

Virtual NUMA | The VMware story

NUMA - Non Uniform Memory Access
NUMA, a term which comes along with Symmetric Multi-Processing or SMP. In the traditional processor architecture, all memory access are done using the same shared memory bus. This works fine with small number of CPUs. But when the number of CPUs increase (after 8 or 12 CPUs), CPU competes for control over the shared bus and creates serious performance issues. 
NUMA was introduced to overcome this problem. In NUMA architecture, using advanced memory controller high speed buses will be used by nodes for processing. Nodes are basically a group of CPUs which owns its own memory and I/O much like an SMP. If a process runs in one node, the memory used from the same node is referred to Local Memory. In some scenarios, the node may not be able to cater needs of all its processes. At that time, the node makes use of memory from other nodes. This is referred to as Remote Memory. This process will be slow and the latency depends on the location of the remote memory…

Virtual machine is inaccessible after vMotion | VMware

Issue :
VM fails to power on, power off or modify. If tried to migrate to another ESXi host the VM gets disconnected.
Error from vCenter :
VM will be shown in vCenter as inaccessible. 
Snippets from the logs from ESXi host:
[root@ESXi01:/vmfs/volumes/570248ff-86524429-7f05-848f691451f9/VMname] cat VMname.vmx | grep vmdk cat: can't open 'VMname.vmx': Device or resource busy
Troubleshooting:
The logs shows that the VM is busy or locked due to some reason. Check the lock status using vmx
[root@ESXi01:/vmfs/volumes/570248ff-86524429-7f05-848f691451f9/VMname] vmkfstools -D VMname.vmx Lock [type 10c00001 offset 151009280 v 326, hb offset 3198976 gen 61, mode 1, owner 58109c0d-10ae7368-fd7b-848f69156ba8 mtime 15841905 num 0 gblnum 0 gblgen 0 gblbrk 0] Addr <4, 326, 119>, gen 129, links 1, type reg, flags 0, uid 0, gid 0, mode 100755 len 5230, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 2, bs 8192
The highlighted number in the log output 848f69156ba8 refers to the MAC address of the…

VMware ESXi host disconnects from vCenter

Issue :
ESXi hosts disconnects from vCenter and may not even connect directly using vSphere client. The VMs will continue to run. SSH to host will work. Execution of esxcfg-scsidevs -m command will hang. LUN disappears in one or more hosts in an ESXi cluster. 
Errors from vCenter : A general system error occurred: Invalid response code: 503 Service Unavailable.
Unable to communicate with the remote host, since it is disconnected.
Cannot contact the specified host. The host may not be available on the network, a network configuration problem may exist, or the management services on this host may not be responding.
Snippets from the vmkernal logs from ESXi host:
Check for non-responsive luns
[root@ESXi01:~] cat /var/log/vmkernel.log  | grep -i responsive
cpuxx:yyyyyyy ALERT: hostd detected to be non-responsive
Check vmkernel.log where device status is 0x18 which corresponds to Reservation conflict
[root@ESXi01:~] cat /var/log/vmkernel.log | grep 0x18 | head
cpuxx:yyyyy)NMP: nmp_ResetDeviceLogThrot…