Posts

VMware ESXi host disconnects from vCenter

Issue :
ESXi hosts disconnects from vCenter and may not even connect directly using vSphere client. The VMs will continue to run. SSH to host will work. Execution of esxcfg-scsidevs -m command will hang. LUN disappears in one or more hosts in an ESXi cluster. 
Errors from vCenter : A general system error occurred: Invalid response code: 503 Service Unavailable.
Unable to communicate with the remote host, since it is disconnected.
Cannot contact the specified host. The host may not be available on the network, a network configuration problem may exist, or the management services on this host may not be responding.
Snippets from the vmkernal logs from ESXi host:
Check for non-responsive luns
[root@ESXi01:~] cat /var/log/vmkernel.log  | grep -i responsive
cpuxx:yyyyyyy ALERT: hostd detected to be non-responsive
Check vmkernel.log where device status is 0x18 which corresponds to Reservation conflict
[root@ESXi01:~] cat /var/log/vmkernel.log | grep 0x18 | head
cpuxx:yyyyy)NMP: nmp_ResetDeviceLogThrot…

VMware ESXi host disconnects and do not connect back to vCenter

Issue:
ESXi host disconnects from vCenter. And if tried to reconnect, the activity fails at 89%
Error from vCenter :
A general system error occurred: internal error Processing data from vCenter agent on ESXi01
Snippets from the vpxd logs from vCenter:
=============================================================== 2016-03-29T12:28:37.644+05:30 error vpxd[14460] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:000000001099ab00, TCP:IP Address:443>; cnx: (null), error: class Vmacore::Ssl::SSLVerifyException(SSL Exception: Verification parameters: --> PeerThumbprint: Thumbprint --> ExpectedThumbprint:  --> ExpectedPeerName: IP/Hostname --> The remote host certificate has these problems: -->  --> * The host certificate chain is incomplete. -->  --> * Host name does not match the subject name(s) in certificate. -->  --> * unable to get local issuer certificate) ============================================================…

What happens to vSphere Distributed Switch if vCenter fails ? | VMware

Image
What's the biggest question\confusion\concern about VMware vSphere Distributed Switch (vDS)? Yes, you guessed it right… Will the vDS work if my vCenter goes down ? The answer is yes, it will work (*conditions apply). But how ? How will vDS work , since it depends purely on vCenter. No better time than this to learn about the vDS backend operation.
vDS consists of two parts: Control PlaneData Plane.

VMware vSphere Update Manager 5 remediation fails

Issue

When VMware vSphere Update Manager 5 tries to remediate a host, the remediation fails at 25% with the below error:

fault.com.vmware.vcIntegrity.VcIntegrityFault.summary

VM options greyed out | VMware

Issue

We faced an issue today morning with one of our VMs hosted in VMware. The VM related options were greyed out.
Root cause
There was a snapshot job running in the background (not visible from vCenter), which prevented any administration task in the VM. This task was stuck at 0%. This activity cannot be cancelled from vCenter or from console as it was initiated by a system user called vpxuser.
Workaround
Login to the SSH console of the ESXi host holding the VM using putty.
Identify the vmid of the affected VM (In our case the vmid was 391) using the command
vim-cmd vmsvc/getallvms
Check the tasks running in background for this particular VM using the command
vim-cmd vmsvc/get.tasklist 391
See if you can cancel the task using the command
vim-cmd vimsvc/task_cancel <taskname> [Task name will be something like hatask-391-vim.virtualmachine.createsnapshot-1234567] .

In our case this was not working as the task was initiated by a system user. But in scenarios were the snapshot or any VM…

DCLocator | Acitve Directory Client logon

                                 The netlogon service in DC is responsible for registering SRV records in the DNS server under _tcp.dc._msdcs.domain.com. It then registers the SRV records of Domain Controller under _sites.dc._msdcs.domain.com. based on their site location.

Automatic Site Coverage | Active Directory

                          In an Active Directory environment where you have at least a single Domain Controller, the clients in that site will contact this Domain Controller for handling service requests. But suppose, you have a site without a Domain Controller (yes, it is possible). In that scenario, which Domain Controller does the client contact for handling its service requests. This is where the Automatic Site Coverage comes into play!!!

Using Automatic Site Coverage, each Domain Controller checks all sites in the domain and calculate replication cost matrix. Thus the Domain Controller from a site which appears as the closest one (using site link cost calculation) to the site without Domain Controller will advertise itself as the authoritative one. If there are multiple sites with the same cost link to the site without Domain Controller, then the site with the most number of Domain Controllers will be chosen. If the tie appears here as well, the site which comes in first alphabet…