Skip to main content

Posts

Showing posts from November, 2017

How to fix | vSAN CLOMD Liveness - Part I

In the following scenarios, you will see the CLOMD service liveness on ESXi hosts, If any of the ESXi hosts are disconnected, the CLOMD liveness state of the disconnected host is shown as unknown .If the Health service is not installed on a particular ESXi host, the CLOMD liveness state of all the ESXi hosts is also reported as unknown. If the CLOMD service is not running on a particular ESXi host, the CLOMD liveness state of one host is abnormal. The Cluster Health – CLOMD liveness check in the vSAN Health Service, and provides details on why it might report an error.This checks if the Cluster Level Object Manager ( CLOMD ) daemon is alive or not. It does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries.  CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host

How to fix | ESXI Virtual SAN Health service installation

I encountered an issue with the ESXi Virtual SAN Health service installation in one of the vSAN cluster, Step 1 : I checked whether all the ESXi hosts are running on the same version or not, VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX1 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX2 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX3 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX5 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX4 They are in the same version so we can go check whether vSAN health VIB  is installed or not. From the KB https://kb.vmware.com/s/article/2109874 , On vSphere 6.0 Update 2 release, none of the other health checks will be conducted until all the hosts are upgraded to 6.0 Update 2 (when running the latest version, vSAN 6.2) release to avoid false alarms.  But we have all the ESXi hosts in ESXi 6.0 Update 3 "Install the vSAN Health Service V

How to Fix | Virtual SAN Health - Physical Disk Health Retrieval Issues

Physical Disk Health – Physical Disk Health Retrieval Issues In Virtual SAN cluster, there is one more common issue is the Virtual SAN health test failing to retrieve the Physical Disk Health on an ESXi host.It is informing the administrator that it cannot get physical disk-related information from the ESXi host in question in order to perform a check on the health of the physical disks. If the Virtual SAN management service vsanmgmtd on the ESXi host is nonresponsive then you will encounter this issue, in the vsanmgmt.log you will see the following snippets, ++++++++++++++++++++++++++++++++++++++++++++ [root@esxihost-1:/var/log] cat vsanmgmt.log  2017-11-15T03:08:46Z VSANMGMTSVC: INFO vsanperfsvc[Thread-1] [VsanLsomHealth::getHealthStats] Get issued comps = {}  2017-11-15T03:08:46Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-1] [VsanHealthUtil::InvokeMethod] Invoke: mo=ServiceInstance, info=RetrieveContent  2017-11-15T03:08:46Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-1] [Vs

vSAN Component Failure State - Degraded vs Absent - Part II

Cause and Recovery of the degraded components Scenario 1 Cause : Capacity tier SSD / Magnetic disk drive failure in a Virtual SAN so the disk and all the components stored on the disk is marked as DEGRADED as the failure is permanent. Behind the scenes: If the VM has a policy that includes NumberOfFailuresToTolerate=1 or greater, the VM’s objects will still be accessible. The disk state is marked as DEGRADED and can be verified via vSphere web client UI.  At this point, all in-flight I/O is halted while Virtual SAN reevaluates the availability of the object without the failed component as part of the active set of components. If Virtual SAN concludes that the object is still available (based on available full mirror copy and witness), all in-flight I/O is restarted. The typical time from physical removal of the drive, Virtual SAN processing this event, marking the component DEGRADED halting and restoring I/O flow is approximately 5-7 seconds . Virtual SAN

vSAN Component Failure State - Degraded vs Absent - Part I

Failure States of Virtual SAN Components: Virtual SAN  handles failures of the host, network and storage devices in the cluster based on the severity of the failure. When these fail they directly affect the components in the  vSAN cluster.  Virtual SAN has 2 types of failure states for components ABSENT and DEGRADED. According to the component state, it uses different approaches to recover the affected components. Degraded: "A component is in degraded state if Virtual SAN detects a permanent component failure and assumes that the component is not going to recover to working state." Absent: "A component is in absent state if Virtual SAN detects a temporary component failure where the component might recover and restore its working state." An ABSENT state may or not resolve itself over time, but a  DEGRADED state is a permanent state. From the above image, left side a disk has been unplugged or offline may be reinserted or brought online, Virtual

vSAN Disk group is in "Unhealthy State"

If you are running VMware vSAN 6.0, 6.1 and 6.2 then there is a high chance that you will be seeing this issue with the following RAID controllers, Cisco 12G SAS Modular Raid Controller DELL FD332-PERC (Dual ROC) DELL FD332-PERC (Single ROC) DELL PERC H730 Adapter DELL PERC H730 Mini ==> We are using with Dell R620/630 serves with this RAID controller DELL PERC H730P Adapter  DELL PERC H730P Mini Huawei Technologies Co. Ltd. SR 430C Lenovo ThinkServer RAID 720i AnyRAID Adapter Lenovo ThinkServer RAID 720ix AnyRAID Adapter Lenovo ServeRAID 5210e SAS/SATA Controller Lenovo ServeRAID M5210 SAS/SATA Controller LSI MegaRAID SAS 9361-8i LSI MegaRAID SAS 9362-8i Supermicro SMC3108 But this can happen due to Physical Disk Drive failure and RAID Controllers from above list resetting the Disk Drives. In some scenario only one disk group will go to unhealthy state or all the disk groups will go to unhealthy state on the ESXi host in the vSAN cluster. Th

How to Fix | Controller utility is installed on host "Warning"

The controller utilities enable additional health checks based on controller settings. The yellow check status indicates that vSAN Health Service is not able to find the appropriate controller utility for the storage controller on the host. Typically, the controller utility is used to configure and view configuration data. When vSAN Health Service can retrieve controller configuration data, it can further analyze configuration issues for the current vSAN setup. Host with PERCCLI installed: [root@esx26:~] esxcli software  vib list | grep perccli vmware-esx-perccli-1.05.08     1.05.08-01                             LSI     PartnerSupported  2017-08-03 Host without PERCCLI: [root@esx7:~] esxcli software  vib list | grep perccli ********NO OUTPUT******** Based on the KB https://kb.vmware.com/s/article/2148867 1) Download the PERCCLI for ESXi from the below link,  http://www.dell.com/support/home/in/en/inbsdt1/Drivers/DriversDetails?driverId=XY978 2) Put the host

How to Fix | Virtual SAN Health Alarm 'Performance data collection'' status is Red

Virtual SAN Health Alarm 'Performance data collection'' status is Red vSAN CLuster ==> Monitor ==> Virtual SAN==> Health ==> Performance Service ==> Performance Data Collection==>  Stats Gathering ==>  Failed Stats persistence==> Failed The causes for this error is unknown but there are two fixes available to this issue, 1)  Restarting the vsanmgmtd and vsanvpd service on all the ESXi hosts in the vSAN Cluster.  There is no impact of restarting these two services on the ESXi,  /etc/init.d/vsanmgmtd  restart /etc/init.d/vsanvpd restart Make sure the service is is running state after the restart,  /etc/init.d/vsanmgmtd  status /etc/init.d/vsanvpd status Post restart of the services retest the vsan health , vSAN CLuster ==> Monitor ==> Virtual SAN==> Health==>Retest and the Performance Data Collection should be green. 2) To resolve this issue, re-enable the performance service from the cluster level a.