Skip to main content

How to fix | vSAN CLOMD Liveness - Part II (Virtual machine creation failed)


vSAN CLOMD daemon may fail when trying to repair objects with 0 byte components

When Cloning a VM from a template from VRA and vCenter vMotion failed with the following errors. And the vApp deployment failed due to the clomd service is failed on the host,
Read the importance of clomd here, https://virtuawisdom.blogspot.in/2017/11/how-to-fix-vsan-clomd-liveness-part-i.html

Task Details:
Name: clone
Status: Cannot complete file creation operation.
Start Time: May 30, 2017 5:34:13 AM
Completed Time: May 30, 2017 5:35:13 AM
State: error
Error Stack:  A CLOM is not attached. This could indicate that the clomd daemon is not running.Failed to create the object.
Additional Task Details:
Error Type: CannotCreateFile
Task Id: Task
Cancelable: true
Canceled: false
Description Id: VirtualMachine.clone
Event Chain Id: 291778
/var/log/clomd.log
+++++++++++++++++++++++++++++++++++++++++++++
2017-05-25T18:20:12.755Z 26738391 (111018916128)(opID:0)main: Clomd is starting 
2017-05-25T18:20:19.223Z 26739614 (553336654112)(opID:0)main: Clomd is starting 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)main: Clomd is starting ==> The below log lengthy log appeared in the above two occurrences 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)CLOM_ReadParamsFromVsi: Turning enhanced rebalancing ON 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)CLOM_ReadParamsFromVsi: Setting repairCompsDiskThreshold to 95 percent 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)main: Is in stretched cluster mode? No 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)CLOMSetOptions: Setting forground to TRUE 
2017-05-25T18:20:24.430Z 26739633 (778447268128)(opID:0)CLOMSetOptions: No default configuration specified. 
2017-05-25T18:20:24.431Z 26739633 (778447268128)(opID:0)CLOM_AttachDOM: Speaking to DOM using protocol version 1.6 
2017-05-25T18:20:24.446Z 26739633 (778447268128)(opID:0)main: Allocated space for 9000 simultaneous work items 
2017-05-25T18:20:24.446Z 26739633 (778447268128)(opID:0)CLOM_AttachUDS: Initializing CLOM Unix domain socket (UDS) Server: /var/run/clomd.uds. 
2017-05-25T18:20:24.456Z 26739633 (778447268128)(opID:0)main: CLOM successfully attached to CMMDS. 
2017-05-25T18:20:24.456Z 26739633 (778447268128)(opID:0)CLOM_InitDecomContext: localUuid 58925786-4497-f5aa-483e-ecf4bbdb9ca0 status Success 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58925a2a-0894-9615-eaeb-ecf4bbdba560 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58936cbb-69cc-fa1f-9136-ecf4bbdba6e8 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58924e9a-74b3-a433-b476-ecf4bbdb9c20 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58936c95-8b90-b650-5957-ecf4bbdbbc40 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 5890ec61-6c43-db5f-4f25-ecf4bbde55d0 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.457Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58936b20-277d-2468-f58b-ecf4bbd0d500 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 5890f44f-4e65-af7a-9bb9-246e960fec38 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 589258ba-e52e-8ba4-8bbb-ecf4bbd0dae0 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58925786-4497-f5aa-483e-ecf4bbdb9ca0 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOMDecomCleanupInMemState: Cleaning in-memory state 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOMDecomCleanupObjLists: Cleaning up decommissioning lists. 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 5890fb84-1de1-c8d4-a91d-246e960fc0b8 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Failed to thunk event-based crawler 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_ProcessDecomUpdate: Node 58910347-0398-7edb-3f7a-246e960ff670 state change. Old:DECOM_STATE_INVALID New:DECOM_STATE_NONE Mode:0 JobUuid:00000000-0000-0000-0000-000000000000 
2017-05-25T18:20:24.458Z 26739633 (778447268128)(opID:0)CLOM_CdbEventCallback: Disable processing of all incoming entries, using V2 protocol now 
2017-05-25T18:20:24.930Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object f46f1c59-2ec3-3e30-d551-ecf4bbdb9ca0 is inaccessible, Skipping compliance verification @CSN 6, SCSN 8. ConfigState 13 
2017-05-25T18:20:25.438Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object dc711c59-5008-5367-b76f-ecf4bbdba280 is inaccessible, Skipping compliance verification @CSN 9, SCSN 11. ConfigState 13 
2017-05-25T18:20:25.443Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object 4c701c59-aec4-7b67-389e-ecf4bbdb9ca0 is inaccessible, Skipping compliance verification @CSN 15, SCSN 17. ConfigState 12 
2017-05-25T18:20:25.668Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object f36f1c59-f818-de81-306d-ecf4bbdb9ca0 is inaccessible, Skipping compliance verification @CSN 6, SCSN 8. ConfigState 12 
2017-05-25T18:20:26.007Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object 4d701c59-9a41-8e9f-4d3c-ecf4bbdb9ca0 is inaccessible, Skipping compliance verification @CSN 9, SCSN 11. ConfigState 13 
2017-05-25T18:20:26.973Z 26739633 (778447268128)(opID:0)CLOM_ProcessObject: Object 4d701c59-aec8-6fe8-3f35-ecf4bbdb9ca0 is inaccessible, Skipping compliance verification @CSN 23, SCSN 25. ConfigState 12 
2017-05-25T18:20:29.312Z 26739633 (778447268128)(opID:1804289383)CLOMProcessWorkItem: Op CLEANUP starts:1804289383 
2017-05-25T18:20:29.319Z 26739633 (778447268128)(opID:1804289383)CLOMReconfigure: Reconfiguring cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8 workItem type CLEANUP 
2017-05-25T18:20:29.326Z 26739633 (778447268128)(opID:1804289383)CLOMCleanupObject: Cleaning up transient components from object cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8 needComplCleanup: 1 needPtFixCleanup: 1 
2017-05-25T18:20:29.326Z 26739633 (778447268128)(opID:1804289383)CLOM_SetQuorumVotes: Counted votes good:2, absent:0, bad:0; upperFDs:2, minLowerFDs:1, nTotalReplicas:2, nUpperReplicas:2, nLowerReplicas:1 
2017-05-25T18:20:29.326Z 26739633 (778447268128)(opID:1804289383)CLOM_SetQuorumVotes: 1 upper primary witnesses are required 
2017-05-25T18:20:29.341Z 26739633 (778447268128)(opID:1804289383)CLOM_SetQuorumVotes: Need at least 1 Lower FDs Per Upper, current: 1 
2017-05-25T18:20:29.341Z 26739633 (778447268128)(opID:1804289383)SetVotes: Balance votes - twoLevelVoteAssignment:FALSE minimalVotesPerUpper:1 targetVotesPerUpperFD:1 numUpperFDs:3 
2017-05-25T18:20:29.341Z 26739633 (778447268128)(opID:1804289383)CLOMMarshalConfiguration: Marshaling config for UUID cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8 
2017-05-25T18:20:29.342Z 26739633 (778447268128)(opID:1804289383)CLOM_LogDomMessage: referent cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8 length 3056 object size: 53687091200 type: 2 
2017-05-25T18:20:29.342Z 26739633 (778447268128)(opID:1804289383)CLOM_LogDomMessage: policy (("stripeWidth" i1) ("cacheReservation" i0) ("proportionalCapacity" i0) ("hostFailuresToTolerate" i1) ("forceProvisioning" i1) ("spbmProfileId" "aa6d5a82-1c88-45da-85d3-3d74b91a5bad") ("spbmProfileGenerationNumber" l+1) ("CSN" l23) ("SCSN" l22) ("spbmProfileName" "Virtual SAN Default Storage Policy")) 
2017-05-25T18:20:29.342Z 26739633 (778447268128)(opID:1804289383)CLOM_LogDomMessage: config ("Configuration" (("CSN" l23) ("SCSN" l22) ("addressSpace" l53687091200) ("scrubStartTime" l+1495091917039201) ("objectVersion" i4) ("highestDiskVersion" i4) ("muxGroup" l434773959603123436) ("groupUuid" 804a1d59-c562-08e0-74b2-ecf4bbde4bd8) ("compositeUuid" cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8)) ("RAID_1" (("scope" i3)) ("Component" (("addressSpace" l53687091200) ("componentState" l5) ("componentStateTS" l1495091917) ("faultDomainId" 58910347-0398-7edb-3f7a-246e960ff670) ("nVotes" i0) ("lastScrubbedOffset" l1075904512) ("subFaultDomainId" 58910347-0398-7edb-3f7a-246e960ff670)) cd4a1d59-3f10-b9d8-4980-ecf4bbde4bd8 52deaf90-fa2c-3d7a-bcd3-98183f650c0e) ("Component" (("capacity" (l2969567232 l53687091200)) ("addressSpace" l53687091200) ("componentState" l5) ("componen 
2017-05-25T18:20:29.342Z 26739633 (778447268128)(opID:1804289383)CLOMReconfigure: exit: obj cd4a1d59-1447-6ed6-a567-ecf4bbde4bd8 configDelay 0 newConfigGenerated 1 status Success 
2017-05-25T18:20:29.342Z 26739633 (778447268128)(opID:1804289383)CLOMProcessWorkItem: Op ends:1804289383 
2017-05-25T18:20:30.650Z 26739633 (778447268128)(opID:1804289384)CLOMProcessWorkItem: Op REPAIR starts:1804289384 
2017-05-25T18:20:30.655Z 26739633 (778447268128)(opID:1804289384)CLOMReconfigure: Reconfiguring 65f97358-86ea-ade2-bf53-ecf4bbdba560 workItem type REPAIR 

2017-05-25T18:20:30.659Z 26739633 (778447268128)(opID:1804289384)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for 65f97358-86ea-ade2-bf53-ecf4bbdba560 found 
++++++++++++++++++++++++++++++++++++++++++++++++++++
The CLOMD service had crashed when trying to repair this object "65f97358-86ea-ade2-bf53-ecf4bbdba560" .  This problem is described in the KB https://kb.vmware.com/s/article/2149968
You will see the following error messages on the vmkernel, vmkwarning, hostd, and vpxa logs in the ESXi host during the time of virtual machine creation,

017-05-30T05:35:13.539Z esx8 vmkernel: cpu25:67041)DOM: DOMCharDevSendToClom:1116: A CLOM is not attached. This could indicate that the clomd daemon is not running.

2017-05-30T05:35:13.539Z esx8 vmkwarning: cpu13:31035365 opID=4c4918f3)WARNING: VSAN: VsanIoctlCtrlNode:2071: 91042d59-b3df-7f52-8ae3-ecf4bbdb9ca0: RPC to DOM returned: Transient storage condition, suggest retry

2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:VSANEA_ObjectCreate:111: object creation ioctl failed: Resource temporarily unavailable
2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:VsanObj_Create:154: [opID=4c4918f3] Error creating object: Failure
2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:VsanCreateObjectInt:1258: [opID=4c4918f3] Error creating VSAN object (vobCtxHandle: 96): Failure
2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:VsanCreateObjectInt:1261: [opID=4c4918f3] VOB added: @&!*@*@(vob.vsanprovider.object.creation.failed)
2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:VsanFinishOp:502: [opID=4c4918f3] Operation completed with status: Failure
2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 31035365:IPCCompletionFn:1056: [opID=4c4918f3] IPC completed: Failure

2017-05-30T05:35:13Z esx8 osfsd: 2017-05-30T05:35:13.540Z 69854:Event_Pump:367: PumpEvents: Interrupted system call, continuing
/var/log/hostd.log
++++++++++++++++++++++++++++++++++++++++++++++++
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] OSFSIpcCall: IPC finished for opID 7a381129-01-b-833f. OpCode: 195887105 Response: Failure
2017-05-30T05:35:13.540Z esx8 Hostd: error hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] OSFSIpcCall: IPC failed for opID 7a381129-01-b-833f. OpCode: 195887105 Error: IPC returned error : Failure
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] OSFSIpcCall: END (opIDS = 7a381129-01-b-833f) IPC returned error : Failure
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] Vob Stack:
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] [vob.vsan.dom.noclomattached]: A CLOM is not attached. This could indicate that the clomd daemon is not running.
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] [vob.vsanprovider.object.creation.failed]: Failed to create object.
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Libs opID=7a381129-01-b-833f user=vpxuser:vpxuser] ObjectStoreFileSystemImpl::CreateNamespace: END (opIDS = 7a381129-01-b-833f) CreateNamespace IPC failed for name (prod-000001) uuid () vmUuid () policy (<?xml version="1.0" encoding="UTF-8"?>
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Default opID=7a381129-01-b-833f user=vpxuser:vpxuser] AdapterServer caught exception: vim.fault.CannotCreateFile
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Vimsvc.TaskManager opID=7a381129-01-b-833f user=vpxuser:vpxuser] Task Completed : haTask--vim.DatastoreNamespaceManager.CreateDirectory-160136913 Status error

2017-05-30T05:35:13.540Z esx8 Hostd: rageProfile>"
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Solo.Vmomi opID=7a381129-01-b-833f user=vpxuser:vpxuser] Throw vim.fault.CannotCreateFile
2017-05-30T05:35:13.540Z esx8 Hostd: info hostd[52CC6B70] [Originator@6876 sub=Solo.Vmomi opID=7a381129-01-b-833f user=vpxuser:vpxuser] Result:
2017-05-30T05:35:13.540Z esx8 Hostd: --> (vim.fault.CannotCreateFile) {
2017-05-30T05:35:13.540Z esx8 Hostd: --> faultCause = (vmodl.MethodFault) null,
2017-05-30T05:35:13.540Z esx8 Hostd: --> faultMessage = (vmodl.LocalizableMessage) [
2017-05-30T05:35:13.540Z esx8 Hostd: --> (vmodl.LocalizableMessage) {
2017-05-30T05:35:13.540Z esx8 Hostd: --> key = "vob.vsanprovider.object.creation.failed",
2017-05-30T05:35:13.540Z esx8 Hostd: --> arg = <unset>,
2017-05-30T05:35:13.540Z esx8 Hostd: --> message = "Failed to create object.
2017-05-30T05:35:13.540Z esx8 Hostd: --> "
2017-05-30T05:35:13.540Z esx8 Hostd: --> },
2017-05-30T05:35:13.540Z esx8 Hostd: --> (vmodl.LocalizableMessage) {
2017-05-30T05:35:13.540Z esx8 Hostd: --> key = "vob.vsan.dom.noclomattached",
2017-05-30T05:35:13.540Z esx8 Hostd: --> arg = <unset>,
2017-05-30T05:35:13.540Z esx8 Hostd: --> message = "A CLOM is not attached. This could indicate that the clomd daemon is not running.
2017-05-30T05:35:13.540Z esx8 Hostd: --> "
2017-05-30T05:35:13.540Z esx8 Hostd: --> }
2017-05-30T05:35:13.540Z esx8 Hostd: --> ],
2017-05-30T05:35:13.540Z esx8 Hostd: --> file = "Failed to create directory prod-000001 (Cannot Create File)"
2017-05-30T05:35:13.540Z esx8 Hostd: --> msg = ""

2017-05-30T05:35:13.540Z esx8 Hostd: --> }
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
/var/log/vpxa.log
+++++++++++++++++++++++++++++++++++++++++++++
2017-05-30T05:35:13.541Z esx8 Vpxa: verbose vpxa[A4C5B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=WFU-3e794bb8] Received WaitForUpdatesDone callback
2017-05-30T05:35:13.541Z esx8 Vpxa: verbose vpxa[A4C5B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=WFU-3e794bb8] Applying updates from 2434898 to 2434899 (at 2434898)
2017-05-30T05:35:13.541Z esx8 Vpxa: verbose vpxa[A4C5B70] [Originator@6876 sub=vpxaTaskInfo opID=WFU-3e794bb8] [VpxaTaskInfoPublisher::PropertyChanged] Number of deferred task updates: 0
2017-05-30T05:35:13.541Z esx8 Vpxa: verbose vpxa[A4C5B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=WFU-3e794bb8] Starting next WaitForUpdates() call to hostd
2017-05-30T05:35:13.542Z esx8 Vpxa: verbose vpxa[A4C5B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=WFU-3e794bb8] Completed WaitForUpdatesDone callback
2017-05-30T05:35:13.541Z esx8 Vpxa: error vpxa[A4A4B70] [Originator@6876 sub=vpxaVmomi opID=7a381129-01-b] [VpxaClientAdapter::InvokeCommon] Re-throwing method-fault 'N3Vim5Fault16CannotCreateFile9ExceptionE(vim.fault.CannotCreateFile)
++++++++++++++++++++++++++++++++++++++++++++++++
I have restarted clomd service on the host esx8 and the vsan clomd liveness health was changed to Alive from abnormal. And I was able to created the VM on the host.


From the KB the problem can be prevented by making sure VM memory allocation is always higher than the VM memory reservation so the swap objects created will always be non-zero sized. 

Read: https://kb.vmware.com/s/article/2149968
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vsan-troubleshooting-reference-manual.pdf  ( From page 152)

Comments

Post a Comment

Please leave your valuable comment to improvise the content.

Popular posts from this blog

How to Fix | Virtual SAN Health Alarm 'Performance data collection'' status is Red

Virtual SAN Health Alarm 'Performance data collection'' status is Red vSAN CLuster ==> Monitor ==> Virtual SAN==> Health ==> Performance Service ==> Performance Data Collection==>  Stats Gathering ==>  Failed Stats persistence==> Failed The causes for this error is unknown but there are two fixes available to this issue, 1)  Restarting the vsanmgmtd and vsanvpd service on all the ESXi hosts in the vSAN Cluster.  There is no impact of restarting these two services on the ESXi,  /etc/init.d/vsanmgmtd  restart /etc/init.d/vsanvpd restart Make sure the service is is running state after the restart,  /etc/init.d/vsanmgmtd  status /etc/init.d/vsanvpd status Post restart of the services retest the vsan health , vSAN CLuster ==> Monitor ==> Virtual SAN==> Health==>Retest and the Performance Data Collection should be green. 2) To resolve this issue, re-enable the performance service from the cluster level a.

vSAN Disk group is in "Unhealthy State"

If you are running VMware vSAN 6.0, 6.1 and 6.2 then there is a high chance that you will be seeing this issue with the following RAID controllers, Cisco 12G SAS Modular Raid Controller DELL FD332-PERC (Dual ROC) DELL FD332-PERC (Single ROC) DELL PERC H730 Adapter DELL PERC H730 Mini ==> We are using with Dell R620/630 serves with this RAID controller DELL PERC H730P Adapter  DELL PERC H730P Mini Huawei Technologies Co. Ltd. SR 430C Lenovo ThinkServer RAID 720i AnyRAID Adapter Lenovo ThinkServer RAID 720ix AnyRAID Adapter Lenovo ServeRAID 5210e SAS/SATA Controller Lenovo ServeRAID M5210 SAS/SATA Controller LSI MegaRAID SAS 9361-8i LSI MegaRAID SAS 9362-8i Supermicro SMC3108 But this can happen due to Physical Disk Drive failure and RAID Controllers from above list resetting the Disk Drives. In some scenario only one disk group will go to unhealthy state or all the disk groups will go to unhealthy state on the ESXi host in the vSAN cluster. Th

How to fix | vSAN CLOMD Liveness - Part I

In the following scenarios, you will see the CLOMD service liveness on ESXi hosts, If any of the ESXi hosts are disconnected, the CLOMD liveness state of the disconnected host is shown as unknown .If the Health service is not installed on a particular ESXi host, the CLOMD liveness state of all the ESXi hosts is also reported as unknown. If the CLOMD service is not running on a particular ESXi host, the CLOMD liveness state of one host is abnormal. The Cluster Health – CLOMD liveness check in the vSAN Health Service, and provides details on why it might report an error.This checks if the Cluster Level Object Manager ( CLOMD ) daemon is alive or not. It does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries.  CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host

How to fix | ESXI Virtual SAN Health service installation

I encountered an issue with the ESXi Virtual SAN Health service installation in one of the vSAN cluster, Step 1 : I checked whether all the ESXi hosts are running on the same version or not, VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX1 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX2 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX3 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX5 VMware ESXi 6.0.0 build-5224934 VMware ESXi 6.0.0 Update 3 on ESX4 They are in the same version so we can go check whether vSAN health VIB  is installed or not. From the KB https://kb.vmware.com/s/article/2109874 , On vSphere 6.0 Update 2 release, none of the other health checks will be conducted until all the hosts are upgraded to 6.0 Update 2 (when running the latest version, vSAN 6.2) release to avoid false alarms.  But we have all the ESXi hosts in ESXi 6.0 Update 3 "Install the vSAN Health Service V

Horizon View Pools stuck in Deleting state

Recently, had an issue with 2 view Desktop Pools that were stuck in Deleting state in horizon view manager. We are running Horizon View 7.2 and this issue happening since View 4.x. Out of 2 Pools, I was able to delete one pool by just removing the VM from Resources-->Machines--> Filtered using Pool name.But when doing the same thing for the TEST-Pool I was getting an error as below, "Machine","Desktop Pool","DNS Name","User","Host","Agent Version","Datastore","Status" "TEST-POOL-046","TEST-POOL","TEST-POOL-046.TEST.LOCAL","","esx3.TEST.LOCAL","Unknown","[TEST-VCENTER1VSAN]","Status:Error Status Errors:Nov 30, 2017 10:38:20 PM PST: Failed to delete VM - null" So I logged in to the connection server and found the following error logs, C:\programdata\vmware\vdm\logs\debug-2017-11-30-221023.txt ++++++++

How to Fix | Virtual SAN Health - Physical Disk Health Retrieval Issues

Physical Disk Health – Physical Disk Health Retrieval Issues In Virtual SAN cluster, there is one more common issue is the Virtual SAN health test failing to retrieve the Physical Disk Health on an ESXi host.It is informing the administrator that it cannot get physical disk-related information from the ESXi host in question in order to perform a check on the health of the physical disks. If the Virtual SAN management service vsanmgmtd on the ESXi host is nonresponsive then you will encounter this issue, in the vsanmgmt.log you will see the following snippets, ++++++++++++++++++++++++++++++++++++++++++++ [root@esxihost-1:/var/log] cat vsanmgmt.log  2017-11-15T03:08:46Z VSANMGMTSVC: INFO vsanperfsvc[Thread-1] [VsanLsomHealth::getHealthStats] Get issued comps = {}  2017-11-15T03:08:46Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-1] [VsanHealthUtil::InvokeMethod] Invoke: mo=ServiceInstance, info=RetrieveContent  2017-11-15T03:08:46Z VSANMGMTSVC: ERROR vsanperfsvc[Thread-1] [Vs

vSAN Component Failure State - Degraded vs Absent - Part I

Failure States of Virtual SAN Components: Virtual SAN  handles failures of the host, network and storage devices in the cluster based on the severity of the failure. When these fail they directly affect the components in the  vSAN cluster.  Virtual SAN has 2 types of failure states for components ABSENT and DEGRADED. According to the component state, it uses different approaches to recover the affected components. Degraded: "A component is in degraded state if Virtual SAN detects a permanent component failure and assumes that the component is not going to recover to working state." Absent: "A component is in absent state if Virtual SAN detects a temporary component failure where the component might recover and restore its working state." An ABSENT state may or not resolve itself over time, but a  DEGRADED state is a permanent state. From the above image, left side a disk has been unplugged or offline may be reinserted or brought online, Virtual

vCloud Director vApp power on failure due to vim.fault.HAErrorsAtDest

When I was trying to power on a vApp with 12 VMs in vCloud Director the power on operation failed due to one VM was unable to power on the ESXi host with this error "The host is reporting errors in its attempts to provide vSphere HA support" ++++++++++++++++ Underlying system error: com.vmware.vim.binding.vim.fault.HAErrorsAtDest vCenter Server task (moref: task-689) failed in vCenter Server 'TEST-VC1' (73dc8fb7-28d6-41b3-86dd-09126c88aebe). - The host is reporting errors in its attempts to provide vSphere HA support. +++++++++++++++ I was searching for the fault message vim.fault.HAErrorsAtDest and got the information from the http://pubs.vmware.com/, http://pubs.vmware.com/vsphere-6-5/index.jsp?topic=/com.vmware.wssdk.apiref.doc/vim.fault.HAErrorsAtDest.html   Fault Description  The destination compute resource is HA-enabled, and HA is not running properly. This will cause the following problems:  1) The VM will not have HA protection.  2

vSAN Component Failure State - Degraded vs Absent - Part II

Cause and Recovery of the degraded components Scenario 1 Cause : Capacity tier SSD / Magnetic disk drive failure in a Virtual SAN so the disk and all the components stored on the disk is marked as DEGRADED as the failure is permanent. Behind the scenes: If the VM has a policy that includes NumberOfFailuresToTolerate=1 or greater, the VM’s objects will still be accessible. The disk state is marked as DEGRADED and can be verified via vSphere web client UI.  At this point, all in-flight I/O is halted while Virtual SAN reevaluates the availability of the object without the failed component as part of the active set of components. If Virtual SAN concludes that the object is still available (based on available full mirror copy and witness), all in-flight I/O is restarted. The typical time from physical removal of the drive, Virtual SAN processing this event, marking the component DEGRADED halting and restoring I/O flow is approximately 5-7 seconds . Virtual SAN