I have 5 ESX servers with HA and DRS enabled (fully automated). Yesterday night the ESX05 (4.0.0, 208167) suddenly went offline. After checking the event logs in vmkernel, the following notification returned several times per second just before he went down:
Aug 8 03:38:27 ESX05 vmkernel: 13:16:28:50.288 cpu13:4324)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x410007155140) to NMP device "naa.6005076b074fb47f4b2f626100000005" failed on physical path "vmhba0:C0:T1:L3" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
Aug 8 03:38:27 ESX05 vmkernel: 13:16:28:50.288 cpu13:4324)ScsiDeviceIO: 747: Command 0x28 to device "naa.6005076b074fb47f4b2f626100000005" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
When I get the info from device "naa.6005076b074fb47f4b2f626100000005", I get the following:
Device Display Name: IBM Serial Attached SCSI Disk
Storage Array Type: VMW_SATP_DEFAULT_AA
Storage Array Type Device Config:
Path Selection Policy: VMW_PSP_FIXED
Path Selection Policy Device Config: {preferred=vmhba0:C0:T1:L3;current=vmhba0:C0:T1:L3}
Working Paths: vmhba0:C0:T1:L3
After restarting the blade, everyting worked again...
I don't know what caused this problem. Someone can help me to prevent this happens again?
Thanks!