由于机房空调故障,导致机房温度过高。Oracle服务器带有自我保护功能,服务器自动正常关闭了。经过处理,机房温度降下来了,客户反映服务器启动不了。这时就我们需要手动清除警告。可以通过下面的方式找到错误的uuid,然后修复这个错误(这个高温的告警是正常的,清除之后就可以,如果是硬件故障就不一定能修好)。
-> show /system/open_problems(由于之前的错误已清除,没有记录详细的信息,所以这里看不到问题)
Open Problems (0) Date/Time Subsystems Component ------------------------ ------------------ ------------ ->-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? yfaultmgmtsp> help
Built-in commands: echo - Display information to user. Typical use: echo $? help - Produces this help. Use 'help <command>' for more information about an external command. exit - Exit this shell. External commands: fmadm - Administers the fault management service fmdump - Displays contents of the fault and ereport/error logs fmstat - Displays statistics on fault management operationsfaultmgmtsp> fmadm
Usage: fmadm <subcommand> where <subcommand> is one of the following: faulty [-asv] [-u <uuid>] : display list of problems faulty -f [-a] [<FRU>] : display FRUs with problems faulty -r [-a] : display ASRUs with problems list [-asv] [-u <uuid>] : display list of problems list -f [-a] [<FRU>] : display FRUs with problems list -r [-a] : display ASRUs with problems list-fault [-asv] [-u <uuid>] : display list of faults list-defect [-asv] [-u <uuid>] : display list of defects list-alert [-asv] [-u <uuid>] : display list of alerts acquit <FRU> : acquit problems on a FRU acquit <UUID> : acquit problems associated with UUID acquit <FRU> <UUID> : acquit problems specified by (FRU, UUID) combination clear class@path|<UUID> : clear an event or UUID replaced <FRU> : fixed problems via FRU replacement repaired <FRU> : repaired a FRU repair <FRU> : repaired a FRU reset -s all|FRU1[:FRU2...] : reset SERD counters for all or some FRUs rotate errlog : rotate error log rotate infolog : rotate ireport log rotate fltlog : rotate fault logfaultmgmtsp> fmadm repair 5e401d78-63a9-4cf1-b3e3-e38396e7770a
faultmgmtsp> fmadm list
No faults found
faultmgmtsp>
faultmgmtsp> exit
->
其他常用的命令
查看错误信息
-> show /system/open_problems Open Problems (0) Date/Time Subsystems Component ------------------------ ------------------ ------------ -> -> show /SP/faultmgmt修改root密码
-> set /SP/users/root password Enter new password: ******** Enter new password again: ********有时我们可能需要复位警告灯或者旧的警报,我们可以通过下面的命令去完成
-> set component_path clear_fault_action=true Are you sure you want to clear component_path (y/n)? y Set ‘clear_fault_action’ to ‘true component_path包括 Host CPU (/SYS/MB/P#) Memory Riser (/SYS/MB/P0/MR#) DIMM (/SYS/MB/P0/MR0/D#) Motherboard (/SYS/MB) Fan module (/SYS/FM#) PCI card (SYS/MB/PCIE#) -> set /SYS/MB/P1 clear_fault_action=true Are you sure you want to clear /SYS/MB/P0 (y/n)? y Set 'clear_fault_action' to 'true'设置ILOM的地址
set /SP/network pendingipdiscovery=static pendingipaddress=192.168.10.250 pendingipgateway=192.168.10.254 pendingipnetmask=255.255.255.0 set /SP/network commitpending=true set /SP/network pendingipdiscovery=static set /SP/network pendingipaddress=<IP Address> set /SP/network pendingipgateway=<gateway-IPaddr> set /SP/network pendingipnetmask=<netmask> set /SP/network commitpending=true
在OS上也可通过ipmitool sunoem cli "show /SP/network"命令来查看