由于机房空调故障,导致机房温度过高。Oracle服务器带有自我保护功能,服务器自动正常关闭了。经过处理,机房温度降下来了,客户反映服务器启动不了。这时就我们需要手动清除警告。可以通过下面的方式找到错误的uuid,然后修复这个错误(这个高温的告警是正常的,清除之后就可以,如果是硬件故障就不一定能修好)。
-> show /system/open_problems(由于之前的错误已清除,没有记录详细的信息,所以这里看不到问题)
Open Problems (0) Date/Time Subsystems Component ------------------------ ------------------ ------------ ->-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? yfaultmgmtsp> help
Built-in commands:
echo - Display information to user.
Typical use: echo $?
help - Produces this help.
Use 'help <command>' for more information about an external command.
exit - Exit this shell.
External commands:
fmadm - Administers the fault management service
fmdump - Displays contents of the fault and ereport/error logs
fmstat - Displays statistics on fault management operations
faultmgmtsp> fmadmUsage: fmadm <subcommand>
where <subcommand> is one of the following:
faulty [-asv] [-u <uuid>] : display list of problems
faulty -f [-a] [<FRU>] : display FRUs with problems
faulty -r [-a] : display ASRUs with problems
list [-asv] [-u <uuid>] : display list of problems
list -f [-a] [<FRU>] : display FRUs with problems
list -r [-a] : display ASRUs with problems
list-fault [-asv] [-u <uuid>] : display list of faults
list-defect [-asv] [-u <uuid>] : display list of defects
list-alert [-asv] [-u <uuid>] : display list of alerts
acquit <FRU> : acquit problems on a FRU
acquit <UUID> : acquit problems associated with UUID
acquit <FRU> <UUID> : acquit problems specified by
(FRU, UUID) combination
clear class@path|<UUID> : clear an event or UUID
replaced <FRU> : fixed problems via FRU replacement
repaired <FRU> : repaired a FRU
repair <FRU> : repaired a FRU
reset -s all|FRU1[:FRU2...] : reset SERD counters for all
or some FRUs
rotate errlog : rotate error log
rotate infolog : rotate ireport log
rotate fltlog : rotate fault log
faultmgmtsp> fmadm repair 5e401d78-63a9-4cf1-b3e3-e38396e7770afaultmgmtsp> fmadm list
No faults found
faultmgmtsp>
faultmgmtsp> exit
->
其他常用的命令
查看错误信息
-> show /system/open_problems Open Problems (0) Date/Time Subsystems Component ------------------------ ------------------ ------------ -> -> show /SP/faultmgmt修改root密码
-> set /SP/users/root password Enter new password: ******** Enter new password again: ********有时我们可能需要复位警告灯或者旧的警报,我们可以通过下面的命令去完成
-> set component_path clear_fault_action=true
Are you sure you want to clear component_path (y/n)? y
Set ‘clear_fault_action’ to ‘true
component_path包括
Host CPU (/SYS/MB/P#)
Memory Riser (/SYS/MB/P0/MR#)
DIMM (/SYS/MB/P0/MR0/D#)
Motherboard (/SYS/MB)
Fan module (/SYS/FM#)
PCI card (SYS/MB/PCIE#)
-> set /SYS/MB/P1 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0 (y/n)? y
Set 'clear_fault_action' to 'true'
设置ILOM的地址
set /SP/network pendingipdiscovery=static pendingipaddress=192.168.10.250 pendingipgateway=192.168.10.254 pendingipnetmask=255.255.255.0 set /SP/network commitpending=true set /SP/network pendingipdiscovery=static set /SP/network pendingipaddress=<IP Address> set /SP/network pendingipgateway=<gateway-IPaddr> set /SP/network pendingipnetmask=<netmask> set /SP/network commitpending=true
在OS上也可通过ipmitool sunoem cli "show /SP/network"命令来查看
