Host Power-On issue in Oracle T4-1 server from ILOM prompt : start: System faults or hardware configuration prevents power on.
It is the first time i am facing such an issue with Oracle T-series servers. Starting from T1000, i have worked on different T-series SPARC machines like T2000, T3-1, T3-2, T4-1.... and they are all like my pets. It is something more than a normal pleasure that i feel when i work on T-series and M-series SPARC servers. Unfortunate that M-series servers are no more into market except its support and that too, EOSL is almost reached
While working on these servers, I came to know that if any small hardware issue is there, ILOM prompt doesn't allow the host to run the POST itself and it will stay in the same prompt.
There is an easy way to solve this issue which i have already done and got a positive output. Its just like "Cheating the server". Everything is mentioned in detail below with the commands and outputs
When you try to start the Server, the issue appears like the below mentioned
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
start: System faults or hardware configuration prevents power on.
While working on these servers, I came to know that if any small hardware issue is there, ILOM prompt doesn't allow the host to run the POST itself and it will stay in the same prompt.
There is an easy way to solve this issue which i have already done and got a positive output. Its just like "Cheating the server". Everything is mentioned in detail below with the commands and outputs
When you try to start the Server, the issue appears like the below mentioned
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
start: System faults or hardware configuration prevents power on.
What you need to do is:
1) Go to the Fault management shell to use the fmadm utility, so that we can find if any hardware issues are
there
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
*********************************************************************************
2) You can see the hardware issues after entering the command fmadm faulty and its the same that you can
see from run level3
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2012-09-26/07:24:59 dc873a7b-f661-ca2f-ae23-f59753bff70c SPT-8000-DH Critical
Fault class : fault.chassis.voltage.fail
FRU : /SYS/MB
(Part Number: 7015924)
(Serial Number: 465769T+1220BW09L3)
Description : A chassis voltage supply is operating outside of the
allowable range.
Response : The system will be powered off. The chassis-wide service
required LED will be illuminated.
Impact : The system is not usable until repaired. ILOM will not allow
the system to be powered on until repaired.
Action : The administrator should review the ILOM event log for
additional information pertaining to this diagnosis. Please
refer to the Details section of the Knowledge Article for
additional information.
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2012-10-03/04:47:56 47f0af46-19ce-c28d-ae6e-a01e19522e79 SPT-8000-5X Major
Fault class : fault.chassis.env.power.loss
FRU : /SYS/PS0
(Part Number: 300-2235)
(Serial Number: B70386)
Description : A power supply AC input voltage failure has occurred.
Response : The service-required LED on the affected power supply and
chassis will be illuminated.
Impact : Server will be powered down when there are insufficient
operational power supplies.
Action : The administrator should review the ILOM event log for
additional information pertaining to this diagnosis. Please
refer to the Details section of the Knowledge Article for
additional information.
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2012-07-17/10:03:18 01877670-70dc-667e-928b-c13be3cac7da SPT-8000-MJ Critical
Fault class : fault.chassis.power.fail
FRU : /SYS/PS1
(Part Number: 300-2235)
(Serial Number: B70387)
Description : A Power Supply has failed and is not providing power to the
server.
Response : The service required LED on the chassis and on the affected
Power Supply may be illuminated.
Impact : Server will be powered down when there are insufficient
operational power supplies
Action : The administrator should review the ILOM event log for
additional information pertaining to this diagnosis. Please
refer to the Details section of the Knowledge Article for
additional information.
faultmgmtsp>
*********************************************************************************
3) After getting the outputs, please note the faulty FRU's and set the property "clear_fault_action=true" for
all these faulty ones going back to the ILOM prompt
faultmgmtsp> exit
-> set /SYS/MB clear_fault_action=true
Are you sure you want to clear /SYS/MB (y/n)? y
Set 'clear_fault_action' to 'true'
-> set /SYS/PS0 clear_fault_action=true
Are you sure you want to clear /SYS/PS0 (y/n)? y
Set 'clear_fault_action' to 'true'
-> set /SYS/PS1 clear_fault_action=true
Are you sure you want to clear /SYS/PS1 (y/n)? y
Set 'clear_fault_action' to 'true'
*********************************************************************************
4) Once this property is set to true, we need to go back to fault management shell and repair the FRU's using
the below mentioned command
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp> fmadm repair /SYS/MB
faultmgmtsp> fmadm repair /SYS/PS0
faultmgmtsp> fmadm repair /SYS/PS1
*********************************************************************************
5) Check for any more faults are there in the server
faultmgmtsp> fmadm faulty
faultmgmtsp>
faultmgmtsp> exit
*********************************************************************************
6) Try startin the System, you can find the error is resolved and the server is up and running
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS
-> start /HOST/console
Are you sure you want to start /HOST/console (y/n)? y
Serial console started. To stop, type #.
[CPU 0:0:0] NOTICE: Initializing TOD: 2012/10/03 05:44:10
[CPU 0:0:0] NOTICE: Loaded ASR status DB data. Ver. 3.
[CPU 0:0:0] NOTICE: Initializing TPM with:
tpm_enable = false
tpm_activate = false
tpm_forceclear = false
[CPU 0:0:0] NOTICE: TPM found: Ver 1.2, Rev 1.2, SpecLevel 2, errataRev 0, VendorId 'IFX'
[CPU 0:0:0] NOTICE: TPM initialized successfully. Current state is: disabled
[CPU 0:0:0] NOTICE: Serial#: 000000000000002a.015948c07cda22a6
[CPU 0:0:0] NOTICE: Version: 003e003012030607
[CPU 0:0:0] NOTICE: T4 Revision: 1.2
..............................................................
bash-3.2# prtdiag -v |more
For my case the problem keeps occurring even after clearing the fault and the server fails to boot
ReplyDeleteThis is just awesome, thanks for sharing it. Also, if you are visit our website if you are looking for assistance with your nursing assignment:
ReplyDeleteNursing Assignment Help