Blog

ORA-00603: ORACLE server session terminated by fatal error

Problem Description:

Below errors in alert.log after database goes down.

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
kgefec: fatal error 0

As you have upgraded ODA from 12.1.2.10.0 to 12.1.2.11.0, There is a major change in the OS as well as kernel version which is 2.6.39-400 to 4.1.12-61.

This issue faced mainly due to OS memory fragmentation. If you clear the OS cache or reboot the server – as fragmentation is gone – You are not going to see this issue.

This is nothing related to RDBMS. This is related to OS level Memory page allocation to processes.

 

Solution

Please refer to the below MOS doc to fix this issue,

Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available ( Doc ID 2041723)

Oracle Database Appliance was getting rebooted with reboot errors in system OS logs

Oracle Database Appliance was getting rebooted with reboot errors in system OS logs

[root@nasodacxt0101d1 ~]# last reboot | head -3
reboot system boot 4.1.12-94.3.9.el Sun Feb 18 20:34 – 03:00 (1+06:25) >>>>>>>>> this is the issue time
reboot system boot 4.1.12-94.3.9.el Sun Feb 18 02:13 – 21:36 (19:22)
reboot system boot 4.1.12-94.3.9.el Fri Jan 12 12:27 – 21:36 (37+09:08)

$ cat messages_nasodacxt0101d1 | grep -i signal
Feb 18 21:36:27 nasodacxt0101d1 init: initoak main process (3599) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: dcliagent main process (3601) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: odaBaseAgent main process (3602) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: oracle-ohasd main process (3604) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: oracle-tfa main process (5643) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty1) main process (5647) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty2) main process (5649) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty3) main process (5655) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty4) main process (5661) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty5) main process (5667) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty6) main process (5669) killed by TERM signal
Feb 18 21:39:31 nasodacxt0101d1 snmpd[5211]: Received TERM or STOP signal… shutting down…
Feb 18 21:39:31 nasodacxt0101d1 rpc.mountd[5017]: Caught signal 15, un-registering and exiting.
Feb 18 21:39:32 nasodacxt0101d1 ntpd[5265]: ntpd exiting on signal 15
Feb 18 21:39:34 nasodacxt0101d1 rpcbind: rpcbind terminating on signal. Restart with “rpcbind -w”
Feb 18 21:39:35 nasodacxt0101d1 rsyslogd: [origin software=”rsyslogd” swVersion=”5.8.10″ x-pid=”4568″ x-info=”http://www.rsyslog.com”%5D exiting on signal 15.

Cause

Press ctl-alt-del key sequence via ILOM remote console and server is rebooted.

Solution

Take a backup of /etc/init/control-alt-delete.conf file

Disable the settings in the /etc/init/control-alt-delete.conf file by completing the following:

  1. Create /etc/init/control-alt-delete.override file and
  2. Add the line “exec /bin/true” in the above file
  3. Run “initctl reload-configuration control-alt-delete” to reflect the changes

Try to reproduce the reboot by pressing ctl-alt-del key sequence via ILOM remote console

ORA-00068: invalid value 0 for parameter _query_execution_time_limit, must be between 1952541791 and 6252643

Problem Description:

Users were seeing below error in application servers

03/05/2018 06:01:43 AM : 0 : ClaimVersionDao : 0 : 2 : Generic TPS Error. Check system application log for additional details. ADOException exception (could not execute query

[ SELECT distinct IC_CLAIM_ID
FROM claim_version
WHERE 1=1
AND CLAIM_ID = :p0 ]
Name:claimId – Value:20149913944300
[SQL: SELECT distinct IC_CLAIM_ID
FROM claim_version
WHERE 1=1
AND CLAIM_ID = :p0] : ORA-00068: invalid value 0 for parameter _query_execution_time_limit, must be between 1952541791 and 6252643)
03/05/2018 06:01:43 AM : 0 :  :  :  : An error occurred — attempting to run the last workflow step.

Troubleshooting and Solution:

Reviewing the alert log and patch details for 12.1.0.2 database involved, proved that we applied the patches when Database was up and running. This seems to be the cause of this issue and
thus restart fixed the issue. Also the messages for “_disable_image_check” started as soon as we finished the patching when databases in live. Instances need to be down when one-off
are applied. If in case one-off is applied while instance is up, a restart should fix the issue created by that. In this case looks like restart is done.
So no more action seems needed now. And root cause of the issue is patch 27589110 was accidentally applied when instances were running.

Cheatsheet for EBS 12.2 post upgrade tips

1. Problem: fs_clone not Copying files from Run to Patch filesystem during patching cycles

 

Solution:

Specify the synchronization step in the custom sync up driver $APPL_TOP_NE/EBSapps/appl/ad/custom/adop_sync.drv.
Add the actions within #Begin Customization and #End Customization section. The actions would be performed during ADOP synchronization phase.
# Sample For Unix Platform
rsync -zr %s_current_base%/EBSapps/appl/mbs/12.0.0 %s_other_base%/EBSapps/appl/mbs
rsync -zr %s_current_base%/EBSapps/comn/java/classes/atlhdc %s_other_base%/EBSapps/comn/java/classes
rsync -zr %s_current_base%/EBSapps/comn/java/classes/mbs %s_other_base%/EBSapps/comn/java/classes

 

2. Problem:Lower Case Custom files names were not Getting recognized on Release 12.2.5

 

Solution:

All the lower case names are extracted from 11i on windows source server to target OEL6 server during the custom top creation.
As part of retrofitting, Oracle Development Team created new files with Upper case as per R12 requirement.

 

3. Recommended EBS Profile change

 

Solution:
RRA: Enabled –> Yes
Concurrent:Report Copies –> from 1 to 0
SLA: Enable SRS Log/Output –> NO

 

4. Problem: Users’ password were getting locked upon login

 

Solution:
Passwords upon change in Release 12.2.5 is case-sensitive if Signon Password Case = SENSITIVE
Sign on Password Case Profile Does Not Work (Doc ID 1087519.1)

 

4. Please use below NFS Option recommended to be used for Linux For NFS V3 Doc ID 1375769.1
rw,nointr,bg,hard,timeo=600,wsize=65536,rsize=65536,nfsvers=3,tcp

 

5. Please use below commands to installed all rpms in one go needed to EBS 12.2 and RDBMS 12.1

 

wget http://public-yum.oracle.com/public-yum-ol6.repo
yum install oracle-rdbms-server-12cR1-preinstall-1.0-14.el6.x86_64
yum install oracle-ebs-server-R12-preinstall-1.0-7.el6.x86_64
ldconfig –v

Control File Backups for oldest 3 full backups were getting deleted and not maintaining all control file backups from all backups on disk

RMAN backup keeps the backup metadata information in the reusable section of the controlfile. It depends on the parameter CONTROL_FILE_RECORD_KEEP_TIME. CONTROL_FILE_RECORD_KEEP_TIME specifies the minimum number of days before a reusable record in the control file can be reused. In the event a new record needs to be added to a reusable section and there is not enough space then it will delete the oldest record, which are aged enough.

Backup retention policy is the rule to set regarding which backups must be retained (whether on disk or other backup media) to meet the recovery and other requirements.

If the CONTROL_FILE_RECORD_KEEP_TIME is less than the retention policy then it may overwrite reusable records prior to obsoleting them in the RMAN metadata. Therefor it is recommended that the CONTROL_FILE_RECORD_KEEP_TIME should set to a higher value than the retention policy.

 

NOTE:  Best practice is to NOT set control_file_record_keep_time to a value greater than 10.    If you need retention greater than this in the controlfile, you should use an RMAN catalog.

 

Formula

CONTROL_FILE_RECORD_KEEP_TIME = retention period + level 0 backup interval + 1

 

For e.g.

e.q. level 0 backup once a week with retention policy of a recovery windows of 14 days then in this case the CONTROL_FILE_RECORD_KEEP_TIME should be 14+7+1=22

 

Please note to set the retention policy of “14” days in this RMAN configuration format “CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 14 days;” and NOT “CONFIGURE RETENTION POLICY TO REDUNDANCY 14;” (Redundancy keeps no of backup pieces includes level-0/level-1/archives and NOT number of days)