Blog

To determine the time for the until time clause for RMAN duplicate/recovery

Connect to production server and source the DB home.

 

1. Export the NLS date format in PROD

 

export NLS_DATE_FORMAT=’YYYY-MM-DD HH24:MI:SS’

 

2. Connect to production database(PROD)

 

rman target /

list backup of controlfile;

 

Pick the time of the latest controlfile time.

 

Sample Output:

 

RMAN> list backup of controlfile;

 

From the above output it helps to determine the control file which need to be copied over to DEV/TEST server.

 

3. To determine the point in time recover run the archive log backup command.

 

Please use the below command on PROD to retrieve that and the latest time stamp from Next Time column of the last archive log present in the backup being used.

 

RMAN> list backup of archivelog all;

 

 

 

Step 4: connect to RMAN (Target Environment)

 

Sample clone:

 

rman auxiliary / log=/u01/backups/RMAN/rman_instance_rec.log

 

DUPLICATE DATABASE TO DEV

UNTIL TIME “TO_DATE(’31-JAN-2014 00:00:28′,’DD-MON-YYYY HH24:MI:SS’)”

BACKUP LOCATION ‘/u01/backups/RMAN/PROD’

nofilenamecheck

;

 

Note: Following command automatically restore, recover, open database and also recreate the TEMP tablespace.

 

 

Error Message:

 

channel ORA_AUX_DISK_4: SID=626 device type=DISK

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 09/06/2012 17:50:45

 

RMAN-05501: aborting duplication of target database

RMAN-06617: UNTIL TIME (06-SEP-12) is ahead of last NEXT TIME in archived logs (06-SEP-12)

 

RMAN> exit

 

If we run into issue in providing the point in time recovery, try to readjust the value to next lower sequence number from the archive log list and match it to that respective timing.

 

Error Fix: How to find the exact time to mention.

 

Description:

The below command will simulate the restore process from available backup. It won’t touch the data file

 

RMAN> Restore database PREVIEW;

 

PREVIEW tells about

1. At which SCN media recovery will start

2. AT which SCN recovery required to clear the datafile fuzziness.

3. What all archivelogs are required to clear this fuzziness.

4. Also it reports the missing archivelogs under backup.

 

Based on scn you can find until which archive log is required and based on SCN you can find the time of recover

Tracing to Capture any Database Timeouts on Client(s) and Server Side

Consider below tracing on Client and Server Side to capture any database errors, especially if you donot see any errors on DB alert logs and application team is complaining of database related timeouts in their application logs

 

1. Remove all entrees for TRACE_ from the client SQLNET.ORA

2. Place the below in the Client SQLNET.ORA making sure you change the file path for TRACE_DIRECTORY_CLIENT= to a location your user can write too.

TRACE_UNIQUE_CLIENT = ON
TRACE_LEVEL_CLIENT=16
TRACE_TIMESTAMP_CLIENT=TRUE
TRACE_DIRECTORY_CLIENT=/home/oracle/client_trace
TRACE_FILE_CLIENT=CL.trc
DIAG_ADR_ENABLED=OFF #<===Use this only for 11g and newer clients

3. Make sure to turn on SERVER side TRACE here on the Database side, Making sure you change the file path for TRACE_DIRECTORY_SERVER= to a location your user can write too.
Place the below in the Server side sqlnet.ora

TRACE_UNIQUE_SERVER = ON
TRACE_LEVEL_SERVER=16
TRACE_TIMESTAMP_SERVER=TRUE
TRACE_DIRECTORY_SERVER=/Path here/
TRACE_FILE_SERVER=SR.trc
DIAG_ADR_ENABLED=OFF #<===Use this only for 11g and newer clients

4. Review the DOC how to ID matching client to server trace and extract the trace files outlined (Only for concerned clients having the issue).

How to Match Oracle Net Client and Server Trace Files ( Doc ID 374116.1)

Network Tuning for Dom0 and ODAbase for high speed transfer across WAN links

From OS perspective following is the procedure.

1) Find the Bandwidth delay product (BDP) value and set the network buffer size accordingly. It is calculated as the product of the link bandwidth, and the Round Trip Time.

For example :

For a 1Gb/s network and Rount Trip Time of 0.1s, the BDP=(0.1 * 10^9)/8. On such a network, set the following parameter values under the file : /etc/sysctl.conf
net.core.rmem_max = 12500000
net.core.wmem_max = 12500000
net.ipv4.tcp_rmem = 4096 87380 12500000
net.ipv4.tcp_wmem = 4096 65536 12500000

And increase the following parameters as well :
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_max_syn_backlog = 4096

And then execute the command:
# sysctl -p

 

a) For both the changes there is no need for a system reboot.

b) Post this change, it is required to monitor the output of the command : “# netstat -s” and check if the following counters are still seen to be increasing :
packets pruned from receive queue because of socket buffer overrun
times the listen queue of a socket overflowed
SYNs to LISTEN sockets ignored
packets collapsed in receive queue due to low socket buffer
TCPBacklogDrop

 

Configure on Guest VM and dom0.

Using Snapshot Standby for DR drill(s)

Snapshot Standby Database functionality (that comes with EE licenses) can be used for yearly DR drills on 11g/12c databases. Here are the technical steps used for the same. 

Convert the Physical Standby Database to Snapshot database.

Please differ the dest_2 parameter in standby server.

SQL> alter system set log_archive_dest_state_2=DEFER scope=both sid=’*’;

$sqlplus / as sysdba

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;  (make sure MRP is stopped)

SQL> select flashback_on from v$database;

SQL> alter database flashback on;

Database altered.

SQL> select flashback_on from v$database;

FLASHBACK_ON
——————
YES

Activate the Physical Standby Database

$ sqlplus / as sysdba

SQL> select database_role,name,controlfile_type from v$database;

DATABASE_ROLE    NAME      CONTROL
—————- ——— ——-
PHYSICAL STANDBY NASCXTHZ   STANDBY

SQL> alter database convert to snapshot standby; (converting the physical standby database to snapshot database)

SQL> alter database open;
Database altered.

SQL> select database_role,name,controlfile_type,flashback_on from v$database;

DATABASE_ROLE    NAME      CONTROL FLASHBACK_ON
—————- ——— ——- ——————
SNAPSHOT STANDBY NASCXTHZ   CURRENT RESTORE POINT ONLY

SQL> select open_mode from v$database;

OPEN_MODE
——————–
READ WRITE

SQL> set pages 200 lines 200
SQL> select * from v$restore_point;

Below two commands will show how much flashback data has been generated.

SQL> select * from v$flashback_database_log;

OLDEST_FLASHBACK_SCN OLDEST_FL RETENTION_TARGET FLASHBACK_SIZE ESTIMATED_FLASHBACK_SIZE
——————– ——— —————- ————– ————————
6.7014E+12 23-APR-15             1440       63750144                        0

SQL> select * from v$flashback_database_stat;

BEGIN_TIM END_TIME  FLASHBACK_DATA    DB_DATA  REDO_DATA ESTIMATED_FLASHBACK_SIZE
——— ——— ————– ———- ———- ————————
23-APR-17 23-APR-17          49152     622592          0                        0

SQL> show parameter db_rec

SQL> select database_role,name,controlfile_type,flashback_on from v$database;

DATABASE_ROLE    NAME      CONTROL FLASHBACK_ON
—————- ——— ——- ——————
SNAPSHOT STANDBY NASCXTHZ   CURRENT RESTORE POINT ONLY

SQL> select open_mode from v$database;

OPEN_MODE
——————–
READ WRITE

Convert the Snapshot Database to Physical Standby database.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.
Total System Global Area  830930944 bytes
Fixed Size                  2217912 bytes
Variable Size             603981896 bytes
Database Buffers          222298112 bytes
Redo Buffers                2433024 bytes
Database mounted.
SQL>

Next convert the snapshot to a physical standby.

SQL> alter database convert to physical standby;
Database altered.

SQL> select * from v$restore_point;

no rows selected

SQL> select * from v$flashback_database_log;

no rows selected

SQL> select * from v$flashback_database_stat;

no rows selected

SQL> alter database recover managed standby database disconnect from session;

Post DG Sync Validation

Run the following query on Primary DB

SQL> select thread#, max(sequence#) “Last Primary Seq Generated”
from v$archived_log val, v$database vdb
where val.resetlogs_change# = vdb.resetlogs_change#
group by thread# order by 1;

Run the following query on Standby DB

SQL> select al.thrd “Thread”, almax “Last Seq Received”, lhmax “Last Seq Applied”
from (select thread# thrd, max(sequence#) almax
from v$archived_log
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) al,
(select thread# thrd, max(sequence#) lhmax
from v$log_history
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) lh
where al.thrd = lh.thrd;

Ensure “Last Primary Seq Generated” on Primary matches “Last Seq Applied” on each Standby

ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn]

Problem Description:

We observed below ORA-00600 in alert.log of 11.2.0.4 database, but fortunately DB was up and running

Dump file /u01/app/oracle/diag/rdbms/nascxthzprod/NASCXTHZPROD/incident/incdir_1517411/NASCXTHZPROD_ora_94908_i1517411.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name: Linux
Node name: nasodaprd051
Release: 2.6.39-400.290.1.el6uek.x86_64
Version: #1 SMP Mon Nov 7 12:59:33 PST 2016
Machine: x86_64
Instance name: NASCXTHZPROD
Redo thread mounted by this instance: 1
Oracle process number: 669
Unix process pid: 94908, image: oracle@nasodaprd051


*** 2017-07-29 12:05:19.586

Dump continued from file: /u01/app/oracle/diag/rdbms/nascxthzprod/NASCXTHZPROD/trace/NASCXTHZPROD_ora_94908.trc
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 1517411 (ORA 600 [ktbdchk1: bad dscn]) ========
—– Beginning of Customized Incident Dump(s) —–

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Error/Stack/SQL/Version from file “NASCXTHZPROD_ora_94908_i1517411.trc”
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Error: ORA-00600 [ktbdchk1: bad dscn]

Error Stack: ORA-600[ktbdchk1: bad dscn]
Main Stack:
ktbValidateDependentScn <- ktbgcl1 <- ktbgtl0 <- kdiins0 <- kdiinsp <- kauxsin
<- qesltcLoadIndexList <- qesltcLoadIndexes <- qerltcNoKdtBufferedInsRowCBK
<- qerltcSingleRowLoad <- qerltcFetch <- insexe <- opiexe <- kpoal8 <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main

Current SQL: INSERT INTO CASES (IC_MEMBER_ID, DATE_CREATED, IC_RULE_ID, ACTION, REPORTING_CATEGORY, CERTAINTY, BRIEF_JUSTIFICATION_ID, COMPR_JUSTIFICATION_ID, JUSTIFICATION_ID, MEMBER_EMPLOYEE_FLAG, STATE, DENY_ADJUSTMENT_CODE, ADD_ADJUSTMENT_CODE, SUPPORT_ADJUSTMENT_CODE, SAVINGS_PERCENTAGE, MODULE_NAME, MSG, H…

Cause:

Bug 22241601 : ORA-600 [KDSGRP1] IN ADG AFTER FAILOVER

Solution:

Install Patch for Bug 22241601  from note 22241601.8

Workaround 

Install the fix to prevent this issue from being introduced; installing the fix
also by default tries to repair existent invalid ITL commit scn (healing).
There is not need to set any parameter for it.

Databases already having init.ora parameter _ktb_debug_flags=8 can
remove the parameter after the fix is installed as _ktb_debug_flags=8 is now
the default so the healing is enabled by default. Note that this fix is not disabled
if _ktb_debug_flags is set to 0; a value not including 8 will only disable the healing
for already affected ITL SCN but the fix still solves the problem of the invalid ITL SCN
in the INDEX to be introduced as it does not depend of any parameter.

Sometimes the fix may not repair the block for an already existent invalid SCN on disk;
then repair this issue with:

1. Recreate the affected Index in the Primary Database
Create Index may also fail with the same ORA-600 [ktbdchk1: bad dscn] if Oracle
builds the index based on another index with the same corruption for which the
solution is to drop all the affected indexes for the table and recreate them.

(OR)

2. If the issue is in the standby database: refresh the affected file from primary to standby.
If the error is in the primary database, refresh the file from the standby to the primary database.

ORA-00603: ORACLE server session terminated by fatal error

Problem Description:

Below errors in alert.log after database goes down.

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
kgefec: fatal error 0

As you have upgraded ODA from 12.1.2.10.0 to 12.1.2.11.0, There is a major change in the OS as well as kernel version which is 2.6.39-400 to 4.1.12-61.

This issue faced mainly due to OS memory fragmentation. If you clear the OS cache or reboot the server – as fragmentation is gone – You are not going to see this issue.

This is nothing related to RDBMS. This is related to OS level Memory page allocation to processes.

 

Solution

Please refer to the below MOS doc to fix this issue,

Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available ( Doc ID 2041723)

Oracle Database Appliance was getting rebooted with reboot errors in system OS logs

Oracle Database Appliance was getting rebooted with reboot errors in system OS logs

[root@nasodacxt0101d1 ~]# last reboot | head -3
reboot system boot 4.1.12-94.3.9.el Sun Feb 18 20:34 – 03:00 (1+06:25) >>>>>>>>> this is the issue time
reboot system boot 4.1.12-94.3.9.el Sun Feb 18 02:13 – 21:36 (19:22)
reboot system boot 4.1.12-94.3.9.el Fri Jan 12 12:27 – 21:36 (37+09:08)

$ cat messages_nasodacxt0101d1 | grep -i signal
Feb 18 21:36:27 nasodacxt0101d1 init: initoak main process (3599) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: dcliagent main process (3601) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: odaBaseAgent main process (3602) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: oracle-ohasd main process (3604) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: oracle-tfa main process (5643) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty1) main process (5647) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty2) main process (5649) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty3) main process (5655) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty4) main process (5661) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty5) main process (5667) killed by TERM signal
Feb 18 21:36:27 nasodacxt0101d1 init: tty (/dev/tty6) main process (5669) killed by TERM signal
Feb 18 21:39:31 nasodacxt0101d1 snmpd[5211]: Received TERM or STOP signal… shutting down…
Feb 18 21:39:31 nasodacxt0101d1 rpc.mountd[5017]: Caught signal 15, un-registering and exiting.
Feb 18 21:39:32 nasodacxt0101d1 ntpd[5265]: ntpd exiting on signal 15
Feb 18 21:39:34 nasodacxt0101d1 rpcbind: rpcbind terminating on signal. Restart with “rpcbind -w”
Feb 18 21:39:35 nasodacxt0101d1 rsyslogd: [origin software=”rsyslogd” swVersion=”5.8.10″ x-pid=”4568″ x-info=”http://www.rsyslog.com”%5D exiting on signal 15.

Cause

Press ctl-alt-del key sequence via ILOM remote console and server is rebooted.

Solution

Take a backup of /etc/init/control-alt-delete.conf file

Disable the settings in the /etc/init/control-alt-delete.conf file by completing the following:

  1. Create /etc/init/control-alt-delete.override file and
  2. Add the line “exec /bin/true” in the above file
  3. Run “initctl reload-configuration control-alt-delete” to reflect the changes

Try to reproduce the reboot by pressing ctl-alt-del key sequence via ILOM remote console