ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded.

Cause

This is generally due due to basic issues like:
* The remote manager process is not running or is running with incorrect port number
* The RMTHOST parameter in the pump extract is not configured correctly.
In other cases, the issue could be due to firewalls that forbid the connection, 
blocking certain ports or processes. 
This is generally seen when there is a firewall between the source and target machine and 
either the ports are not open or just the manager port is open.

Solution

The Extract, Replicat and Ggsci processes use ports normally starting at port 7840 and ascend sequentially. 
The ggsci command 'send manager getportinfo detail ' will retrieve the current list of ports that have been
allocated by Manager and their corresponding process IDs.
If you have some port restrictions, then you could use the DYNAMICPORTLIST with some range so that the
collector process will allocate the ports from that range. 
In general to overcome this issue, we could do something like below 

1. Change the target manager parameter file to use something like the following 

port 7809 
dynamicportlist 7810-7820 

2. Stop and start the manager 
3. Open ports 7809 through 7820 in the firewall 
4. Re-start the source pump 
The port range used in dynamicportlist(7810-7820) and the manager port 7809 is just an example. 
You can define your own ports there and have them open. 

OGG Extract Pump abended with ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded

ogg中Time Since Chkpt显示unknown解决

1、异常现象

[oracle@localhost ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     

2、尝试关闭异常进程重启

GGSCI (localhost.localdomain) 2> stop *

Sending STOP request to EXTRACT EXT-ECP ...

ERROR: sending message to EXTRACT EXT-ECP (Timeout waiting for message).

Sending STOP request to EXTRACT EXT-EDS ...

ERROR: sending message to EXTRACT EXT-EDS (Timeout waiting for message).

Sending STOP request to EXTRACT EXT-XZ ...

ERROR: sending message to EXTRACT EXT-XZ (Timeout waiting for message).

Sending STOP request to EXTRACT P-EDS ...

ERROR: sending message to EXTRACT P-EDS (Timeout waiting for message).

Sending STOP request to EXTRACT P-XZ ...

ERROR: sending message to EXTRACT P-XZ (Timeout waiting for message).

Sending STOP request to REPLICAT REP-BOS ...

ERROR: sending message to REPLICAT REP-BOS (Timeout waiting for message).

GGSCI (localhost.localdomain) 3> stop mgr!

Sending STOP request to MANAGER ...
Request processed.
Manager stopped.

GGSCI (localhost.localdomain) 4> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     

GGSCI (localhost.localdomain) 5> kill EXT-ECP 

ERROR: Manager not currently running.

GGSCI (localhost.localdomain) 6> kill EXT-EDS 

ERROR: Manager not currently running.


GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown   

GGSCI (localhost.localdomain) 8> exit  
--使用stop 进程,stop mgr,kill 进程都不能正常关闭这些进程

3、系统系统级别kill相关ogg进程

[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle    7479     1  0 Nov10 ?        00:03:31 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-ecp.prm REPORTFILE /opt/OGG/dirrpt/EXT-ECP.rpt PROCESSID EXT-ECP USESUBDIRS
oracle    7480     1  0 Nov10 ?        00:02:30 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-eds.prm REPORTFILE /opt/OGG/dirrpt/EXT-EDS.rpt PROCESSID EXT-EDS USESUBDIRS
oracle    7482     1  0 Nov10 ?        00:03:07 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-xz.prm REPORTFILE /opt/OGG/dirrpt/EXT-XZ.rpt PROCESSID EXT-XZ USESUBDIRS
oracle    7483     1  0 Nov10 ?        00:00:01 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/p-eds.prm REPORTFILE /opt/OGG/dirrpt/P-EDS.rpt PROCESSID P-EDS USESUBDIRS
oracle    7485     1  0 Nov10 ?        00:00:03 /opt/OGG/replicat PARAMFILE /opt/OGG/dirprm/rep-bos.prm REPORTFILE /opt/OGG/dirrpt/REP-BOS.rpt PROCESSID REP-BOS USESUBDIRS
oracle    7518     1  0 Nov10 ?        00:00:01 ./server -p 7847 -k -l /opt/OGG/ggserr.log
oracle    7677     1  0 Nov10 ?        00:00:15 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/p-xz.prm REPORTFILE /opt/OGG/dirrpt/P-XZ.rpt PROCESSID P-XZ USESUBDIRS
oracle   25261 25112  0 12:48 pts/1    00:00:00 grep /opt/OGG
[oracle@localhost OGG]$ kill -9 7479 7480 7482 7483 7485  7518 7677
[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle   25264 25112  0 12:48 pts/1    00:00:00 grep /opt/OGG

4、重启所有ogg进程

[oracle@localhost OGG]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     ABENDED     EXT-ECP     00:00:00      unknown     
EXTRACT     ABENDED     EXT-EDS     00:00:00      unknown     
EXTRACT     ABENDED     EXT-XZ      00:00:00      unknown     
EXTRACT     ABENDED     P-EDS       00:00:00      unknown     
EXTRACT     ABENDED     P-XZ        00:00:00      unknown     
REPLICAT    ABENDED     REP-BOS     00:00:00      unknown     
--进程状态还是异常

GGSCI (localhost.localdomain) 2> start mgr

Manager started.


GGSCI (localhost.localdomain) 3> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--进程起来了,但是Time Since Chkpt还是不正确

GGSCI (localhost.localdomain) 4> stop ext-ecp

Sending STOP request to EXTRACT EXT-ECP ...
Request processed.


GGSCI (localhost.localdomain) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXT-ECP     unknown       00:00:02    
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--关闭EXT-ECP测试,状态正常

GGSCI (localhost.localdomain) 6> start ext-ecp

Sending START request to MANAGER ...
EXTRACT EXT-ECP starting


GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     unknown       00:00:14    
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--Lag异常,等待恢复

GGSCI (localhost.localdomain) 8> stop ext-eds

Sending STOP request to EXTRACT EXT-EDS ...

Recovery is not complete.  This normal stop will wait and checkpoint recovery's 
work when recovery has finished. To force Extract to stop now, 
use the SEND EXTRACT EXT-EDS, FORCESTOP command.
--因为恢复没有完成导致该提示,可以忽略,等待

GGSCI (localhost.localdomain) 9> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     unknown       00:00:02    
EXTRACT     STOPPED     EXT-EDS     01:51:12      00:00:01    
EXTRACT     RUNNING     EXT-IM      00:00:00      1059:44:26  
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     


GGSCI (localhost.localdomain) 10> start ext-eds

Sending START request to MANAGER ...
EXTRACT EXT-EDS starting


GGSCI (localhost.localdomain) 11> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     99:53:02      00:00:01    
EXTRACT     RUNNING     EXT-EDS     01:51:12      00:00:10    
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      00:00:00    


GGSCI (localhost.localdomain) 12> stop ext-xz

Sending STOP request to EXTRACT EXT-XZ ...
Request processed.


GGSCI (localhost.localdomain) 13> start ext-xz

Sending START request to MANAGER ...
EXTRACT EXT-XZ starting

GGSCI (localhost.localdomain) 15> stop p-eds

Sending STOP request to EXTRACT P-EDS ...
Request processed.


GGSCI (localhost.localdomain) 16> start p-eds

Sending START request to MANAGER ...
EXTRACT P-EDS starting


GGSCI (localhost.localdomain) 17> stop p-xz

Sending STOP request to EXTRACT P-XZ ...
Request processed.


GGSCI (localhost.localdomain) 18> start p-xz

Sending START request to MANAGER ...
EXTRACT P-XZ starting


GGSCI (localhost.localdomain) 19> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      00:00:01    
EXTRACT     RUNNING     EXT-EDS     00:00:00      00:00:10    
EXTRACT     RUNNING     EXT-IM      00:00:00      1059:45:28  
EXTRACT     RUNNING     EXT-XZ      00:00:00      00:00:07    
EXTRACT     RUNNING     P-EDS       00:00:00      00:00:04    
EXTRACT     RUNNING     P-XZ        00:00:00      00:00:05    
REPLICAT    RUNNING     REP-BOS     00:00:00      00:00:05    
--重启所有异常进程,ogg工作正常

GGSCI (localhost.localdomain) 20> 

5、总结处理步骤
强制关闭mgr,系统级别kill相关ogg进程,开启ogg主进程,重启相关进程

ggsci: error while loading shared libraries

在部署goldengate过程中发现如下错误:
[oracle@localhost OGG]$ ggsci
ggsci: error while loading shared libraries: /opt/oracle/product/10.2.0/db_1/lib/libclntsh.so.10.1: cannot restore segment prot after reloc: Permission denied

查找资料,没有专门说ogg安装这个错误,是SELinux启用导致,但是有很多其他程序执行过程中报类此错误是有此导致,那么我抱着尝试的态度实验看看:
1、查看SELinux是否被关闭
[oracle@localhost tmp]$ more /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – SELinux is fully disabled.
SELINUX=disabled
# SELINUXTYPE= type of policy in use. Possible values are:
# targeted – Only targeted network daemons are protected.
# strict – Full SELinux protection.
SELINUXTYPE=targeted
说明已经关闭。那就奇怪了,既然已经闭关了那为什么还不行?于是我怀疑,是不是有人只是修改了SELINUX=disabled,没有重启系统或者使用命令使其生效导致。

2、查看SELINUX修改是否生效
[root@localhost ~]# getenforce
Enforcing
果然修改没有生效

3、使SELINUX生效
[root@localhost ~]# setenforce 0
再次查询,现在已经生效
[root@localhost ~]# getenforce
Permissive

4、然后启动ggsci
[oracle@localhost ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.1 OGGCORE_11.1.1.1.1_PLATFORMS_110729.1700
Linux, x64, 64bit (optimized), Oracle 10g on Jul 29 2011 19:43:29

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.

Total insert collisions (ogg)

1、错误现象
Replicating from ECP.TAB_UUM_PACKAGE to RWGL.TAB_UUM_USER:
*** Total statistics since 2011-08-05 10:34:10 ***

Total inserts                               20.00
Total updates                                0.00
Total deletes                                0.00
Total discards                               0.00
Total operations                            20.00
Total insert collisions                     20.00

2、错误原因
RWGL.TAB_UUM_USER表上有insert触发器,导致失败。因为触发器使得插入操作为插入和触发器中的操作绑定为了一个整体,现在因为触发器失败,导致插入失败,而且还会丢失该条插入记录,需要查找出该条记录比较困难。

3、解决方案
采用自治事件结合异常捕获
自治事件使得触发器和插入操作相互分离,异常捕获记录触发器失败的原因,插入到日志表中,通过该表,可以查询查失败的记录,然后人工干预,触发器实例:

create or replace trigger ogg_t
  before insert on t_1
  for each row
declare
   tid NUMBER;
   err VARCHAR2(100);
   PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
  SELECT t.id2 INTO tid FROM t_2 t WHERE NAME=:new.Name;
  INSERT INTO t_3 VALUES(tid,:new.name);
  COMMIT;
EXCEPTION
       WHEN TOO_MANY_ROWS THEN
         INSERT INTO t_error VALUES(:new.id,'TOO_MANY_ROWS');
         COMMIT;
        WHEN NO_DATA_FOUND THEN
           INSERT INTO t_error VALUES(:new.id,'NO_DATA_FOUND');
           COMMIT;
         WHEN OTHERS THEN
           err:=SUBSTR(SQLERRM(SQLCODE),1,100);
           INSERT INTO t_error VALUES(:new.id,err);
           COMMIT;
end ogg_t;

1)PRAGMA AUTONOMOUS_TRANSACTION;
自治事务,就是说触发器不管是成功,还是失败,数据库同步程序都能够同步成功数据到目标端

2)COMMIT;
因为采用了自治事件,所以begin end中的操作是独立与数据库中数据,需要单独提交

3)EXCEPTION
添加异常处理

4)INSERT INTO t_error VALUES(:new.id,’TOO_MANY_ROWS’);(类此语句,注意commit)
建立一张错误日志表(根据具体情况决定),如果触发器失败,把错误记录到该表中,以后出现问题查找很方便(要求:通过该表能够查询到那条语句的触发器执行失败。失败原因,失败时间,额外列(用于确定对应记录))

Goldengate常见错误

ERROR OGG-01031 There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file “/opt/OGG/dirdat/AIR/EXTTRAIL/U9000005” (error 11, Resource temporarily unavailable)).
重新启动一次

WARNING OGG-00769 mysql_refresh() failed, falling back to default key. SQL error (1227). Access denied; you need the RELOAD privilege for this operation.
mysql用户权限问题

ERROR OGG-01033 There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Remote file used is /opt/OGG/dirdat/rl000003, reply received is Unable to lock file “/opt/OGG/dirdat/rl000003” (error 13, Permission denied). Lock currently held by process id (PID) 14409)
原因:网络或者目标段路径不正常,访问到目标端目录失败导致
在目标端kill -9 14409
或者等待2小时,自动系统自动重启目标端进程

ERROR OGG-01033 Oracle GoldenGate Capture for Oracle, p-xz.prm: There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Remote file used is /opt/OGG/dirdat/XunZhi/EXTFILE/U1000000, reply received is Could not create /opt/OGG/dirdat/XunZhi/EXTFILE/U1000000).
检查远程的目录是否和datapump中的远程目录是否一致