pg_control丢失/损坏处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:pg_control丢失/损坏处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

-l :设置下一个wal日志名
可以通过 $PGDATA/pg_wal 目录中查找数值最新的WAL段的文件名+1

[postgres@localhost pg_wal]$ ls
00000001000000000000008E  archive_status

-l 00000001000000000000008F

-m:设置下一个和最旧的事务ID
mxid1:下一个多事务ID的安全值可以通过在 $PGDATA/pg_multixact/offsets目录中查找数值最大的文件名+1,然后乘以65536(0×10000)来确定。
mxid2:最旧的事务ID的安全值可以通过$PGDATA/pg_multixact/offsets目录中数字最小的文件名+1,乘以65536(0×10000)来确定。文件名是十六进制

[postgres@localhost pg_wal]$ cd ../pg_multixact
[postgres@localhost pg_multixact]$ ls
members  offsets
[postgres@localhost pg_multixact]$ cd offsets/
[postgres@localhost offsets]$ ls
0000

-m 0x10000,0x10000

-O:设置下一个多事务处理偏移量
安全值可以通过在 $PGDATA/pg_multixact/members 目录中查找数值最大的文件名,+1,然后乘以52352(0xCC80)来确定。文件名为十六进制

[postgres@localhost members]$ ls
0000

-O 0xCC80

-x:设置下一个事务ID
安全值可以通过在$PGDATA/pg_xact目录中查找数值最大的文件名,+1,然后乘以1048576(0×100000)来确定。请注意,文件名是十六进制

[postgres@localhost pg_xact]$ ls -ltr
total 504
-rw------- 1 postgres postgres 262144 Mar  8 01:53 0000
-rw------- 1 postgres postgres 253952 Mar  9 10:54 0001

-x 0x200000

删除postmaster.pid文件
如果正常关闭库,该文件会被自动删除,异常关闭的才需要处理

[postgres@localhost pg_wal]$ pg_resetwal -l 00000001000000000000008F -m 0x10000,0x10000 -O 0xCC80 -x 0x200000 -f $PGDATA
pg_resetwal: error: lock file "postmaster.pid" exists
pg_resetwal: hint: Is a server running?  If not, delete the lock file and try again.
[postgres@localhost pg_wal]$ cd ../
[postgres@localhost data]$ ls -l postmaster.pid
-rw------- 1 postgres postgres 75 Mar  9 11:02 postmaster.pid
[postgres@localhost data]$ rm -rf postmaster.pid

touch pg_control文件

[postgres@localhost data]$ pg_resetwal -l 00000001000000000000008F -m 0x10000,0x10000 -O 0xCC80 -x 0x200000 -f $PGDATA
pg_resetwal: error: could not open file "global/pg_control" for reading: No such file or directory
pg_resetwal: hint: If you are sure the data directory path is correct, execute
  touch global/pg_control
and try again.
[postgres@localhost data]$ touch global/pg_control

重建pg_control

[postgres@localhost data]$ pg_resetwal -l 00000001000000000000008F -m 0x10000,0x10000 -O 0xCC80 -x 0x200000 -f $PGDATA
pg_resetwal: warning: pg_control exists but is broken or wrong version; ignoring it
Write-ahead log reset
[postgres@localhost data]$ pg_ctl start
waiting for server to start....
 done
server started

pg启动报invalid checkpoint record处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:pg启动报invalid checkpoint record处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

pg库启动报PANIC: could not locate a valid checkpoint record错误

2025-03-09 10:59:10.365 EDT [73013] LOG:  starting PostgreSQL 16.8 on x86_64-pc-linux-gnu, 
                                          compiled by gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), 64-bit
2025-03-09 10:59:10.365 EDT [73013] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-03-09 10:59:10.365 EDT [73013] LOG:  listening on IPv6 address "::", port 5432
2025-03-09 10:59:10.367 EDT [73013] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-03-09 10:59:10.401 EDT [73018] LOG:  database system was interrupted; last known up at 2025-03-09 10:54:43 EDT
2025-03-09 10:59:11.506 EDT [73018] LOG:  invalid checkpoint record
2025-03-09 10:59:11.508 EDT [73018] PANIC:  could not locate a valid checkpoint record
2025-03-09 10:59:12.004 EDT [73013] LOG:  startup process (PID 73018) was terminated by signal 6: Aborted
2025-03-09 10:59:12.004 EDT [73013] LOG:  aborting startup due to startup process failure
2025-03-09 10:59:12.006 EDT [73013] LOG:  database system is shut down

从报错信息中看,是有无法读取到有效的checkpoint记录导致,初步怀疑是wal异常,检查pg_wal目录,发现wal日志为空

[root@localhost pg_wal]# ls -ltr
total 16388
drwx------  2 postgres postgres        6 Mar  6 08:10 archive_status
[root@localhost pg_wal]# 

进一步检查系统操作命令rm删除wal日志命令

cd pg_wal
rm -rf 0000000*

初步确认是由于wal日志被意外删除导致pg库无法启动,尝试重置wal,由于库不是干净关闭,无法直接重置成功

[postgres@localhost log]$ pg_resetwal $PGDATA
The database server was not shut down cleanly.
Resetting the write-ahead log might cause data to be lost.
If you want to proceed anyway, use -f to force reset.

使用pg_resetwal -f强制重置

[postgres@localhost log]$ pg_resetwal -f $PGDATA
Write-ahead log reset

尝试启动pg库成功

[postgres@localhost log]$ pg_ctl start 
waiting for server to start....2025-03-09 11:02:02.609 EDT [73088] LOG:  
redirecting log output to logging collector process
2025-03-09 11:02:02.609 EDT [73088] HINT:  Future log output will appear in directory "log".
 done
server started
2025-03-09 11:02:02.609 EDT [73088] LOG:  starting PostgreSQL 16.8 on x86_64-pc-linux-gnu, 
                                          compiled by gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), 64-bit
2025-03-09 11:02:02.609 EDT [73088] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-03-09 11:02:02.609 EDT [73088] LOG:  listening on IPv6 address "::", port 5432
2025-03-09 11:02:02.610 EDT [73088] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-03-09 11:02:02.645 EDT [73092] LOG:  database system was shut down at 2025-03-09 11:01:53 EDT
2025-03-09 11:02:02.650 EDT [73088] LOG:  database system is ready to accept connections

pg误删除数据恢复(PostgreSQL delete数据恢复)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:pg误删除数据恢复(PostgreSQL delete数据恢复)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

应用维护人员通过应用程序误删除了一些数据,希望对其进行恢复.通过咨询确认是PostgreSQL数据,wal和arch文件都在,取客户删除数据时间点相关wal文件
QQ20250306-231208


然后让客户提供具体涉及误删除的相关的库和表信息
QQ20250306-230846

通过工具解析wal日志,比较顺利的获取到了undo sql语句

Switch wal to 0000000300000525000000B0 on time 2025-03-05 22:23:14.263661+08
Switch wal to 0000000300000525000000B1 on time 2025-03-05 22:23:14.423082+08
Switch wal to 0000000300000525000000B2 on time 2025-03-05 22:23:15.983833+08
Switch wal to 0000000300000525000000B3 on time 2025-03-05 22:23:17.802107+08
Switch wal to 0000000300000525000000B4 on time 2025-03-05 22:23:18.942125+08
Switch wal to 0000000300000525000000B5 on time 2025-03-05 22:23:20.293585+08
Switch wal to 0000000300000525000000B6 on time 2025-03-05 22:23:21.531484+08
Switch wal to 0000000300000525000000B7 on time 2025-03-05 22:23:24.217501+08
Switch wal to 0000000300000525000000B8 on time 2025-03-05 22:23:27.504164+08
Switch wal to 0000000300000525000000B9 on time 2025-03-05 22:23:29.260754+08
Switch wal to 0000000300000525000000BA on time 2025-03-05 22:23:33.544486+08
Switch wal to 0000000300000525000000BB on time 2025-03-05 22:23:35.568116+08
Switch wal to 0000000300000525000000BC on time 2025-03-05 22:23:37.329659+08
Switch wal to 0000000300000525000000BD on time 2025-03-05 22:23:38.971834+08
Switch wal to 0000000300000525000000BE on time 2025-03-05 22:23:39.922353+08
Switch wal to 0000000300000525000000BF on time 2025-03-05 22:23:40.897294+08

QQ20250306-231657


然后把这些sql语句反向插入到生产库,完成这些误操作数据的恢复
QQ20250306-231935

如果此类的数据库(oracle,mysql,sql server,PostgreSQL)等恢复请求,需要专业恢复技术支持,请联系我们:
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

PostgreSQL表文件损坏恢复—pdu恢复损坏的表文件

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL表文件损坏恢复—pdu恢复损坏的表文件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些情况下,由于PostgreSQL表文件损坏导致无法正常访问,可以通过pdu把好的block中的数据恢复出来
准备一张测试表,里面有97条记录

his5_dms=# \d hiscrm.t_sys_oper_log;
                                            Table "hiscrm.t_sys_oper_log"
       Column       |          Type          | Collation | Nullable |Default                     
--------------------+------------------------+-----------+----------+------------
 id                 | bigint                 |           | not null | 
 module             | character varying(50)  |           |          | 
 title              | character varying(50)  |           |          | 
 alias              | character varying(50)  |           |          | 
 business_type      | integer                |           |          | 0
 method             | character varying(200) |           |          | 
 request_method     | character varying(10)  |           |          | 
 operator_type      | integer                |           |          | 0
 oper_name          | character varying(50)  |           |          | 
 dept_name          | character varying(50)  |           |          | 
 oper_url           | character varying(255) |           |          | 
 oper_ip            | character varying(50)  |           |          | 
 oper_location      | character varying(255) |           |          | 
 oper_param         | text                   |           |          | 
 json_result        | text                   |           |          | 
 status             | integer                |           |          | 0
 error_msg          | text                   |           |          | 
 oper_time          | date                   |           |          | 
 create_id          | bigint                 |           |          | 
 create_time        | bigint                 |           |          | 
 clinic_id          | bigint                 |           |          | 
 group_id           | bigint                 |           |          | 
 patient_id         | bigint                 |           |          | 
 is_patient_related | integer                |           |          | 
 business_content   | json                   |           |          | 

his5_dms=# select count(1) from hiscrm.t_sys_oper_log;
 count 
-------
    97
(1 row)

查询表对应的具体文件

his5_dms=# SELECT oid,relfilenode FROM pg_class WHERE relname='t_sys_oper_log';
  oid  | relfilenode 
-------+-------------
 16850 |       16850
(1 row)

his5_dms=# SELECT pg_relation_filepath('hiscrm.t_sys_oper_log');
 pg_relation_filepath 
----------------------
 base/16386/16850
(1 row)

his5_dms=# SHOW data_directory;
     data_directory     
------------------------
 /var/lib/pgsql/12/data
(1 row)

使用dd对文件进行破坏

[postgres@xifenfeidg ~]$ ls -l  /var/lib/pgsql/12/data/base/16386/16850
-rw-------. 1 postgres postgres 90112 Sep  5 20:26 /var/lib/pgsql/12/data/base/16386/16850
[postgres@xifenfeidg ~]$ dd if=/dev/zero of=/var/lib/pgsql/12/data/base/16386/16850 bs=512 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes copied, 0.000158756 s, 3.2 MB/s

重启pg库

[postgres@xifenfeidg bin]$ ./pg_ctl -m fast -D /var/lib/pgsql/12/data/ stop
waiting for server to shut down.... done
server stopped
[postgres@xifenfeidg bin]$ ./pg_ctl -D /var/lib/pgsql/12/data/ start
waiting for server to start....2025-03-02 19:02:11.395 HKT [64515] LOG:  
  starting PostgreSQL 12.20 on x86_64-pc-linux-gnu, 
  compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
2025-03-02 19:02:11.396 HKT [64515] LOG:  listening on IPv6 address "::1", port 5432
2025-03-02 19:02:11.396 HKT [64515] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-03-02 19:02:11.396 HKT [64515] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-03-02 19:02:11.397 HKT [64515] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-03-02 19:02:11.403 HKT [64515] LOG:  redirecting log output to logging collector process
2025-03-02 19:02:11.403 HKT [64515] HINT:  Future log output will appear in directory "log".
 done
server started

查询数据报错

[postgres@xifenfeidg bin]$ psql
psql (16.8, server 12.20)
Type "help" for help.

postgres=# \c his5_dms;
psql (16.8, server 12.20)
You are now connected to database "his5_dms" as user "postgres".
his5_dms=#  select count(1) from hiscrm.t_sys_oper_log;
ERROR:  invalid page in block 0 of relation base/16386/16850

通过pdu进行恢复
跳过了坏块,把好的block中数据均恢复出来

his5_dms.hiscrm=# unload tab t_sys_oper_log;
正在解析表 <t_sys_oper_log>. 已解析数据页: 0, 已解析数据: 0 条
        |-块号0 空页面或页面已损坏,已跳过
正在解析表 <t_sys_oper_log>. 已解析数据页: 11, 已解析数据: 86 条
表名<t_sys_oper_log>-</var/lib/pgsql/12/data/base/16386/16850> 
    解析完成, 11 个数据页 ,共计 86 条数据. 成功 86 条; 失败【0】条 
COPY文件路径为:<his5_dms/hiscrm/t_sys_oper_log.csv>

导入pg库中

his5_dms=# truncate table hiscrm.t_sys_oper_log;
TRUNCATE TABLE
his5_dms=# \i his5_dms/COPY/hiscrm_copy.sql
SET
COPY 86
his5_dms=# 

PostgreSQL恢复工具—pdu恢复单个表文件

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:PostgreSQL恢复工具—pdu恢复单个表文件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在某些情况下,比如我们需要对单个的PostgreSQL库的表文件进行恢复(比如文件系统损坏,drop库/表,truncate表等原因,然后找到了部分oid文件),可以使用pdu对其进行完美恢复(相比pg_filedump也方便很多),具体操作步骤:
1. 由于只有单个表文件,无法获取字典信息,因此需要应用厂商/客户提供具体表创建语句

his5_dms=#    CREATE TABLE t_xff (
his5_dms(#         id                       bigint,
his5_dms(#         hospital_id              bigint,
his5_dms(#         parent_id                bigint,
his5_dms(#         disease_code             varchar(60),
his5_dms(#         disease_name             varchar(60),
his5_dms(#         type                     smallint,
his5_dms(#         py                       varchar(60),
his5_dms(#         wb                       varchar(60),
his5_dms(#         sc                       varchar(20),
his5_dms(#         order_no                 int,
his5_dms(#         state                    smallint,
his5_dms(#         create_datetime          timestamp(6),
his5_dms(#         create_id                bigint,
his5_dms(#         edit_datetime            timestamp(6),
his5_dms(#         edit_id                  bigint,
his5_dms(#         search_path              varchar(300),
his5_dms(#         diagnosis_sort           int,
his5_dms(#         category_name            varchar(40),
his5_dms(#         input_option             varchar(40),
his5_dms(#         category_class           smallint,
his5_dms(#         memo1                    varchar(300),
his5_dms(#         memo2                    varchar(300),
his5_dms(#         other_code               varchar(60),
his5_dms(#         other_name               varchar(60),
his5_dms(#         special_disease_flag     smallint
his5_dms(#    );
CREATE TABLE

2. 把oid文件pdu放到restore库中

[root@xifenfeidg public]# pwd
/tmp/pdu/restore/public
[root@xifenfeidg public]# ls -l
total 7144
-rw-r--r--. 1 root root 7315456 Mar  2 21:04 123456
[root@xifenfeidg public]# 

3. 使用add语句在pdu加载数据类型

restore.public=# add 123456 t_xff bigint,bigint,bigint,varchar,varchar,smallint,varchar,varchar,varchar,
int,smallint,timestamp,bigint,timestamp,bigint,varchar,int,varchar,varchar,
smallint,varchar,varchar,varchar,varchar,smallint;
添加完成,请用\dt;查看可unload的表
restore.public=# \dt;
|--------------------------------------------------|
|               表名                  |  表大小    |
|--------------------------------------------------|
|    t_xff                            |  6.98 MB   |
|--------------------------------------------------|

        仅显示表大小排名前 1 的表名

4.使用pdu恢复表数据

restore.public=# unload t_xff;
正在解析表 <t_xff>. 已解析数据页: 893, 已解析数据: 46998 条
<t_xff>-<restore/public/123456> 解析完成, 894 个数据页 ,共计 46998 条数据. 成功 46998 条; 失败【0】条 
 COPY文件路径为:<restore/public/t_xff.csv>
restore.public=# unload COPY;

COPY命令导出完成, 文件路径: restore/COPY/public_copy.sql

5.导入数据到pg库中

his5_dms=# \i restore/COPY/public_copy.sql
SET
COPY 46998
his5_dms=# select count(1) from t_xff;
 count 
-------
 46998
(1 row)
his5_dms=# \x
Expanded display is on.
his5_dms=# select * from t_xff limit 1;
-[ RECORD 1 ]--------+---------------------------
id                   | 323839
hospital_id          | 0
parent_id            | 301
disease_code         | 57.8900x003
disease_name         | 腹腔镜下膀胱颈悬吊术
type                 | 2
py                   | fqjxpgjxds
wb                   | eeqgeeceks
sc                   | 
order_no             | 0
state                | 1
create_datetime      | 2022-09-29 15:22:58.588492
create_id            | 
edit_datetime        | 
edit_id              | 
search_path          | 301,
diagnosis_sort       | 
category_name        | 
input_option         | 
category_class       | 3
memo1                | 
memo2                | 
other_code           | 
other_name           | 
special_disease_flag | 0