关于postgresql：我如何优雅地杀死陈旧的服务器进程postgres

how do I gracefully kill stale server process postgres

偶尔在我们的实验室中，我们的postgres 8.3数据库将从pid文件中获取孤立，并且在尝试关闭数据库时会收到此消息：

Error: pid file is invalid, please manually kill the stale server process postgres

发生这种情况时，我们立即执行pg_dump，以便稍后恢复数据库。但是，如果我们只是杀死-9孤立postgres进程然后启动它，则数据库仅使用上次成功关闭的数据启动。但是如果你在杀死之前psql，那么数据全部可用，因此pg_dump的工作原理。

有没有办法优雅地关闭孤立的postgres进程，所以我们不必通过pg_dump并恢复？或者有没有办法让数据库在杀死孤立进程后恢复？

根据文档，您可以发送SIGTERM或SIGQUIT。 SIGTERM是首选。无论哪种方式都不使用SIGKILL(正如您从个人经验中所知)。

编辑：另一方面，您遇到的不正常，可能表示错误配置或错误。请在pgsql-admin邮件列表上寻求帮助。

永远不要使用kill -9。

我强烈建议你试着弄清楚这是怎么发生的。错误消息究竟来自哪里？这不是PostgreSQL错误消息。你有没有机会混合不同的方式来启动/停止服务器(有时是initscrises，有时是pg_ctl)？这可能会导致事情不同步。

但要回答直接问题 - 在进程上使用常规kill(no -9)来关闭它。如果有多个postgres进程正在运行，请确保删除所有postgres进程。

数据库将在关闭时始终自动恢复。这个shuold也发生在kill -9上 - 任何提交的数据都应该在那里。这几乎听起来像你有两个不同的数据目录相互叠加或类似的东西 - 至少在此之前这已经成为NFS的一个已知问题。

相关讨论

我使用像cron每分钟运行的以下脚本。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

#!/bin/bash

DB="YOUR_DB"

# Here's a snippet to watch how long each connection to the db has been open:
# watch -n 1 'ps -o pid,cmd,etime -C postgres | grep $DB'

# This program kills any postgres workers/connections to the specified database
# which have been running for 2 or 3 minutes. It actually kills workers which
# have an elapsed time including"02:" or"03:". That'll be anything running
# FOR at least 2 minutes AND less than 4. It'll also cover anything that
# managed to stay around until an hour and 2 or 3 minutes, etc.
#
# Run this once a minute via cron and it should catch any connection open
# between 2 and 3 minutes. You can temporarily disable it if if you need to run
# a long connection once in a while.
#
# The check for"03:" is in case there's a little lag starting the cron job AND
# the timing IS really bad AND it never sees a worker IN the 1 MINUTE window
# WHEN it's got"02:".
old=$(ps -o pid,cmd,etime -C postgres | grep"$DB" | egrep '0[23]:')
if [ -n"$old" ]; then
echo"Killing:"
echo"$old"
echo"$old" | awk '{print $1}' | xargs -I {} kill {}
fi