关于python:InterfaceError:连接已经关闭(使用django + celery + Scrapy)

InterfaceError: connection already closed (using django + celery + Scrapy)

在Celery任务中使用Scrapy解析函数(有时可能需要10分钟)时,我得到了这个。

我用:
- Django == 1.6.5
- django-celery == 3.1.16
- 芹菜== 3.1.16
- psycopg2 == 2.5.5(我也使用了psycopg2 == 2.5.4)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[2015-07-19 11:27:49,488: CRITICAL/MainProcess] Task myapp.parse_items[63fc40eb-c0d6-46f4-a64e-acce8301d29a] INTERNAL ERROR: InterfaceError('connection already closed',)
Traceback (most recent call last):
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/app/trace.py", line 284, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/celery/backends/base.py", line 248, in store_result
    request=request, **kwargs)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/backends/database.py", line 29, in _store_result
    traceback=traceback, children=self.current_task_children(request),
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 42, in _inner
    return fun(*args, **kwargs)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 181, in store_result
    'meta': {'children': children}})
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 87, in update_or_create
    return get_queryset(self).update_or_create(**kwargs)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/djcelery/managers.py", line 70, in update_or_create
    obj, created = self.get_or_create(**kwargs)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
    return self.get(**lookup), False
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
    num = len(clone)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
    self._fetch_all()
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
    self._result_cache = list(self.iterator())
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
    for row in compiler.results_iter():
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
    for rows in self.execute_sql(MULTI):
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 785, in execute_sql
    cursor = self.connection.cursor()
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 160, in cursor
    cursor = self.make_debug_cursor(self._cursor())
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
    return self.create_cursor()
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 134, in _cursor
    return self.create_cursor()
  File"/home/mo/Work/python/pb-env/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 137, in create_cursor
    cursor = self.connection.cursor()
InterfaceError: connection already closed


Unfortunately this is a problem with django + psycopg2 + celery combo.
It's an old and unsolved problem.

Take a look on this thread to understand:
https://github.com/celery/django-celery/issues/121

Basically, when celery starts a worker, it forks a database connection
from django.db framework. If this connection drops for some reason, it
doesn't create a new one. Celery has nothing to do with this problem
once there is no way to detect when the database connection is dropped
using django.db libraries. Django doesn't notifies when it happens,
because it just start a connection and it receives a wsgi call (no
connection pool). I had the same problem on a huge production
environment with a lot of machine workers, and sometimes, these
machines lost connectivity with postgres server.

I solved it putting each celery master process under a linux
supervisord handler and a watcher and implemented a decorator that
handles the psycopg2.InterfaceError, and when it happens this function
dispatches a syscall to force supervisor restart gracefully with
SIGINT the celery process.

编辑:

找到更好的解决方案。 我实现了像这样的芹菜任务基类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from django.db import connection
import celery

class FaultTolerantTask(celery.Task):
   """ Implements after return hook to close the invalid connection.
    This way, django is forced to serve a new connection for the next
    task.
   """

    abstract = True

    def after_return(self, *args, **kwargs):
        connection.close()

@celery.task(base=FaultTolerantTask)
def my_task():
    # my database dependent code here

我相信它也能解决你的问题。


伙计们和emanuelcds,

我有同样的问题,现在我已经更新了我的代码并为芹菜创建了一个新的加载器:

1
2
3
4
5
6
7
8
9
from djcelery.loaders import DjangoLoader
from django import db

class CustomDjangoLoader(DjangoLoader):
    def on_task_init(self, task_id, task):
       """Called before every task."""
        for conn in db.connections.all():
            conn.close_if_unusable_or_obsolete()
        super(CustomDjangoLoader, self).on_task_init(task_id, task)

这当然如果你使用djcelery,它在设置中也需要这样的东西:

1
2
CELERY_LOADER = 'myproject.loaders.CustomDjangoLoader'
os.environ['CELERY_LOADER'] = CELERY_LOADER

我仍然要测试它,我会更新。