如何使用Python中的boto库获取AmazonSQS队列中的所有消息？

How to get all messages in Amazon SQS queue using boto library in Python?

我正在开发一个应用程序，它的工作流是通过使用boto在SQS中传递消息来管理的。

我的sqs队列正在逐渐增长，我无法检查它应该包含多少元素。

现在我有了一个守护进程，它定期轮询队列，并检查是否有一组固定大小的元素。例如，考虑以下"队列"：

1	q = ["msg1_comp1","msg2_comp1","msg1_comp2","msg3_comp1","msg2_comp2"]

现在我想检查队列中是否有"msg1_comp1"、"msg2_comp1"和"msg3_comp1"在某个时间点同时出现，但我不知道队列的大小。

在查看了API之后，您似乎只能获得1个元素，或者队列中的固定数量的元素，但不能全部获得：

1
2
3
4
5
6

>>> rs = q.get_messages()
>>> len(rs)
1
>>> rs = q.get_messages(10)
>>> len(rs)
10

答案中提出的一个建议是，例如，在一个循环中获取10条消息，直到我什么也得不到为止，但SQS中的消息具有可见性超时，这意味着如果我从队列中轮询元素，它们不会被真正删除，它们只会在短时间内不可见。

有没有一种简单的方法可以在不知道有多少条消息的情况下获取队列中的所有消息？

在while循环中调用q.get_messages(n)：

1
2
3
4
5

all_messages=[]
rs=q.get_messages(10)
while len(rs)>0:
all_messages.extend(rs)
rs=q.get_messages(10)

此外，dump也不支持超过10条消息：

1
2
3
4

def dump(self, file_name, page_size=10, vtimeout=10, sep='
'):
"""Utility function to dump the messages in a queue to a file
NOTE: Page size must be < 10 else SQS errors"""

相关讨论

我不能这样做，因为sqs中的消息有可见性超时，所以如果我先得到10条消息，然后循环几次，下次我可能会得到同样的10条消息，因为超时已经过去了。我正在考虑使用dump()，但我必须在读了文件之后，这似乎很愚蠢，我是否遗漏了一些东西？(我可以将可见性超时设置为很长时间，但这看起来很难看)。
@链接器-你说你需要检查"n"特定的消息。这是否意味着有一些匹配条件要与每条消息进行比较？
抱歉，如果这让人困惑，我已经更新了我的帖子。
@链接器-根据参考，可见性超时可以长达12小时。除非你要开始一个大规模的EC2工作，我猜这会适合你的需要？docs.amazonwebservices.com/awssimplequeueservice/2011-10-01/&zwnj；&8203；&hellip；
@链接器-顺便说一句，消息的数量应该只有1到10条。如果您使用其他东西，那么sqs服务应该返回一个ReadCountOutOfRange错误。
在这种情况下，依赖时间确实有点困难，因为我正在处理一些我无法控制的事件(来自外部资源，例如，如果它们有问题，并且在几个小时内停止发送可能会造成问题的数据)。我不明白为什么会有dump方法，但是没有get-all-u消息，对我来说毫无意义。
很好，不知道readCountoutOfRange，如果有更多的元素，我在执行dump()时还会得到这个错误吗？
@链接器-查看dump方法的boto源。查看我更新的答案。
我刚刚用boto测试了在我的sqs队列中放入12条消息，并尝试dump()，不过这看起来很好，我使用的是boto 2.1.1。
即使是2.1.1也有这些评论，这很奇怪，我怀疑这是一些遗留的评论，因为如果不是这样的话，即使有更多的消息，它也可以正常工作。
@链接器-有趣的是，正如公共API文档所建议的那样，这应该会返回一个错误。docs.amazonwebservices.com/awssimplequeueservice/2011-10-01/&zwnj；&误8203；&hellip；我建议这是可能会消失的未记录行为(他们可能随时开始强制执行限制)，因此您不应依赖它长期工作。
确实很有趣。我将用这段代码来解决这个问题，这看起来与dump()的功能非常相似，谢谢！

我一直在使用AWS SQS队列来提供即时通知，因此我需要实时处理所有消息。以下代码将帮助您有效地将(所有)消息出列，并在删除时处理任何错误。

注意：要从队列中删除消息，需要删除它们。我正在使用更新的boto3-aws-python-sdk、json库和以下默认值：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

import boto3
import json

region_name = 'us-east-1'
queue_name = 'example-queue-12345'
max_queue_messages = 10
message_bodies = []
aws_access_key_id = '<YOUR AWS ACCESS KEY ID>'
aws_secret_access_key = '<YOUR AWS SECRET ACCESS KEY>'
sqs = boto3.resource('sqs', region_name=region_name,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
queue = sqs.get_queue_by_name(QueueName=queue_name)
while True:
messages_to_delete = []
for message in queue.receive_messages(
MaxNumberOfMessages=max_queue_messages):
# process message body
body = json.loads(message.body)
message_bodies.append(body)
# add message to delete
messages_to_delete.append({
'Id': message.message_id,
'ReceiptHandle': message.receipt_handle
})

# if you don't receive any notifications the
# messages_to_delete list will be empty
if len(messages_to_delete) == 0:
break
# delete messages to remove them from SQS queue
# handle any errors
else:
delete_response = queue.delete_messages(
Entries=messages_to_delete)

相关讨论

我的理解是，SQS服务的分布式特性几乎使您的设计无法工作。每次调用get-messages时，您都在与一组不同的服务器通信，这些服务器将包含一些但不是所有的消息。因此，不可能"不时地签入"来设置特定消息组是否已准备好，然后只接受这些消息。

您需要做的是连续轮询，在所有消息到达时获取它们，并将它们本地存储在您自己的数据结构中。每次成功获取之后，您都可以检查数据结构，以查看是否收集了完整的消息集。

请记住，消息将无序到达，某些消息将被传递两次，因为删除必须传播到所有SQS服务器，但随后的GET请求有时会击败删除消息。

我在cronjob中执行这个

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

from django.core.mail import EmailMessage
from django.conf import settings
import boto3
import json

sqs = boto3.resource('sqs', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
region_name=settings.AWS_REGION)

queue = sqs.get_queue_by_name(QueueName='email')
messages = queue.receive_messages(MaxNumberOfMessages=10, WaitTimeSeconds=1)

while len(messages) > 0:
for message in messages:
mail_body = json.loads(message.body)
print("E-mail sent to: %s" % mail_body['to'])
email = EmailMessage(mail_body['subject'], mail_body['message'], to=[mail_body['to']])
email.send()
message.delete()

messages = queue.receive_messages(MaxNumberOfMessages=10, WaitTimeSeconds=1)

注意：这不是对问题的直接回答。相反，假设最终用户使用的是Boto包(即boto2)，而不是Boto3，这是对@timothyliu答案的一种补充。此代码是他回答中提到的delete_messages呼叫的"boto-2-ization"一个Boto(2)调用delete_message_batch(messages_to_delete)，其中messages_to_delete是一个dict对象，其键为：对应于id的值：receipt_handle对返回。

AttributeError: 'dict' object has no attribute 'id'.

似乎delete_message_batch期望一个Message类对象；为delete_message_batch复制boto源，并允许它使用非Message对象(ala boto3)，如果一次删除超过10条"消息"，也会失败。所以，我必须使用以下工作。

从这里输入代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

from __future__ import print_function
import sys
from itertools import islice

def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)

@static_vars(counter=0)
def take(n, iterable, reset=False):
"Return next n items of the iterable as same type"
if reset: take.counter = 0
take.counter += n
bob = islice(iterable, take.counter-n, take.counter)
if isinstance(iterable, dict): return dict(bob)
elif isinstance(iterable, list): return list(bob)
elif isinstance(iterable, tuple): return tuple(bob)
elif isinstance(iterable, set): return set(bob)
elif isinstance(iterable, file): return file(bob)
else: return bob

def delete_message_batch2(cx, queue, messages): #returns a string reflecting level of success rather than throwing an exception or True/False
"""
Deletes a list of messages from a queue in a single request.
:param cx: A boto connection object.
:param queue: The :class:`boto.sqs.queue.Queue` from which the messages will be deleted
:param messages: List of any object or structure with id and receipt_handle attributes such as :class:`boto.sqs.message.Message` objects.
"""
listof10s = []
asSuc, asErr, acS, acE ="","",0,0
res = []
it = tuple(enumerate(messages))
params = {}
tenmsg = take(10,it,True)
while len(tenmsg)>0:
listof10s.append(tenmsg)
tenmsg = take(10,it)
while len(listof10s)>0:
tenmsg = listof10s.pop()
params.clear()
for i, msg in tenmsg: #enumerate(tenmsg):
prefix = 'DeleteMessageBatchRequestEntry'
numb = (i%10)+1
p_name = '%s.%i.Id' % (prefix, numb)
params[p_name] = msg.get('id')
p_name = '%s.%i.ReceiptHandle' % (prefix, numb)
params[p_name] = msg.get('receipt_handle')
try:
go = cx.get_object('DeleteMessageBatch', params, BatchResults, queue.id, verb='POST')
(sSuc,cS),(sErr,cE) = tup_result_messages(go)
if cS:
asSuc +=","+sSuc
acS += cS
if cE:
asErr +=","+sErr
acE += cE
except cx.ResponseError:
eprint("Error in batch delete for queue {}({})
Params ({}) list: {}".format(queue.name, queue.id, len(params), params))
except:
eprint("Error of unknown type in batch delete for queue {}({})
Params ({}) list: {}".format(queue.name, queue.id, len(params), params))
return stringify_final_tup(asSuc, asErr, acS, acE, expect=len(messages)) #mdel #res

def stringify_final_tup(sSuc="", sErr="", cS=0, cE=0, expect=0):
if sSuc =="": sSuc="None"
if sErr =="": sErr="None"
if cS == expect: sSuc="All"
if cE == expect: sErr="All"
return"Up to {} messages removed [{}]\t\tMessages remaining ({}) [{}]".format(cS,sSuc,cE,sErr)

像下面的代码这样的东西应该可以做到。抱歉，它是C语言的，但转换为Python并不难。这本词典是用来剔除重复的词的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

public Dictionary<string, Message> GetAllMessages(int pollSeconds)
{
var msgs = new Dictionary<string, Message>();
var end = DateTime.Now.AddSeconds(pollSeconds);

while (DateTime.Now <= end)
{
var request = new ReceiveMessageRequest(Url);
request.MaxNumberOfMessages = 10;

var response = GetClient().ReceiveMessage(request);

foreach (var msg in response.Messages)
{
if (!msgs.ContainsKey(msg.MessageId))
{
msgs.Add(msg.MessageId, msg);
}
}
}

return msgs;
}