关于python：解析位置，人名，字符串的日期由NLTK

Parse Location, Person name, Date from string by NLTK

我有很多像下面这样的弦，

ISLAMABAD: Chief Justice Iftikhar Muhammad Chaudhry said that National Accountab。

江户十一〔一〕号

埃多克斯1〔2〕

我正在使用NLTK删除日期行部分并识别日期、位置和人名？

使用POS标记，我可以找到语音部分。但我需要确定地点、日期和人名。我该怎么做？

更新：

注意：我不想执行另一个HTTP请求。我需要用我自己的代码来解析它。如果有图书馆，可以使用它。

更新：

我用的是ne_chunk。但没有运气。

1
2
3
4
5
6
7
8
9
10
11
12

import nltk

def pchunk(t):
w_tokens = nltk.word_tokenize(t)
pt = nltk.pos_tag(w_tokens)
ne = nltk.ne_chunk(pt)
print ne

# txts is a list of those 3 sentences.
for t in txts:
print t
pchunk(t)

输出如下，

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

ISLAMABAD: Chief Justice Iftikhar Muhammad Chaudhry said that National Accountab

(S
ISLAMABAD/NNP
:/:
Chief/NNP
Justice/NNP
(PERSON Iftikhar/NNP Muhammad/NNP Chaudhry/NNP)
said/VBD
that/IN
(ORGANIZATION National/NNP Accountab/NNP))

KARACHI, July 24 -- Police claimed to have arrested several suspects in separate

(S
(GPE KARACHI/NNP)
,/,
July/NNP
24/CD
--/:
Police/NNP
claimed/VBD
to/TO
have/VB
arrested/VBN
several/JJ
suspects/NNS
in/IN
separate/JJ)

ALUM KULAM, Sri Lanka -- As gray-bellied clouds started to blot out the scorchin

(S
(GPE ALUM/NN)
(ORGANIZATION KULAM/NN)
,/,
(PERSON Sri/NNP Lanka/NNP)
--/:
As/IN
gray-bellied/JJ
clouds/NNS
started/VBN
to/TO
blot/VB
out/RP
the/DT
scorchin/NN)

号

仔细检查。即使卡拉奇也很受欢迎，但斯里兰卡是公认的人，伊斯兰堡是公认的NNP而不是GPE。