OpenCV-Python中的简单数字识别OCR

Simple Digit Recognition OCR in OpenCV-Python

我正试图在OpenCvpython(CV2)中实现"数字识别OCR"。它只是为了学习的目的。我想在OpenCV中学习Knearest和SVM功能。

我每个数字有100个样本(即图像)。我想和他们一起训练。

opencv样本附带了一个样本letter_recog.py。但我仍然不知道如何使用它。我不明白什么是示例、响应等。而且，它首先加载了一个txt文件，而我没有首先理解它。

稍后搜索一点，我可以在cpp样本中找到一个字母"recognition.data"。我用它为cv2.knarest编写了一个字母"recog.py"模型的代码(仅用于测试)：

1
2
3
4
5
6
7
8
9
10
11

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

它给了我一个2万的数组，我不明白它是什么。

问题：

1)什么是letter_recognition.data文件？如何从我自己的数据集构建该文件？

2)results.reval()表示什么？

3)如何使用Letter_recognition.data文件(Knearest或SVM)编写简单的数字识别工具？

好吧，我决定在我的问题上锻炼自己来解决上面的问题。我想要的是使用OpenCV中的Knearest或SVM功能实现一个简单的OCR。下面是我所做的以及如何做到的。(这只是为了学习如何将Knearest用于简单的OCR目的)。

1)我的第一个问题是关于OpenCV样本附带的letter_recognition.data文件。我想知道那个文件里有什么。

它包含一个字母，以及该字母的16个特征。

埃多克斯1〔0〕帮我找到了它。这16个特点在文件中有解释。(虽然我不理解最后的一些特性)

2)因为我知道，在不了解所有这些特性的情况下，很难做到这一点。我还试了一些其他的试卷，但对初学者来说都有点难。

So I just decided to take all the pixel values as my features.(我不担心准确性或性能，我只是想让它工作，至少是以最低的准确性)

我为我的培训数据拍摄了以下图片：

enter image description here

(我知道培训数据的数量较少。但是，由于所有的字母都是相同的字体和大小，我决定试试这个)。

为了准备培训数据，我在OpenCV中编写了一个小代码。它执行以下操作：

它加载图像。

选择数字(显然是通过轮廓查找和对字母的面积和高度应用约束来避免错误检测)。

围绕一个字母绘制边界矩形，然后等待key press manually。这一次我们自己按数字键来对应框中的字母。

按下相应的数字键后，它会将此框的大小调整为10x10，并在一个数组(此处为示例)中保存100个像素值，并在另一个数组(此处为响应)中保存相应的手动输入数字。

然后将两个数组保存在单独的txt文件中。

在人工数字分类结束时，列车数据(train.png)中的所有数字都是自己手工标注的，图像如下：

enter image description here

下面是我用于上述目的的代码(当然，不是很干净)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

################# Now finding Contours ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)

if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)

if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print"training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

现在我们进入培训和测试部分。

对于测试部分，我使用下面的图片，它和我训练时使用的字母类型相同。

enter image description here

培训内容如下：

加载我们之前保存的txt文件

创建一个我们正在使用的分类器实例(这里是Knearest)

然后我们使用knarest.train函数来训练数据

出于测试目的，我们执行以下操作：

我们加载用于测试的图像

像之前一样处理图像，并使用轮廓法提取每个数字

为其绘制边界框，然后将其大小调整为10x10，并如前面所述将其像素值存储在数组中。

然后，我们使用knarest.find_nearest()函数查找与我们给出的项最近的项。(如果幸运的话，它会识别出正确的数字。)

我将最后两个步骤(培训和测试)包含在以下单一代码中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

import cv2
import numpy as np

####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

它起作用了，下面是我得到的结果：

enter image description here

在这里，它可以100%准确地工作。我想这是因为所有的数字都是相同的类型和大小。

但无论如何，对于初学者来说，这是一个很好的开始(我希望如此)。

相关讨论

+一个很长的职位，但很有教育意义。这应该转到opencv标签信息
如果有人感兴趣，我用这段代码制作了一个合适的OO引擎，还有一些铃声和口哨：github.com/goncalopp/simple-ocr-opencv
嗨，这篇文章引用的谷歌文档链接对我不起作用。
@里卡多：编辑了链接，检查它是否工作，或者搜索论文的名称，谷歌的第一个链接就会出现。
提取功能时，是否可以按顺序进行？或者边界检测总是随机的？提前谢谢
我没有找到你。你在问在OpenCV中找到的轮廓的顺序吗？那就不好了。我不知道它是否是随机的，但我们知道的顺序不是这样的，比如从左到右从上到下，反之亦然，在某些情况下它会造成一些困难。我们必须根据我们的标准手工订购。(如果这不是你想问的，请澄清)
请注意，当您拥有定义良好的完美字体时，不需要使用SVM和KNN。例如，数字0、4、6、9构成一组，数字1、2、3、5、7构成另一组，数字8构成另一组。这个组由欧拉数给出。然后"0"没有端点，"4"有两个端点，"6"和"9"由质心位置区分。3"是另一组中唯一具有3个端点的。1"和7"以骨架长度区分。当考虑凸壳和数字时，"5"和"2"有两个孔，它们可以通过最大孔的质心来区分。
好吧，谢谢你提供欧拉号码的信息。我不知道。无论如何，在这种情况下，我知道这是一个非常直接的问题。但我的目标是了解如何使用Knearest函数，以及如何在最基本的层面上开发一个简单的OCR。这就是我想要的。
@阿比德拉曼克，第一个代码运行完美。但不幸的是，我在运行第二个code.opencv error:cvcchecktraindata中的错误参数(列车数据必须是浮点矩阵)，file/build/buildd/opencv-2.3.1/modules/ml/src/inner_functions.cp&zwnj；&8203；p，第857行回溯(最近调用的最后一个)：file"num1.py"，line 10，inmodel.train(samples，responses)cv2.error:/build/buildd/opencv-2.3.1/modules/ml/src/inner_functions.cp&zwnj；&8203；p:857:error:(-5)列车数据必须是函数cvchecktraindata中的浮点矩阵，如何修复？
@皮疹：检查samples和responses是否为浮点。如果不是，请将它们转换为浮点。
@Abidrahmank:我正在获取generalresponses.data和generalsamples.data文件为空。实际上，在我的例子中，第一个程序给出响应一个空列表，并给出1048586作为键。
@鲁莽：你用的是和我用的相同的图像吗？代码的第一部分仅为此图像设置。
@阿比德拉曼克：是的，兄弟。就像你说的那样。但是文件是空的：(。
@鲁莽：你的问题得到解决了吗？我做这件事也遇到了同样的问题。
@沙尔基：对不起，伙计。我没有。
有问题……谢谢您。这是一个很好的教程。我犯了一个小错误。如果有人在这件事上像我和@rush一样面临同样的问题，那是因为你按错了键。对于框中的每个数字，您必须输入该编号，以便对其进行培训。希望有帮助。
对于安装OpenCV(2.4+版)的人来说，OpenCV中有几个API已经更改。我刚做了一个回购，这对我来说很好。希望它有帮助。
一流的教程。谢谢您！需要进行一些更改才能与最新(3.1)版本的Opencv:Contours，hierarchy=cv2.findcontours(thresh，cv2.retr_list，cv2.chain_about-sipl&zwnj；&8203；e)=>，Contours，hierarchy=cv2.findcontours(thres，cv2.retr_list，cv2.cha链_about-sipl&zwnj；&35;；e)，model=cv2.knarest()=>model=cv2.ml.ml.ml.model=cv2.ml.cl.cl.cl.ml.Knearest ou创建()，model.train(samples，responses)=>model.train(samples，cv2.ml.row_sample，responses)，retval，results，neigh_resp，dists=model.find_nearest(roishlow，k=1)=>retval，results，neigh_resp，dists=model.find_nearest(roishlow，k=1)
谢谢。是的，这个已经很老了，我最近还没有探索过3.x版本。我想我应该再看一遍。
@约翰内斯堡感谢您的更新，快速提示-您上次的更正稍微有点偏离，应该阅读：retval，results，neigh_resp，dists=model.find欷nearest(roishlow，k=1)=>retval，results，neigh_resp，dists=model.findnearst(roishlow，k=1)
很好的解释……谢谢)
这是很好的解释，但是我们如何在上面的图像中找到数字，每个轮廓都是数字。但在现实世界中，有一些轮廓不是数字。@abidrahmank
我很想知道，与Tesseract OCR相比，仅在数字方面，这一过程有多快。有人测试过这两条围裙吗？
我测试了自己，所以如果有人想知道这个围裙有多快，我会给出我的结果。我使用了一个小的(38x24px)图像，从加载要分析的图像到获得字符串变量，字符为123。在Windows 10 x64上使用OpenCv 3，带有Intel I-73540M CPU和Tesseract 4.0 Alpha(未经培训)。- AbidRahmanK code took: **0.0510** seconds - Tesseract with"-psm 6" attributes took: **0.8970** seconds所以@Abidrahmank在速度上远胜出，除了这个非常简单的测试，还没有比较精度，它给出了100%的精度。
我试过你的密码，确实能完全识别出数字。但是你如何识别这个点呢？
我们可以用Java做同样的过程吗？
我试过你的第二个代码，在model.train(samples,responses)行收到以下错误消息：TypeError: only size-1 arrays can be converted to Python scalars，我不知道我做错了什么。有什么想法吗？
@黑暗帮你解决了问题？我也有同样的错误。

对于那些对C++代码感兴趣的人可以参考下面的代码。感谢阿比德·拉赫曼的解释。

程序与上面相同，但是轮廓查找只使用第一层次的轮廓，因此算法只对每个数字使用外部轮廓。

用于创建示例和标签数据的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
thr.copyTo(con);

// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
sample.push_back(tmp2.reshape(1,1)); // Store sample data
imshow("src",src);
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
}

// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float

FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data <<"data" << sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label <<"label" << response;
Label.release();
cout<<"Training and Label data created successfully....!!"<<endl;

imshow("src",src);
waitKey();

培训与测试规范

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

Mat thr,gray,con;
Mat src=imread("dig.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
thr.copyTo(con);

// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
Label.release();

KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;

vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;

//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
{
Rect r= boundingRect(contours[i]);
Mat ROI = thr(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
tmp1.convertTo(tmp2,CV_32FC1);
float p=knn.find_nearest(tmp2.reshape(1,1), 1);
char name[4];
sprintf(name,"%d",(int)p);
putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}

imshow("src",src);
imshow("dst",dst);
imwrite("dest.jpg",dst);
waitKey();

结果

结果，第一行中的点被检测为8，我们还没有对点进行培训。同时，我将第一层次的每一个轮廓作为样本输入，用户可以通过计算面积来避免。

Results