关于opencv：除了Haar级联之外，还有哪些算法或方法可用于自定义对象检测？

What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

我需要做计算机视觉任务，以检测沃特瓶或苏打罐。我将获得瓶子、苏打罐或任何其他随机物体(一个接一个)的"正面"图像，我的算法应该确定它是瓶子、罐还是它们中的任何一个。

关于对象检测方案的一些详细信息：

如前所述，我将为每个图像/视频帧测试一个对象。
并非所有的瓦特瓶都是一样的。塑料、盖子或标签可能有颜色变化。也许有些人拿不到标签或盖子。
苏打罐也有同样的变化。不过，不会测试起皱的苏打罐。
对象之间可能存在较小的大小差异。
我可以有一个绿色(或任何自定义颜色)的背景。
我会对图像进行任何必要的过滤。
这将在覆盆子圆周率上运行。

以防万一，每个例子都有：

enter image description here 号

我已经测试了几次opencv人脸检测算法，我知道它工作得很好，但是我需要获得一个特殊的haar cascades特性XML文件来检测这种方法上的每个自定义对象。

所以，我想到的不同选择是：

创建自定义HAAR分类器。
考虑形状。
考虑轮廓。

我想得到一个简单的算法，我认为创建一个自定义的haar分类器甚至是不需要的。你有什么建议？

更新

我强烈考虑了形状/长宽比方法。

不过，我想我面临着一些问题，因为瓶子有不同的尺寸甚至形状。但这让我思考或设置了以下考虑：

我正在用thresh_二进制方法应用阈值。(多亏了答案)。
我将在检测时使用白色背景。
汽水罐都是一样大小的。
因此，一个高精度的汽水罐边界框可能会区分一个罐。

我所取得的成就：

门限确实帮助了我，我可以注意到，在白底测试中，我将获得罐头：

enter image description here 。

这是从瓶子里得到的：

enter image description here 。

因此，较暗的区域留下的支配是显而易见的。在罐头中有些情况下，这可能会变成假阴性。对于瓶子，光线和角度可能导致结果不一致，但我真的认为这可能是一个较短的方法。

所以，我现在很困惑，我应该如何评价黑暗统治，我已经读到，findContours导致了它，但我对如何抓住这样的功能相当迷茫。例如，在苏打罐的情况下，它可能会找到几个轮廓，所以我会迷失在评估什么上。

注意：我可以测试任何其他算法或库，这些算法或库与打开cv不同。

相关讨论

型

正如西瑞尔所建议的，纵横比(宽/高)可能是一个有用的度量。下面是一些opencv python代码，它可以找到轮廓(希望包括瓶子或罐子的轮廓)，并为您提供纵横比和其他一些度量：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue

# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)

bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse

# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0

# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.

# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]

。

为了检查透明度，可以使用直方图分析或背景减法将图片与已知背景进行比较。

轮廓的力矩可用于确定其质心(重心)：

1
2
3
4
5

moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)

你可以把这个和中心比较一下。如果物体一端较大("较重")，则质心将比中心更靠近该端。

相关讨论

型

因为你想识别罐头和瓶子，而不是百事可乐和可乐，所以与HAAR和Sift/Surf/Orb等功能2d匹配的产品相比，形状匹配可能是一个不错的选择。

独特的背景色会让事情变得更容易。

首先从背景图像创建柱状图

1
2
3
4
5
6
7
8

int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};

cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );

然后使用calcbackproject创建bg而不是bg的掩码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );

cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}

号

然后：

将瓶子掩模或瓶子背面项目与几个样本瓶子掩模/背面项目进行比较，使用具有置信阈值的匹配模板来确定是否匹配。

1 2	matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED); double confidence; minMaxLoc( result, NULL, &confidence);

或者使用火柴形状，尽管我从来没有让它正常工作过。

1	double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);

。

或者使用linemod，它很难设置，但对于形状不太复杂的图像效果很好。除了链接的文件，我还没有找到这个方法的任何工作示例，所以这里是我所做的。

首先用一些样本图像创建/训练检测器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);

//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;

int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;

//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;

cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.

//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);

//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id ="bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);

然后做一些匹配

1
2
3
4
5
6
7
8
9

std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;

new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;

。

相关讨论

型

我在这里看到了一些基本的想法：

百万千克1检查对象(精确到对象边界矩形)的宽度/高度比。对于罐头，大约是2-2.5，对于瓶子，我想应该是3。这是一个非常简单的想法，它应该很容易快速测试它，我认为它应该有相当好的准确性。对于一些值，比如2.75(假设我给出的值是正确的，很可能不是正确的)，您可以使用一些不同的算法。百万千克1百万千克1检查你的物体是否包含玻璃/透明区域-如果是，那肯定是一个瓶子。在这里你可以阅读更多关于它的信息。百万千克1百万千克1使用GrabCut算法获取对象遮罩/更精确的形状，并检查顶部的形状宽度是否与底部的宽度相似-如果是，则为空瓶(瓶子顶部有螺旋盖)。百万千克1

相关讨论

型

因此，我的主要检测方法是：

Bottles are transparent and cans are opaque

号

一般算法包括：

Take a grayscale picture.

Apply a binary threshold.

Select a convenient ROI from it.

Obtain it's color mean and even the standard deviation.

Distinguish.

号

基本上减少了这一功能的实施(之前定义了CAN和BOTTLE)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {

Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);

if ( !capture ) {
fprintf( stderr,"ERROR: capture is NULL
" );
getchar();
return -1;
}

img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);

// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f
", media);

if (media < thresholdValue) {

return CAN;
}

else {
return BOTTLE;
}
}

。

可以看出，应用了THRESH_BINARY阈值，使用的是纯白色背景。然而，整个方法和算法所面临的主要和关键问题是环境中的亮度变化，甚至是微小的变化。

有时我会注意到一个THRESH_BINARY_INV可能会有更多的帮助，但我想知道我是否可以使用一些特定的阈值参数，或者应用其他过滤器是否会导致环境闪电问题。

我真的很欣赏边界框中的纵横比计算方法或寻找轮廓，但我发现当条件被调整时，这是直接和简单的。

型

我会使用深度学习，基于转移学习。

其思想是：假设有一个高度复杂、训练有素的神经网络，在类似的分类任务上进行训练(在大型公共数据集上，如ImageNet)，你可以冻结它的大部分权重，只训练最后一层。外面有很多教程。你不需要有深入学习的背景。

有一个教程，几乎是开箱即用的TensorFlow，这里有另一个基于Keras的教程。