CNN物体图像分类识别（基于MATLAB）

STL-10数据集选择
- 数据集简介
- 数据集处理
CNN网络设计
- 对VggNet进行修改
- 对ResNet进行修改
进行训练
- 用修改后的VggNet进行训练
- - 训练效果
- 用修改后的ResNet进行训练
- - 训练效果

STL-10数据集选择

数据集简介

选择STL-10数据集，STL-10数据集基于CIFAR-10数据集进行修改，与CIFAR-10相比，每个类的训练集的数量更少，且图片有着更高的分辨率（96×96）。
该数据集包括10个类，分别为airplane，bird, car, cat, deer, dog, horse, monkey, ship, truck，训练集中每个类有500张训练图片，测试集中每个类有800张测试图片。
STL-10数据集官方链接: STL-10

数据集处理

数据集下载后为二进制bin形式，需要将数据集转化为图片形式。
在Pycharm中新建一个binConvert目录，将下载好的stl10_binary文件夹粘贴进去。
新建convert.py
因为已经下载好数据集，所以注释掉了代码中下载数据集的部分，如果未下载，可以取消掉这部分的注释进行数据集的下载。
生成训练集图片

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167

from __future__ import print_function

import sys
import os, sys, tarfile, errno
import numpy as np
import matplotlib.pyplot as plt

if sys.version_info >= (3, 0, 0):
import urllib.request as urllib # ugly but works
else:
import urllib

try:
from imageio import imsave
except:
from scipy.misc import imsave

print(sys.version_info)

# image shape
HEIGHT = 96
WIDTH = 96
DEPTH = 3

# size of a single image in bytes
SIZE = HEIGHT * WIDTH * DEPTH

# path to the directory with the data
DATA_DIR = './stl10_binary'

# url of the binary data
DATA_URL = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz'

# path to the binary train file with image data
DATA_PATH = './stl10_binary/train_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/train_y.bin'

def read_labels(path_to_labels):
"""
:param path_to_labels: path to the binary file containing labels from the STL-10 dataset
:return: an array containing the labels
"""
with open(path_to_labels, 'rb') as f:
labels = np.fromfile(f, dtype=np.uint8)
return labels

def read_all_images(path_to_data):
"""
:param path_to_data: the file containing the binary images from the STL-10 dataset
:return: an array containing all the images
"""

with open(path_to_data, 'rb') as f:
# read whole file in uint8 chunks
everything = np.fromfile(f, dtype=np.uint8)

# We force the data into 3x96x96 chunks, since the
# images are stored in "column-major order", meaning
# that "the first 96*96 values are the red channel,
# the next 96*96 are green, and the last are blue."
# The -1 is since the size of the pictures depends
# on the input file, and this way numpy determines
# the size on its own.

images = np.reshape(everything, (-1, 3, 96, 96))

# Now transpose the images into a standard image format
# readable by, for example, matplotlib.imshow
# You might want to comment this line or reverse the shuffle
# if you will use a learning algorithm like CNN, since they like
# their channels separated.
images = np.transpose(images, (0, 3, 2, 1))
return images

def read_single_image(image_file):
"""
CAREFUL! - this method uses a file as input instead of the path - so the
position of the reader will be remembered outside of context of this method.
:param image_file: the open file containing the images
:return: a single image
"""
# read a single image, count determines the number of uint8's to read
image = np.fromfile(image_file, dtype=np.uint8, count=SIZE)
# force into image matrix
image = np.reshape(image, (3, 96, 96))
# transpose to standard format
# You might want to comment this line or reverse the shuffle
# if you will use a learning algorithm like CNN, since they like
# their channels separated.
image = np.transpose(image, (2, 1, 0))
return image

def plot_image(image):
"""
:param image: the image to be plotted in a 3-D matrix format
:return: None
"""
plt.imshow(image)
plt.show()

def save_image(image, name):
imsave("%s.png" % name, image, format="png")

# def download_and_extract():
# """
# Download and extract the STL-10 dataset
# :return: None
# """
# dest_directory = DATA_DIR
# if not os.path.exists(dest_directory):
# os.makedirs(dest_directory)
# filename = DATA_URL.split('/')[-1]
# filepath = os.path.join(dest_directory, filename)
# if not os.path.exists(filepath):
# def _progress(count, block_size, total_size):
# sys.stdout.write('\rDownloading %s %.2f%%' % (filename,
# float(count * block_size) / float(total_size) * 100.0))
# sys.stdout.flush()
#
# filepath, _ = urllib.urlretrieve(DATA_URL, filepath, reporthook=_progress)
# print('Downloaded', filename)
# tarfile.open(filepath, 'r:gz').extractall(dest_directory)

def save_images(images, labels):
print("Saving images to disk")
i = 0
for image in images:
label = labels[i]
directory = './img/' + str(label) + '/'
try:
os.makedirs(directory, exist_ok=True)
except OSError as exc:
if exc.errno == errno.EEXIST:
pass
filename = directory + str(i)
print(filename)
save_image(image, filename)
i = i + 1

if __name__ == "__main__":
# download data if needed
# download_and_extract()

# test to check if the image is read correctly
with open(DATA_PATH) as f:
image = read_single_image(f)
plot_image(image)

# test to check if the whole dataset is read correctly
images = read_all_images(DATA_PATH)
print(images.shape)

labels = read_labels(LABEL_PATH)
print(labels.shape)

# save images to disk
save_images(images, labels)

运行代码，会生成一个img文件夹，里面包含10个文件夹，分别为10个类，每个类中包含500张训练图片。
删除生成的img文件夹
接下来生成测试集图片
改变代码的34-38行：

1
2
3
4
5

# path to the binary train file with image data
DATA_PATH = './stl10_binary/test_X.bin'

# path to the binary train file with labels
LABEL_PATH = './stl10_binary/test_y.bin'

再次运行，会生成一个img文件夹，里面包含10个文件夹，分别为10个类，每个类中包含500张测试图片。
手动将生成的测试集和训练集的各个文件夹改为类名字，效果如下
在这里插入图片描述
数据集处理完毕，开始网络的设计

CNN网络设计

对VggNet进行修改

若使用原始的VggNet，网络层数太多，过于复杂，会出现过拟合的现象，准确率不高。
在原始的VggNet网络基础上进行重新设计简化，减少神经网络的层数，并加一些dropout层，可以更好的拟合数据，建立的模型更适合此数据集，会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96，并做数据增强。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

function [layers, lgraph] = get_vggnet()
layers = [
imageInputLayer([96 96 3],'Name','imageinput','DataAugmentation','randfliplr')

convolution2dLayer([5 5],64,'Name','conv_1',"Padding","same")
batchNormalizationLayer('Name','bn_1')
reluLayer('Name','relu_1')
maxPooling2dLayer([2 2],'Name','maxpool_1','Padding','same','Stride',[2 2])

convolution2dLayer([5 5],128,'Name','conv_2','Padding','same')
batchNormalizationLayer('Name','bn_2')
reluLayer('Name','relu_2')
maxPooling2dLayer([2 2],'Name','maxpool_2','Padding','same','Stride',[2 2])

convolution2dLayer([5 5],128,'Name','conv_3','Padding','same')
batchNormalizationLayer('Name','bn_3')
reluLayer('Name','relu_3')
dropoutLayer(0.4,'Name','dp_1')
maxPooling2dLayer([2 2],'Name','maxpool_3','Padding','same','Stride',[2 2])

convolution2dLayer([5 5],256,'Name','conv_4','Padding','same')
batchNormalizationLayer('Name','bn_4')
reluLayer('Name','relu_4')
dropoutLayer(0.4,'Name','dp_2')
maxPooling2dLayer([2 2],'Name','maxpool_4','Padding','same','Stride',[2 2])

convolution2dLayer([5 5],256,'Name','conv_5','Padding','same')
batchNormalizationLayer('Name','bn_5')
reluLayer('Name','relu_5')
dropoutLayer(0.4,'Name','dp_3')
maxPooling2dLayer([2 2],'Name','maxpool_5','Padding','same','Stride',[2 2])

dropoutLayer(0.5,'Name','dp_4')
fullyConnectedLayer(512,'Name','fc_1')
reluLayer('Name','relu_6')
fullyConnectedLayer(512,'Name','fc_2')
reluLayer('Name','relu_7')
dropoutLayer(0.5,'Name','dp_5')
fullyConnectedLayer(10,'Name','fc_3')
softmaxLayer('Name','softmax')
classificationLayer('Name','classoutput')];

lgraph = layerGraph(layers);

对ResNet进行修改

在原始的ResNet的基础上加入一些dropout层，可以在一定程度上减轻过拟合，可以更好的拟合数据，会有更高的准确率。
除此之外还要改变输入层的图片维度为96×96，并做数据增强。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

function [layers, lgraph] = get_resnet()
netWidth = 16;
layers = [
imageInputLayer([96 96 3],'Name','input','DataAugmentation','randfliplr')
convolution2dLayer(3,netWidth,'Padding','same','Name','convInp')
batchNormalizationLayer('Name','bn_res')
reluLayer('Name','relu_sp')

convolutionalUnit(netWidth,1,'conv_sa1')
additionLayer(2,'Name','add_11')
reluLayer('Name','relu_11')
convolutionalUnit(netWidth,1,'conv_sa2')
additionLayer(2,'Name','add_12')
reluLayer('Name','relu_12')
dropoutLayer(0.4,'Name','dp_1')

convolutionalUnit(2*netWidth,2,'conv_sc1')
additionLayer(2,'Name','add_21')
reluLayer('Name','relu_21')
convolutionalUnit(2*netWidth,1,'conv_sc2')
additionLayer(2,'Name','add_22')
reluLayer('Name','relu_22')
dropoutLayer(0.4,'Name','dp_2')

convolutionalUnit(4*netWidth,2,'conv_se1')
additionLayer(2,'Name','add_31')
reluLayer('Name','relu_31')
convolutionalUnit(4*netWidth,1,'conv_se2')
additionLayer(2,'Name','add_32')
reluLayer('Name','relu_32')
dropoutLayer(0.4,'Name','dp_3')

averagePooling2dLayer(8,'Name','globalPool')
dropoutLayer(0.5,'Name','dp_4')
fullyConnectedLayer(10,'Name','fcFinal')
softmaxLayer('Name','softmax')
classificationLayer('Name','classoutput')
];
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'relu_sp','add_11/in2');
lgraph = connectLayers(lgraph,'relu_11','add_12/in2');
skip1 = [
convolution2dLayer(1,2*netWidth,'Stride',2,'Name','skipConv1')
batchNormalizationLayer('Name','skipBN1')];
lgraph = addLayers(lgraph,skip1);
lgraph = connectLayers(lgraph,'relu_12','skipConv1');
lgraph = connectLayers(lgraph,'skipBN1','add_21/in2');

lgraph = connectLayers(lgraph,'relu_21','add_22/in2');
skip2 = [
convolution2dLayer(1,4*netWidth,'Stride',2,'Name','skipConv2')
batchNormalizationLayer('Name','skipBN2')];
lgraph = addLayers(lgraph,skip2);
lgraph = connectLayers(lgraph,'relu_22','skipConv2');
lgraph = connectLayers(lgraph,'skipBN2','add_31/in2');
lgraph = connectLayers(lgraph,'relu_31','add_32/in2');

layers = lgraph.Layers;

function layers = convolutionalUnit(numF,stride,tag)
layers = [
convolution2dLayer(3,numF,'Padding','same','Stride',stride,'Name',[tag,'conv1'])
batchNormalizationLayer('Name',[tag,'BN1'])
reluLayer('Name',[tag,'relu1'])
convolution2dLayer(3,numF,'Padding','same','Name',[tag,'conv2'])
batchNormalizationLayer('Name',[tag,'BN2'])];

进行训练

用修改后的VggNet进行训练

options_train选项设置

1
2
3
4
5
6
7
8
9
10

options_train = trainingOptions('sgdm',...
'MaxEpochs',MaxEpochs,...
'InitialLearnRate',0.01,...
'L2Regularization', 0.01, ...
'Verbose',true,'MiniBatchSize', 128,...
'Shuffle','every-epoch',...
'Plots','training-progress',...
'ValidationData',handles.augimdsValidation , ...
'ValidationFrequency',10, ...
'ExecutionEnvironment', ExecutionEnvironment);

参数	值
训练步数	100
学习率	0.01
批次数	128
L2正则化惩罚参数	0.01

训练效果

在这里插入图片描述
准确率72.55%

用修改后的ResNet进行训练

options_train选项设置

1
2
3
4
5
6
7
8
9
10

options_train = trainingOptions('sgdm',...
'MaxEpochs',MaxEpochs,...
'InitialLearnRate',0.001,...
'L2Regularization', 0.01, ...
'Verbose',true,'MiniBatchSize', 128,...
'Shuffle','every-epoch',...
'Plots','training-progress',...
'ValidationData',handles.augimdsValidation , ...
'ValidationFrequency',10, ...
'ExecutionEnvironment', ExecutionEnvironment);

参数	值
训练步数	100
学习率	0.001
批次数	128
L2正则化惩罚参数	0.01

训练效果

在这里插入图片描述
准确率为65.36%