1.残差网络
残差网络我是在吴恩达深度学习里了解的,真正领会还是看了几篇博文才慢慢的瞎编出来。
我们先来了解一下为什么会衍生出残差网络。(下面文字是抄的,我觉得我写的肯定不如人家的好,我会把连接贴出来)
深层次网络训练瓶颈:梯度消失,网络退化
当使用更深层的网络时,会发生梯度消失、爆炸问题,这个问题很大程度通过标准的初始化和正则化层来基本解决,这样可以确保几十层的网络能够收敛,但是随着网络层数的增加,梯度消失或者爆炸的问题仍然存在。
问题就是网络的退化,举个例子,假设已经有了一个最优化的网络结构,是18层。当我们设计网络结构的时候,我们并不知道具体多少层次的网络时最优化的网络结构,假设设计了34层网络结构。那么多出来的16层其实是冗余的,我们希望训练网络的过程中,模型能够自己训练这五层为恒等映射,也就是经过这层时的输入与输出完全一样。但是往往模型很难将这16层恒等映射的参数学习正确,那么就一定会不比最优化的18层网络结构性能好,这就是随着网络深度增加,模型会产生退化现象。它不是由过拟合产生的,而是由冗余的网络层学习了不是恒等映射的参数造成的。
1.1 为什么叫残差网络
从下图可以看出,数据经过了两条路线,一条是常规路线,另一条则是捷径(shortcut),直接实现单位映射的直接连接的路线,这有点类似与电路中的“短路”。通过实验,这种带有shortcut的结构确实可以很好地应对退化问题。我们把网络中的一个模块的输入和输出关系看作是
y=H(x),那么直接通过梯度方法求 H(x) 就会遇到上面提到的退化问题,如果使用了这种带shortcut的结构,那么可变参数部分的优化目标就不再是 H(x)H(x)H(x),若用F(x)来代表需要优化的部分的话,则
H(x)=F(x)+x,也就是
F(x)=H(x)?x。因为在单位映射的假设中
y=xy 就相当于观测值,所以
F(x) 就对应着残差,因而叫残差网络。为啥要这样做,因为作者认为学习残差
F(X) 比直接学习
H(X)简单!设想下,现在根据我们只需要去学习输入和输出的差值就可以了,绝对量变为相对量(
H(x)?x 就是输出相对于输入变化了多少),优化起来简单很多。
考虑到x的维度与
F(x)维度可能不匹配情况,需进行维度匹配。这里论文中采用两种方法解决这一问题(其实是三种,但通过实验发现第三种方法会使performance急剧下降,故不采用):
1 2 | 1. zero_padding:对恒等层进行0填充的方式将维度补充完整。这种方法不会增加额外的参数 2. projection:在恒等层采用1x1的卷积核来增加维度。这种方法会增加额外的参数 |
下图展示了两种形态的残差模块,左图是常规残差模块,有两个3×3卷积核卷积核组成,但是随着网络进一步加深,这种残差结构在实践中并不是十分有效。针对这问题,右图的“瓶颈残差模块”(bottleneck residual block)可以有更好的效果,它依次由1×1、3×3、1×1这三个卷积层堆积而成,这里的1×1的卷积能够起降维或升维的作用,从而令3×3的卷积可以在相对较低维度的输入上进行,以达到提高计算效率的目的。
现在我们进入代码:
导入模块
1 2 3 4 | import tensorflow as tf from tensorflow.keras import layers,Sequential import tensorflow.keras as keras import os |
1.2实现Basic Block模块
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Basic Block 模块。 class BasicBlock(layers.Layer): def __init__(self, filter_num, stride=1): super(BasicBlock, self).__init__() self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same') self.bn1 = layers.BatchNormalization() self.relu = layers.Activation('relu') #上一块如果做Stride就会有一个下采样,在这个里面就不做下采样了。这一块始终保持size一致,把stride固定为1 self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same') self.bn2 = layers.BatchNormalization() if stride != 1: self.downsample = Sequential() self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride)) else: self.downsample = lambda x:x def call(self, inputs, training=None): # [b, h, w, c] out = self.conv1(inputs) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) identity = self.downsample(inputs) output = layers.add([out, identity]) #layers下面有一个add,把这2个层添加进来相加。 output = tf.nn.relu(output) return output |
1.3 Res Block 模块。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | class ResNet(keras.Model): # 第一个参数layer_dims:[2, 2, 2, 2] 4个Res Block,每个包含2个Basic Block # 第二个参数num_classes:我们的全连接输出,取决于输出有多少类。 def __init__(self, layer_dims, num_classes=6): super(ResNet, self).__init__() # 预处理层;实现起来比较灵活可以加 MAXPool2D,可以没有。 self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)), layers.BatchNormalization(), layers.Activation('relu'), layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same') ]) # 创建4个Res Block;注意第1项不一定以2倍形式扩张,都是比较随意的,这里都是经验值。 self.layer1 = self.build_resblock(64, layer_dims[0]) self.layer2 = self.build_resblock(128, layer_dims[1], stride=2) self.layer3 = self.build_resblock(256, layer_dims[2], stride=2) self.layer4 = self.build_resblock(512, layer_dims[3], stride=2) self.avgpool = layers.GlobalAveragePooling2D() self.fc = layers.Dense(num_classes) def call(self,inputs, training=None): # __init__中准备工作完毕;下面完成前向运算过程。 x = self.stem(inputs) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) # 做一个global average pooling,得到之后只会得到一个channel,不需要做reshape操作了。 # shape为 [batchsize, channel] x = self.avgpool(x) # [b, 100] x = self.fc(x) return x # 实现 Res Block; 创建一个Res Block def build_resblock(self, filter_num, blocks, stride=1): res_blocks = Sequential() # may down sample 也许进行下采样。 # 对于当前Res Block中的Basic Block,我们要求每个Res Block只有一次下采样的能力。 res_blocks.add(BasicBlock(filter_num, stride)) for _ in range(1, blocks): res_blocks.add(BasicBlock(filter_num, stride=1)) # 这里stride设置为1,只会在第一个Basic Block做一个下采样。 return res_blocks def resnet18(): return ResNet([2, 2, 2, 2]) model = resnet18() model.build(input_shape=(None, 32, 32, 3)) model.summary() |
到这里残差网络就已经完成了。
1.4 残差模块加入CBAM
CBAM主要在传统CNN上引入通道注意力机制和空间注意力机制
- 通道注意力机制
2. 空间注意力机制
1.5 Tensorflow2.0+ResNet18+CBAM+垃圾分类
1 2 3 | import tensorflow as tf from tensorflow.keras import layers,Sequential,regularizers,optimizers import tensorflow.keras as keras |
定义一个3 * 3 的卷积,kernel_initializer=“he_normal”,"logrot_normal"
1 2 3 4 5 6 | def regurlarized_padded_conv(*args,**kwargs): return layers.Conv2D(*args,**kwargs,padding="same", use_bias=False, kernel_initializer="he_normal", kernel_regularizer=regularizers.l2(5e-4)) |
通道注意力机制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | class ChannelAttention(layers.Layer): def __init__(self,in_planes,ration=16): super(ChannelAttention,self).__init__() self.avg = layers.GlobalAveragePooling2D() self.max = layers.GlobalMaxPooling2D() self.conv1 = layers.Conv2D(in_planes//ration,kernel_size=1,strides=1, padding ="same", kernel_regularizer=regularizers.l2(1e-4), use_bias=True,activation=tf.nn.relu) self.conv2 = layers.Conv2D(in_planes,kernel_size=1,strides=1, padding= "same", kernel_regularizer=regularizers.l2(1e-4), use_bias=True) def call(self,inputs): avg = self.avg(inputs) max = self.max(inputs) avg = layers.Reshape((1,1,avg.shape[1]))(avg) max = layers.Reshape((1,1,max.shape[1]))(max) avg_out = self.conv2(self.conv1(avg)) max_out = self.conv2(self.conv1(max)) out = avg_out + max_out out = tf.nn.sigmoid(out) return out |
空间注意力机制
1 2 3 4 5 6 7 8 9 10 11 12 13 | class SpatialAttention(layers.Layer): def __init__(self,kernel_size=7): super(SpatialAttention,self).__init__() self.conv1 = regurlarized_padded_conv(1,kernel_size=kernel_size,strides=1,activation=tf.nn.sigmoid) def call(self,inputs): avg_out = tf.reduce_mean(inputs,axis=3) max_out = tf.reduce_max(inputs,axis=3) out = tf.stack([avg_out,max_out],axis=3) out = self.conv1(out) return out |
Basic Block 模块
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | class BasicBlock(layers.Layer): expansion = 1 def __init__(self,in_channels,out_channels,stride=1): super(BasicBlock,self).__init__() self.conv1 = regurlarized_padded_conv(out_channels,kernel_size=3, strides=stride) self.bn1 = layers.BatchNormalization() self.conv2 = regurlarized_padded_conv(out_channels,kernel_size=3,strides=1) self.bn2 = layers.BatchNormalization() ########注意力机制################# self.ca = ChannelAttention(out_channels) self.sa = SpatialAttention() #3.判断stride是否等于1,如果为1就是没有降采样 if stride != 1 or in_channels != self.expansion * out_channels: self.shortcut = Sequential([regurlarized_padded_conv(self.expansion*out_channels, kernel_size=1,strides=stride), layers.BatchNormalization()]) else: self.shortcut = lambda x,_:x def call(self,inputs,training=False): out = self.conv1(inputs) out = self.bn1(out,training=training) out = tf.nn.relu(out) out = self.conv2(out) out = self.bn2(out,training=training) ########注意力机制########### out = self.ca(out) * out out = self.sa(out) * out out = out + self.shortcut(inputs,training) out = tf.nn.relu(out) return out |
Res Block 模块
我用CPU运算太慢了,所以我注释掉了两个跳跃模块具体过程可以看Res Block 模块代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | class ResNet(keras.Model): def __init__(self,layer_dims,num_classes=6): super(ResNet,self).__init__() self.in_channels = 64 #预测理卷积 self.stem = Sequential([ regurlarized_padded_conv(64,kernel_size=3,strides=1), layers.BatchNormalization() ]) #创建4个残差网络 self.layer1 = self.build_resblock(32,layer_dims[0],stride=1) self.layer2 = self.build_resblock(64,layer_dims[1],stride=2) # self.layer3 = self.build_resblock(256,layer_dims[2],stride=2) # self.layer4 = self.build_resblock(512,layer_dims[3],stride=2) self.final_bn = layers.BatchNormalization() self.avgpool = layers.GlobalAveragePooling2D() self.fc = layers.Dense(num_classes,activation="softmax") def call(self,inputs,training=False): out = self.stem(inputs,training) out = tf.nn.relu(out) out = self.layer1(out,training=training) out = self.layer2(out,training=training) # out = self.layer3(out,training=training) # out = self.layer4(out,training=training) out = self.final_bn(out) out = self.avgpool(out) out = self.fc(out) return out # self.final_bn = layers.BatchNormalization() # self.avgpool = #1.创建resBlock def build_resblock(self,out_channels,num_blocks,stride): strides = [stride] + [1] * (num_blocks - 1) res_blocks = Sequential() for stride in strides: res_blocks.add(BasicBlock(self.in_channels,out_channels,stride)) self.in_channels = out_channels return res_blocks |
注意!!!
因为我注释掉了最后两个跳跃模块,就不是18层了,而是10层,所以我在设置模块时,不再是[2,2,2,2] 而是 [2,2]
1 2 | ef ResNet18(): return ResNet([2,2]) |
数据预处理
1 2 3 4 5 6 7 8 9 10 11 | mport numpy as np import matplotlib.pyplot as plt from keras.preprocessing.image import ImageDataGenerator,load_img,img_to_array,array_to_img import glob,os,random #加载数据集路径 base_path = "./data" #查看数据集长度 2295 img_list = glob.glob(os.path.join(base_path,"*/*.jpg")) print(len(img_list)) |
对数据进行分组
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # 对数据集进行分组 train_datagen = ImageDataGenerator( rescale=1./255,shear_range=0.1,zoom_range=0.1, width_shift_range=0.1,height_shift_range=0.1,horizontal_flip=True, vertical_flip=True,validation_split=0.1) test_data = ImageDataGenerator(rescale=1./255,validation_split=0.1) train_generator = train_datagen.flow_from_directory(base_path,target_size=(300,300), batch_size=16, class_mode="categorical", subset="training",seed=0) validation_generator = test_data.flow_from_directory(base_path,target_size=(300,300), batch_size=16, class_mode="categorical", subset="validation",seed=0 ) labels = (train_generator.class_indices) labels = dict((v,k) for k,v in labels.items()) print(labels) |
训练数据
1 2 3 4 5 6 | model = ResNet18() model.build(input_shape=(None,300,300,3)) model.summary() model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc']) model.fit_generator(train_generator, epochs=100, steps_per_epoch=2068//32,validation_data=validation_generator, validation_steps=227//32) |
测试数据我就不写了,上一篇垃圾分类里已经写了。
到这里就完成了。因为我注释掉了2个残差模块,训练的结果是惨不忍睹,没办法,我的小妾(GPU)真是做不到啊!!!!!
参考自大神博文处,我这是组合性质,单独一个知识点请点击下方:
《CBAM 注意力机制详解》
https://blog.csdn.net/abc13526222160/article/details/103765484
《十分钟理解残差网络》
https://blog.csdn.net/fendouaini/article/details/82027389?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-3&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-3
《残差网络详解》
https://blog.csdn.net/abc13526222160/article/details/90057121?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-9&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-9