DLinCV-MNIST-Part1

概述

  MNIST是一个常见的手写数字识别的数据集,以下我在MNIST数据集上试验数据的汇总与分析。

试验说明

试验目的

  进行本次试验的目的有两个:第一是为了更好的熟悉CNN中基本操作的实现方式,比如dropout,BN等trick使用时的参数,中间层输出等操作;第二是亲自测试下几种方法对于模型的改善效果如何,记录数据并绘图。代码参考pytorch官网的example

试验变量

  本次进行测试的变量包含:

  • CNN模型卷积层层数
  • 每层输出channel数目
  • batch-size数量
  • epochs数量
  • learning-rate
  • 不同的优化算法(SGD,RMSprop,momentum)
  • 不同的激活函数(Relu,LeakyRelu,ELU)
  • dropout效果
  • Batch-Normalization效果

数据记录

  试验中记录的数据主要为每次训练的train,test上的Loss和Error曲线,以及每次训练的最后十个epoch的准确率,用以观察和对比。

试验过程

原始模型及结果

  首先记录下原始的模型及主要参数,原始模型代码描述如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)


def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
# use to get the photo of hidden_layer
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  主要参数如下

1
2
3
4
5
6
7
8
9
10
11
12
parser.add_argument('--batch-size', type=int, default=64, metavar='N',
help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
help='number of epochs to train (default: 10)')
parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
help='learning rate (default: 0.01)')
parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
help='SGD momentum (default: 0.5)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
help='how many batches to wait before logging training status')

  原始模型的训练结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.2003, Accuracy: 9438/10000 (94%)
Test set: Average loss: 0.1257, Accuracy: 9614/10000 (96%)
Test set: Average loss: 0.1005, Accuracy: 9699/10000 (97%)
Test set: Average loss: 0.0825, Accuracy: 9727/10000 (97%)
Test set: Average loss: 0.0778, Accuracy: 9759/10000 (98%)
Test set: Average loss: 0.0656, Accuracy: 9785/10000 (98%)
Test set: Average loss: 0.0698, Accuracy: 9766/10000 (98%)
Test set: Average loss: 0.0632, Accuracy: 9801/10000 (98%)
Test set: Average loss: 0.0573, Accuracy: 9815/10000 (98%)
Test set: Average loss: 0.0545, Accuracy: 9824/10000 (98%)

  通过结果观察,10个epoch之后的准确率为98.24%
  训练过程的loss曲线和error曲线如下:
                photo1

卷积层层数试验

  改变卷积层的层数,观察效果。

CNNlayer_1

  模型借鉴LeNet,加上20->120的卷积层,并改变核大小为3,其他参数不改变
  模型如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=3)
self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
self.conv3 = nn.Conv2d(20,120,kernel_size=3)
self.conv3_drop = nn.Dropout2d()
self.fc1 = nn.Linear(120*9, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2(x),2))
x = F.relu(self.conv3_drop(self.conv3(x)))
x = x.view(-1, 120*9)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  试验结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1619, Accuracy: 9500/10000 (95%)
Test set: Average loss: 0.0911, Accuracy: 9708/10000 (97%)
Test set: Average loss: 0.0672, Accuracy: 9787/10000 (98%)
Test set: Average loss: 0.0617, Accuracy: 9803/10000 (98%)
Test set: Average loss: 0.0492, Accuracy: 9843/10000 (98%)
Test set: Average loss: 0.0464, Accuracy: 9852/10000 (99%)
Test set: Average loss: 0.0423, Accuracy: 9865/10000 (99%)
Test set: Average loss: 0.0428, Accuracy: 9864/10000 (99%)
Test set: Average loss: 0.0388, Accuracy: 9881/10000 (99%)
Test set: Average loss: 0.0374, Accuracy: 9881/10000 (99%)

  可以观察到,训练效果在收敛速度稍有加快,测试集准确率上也略有改善,但是效果并不明显。曲线如图:
              photo2

CNNlayer_2

  在原有的模型上增加20->35,35->85的卷积层,并取消conv2后面的池化层,改变核大小为3,模型如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=3)
self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
self.conv3 = nn.Conv2d(20,35,kernel_size=3)
self.conv4 = nn.Conv2d(35,85,kernel_size=3)
self.conv4_drop = nn.Dropout2d()
self.fc1 = nn.Linear(85*7*7, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = F.relu(self.conv4_drop(self.conv4(x)))
x = x.view(-1, 85*7*7)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  训练结果如下

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1826, Accuracy: 9413/10000 (94%)
Test set: Average loss: 0.0973, Accuracy: 9694/10000 (97%)
Test set: Average loss: 0.0822, Accuracy: 9748/10000 (97%)
Test set: Average loss: 0.0649, Accuracy: 9795/10000 (98%)
Test set: Average loss: 0.0537, Accuracy: 9837/10000 (98%)
Test set: Average loss: 0.0543, Accuracy: 9825/10000 (98%)
Test set: Average loss: 0.0484, Accuracy: 9851/10000 (99%)
Test set: Average loss: 0.0466, Accuracy: 9850/10000 (98%)
Test set: Average loss: 0.0439, Accuracy: 9858/10000 (99%)
Test set: Average loss: 0.0376, Accuracy: 9883/10000 (99%)

  对比发现,效果与CNNlayer_1基本相同。

CNNlayer_3

  尝试在原有模型的基础上减少一个卷积层,仅留下一个卷积层;模型如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, kernel_size=5)
self.conv1_drop = nn.Dropout2d()
self.fc1 = nn.Linear(20*12*12, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = self.conv1_drop(x)
x = x.view(-1, 20*12*12)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  训练结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1946, Accuracy: 9420/10000 (94%)
Test set: Average loss: 0.1399, Accuracy: 9559/10000 (96%)
Test set: Average loss: 0.1036, Accuracy: 9695/10000 (97%)
Test set: Average loss: 0.0917, Accuracy: 9707/10000 (97%)
Test set: Average loss: 0.0785, Accuracy: 9742/10000 (97%)
Test set: Average loss: 0.0736, Accuracy: 9764/10000 (98%)
Test set: Average loss: 0.0678, Accuracy: 9788/10000 (98%)
Test set: Average loss: 0.0676, Accuracy: 9784/10000 (98%)
Test set: Average loss: 0.0607, Accuracy: 9795/10000 (98%)
Test set: Average loss: 0.0606, Accuracy: 9814/10000 (98%)

  对比发现,准确率和原始模型比略有降低。曲线如图:
                

小结

  在epoch为10的基础上,小范围改变conv层数的情况下,准确率并没有明显的改善或者降低。

channel数目试验

  改变模型的channel参数,相当于改变模型的宽度,观察试验结果。

channel_1

  改变第二层卷积层的channel数目为120,进行训练,模型如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 120, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(120*4*4, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 120*4*4)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  训练效果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1236, Accuracy: 9623/10000 (96%)
Test set: Average loss: 0.0748, Accuracy: 9747/10000 (97%)
Test set: Average loss: 0.0577, Accuracy: 9824/10000 (98%)
Test set: Average loss: 0.0547, Accuracy: 9831/10000 (98%)
Test set: Average loss: 0.0427, Accuracy: 9865/10000 (99%)
Test set: Average loss: 0.0398, Accuracy: 9875/10000 (99%)
Test set: Average loss: 0.0408, Accuracy: 9871/10000 (99%)
Test set: Average loss: 0.0331, Accuracy: 9894/10000 (99%)
Test set: Average loss: 0.0324, Accuracy: 9891/10000 (99%)
Test set: Average loss: 0.0328, Accuracy: 9895/10000 (99%)

  可以观察模型的收敛速度明显提高,单词迭代准确率提高两个百分点,最后的准确率也有所提高。与CNNlayer_1的模型对比发现,此模型比其略微简单一点,但是效果稍好,可能是由于前面的模型capacity有点高。曲线如图:
                photo4

channel_2

  改变第二层卷积层的channel数目为20,进行训练,模型如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 10, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(10*4*4, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 10*4*4)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  训练效果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.2563, Accuracy: 9233/10000 (92%)
Test set: Average loss: 0.1654, Accuracy: 9504/10000 (95%)
Test set: Average loss: 0.1320, Accuracy: 9586/10000 (96%)
Test set: Average loss: 0.1135, Accuracy: 9650/10000 (96%)
Test set: Average loss: 0.0985, Accuracy: 9695/10000 (97%)
Test set: Average loss: 0.0923, Accuracy: 9711/10000 (97%)
Test set: Average loss: 0.0835, Accuracy: 9731/10000 (97%)
Test set: Average loss: 0.0778, Accuracy: 9769/10000 (98%)
Test set: Average loss: 0.0754, Accuracy: 9774/10000 (98%)
Test set: Average loss: 0.0722, Accuracy: 9783/10000 (98%)

  可以观察到训练的效果有所下降,并且收敛速度明显变慢,曲线如图:
                photo5

小结

  提高最后一层conv的channel数目为120的效果明显提升,并且可以发现,在一定的epoch下,单纯提高模型的capacity并一定能提高准确率,原因可能是产生过拟合现象,或者epoch的数目未达到capacity高的模型的需要。

epoch数目试验

  改变不同模型的epoch数目,并进行横向和纵向的对比。

original_30

  使用原始模型,改变epoch数目为30,结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0397, Accuracy: 9878/10000 (99%)
Test set: Average loss: 0.0382, Accuracy: 9874/10000 (99%)
Test set: Average loss: 0.0385, Accuracy: 9883/10000 (99%)
Test set: Average loss: 0.0363, Accuracy: 9889/10000 (99%)
Test set: Average loss: 0.0366, Accuracy: 9891/10000 (99%)
Test set: Average loss: 0.0364, Accuracy: 9903/10000 (99%)
Test set: Average loss: 0.0350, Accuracy: 9892/10000 (99%)
Test set: Average loss: 0.0346, Accuracy: 9900/10000 (99%)
Test set: Average loss: 0.0356, Accuracy: 9896/10000 (99%)
Test set: Average loss: 0.0346, Accuracy: 9898/10000 (99%)

  可以发现准确率略有上升.

original_100

  使用原始模型,改变epoch数目为100,结果如下:

1
2
3
4
5
6
7
8
9
10
11
Test set: Average loss: 0.0303, Accuracy: 9905/10000 (99%)
Test set: Average loss: 0.0307, Accuracy: 9908/10000 (99%)
Test set: Average loss: 0.0311, Accuracy: 9910/10000 (99%)
Test set: Average loss: 0.0310, Accuracy: 9911/10000 (99%)
Test set: Average loss: 0.0302, Accuracy: 9914/10000 (99%)
Test set: Average loss: 0.0296, Accuracy: 9913/10000 (99%)
Test set: Average loss: 0.0311, Accuracy: 9913/10000 (99%)
Test set: Average loss: 0.0313, Accuracy: 9905/10000 (99%)
Test set: Average loss: 0.0292, Accuracy: 9914/10000 (99%)
Test set: Average loss: 0.0309, Accuracy: 9912/10000 (99%)
Test set: Average loss: 0.0297, Accuracy: 9916/10000 (99%)

  可以发现准确率到达99.16%,曲线如下:
                photo7

channel_1_30

  使用channel_1模型,改变epoch数目为30,结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0257, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0229, Accuracy: 9925/10000 (99%)
Test set: Average loss: 0.0247, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0227, Accuracy: 9926/10000 (99%)
Test set: Average loss: 0.0231, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0232, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0245, Accuracy: 9918/10000 (99%)
Test set: Average loss: 0.0239, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0239, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0228, Accuracy: 9930/10000 (99%)

  可以发现准确率提升到99.30%提升较为明显,同时说明chennel_1的capacity强于原始模型,曲线如下:
                  photo8

channel_1_50

  使用channel_1模型,改变epoch数目为50,结果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0220, Accuracy: 9929/10000 (99%)
Test set: Average loss: 0.0195, Accuracy: 9930/10000 (99%)
Test set: Average loss: 0.0211, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0186, Accuracy: 9942/10000 (99%)
Test set: Average loss: 0.0193, Accuracy: 9942/10000 (99%)
Test set: Average loss: 0.0187, Accuracy: 9943/10000 (99%)
Test set: Average loss: 0.0204, Accuracy: 9933/10000 (99%)
Test set: Average loss: 0.0210, Accuracy: 9933/10000 (99%)
Test set: Average loss: 0.0200, Accuracy: 9936/10000 (99%)
Test set: Average loss: 0.0195, Accuracy: 9940/10000 (99%)

  可以发现准确率提升到99.40%.

小结

  试验发现,通过提高epoch可以提高模型的准确率,在epoch从较小提升时,准确率提升较为明显,但是epoch超过一定范围的话,准确率的提升不明显。

batch-size试验

  batch-size测试如下:

original_10_BS32

  在原始模型的基础上,改变batch-size为32:

1
2
3
4
5
6
7
8
9
Test set: Average loss: 0.0904, Accuracy: 9713/10000 (97%)
Test set: Average loss: 0.0694, Accuracy: 9789/10000 (98%)
Test set: Average loss: 0.0588, Accuracy: 9816/10000 (98%)
Test set: Average loss: 0.0596, Accuracy: 9810/10000 (98%)
Test set: Average loss: 0.0488, Accuracy: 9845/10000 (98%)
Test set: Average loss: 0.0506, Accuracy: 9861/10000 (99%)
Test set: Average loss: 0.0484, Accuracy: 9852/10000 (99%)
Test set: Average loss: 0.0455, Accuracy: 9861/10000 (99%)
Test set: Average loss: 0.0428, Accuracy: 9880/10000 (99%)

                    photo10

original_10_BS256

  在原始模型的基础上,改变batch-size为256:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.5954, Accuracy: 8553/10000 (86%)
Test set: Average loss: 0.3238, Accuracy: 9136/10000 (91%)
Test set: Average loss: 0.2496, Accuracy: 9274/10000 (93%)
Test set: Average loss: 0.2023, Accuracy: 9407/10000 (94%)
Test set: Average loss: 0.1705, Accuracy: 9484/10000 (95%)
Test set: Average loss: 0.1527, Accuracy: 9538/10000 (95%)
Test set: Average loss: 0.1372, Accuracy: 9579/10000 (96%)
Test set: Average loss: 0.1283, Accuracy: 9608/10000 (96%)
Test set: Average loss: 0.1186, Accuracy: 9629/10000 (96%)
Test set: Average loss: 0.1090, Accuracy: 9663/10000 (97%)

                    photo10

小结

  由于batch-size的大小直接改变了单个epoch中的优化速度,当其较大的时候,单个epoch中迭代优化的次数较少,收敛较慢,因此效果不好。从两组参数的loss曲线对比可以明显发现,batch-size为256的模型收敛速度明显慢于为32的模型。

learning-rate实验

channel_1_30_LR1

  在channel_1模型的基础上,每经过一个epoch,learning-rate减小为95%,实现如下:

1
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.95)

  试验数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0255, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0234, Accuracy: 9930/10000 (99%)
Test set: Average loss: 0.0240, Accuracy: 9928/10000 (99%)
Test set: Average loss: 0.0223, Accuracy: 9927/10000 (99%)
Test set: Average loss: 0.0235, Accuracy: 9930/10000 (99%)
Test set: Average loss: 0.0237, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0244, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0237, Accuracy: 9927/10000 (99%)
Test set: Average loss: 0.0238, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0240, Accuracy: 9929/10000 (99%)

                    photo12

channel_1_30_LR2

  在channel_1的基础上,每过5的epoch,learning-rata衰减为0.5,实现如下:

1
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

  实验数据如下

1
2
3
4
5
6
7
8
9
Test set: Average loss: 0.0242, Accuracy: 9928/10000 (99%)
Test set: Average loss: 0.0253, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0232, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0234, Accuracy: 9923/10000 (99%)
Test set: Average loss: 0.0241, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0252, Accuracy: 9921/10000 (99%)
Test set: Average loss: 0.0238, Accuracy: 9926/10000 (99%)
Test set: Average loss: 0.0239, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0242, Accuracy: 9926/10000 (99%)

  效果与上一模型效果基本相同。

channel_1_30_LR3

  在channel_1的基础上,每过5的epoch,learning-rata衰减为0.1,实现如下:

1
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

  实验数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0264, Accuracy: 9923/10000 (99%)
Test set: Average loss: 0.0234, Accuracy: 9934/10000 (99%)
Test set: Average loss: 0.0252, Accuracy: 9923/10000 (99%)
Test set: Average loss: 0.0235, Accuracy: 9922/10000 (99%)
Test set: Average loss: 0.0235, Accuracy: 9926/10000 (99%)
Test set: Average loss: 0.0236, Accuracy: 9925/10000 (99%)
Test set: Average loss: 0.0265, Accuracy: 9918/10000 (99%)
Test set: Average loss: 0.0247, Accuracy: 9923/10000 (99%)
Test set: Average loss: 0.0245, Accuracy: 9924/10000 (99%)
Test set: Average loss: 0.0242, Accuracy: 9929/10000 (99%)

  效果与上两个模型基本相同。

小结

  在训练epoch为30的时候,衰减learning-rate来并没有使性能有明显的提高。

不同的优化算法(SGD,RMSprop,momentum)实验

channel_1_10_RMSprop

  选用RMSprop方法进行优化,epoch为10,优化模型如下:

1
optimizer = optim.RMSprop(model.parameters(), lr=0.01)

  实验数据如下

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1788, Accuracy: 9493/10000 (95%)
Test set: Average loss: 0.1114, Accuracy: 9666/10000 (97%)
Test set: Average loss: 0.1353, Accuracy: 9602/10000 (96%)
Test set: Average loss: 0.1280, Accuracy: 9680/10000 (97%)
Test set: Average loss: 0.1577, Accuracy: 9562/10000 (96%)
Test set: Average loss: 0.1119, Accuracy: 9690/10000 (97%)
Test set: Average loss: 0.1048, Accuracy: 9739/10000 (97%)
Test set: Average loss: 0.1175, Accuracy: 9700/10000 (97%)
Test set: Average loss: 0.1450, Accuracy: 9648/10000 (96%)
Test set: Average loss: 0.1068, Accuracy: 9736/10000 (97%)

  图像如下:
                    photo12

channel_1_30_RMSprop

  选用RMSprop方法进行优化,epoch为30,实验数据如下:

1
2
3
4
5
6
7
8
9
10
11
Test set: Average loss: 0.1161, Accuracy: 9733/10000 (97%)
Test set: Average loss: 0.1182, Accuracy: 9680/10000 (97%)
Test set: Average loss: 0.1162, Accuracy: 9752/10000 (98%)
Test set: Average loss: 0.1125, Accuracy: 9717/10000 (97%)
Test set: Average loss: 0.1125, Accuracy: 9709/10000 (97%)
Test set: Average loss: 0.1116, Accuracy: 9738/10000 (97%)
Test set: Average loss: 0.1140, Accuracy: 9714/10000 (97%)
Test set: Average loss: 0.1124, Accuracy: 9723/10000 (97%)
Test set: Average loss: 0.1351, Accuracy: 9630/10000 (96%)
Test set: Average loss: 0.1195, Accuracy: 9701/10000 (97%)
Test set: Average loss: 0.1305, Accuracy: 9694/10000 (97%)

  图像如下:
                    photo12
  从结果可以看出,在lr=0.01的时候,此优化方法的效果并不理想。

channel_1_10_Adam

  采用Adam优化方法,模型如下:

1
optimizer = optim.Adam(model.parameters(),lr=0.01)

  试验数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1520, Accuracy: 9556/10000 (96%)
Test set: Average loss: 0.1365, Accuracy: 9596/10000 (96%)
Test set: Average loss: 0.1425, Accuracy: 9574/10000 (96%)
Test set: Average loss: 0.1134, Accuracy: 9655/10000 (97%)
Test set: Average loss: 0.1271, Accuracy: 9644/10000 (96%)
Test set: Average loss: 0.1474, Accuracy: 9620/10000 (96%)
Test set: Average loss: 0.1142, Accuracy: 9645/10000 (96%)
Test set: Average loss: 0.1287, Accuracy: 9644/10000 (96%)
Test set: Average loss: 0.1320, Accuracy: 9614/10000 (96%)
Test set: Average loss: 0.1448, Accuracy: 9580/10000 (96%)

                    photo12
  结果并未收敛,原因应该是由于步长不合适导致优化偏离正确方向,因此降低lr再进行试验。

channel_1_10_Adam_2 (LR = 0.001)

  将lr降到0.001

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0566, Accuracy: 9825/10000 (98%)
Test set: Average loss: 0.0441, Accuracy: 9868/10000 (99%)
Test set: Average loss: 0.0434, Accuracy: 9873/10000 (99%)
Test set: Average loss: 0.0371, Accuracy: 9886/10000 (99%)
Test set: Average loss: 0.0341, Accuracy: 9890/10000 (99%)
Test set: Average loss: 0.0303, Accuracy: 9906/10000 (99%)
Test set: Average loss: 0.0317, Accuracy: 9895/10000 (99%)
Test set: Average loss: 0.0285, Accuracy: 9919/10000 (99%)
Test set: Average loss: 0.0287, Accuracy: 9921/10000 (99%)
Test set: Average loss: 0.0291, Accuracy: 9918/10000 (99%)

                    photo12
  发现模型与原始模型相比迅速收敛,在10个epoch的基础上就已经达到99.21%的准确率。

channel_1_10_Adam_3 (LR = 0.002)

  将lr降到0.002:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0544, Accuracy: 9829/10000 (98%)
Test set: Average loss: 0.0404, Accuracy: 9866/10000 (99%)
Test set: Average loss: 0.0378, Accuracy: 9886/10000 (99%)
Test set: Average loss: 0.0398, Accuracy: 9878/10000 (99%)
Test set: Average loss: 0.0332, Accuracy: 9897/10000 (99%)
Test set: Average loss: 0.0304, Accuracy: 9916/10000 (99%)
Test set: Average loss: 0.0324, Accuracy: 9901/10000 (99%)
Test set: Average loss: 0.0306, Accuracy: 9909/10000 (99%)
Test set: Average loss: 0.0376, Accuracy: 9882/10000 (99%)
Test set: Average loss: 0.0325, Accuracy: 9911/10000 (99%)

                    photo12

小结

  本次测试的优化算法包括SGD,RMSprop优化和Adam。在Adam的lr合适的时候,Adam可以使得模型迅速收敛,并且在相同epoch的前提下,相比SGD有更高的准确率。RMSprop可能是由于lr选择不合适,效果并不理想。

不同的激活函数(Relu,LeakyRelu,ELU)

  模型及训练数据如下:

channel_1_10_RELU

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1236, Accuracy: 9623/10000 (96%)
Test set: Average loss: 0.0748, Accuracy: 9747/10000 (97%)
Test set: Average loss: 0.0577, Accuracy: 9824/10000 (98%)
Test set: Average loss: 0.0547, Accuracy: 9831/10000 (98%)
Test set: Average loss: 0.0427, Accuracy: 9865/10000 (99%)
Test set: Average loss: 0.0398, Accuracy: 9875/10000 (99%)
Test set: Average loss: 0.0408, Accuracy: 9871/10000 (99%)
Test set: Average loss: 0.0331, Accuracy: 9894/10000 (99%)
Test set: Average loss: 0.0324, Accuracy: 9891/10000 (99%)
Test set: Average loss: 0.0328, Accuracy: 9895/10000 (99%)

channel_1_10_LeakyRELU

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1230, Accuracy: 9620/10000 (96%)
Test set: Average loss: 0.0744, Accuracy: 9749/10000 (97%)
Test set: Average loss: 0.0587, Accuracy: 9821/10000 (98%)
Test set: Average loss: 0.0538, Accuracy: 9834/10000 (98%)
Test set: Average loss: 0.0441, Accuracy: 9858/10000 (99%)
Test set: Average loss: 0.0397, Accuracy: 9877/10000 (99%)
Test set: Average loss: 0.0412, Accuracy: 9869/10000 (99%)
Test set: Average loss: 0.0333, Accuracy: 9888/10000 (99%)
Test set: Average loss: 0.0311, Accuracy: 9903/10000 (99%)
Test set: Average loss: 0.0344, Accuracy: 9888/10000 (99%)

channel_1_10_ELU

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1113, Accuracy: 9638/10000 (96%)
Test set: Average loss: 0.0795, Accuracy: 9721/10000 (97%)
Test set: Average loss: 0.0548, Accuracy: 9823/10000 (98%)
Test set: Average loss: 0.0617, Accuracy: 9784/10000 (98%)
Test set: Average loss: 0.0523, Accuracy: 9815/10000 (98%)
Test set: Average loss: 0.0530, Accuracy: 9814/10000 (98%)
Test set: Average loss: 0.0478, Accuracy: 9844/10000 (98%)
Test set: Average loss: 0.0428, Accuracy: 9863/10000 (99%)
Test set: Average loss: 0.0387, Accuracy: 9869/10000 (99%)
Test set: Average loss: 0.0421, Accuracy: 9851/10000 (99%)

channel_1_10_tanh

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.2451, Accuracy: 9309/10000 (93%)
Test set: Average loss: 0.1404, Accuracy: 9582/10000 (96%)
Test set: Average loss: 0.0983, Accuracy: 9704/10000 (97%)
Test set: Average loss: 0.0770, Accuracy: 9756/10000 (98%)
Test set: Average loss: 0.0638, Accuracy: 9796/10000 (98%)
Test set: Average loss: 0.0602, Accuracy: 9813/10000 (98%)
Test set: Average loss: 0.0561, Accuracy: 9821/10000 (98%)
Test set: Average loss: 0.0495, Accuracy: 9830/10000 (98%)
Test set: Average loss: 0.0469, Accuracy: 9841/10000 (98%)
Test set: Average loss: 0.0434, Accuracy: 9850/10000 (98%)

小结

  发现几种激活函数的效果差距较小。

dropout效果

channel_1_10_dropout

  测试有dropout的时候,模型定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv1_drop = nn.Dropout2d()
self.conv2 = nn.Conv2d(10, 120, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(120*4*4, 50)
self.fc1_drop = nn.Dropout()
self.fc2 = nn.Linear(50, 10)
self.fc2_drop = nn.Dropout()

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv1_drop(self.conv2(x)), 2))
x = self.conv2_drop(x)
x = x.view(-1, 120*4*4)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  训练效果如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1430, Accuracy: 9573/10000 (96%)
Test set: Average loss: 0.0937, Accuracy: 9707/10000 (97%)
Test set: Average loss: 0.0754, Accuracy: 9774/10000 (98%)
Test set: Average loss: 0.0606, Accuracy: 9811/10000 (98%)
Test set: Average loss: 0.0548, Accuracy: 9825/10000 (98%)
Test set: Average loss: 0.0521, Accuracy: 9835/10000 (98%)
Test set: Average loss: 0.0464, Accuracy: 9852/10000 (99%)
Test set: Average loss: 0.0452, Accuracy: 9852/10000 (99%)
Test set: Average loss: 0.0409, Accuracy: 9869/10000 (99%)
Test set: Average loss: 0.0402, Accuracy: 9871/10000 (99%)

channel_1_10_nodrop

  测试无dropout的时候,模型定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 120, kernel_size=5)
self.fc1 = nn.Linear(120*4*4, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2(x), 2))
x = x.view(-1, 120*4*4)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x)

  训练数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.1155, Accuracy: 9646/10000 (96%)
Test set: Average loss: 0.0686, Accuracy: 9788/10000 (98%)
Test set: Average loss: 0.0506, Accuracy: 9844/10000 (98%)
Test set: Average loss: 0.0407, Accuracy: 9868/10000 (99%)
Test set: Average loss: 0.0387, Accuracy: 9873/10000 (99%)
Test set: Average loss: 0.0346, Accuracy: 9890/10000 (99%)
Test set: Average loss: 0.0339, Accuracy: 9878/10000 (99%)
Test set: Average loss: 0.0304, Accuracy: 9901/10000 (99%)
Test set: Average loss: 0.0294, Accuracy: 9901/10000 (99%)
Test set: Average loss: 0.0313, Accuracy: 9894/10000 (99%)

                    photo12

小结

  从训练数据对比上来看,去掉之后准确率反而有所上升,可能是由于此模型无加dropour的时候,capacity并未过大,因此没有出现过拟合的现象,但是dropout的使用则降低了模型的capacity,所以模型的性能下降。
  我觉得再判断为过拟合之后再使用dropout可能比较合适。

Batch-Normalization效果

  在不加dropout的基础上建立的模型。

channel_1_10_noBN (=channel_1_10_nodrop)

  无BN的时候,训练数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0985, Accuracy: 9722/10000 (97%)
Test set: Average loss: 0.0630, Accuracy: 9803/10000 (98%)
Test set: Average loss: 0.0509, Accuracy: 9842/10000 (98%)
Test set: Average loss: 0.0417, Accuracy: 9866/10000 (99%)
Test set: Average loss: 0.0395, Accuracy: 9880/10000 (99%)
Test set: Average loss: 0.0370, Accuracy: 9875/10000 (99%)
Test set: Average loss: 0.0335, Accuracy: 9874/10000 (99%)
Test set: Average loss: 0.0349, Accuracy: 9883/10000 (99%)
Test set: Average loss: 0.0317, Accuracy: 9897/10000 (99%)
Test set: Average loss: 0.0298, Accuracy: 9901/10000 (99%)

channel_1_10_BN

  有BN的情况下,模型定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv1_BN = nn.BatchNorm2d(self.conv1.out_channels)
self.conv2 = nn.Conv2d(10, 120, kernel_size=5)
self.conv2_BN = nn.BatchNorm2d(self.conv2.out_channels)
self.fc1 = nn.Linear(120*20*20, 50)
self.fc1_BN = nn.BatchNorm2d(self.fc1.out_features)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(self.conv1_BN(self.conv1(x)))
x = F.relu(self.conv2_BN(self.conv2(x)))
x = x.view(-1,120*20*20)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x)

  试验数据如下:

1
2
3
4
5
6
7
8
9
10
Test set: Average loss: 0.0646, Accuracy: 9796/10000 (98%)
Test set: Average loss: 0.0506, Accuracy: 9839/10000 (98%)
Test set: Average loss: 0.0375, Accuracy: 9869/10000 (99%)
Test set: Average loss: 0.0333, Accuracy: 9884/10000 (99%)
Test set: Average loss: 0.0319, Accuracy: 9889/10000 (99%)
Test set: Average loss: 0.0360, Accuracy: 9881/10000 (99%)
Test set: Average loss: 0.0315, Accuracy: 9896/10000 (99%)
Test set: Average loss: 0.0303, Accuracy: 9890/10000 (99%)
Test set: Average loss: 0.0302, Accuracy: 9898/10000 (99%)
Test set: Average loss: 0.0291, Accuracy: 9896/10000 (99%)

                    photo12
  收敛速度有所加快。

小结

  与dropout相比,BN的作用主要在帮助优化,防止进入饱和区停止优化。

中间层训练

  誓言的最后做一件有意思的事情,我们知道CNN可以用来提取特征,但是我们并不知道CNN所提取的特征是什么样子的,所以我们对于中间conv层output进行输出,可以得到如下特征图像:
photo21
  上图显示了5个样本在两层卷积层中学习到的特征。

总结

  简单的对这次实验进行一次总结。通过这个实验我们发现,适当的提高模型的深度和宽度来提高模型的capacity有助于提高模型的准确率;batch-size的大小直接决定了每个epoch的迭代次数,若其太大,则在一个epoch中优化的次数过少,效果不会很理想;在一定范围内提高epoch可以简单粗暴的提高模型的准确率,但是当epoch达到一定数量的时候,若继续增大epoch,模型的效果并不会有更大的改善;优化算法中,若采用RMSprop优化方法以及Adam优化方法的时候,应采用偏小的lr进行训练;当有模型的时候,先不要着急将dropout加上,根据曲线变化趋势,当模型过拟合的时候再考虑使用dropout;BN的使用方便了优化。

参考资料

[1] 所参考的原始代码