DeepLearning-Project-MNIST

概述

  这是深度学习选修课的第一次project,样本集是MNIST,代码出处为pytorch官方文档中的一个example。

网络模型

样例模型

  首先查看定义的Net,得知其为一个基础的CNN模型,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  将模型输出结果为

1
2
3
4
5
6
7
8
$ python project_MNIST.py 
Net (
(conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
(conv2_drop): Dropout2d (p=0.5)
(fc1): Linear (320 -> 50)
(fc2): Linear (50 -> 10)
)

  有打印结果可知,CNN由两层卷积层,一个Dropout层和两层全连接组成,其中每次卷积计算之后还包含一次池化操作。

与LeNet比较

  LeNet网络对于MNIST数据集有过测试,所以我们用LeNet来与样例中的模型做一个简单的对比,LeNet模型定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features

  网络打印如下:

1
2
3
4
5
6
7
Net(
(conv1): Conv2d (1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d (6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120)
(fc2): Linear(in_features=120, out_features=84)
(fc3): Linear(in_features=84, out_features=10)
)

  经过对比可以发现,在结构上,LeNet同样为两层卷积层,但是与样例通道数稍有差异。同时LeNet有三层全连接层,并且没有Dropout层。在前向传播的时候,样例模型先进行池化操作,再进行Relu计算,但是LeNet先进行Relu再进行池化操作。

参数说明

  样例程序用argparse模块将可以调节的参数封装了起来,其中参数包括

  • batch-size:每轮epoch训练的样本数
  • test-batch-size:每次测试所用的样本数
  • epochs:训练的轮数
  • lr:学习率
  • momentum:随机梯度下降方法中的动量值
  • no-cuda:时候使用GPU加速
  • seed:为GPU设置随机种子
  • log-interval:打印训练状态的间隔数

  其中“momentum”的作用是在进行梯度下降算法时候,给定一个概率,在这个概率的基础上判断更新方向是进行更新,还是保持原方法,其常取的值为[0.5, 0.9, 0.95, 0.99],一种常见的做法是在迭代开始时将其设为0.5,在一定的迭代次数后,将其值更新为0.99,相比普通的SGD方法,这种配置通常能极大地加快收敛速度。
  本次project的主要目的是通过调节这些参数来观察CNN的训练结果。

参数调整及结果对比

调节batch-size

batch-size = 50

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.090874
Train Epoch: 10 [5000/60000 (8%)] Loss: 0.185704
Train Epoch: 10 [10000/60000 (17%)] Loss: 0.284757
Train Epoch: 10 [15000/60000 (25%)] Loss: 0.154577
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.094958
Train Epoch: 10 [25000/60000 (42%)] Loss: 0.321375
Train Epoch: 10 [30000/60000 (50%)] Loss: 0.052121
Train Epoch: 10 [35000/60000 (58%)] Loss: 0.286187
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.141594
Train Epoch: 10 [45000/60000 (75%)] Loss: 0.183933
Train Epoch: 10 [50000/60000 (83%)] Loss: 0.107566
Train Epoch: 10 [55000/60000 (92%)] Loss: 0.203557

Test set: Average loss: 0.0502, Accuracy: 9819/10000 (98.19%)

batch-size = 100

1
2
3
4
5
6
7
8
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.191366
Train Epoch: 10 [10000/60000 (17%)] Loss: 0.208016
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.146529
Train Epoch: 10 [30000/60000 (50%)] Loss: 0.063239
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.205667
Train Epoch: 10 [50000/60000 (83%)] Loss: 0.131913

Test set: Average loss: 0.0469, Accuracy: 9831/10000 (98.31%)

batch-size = 200

1
2
3
4
5
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.278308
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.221830
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.306337

Test set: Average loss: 0.0882, Accuracy: 9708/10000 (97.08%)

  由对比可得,当batch-size较小的时候,每个epoch迭代的次数较多,收敛速度较快;反之则收敛速度降低。

调节Epoch

Epoch = 10

1
2
3
4
5
6
7
8
9
10
11
12
Train Epoch: 10 [53760/60000 (90%)]	Loss: 0.316007
Train Epoch: 10 [54400/60000 (91%)] Loss: 0.107930
Train Epoch: 10 [55040/60000 (92%)] Loss: 0.115345
Train Epoch: 10 [55680/60000 (93%)] Loss: 0.107533
Train Epoch: 10 [56320/60000 (94%)] Loss: 0.162614
Train Epoch: 10 [56960/60000 (95%)] Loss: 0.139190
Train Epoch: 10 [57600/60000 (96%)] Loss: 0.201872
Train Epoch: 10 [58240/60000 (97%)] Loss: 0.102919
Train Epoch: 10 [58880/60000 (98%)] Loss: 0.329025
Train Epoch: 10 [59520/60000 (99%)] Loss: 0.173940

Test set: Average loss: 0.0545, Accuracy: 9824/10000 (98.24%)

Epoch = 20

1
2
3
4
5
6
7
8
9
10
11
12
Train Epoch: 20 [53760/60000 (90%)]	Loss: 0.154205
Train Epoch: 20 [54400/60000 (91%)] Loss: 0.151837
Train Epoch: 20 [55040/60000 (92%)] Loss: 0.124796
Train Epoch: 20 [55680/60000 (93%)] Loss: 0.115554
Train Epoch: 20 [56320/60000 (94%)] Loss: 0.076441
Train Epoch: 20 [56960/60000 (95%)] Loss: 0.130018
Train Epoch: 20 [57600/60000 (96%)] Loss: 0.125328
Train Epoch: 20 [58240/60000 (97%)] Loss: 0.231644
Train Epoch: 20 [58880/60000 (98%)] Loss: 0.307199
Train Epoch: 20 [59520/60000 (99%)] Loss: 0.066124

Test set: Average loss: 0.0385, Accuracy: 9874/10000 (98.74%)

Epoch = 100

1
2
3
4
5
6
7
8
9
10
11
12
Train Epoch: 100 [53760/60000 (90%)]	Loss: 0.036712
Train Epoch: 100 [54400/60000 (91%)] Loss: 0.055174
Train Epoch: 100 [55040/60000 (92%)] Loss: 0.208764
Train Epoch: 100 [55680/60000 (93%)] Loss: 0.067438
Train Epoch: 100 [56320/60000 (94%)] Loss: 0.090439
Train Epoch: 100 [56960/60000 (95%)] Loss: 0.182915
Train Epoch: 100 [57600/60000 (96%)] Loss: 0.037201
Train Epoch: 100 [58240/60000 (97%)] Loss: 0.037007
Train Epoch: 100 [58880/60000 (98%)] Loss: 0.107094
Train Epoch: 100 [59520/60000 (99%)] Loss: 0.021533

Test set: Average loss: 0.0283, Accuracy: 9904/10000 (99.04%)

  由对比发现,当学习率合适的情况下,在epoch=10的情况下,参数已经基本收敛,所以再提升epoch的时候,准确率并没有明显的改进。
  需要说明的是,此处测试用的学习率为lr=0.01,当学习率为lr=0.001的时候,即学习率过低的情况下,训练10个epoch的时候,还并未完全收敛,此时再加大epoch,准确率会有明显的提高。

调节lr

lr = 0.001

1
2
3
4
5
6
7
8
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.506689
Train Epoch: 10 [10000/60000 (17%)] Loss: 0.486579
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.425447
Train Epoch: 10 [30000/60000 (50%)] Loss: 0.439384
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.526765
Train Epoch: 10 [50000/60000 (83%)] Loss: 0.635956

Test set: Average loss: 0.2606, Accuracy: 9251/10000 (93.51%)

lr = 0.01

1
2
3
4
5
6
7
8
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.191366
Train Epoch: 10 [10000/60000 (17%)] Loss: 0.208016
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.146529
Train Epoch: 10 [30000/60000 (50%)] Loss: 0.063239
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.205667
Train Epoch: 10 [50000/60000 (83%)] Loss: 0.131913

Test set: Average loss: 0.0469, Accuracy: 9831/10000 (98.31%)

lr = 0.05

1
2
3
4
5
6
7
8
Train Epoch: 10 [0/60000 (0%)]	Loss: 0.191366
Train Epoch: 10 [10000/60000 (17%)] Loss: 0.208016
Train Epoch: 10 [20000/60000 (33%)] Loss: 0.146529
Train Epoch: 10 [30000/60000 (50%)] Loss: 0.063239
Train Epoch: 10 [40000/60000 (67%)] Loss: 0.205667
Train Epoch: 10 [50000/60000 (83%)] Loss: 0.131913

Test set: Average loss: 0.0469, Accuracy: 9835/10000 (98.35%)

  由对比发现,当学习率为0.01或者0.05的时候,学习率较大,因此在训练10个epoch的情况下,已经基本收敛,因此效果差别不是很明显;但是当学习率为0.001的时候,由于学习率偏低,所以当训练为10个epoch的时候,还未达到收敛,因此准确度差距较大。由此判断0.01为较合适的学习率参数值。

隐层输出

  对于前5个样本,将其两次卷积层的结果进行输出,观察两次卷积层分别学习的情况.
  作图所用函数代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def show_images(images):
# images reshape to (batch_size, D)
images = np.reshape(images, [images.shape[0], -1])
# use to get the num of images in every row
sqrtn = int(np.ceil(np.sqrt(images.shape[0])))
# use to get the size of the images
sqrtimg = int(np.ceil(np.sqrt(images.shape[1])))

fig = plt.figure(figsize=(sqrtn, sqrtn))
gs = gridspec.GridSpec(sqrtn, sqrtn)
gs.update(wspace=0.05, hspace=0.05)

for i, img in enumerate(images):
ax = plt.subplot(gs[i])
plt.axis('off')
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_aspect('equal')
plt.imshow(img.reshape([sqrtimg,sqrtimg]))
return

  输出隐层结果的方法为在Net的forward函数的相应位置运用上述函数即可。不过需要特别注意的是,forward中传递的x的数据类型为Variable,其没有numpy()方法,需要将x从Variable转为tensor,在用numpy()方法转为函数的输入。具体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
# use to get the photo of hidden_layer
for i in range(0,5):
testphoto = x[i].data.numpy() # !!! Variable != tensor
show_images(testphoto)
plt.show()
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)

  输出结果汇总如下图:
photo1

  和学长交流后,还有一种可行的输出隐层的方法,即把一个网络拆成两个网络,然后把第一个网络的结果输出,即可以得到原网络隐层的输出。

小结

  总的来时example程序中原本的参数已经比较合适了,epoch=10的时候已经达到98%的准确率,加大epoch之后准确率达到大约99.0%左右。但是目前MNISt数据集已经可以达到的准确率为99.5%左右,因此想进一步提高准确率的话,可以考虑改变网络模型。第一个project还是比较容易的,主要是想让我们熟悉下环境,不过在尝试输出隐层的时候,还是稍微用了点时间才实现的。