TF笔记：Deep MNIST for Experts

2017-02-16 Lu Huang 更多博文 » 博客 » GitHub »

原文链接 https://hlthu.github.io/tensorflow/2017/02/16/tf-learn-mnist-experts.html
注：以下为加速网络访问所做的原文缓存，经过重新格式化，可能存在格式方面的问题，或偶有遗漏信息，请以原文为准。

Tensorflow 十分擅长于实现和训练深度神经网络，本文将：

构建一个 Softmax 回归模型
用 TensorFlow 训练这个模型
用测试数据测试模型的精度
建立、训练并测试一个多层的卷积神经网络

加载 MNIST 数据

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

启动一个会话(InteractiveSession)

import tensorflow as tf
sess = tf.InteractiveSession()

计算图(Computation Graph)

python 代码的角色就是构建一个外部的计算图，并且指示计算图的哪些部分应该运行。

建立 Softmax 回归模型

本节的回归模型只有一个线性层。

Placeholders

首先需要创建用于输出图像和目标输出标签的节点

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

这里 x 和 y_ 都不是具体的值，而是我们在让 TensorFlow 启动计算时需要输入的值，他们被称为 placeholders。

Variables(变量)

定义模型的权重 W 和偏置 b。

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

变量是在计算过程中存在的值，可以在计算过程中使用并修改。

在变量可以在一个 session 中使用之前，他们必须被 session 初始化。下面的操作会对所有的变量初始化

sess.run(tf.global_variables_initializer())

预测类别和损失函数

实现回归模型

y = tf.matmul(x,W) + b

这里我们使用的是交叉熵(cross-entropy)损失函数

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))

其中 tf.nn.softmax_cross_entropy_with_logits 在模型的非归一化预测结果上做 Softmax，并对所有类求和。tf.reduce_mean 是对这些和求平均。

训练模型

TensorFlow 可以使用微分来计算出每个变量上损失的梯度，并且提供了很多优化算法，本文中使用的是最陡梯度下降(steepest gradient descent)。

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

其中 0.5 是步长。返回操作是 train_step，其会对参数作梯度下降更新，于是训练过程就是重复运行 train_step。

for i in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

在每一个训练 iteration，我们加载了 100 个训练数据。然后使用 train_step 训练，其中使用 feed_dict 来更新 placeholder 类型 x 和 y_ 的值。需要指出的是 feed_dict 并不仅仅局限于 placeholder 类型。

评估模型

tf.argmax 是一个非常有用的函数，它可以根据轴给出一个 tesor 的入口。比如 tf.argmax(y,1) 是模型预测出来的结果, tf.argmax(y_,1) 是正确的结果，可以使用 tf.equal 来检查是否相等。

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

返回值将是布尔类型，即 True 或者 False，也就是 1 和 0。下面计算准确率。

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

其中 tf.cast 把布尔型转成了 float 类型，然后求均值。

最后是在测试集上计算准确率。

print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

构建多层卷积网络

权重初始化

构建两个函数用于权重和偏置初始化

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

卷积和 Pooling

卷积层的移步是 1，而且是 0 值填充，从而输出和输入一样大。Pooling 层的块是 2x2。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

第 1 个卷积层

实现第 1 个层，包括一个卷积层和一个最大值 Pooling 层。在卷积层我们使用 32 个特征图，所以权重大小应该是 [5, 5, 1, 32]。前面两维代表卷积核，第 3 个是输入的通道数，最后一个是输出的通道数，即特征图个数。对每一个输出特征图，我们都有一个偏置。

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

下面将输入的图像向量转换成四维的张量，第 2 维和第 3 维是图像的宽和高，最后一维是图像的颜色通道数。

x_image = tf.reshape(x, [-1,28,28,1])

接下来就是卷积和最大值 pooling。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

第 2 个卷积层

卷积层输入是 32 通道，输出是 64 通道，卷积核 5x5。

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

Pooling 仍然是 2x2 的最大值Pooling。

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

全连接层

到目前为止，我们有 64 个通道的图片，每个通道的大小是 7x7。因此输入是 64x7x7 个节点，而输出我们采用 1024 个节点。

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

为了降低过拟合，我们会在输出之前进行 Dropout，为此创建了一个 placeholder 来作为 Dropout 概率的输入。

这使得我们可以在训练的时候使用 Dropout，而在测试时不适用。函数 tf.nn.dropout 可以自动实现 Dropout。

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

输出层

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

训练和评估模型

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

代码

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_varibale(shape):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def main(_):
    mnist = input_data.read_data_sets('../data', one_hot=True)

    sess = tf.InteractiveSession()

    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    # first conv layer
    W_conv1 = weight_variable([5, 5, 1, 32])
    b_conv1 = bias_varibale([32])
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
    h_pool1 = max_pool_2x2(h_conv1)

    # second conv layer
    W_conv2 = weight_variable([5, 5, 32, 64])
    b_conv2 = bias_varibale([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)

    # dense connect layer
    W_fc1 = weight_variable([7*7*64, 1024])
    b_fc1 = weight_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    # dropout
    keep_prob = tf.placeholder(tf.float32)
    h_fc1_dropout = tf.nn.dropout(h_fc1, keep_prob)

    # output layer
    W_fc2 = weight_variable([1024, 10])
    b_fc2 = weight_variable([10])
    y = tf.matmul(h_fc1_dropout, W_fc2) + b_fc2

    # cross_entropy
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))

    # train step
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

    # accuracy
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # for r0.12
    # tf.global_variables_initializer().run()

    # for r0.11
    sess.run(tf.initialize_all_variables())

    for i in range(10000):
        batch = mnist.train.next_batch(100)
        if i%100 == 0:
            train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
            print("Step %d, training accuracy %g"%(i, train_accuracy))
        train_step.run(feed_dict = {x: batch[0],y_: batch[1], keep_prob: 0.5})

    print("test accuracy %g"%accuracy.eval(feed_dict = {x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

if __name__ == '__main__':
    tf.app.run(main=main)