Instructor Note: The code in this notebook is in the file MNIST_CNN.ipynb in the student downloads
MNIST database of handwritten digitstf_env Anaconda environmentch15 examples folderfrom tensorflow.keras.datasets import mnist
load_data function loads training and testing sets(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train), training set labels (y_train), testing set images (X_test) and testing set labels (y_test):X_train.shape
y_train.shape
X_test.shape
y_test.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
# sns.set(font_scale=2)
import numpy as np
index = np.random.choice(np.arange(len(X_train)), 24, replace=False) # 24 indices
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(16, 9))
for item in zip(axes.ravel(), X_train[index], y_train[index]):
axes, image, target = item
axes.imshow(image, cmap=plt.cm.gray_r)
axes.set_xticks([]) # remove x-axis tick marks
axes.set_yticks([]) # remove y-axis tick marks
axes.set_title(target)
plt.tight_layout()
(width,height,channels)
(28, 28, 1)
reshape receives a tuple representing the new shapeX_train = X_train.reshape((60000, 28, 28, 1))
X_train.shape
X_test = X_test.reshape((10000, 28, 28, 1))
X_test.shape
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
tensorflow.keras.utils function to_categorical performs one-hot encodingy_train and y_test into two-dimensional arrays of categorical datafrom tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train)
y_train.shape
y_train[0] # one sample’s categorical data
y_test = to_categorical(y_test)
y_test.shape
Sequential model stacks layers to execute sequentiallyfrom tensorflow.keras.models import Sequential
cnn = Sequential()
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D


Conv2D implements the convolution layercnn.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',
input_shape=(28, 28, 1)))
filters=64—The number of filters in the resulting feature map.kernel_size=(3, 3)—The size of the kernel used in each filteractivation='relu'—Rectified Linear Unit activation function is used to produce this layer’s outputinput_shape=(28, 28,1) Conv2D layer, which is actually the first hidden layerinput_shape from previous layer’s output shape
z
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
Flatten layer's output will be 1-by-3200 (5 × 5 × 128)cnn.add(Flatten())
Flatten layer learned digit featuresDense layersDense layer creates 128 neurons (units) that learn from the 3200 outputs of the previous layercnn.add(Dense(units=128, activation='relu'))
Dense layer Dense layers, commonly with 4096 neuronsDense layer classifies inputs into neurons representing the classes 0-9softmax activation function converts values of these 10 neurons into classification probabilitiescnn.add(Dense(units=10, activation='softmax'))
summary methodOutput Shape column, None means the model does not know in advance how many training samples you’re going to providecnn.summary()
plot_model function from module tensorflow.keras.utilsfrom tensorflow.keras.utils import plot_model
from IPython.display import Image
plot_model(cnn, to_file='convnet.png', show_shapes=True,
show_layer_names=True)
Image(filename='convnet.png') # display resulting image in notebook
compile methodcnn.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
optimizer='adam'—The optimizer this model uses to adjust the weights throughout the neural network as it learns'adam' performs well across a wide variety of models [1],[2]loss='categorical_crossentropy'—The loss function used by the optimizer in multi-classification networks (ours predicts 10 classes)'binary_crossentropy', and for regression, 'mean_squared_error'metrics=['accuracy']—List of metrics the network will produce to help you evaluate the modelfit methodcnn.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
epochs=5—train neural networks iteratively over timeepoch processes every training dataset sample oncebatch_size=64—number of samples to process at a timevalidation_split=0.1—model should reserve the last 10% of the training samples for validation fit method’s hyperparameters, or possibly change the layer composition of your modelvalidation_data argument cnn.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.1)
fit shows the progress of each epoch, how long the epoch took to execute, and the evaluation metrics for that epochacc) and validation accurracy (acc), given that we have not yet tried to tune the hyperparameters or tweak the number and types of the layersevaluate Method¶loss, accuracy = cnn.evaluate(X_test, y_test)
loss
accuracy
predict Method¶predictions = cnn.predict(X_test)
1. at index 7)y_test[0]
predict for first test samplefor index, probability in enumerate(predictions[0]):
print(f'{index}: {probability:.10%}')
predictions[0] to the index of the element containing 1.0 in y_test[0](28, 28, 1) that Keras required for learning back to (28, 28), which Matplotlib requires to display the imagesimages = X_test.reshape((10000, 28, 28))
incorrect_predictions = []
p is the predicted value array, and e is the expected value arrayargmax function determines index of an array’s highest valued elementfor i, (p, e) in enumerate(zip(predictions, y_test)):
predicted, expected = np.argmax(p), np.argmax(e)
if predicted != expected: # prediction was incorrect
incorrect_predictions.append(
(i, images[i], predicted, expected))
len(incorrect_predictions) # number of incorrect predictions
p) and expected value (e)figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(16, 12))
for axes, item in zip(axes.ravel(), incorrect_predictions):
index, image, predicted, expected = item
axes.imshow(image, cmap=plt.cm.gray_r)
axes.set_xticks([]) # remove x-axis tick marks
axes.set_yticks([]) # remove y-axis tick marks
axes.set_title(f'index: {index}\np: {predicted}; e: {expected}')
plt.tight_layout()
def display_probabilities(prediction):
for index, probability in enumerate(prediction):
print(f'{index}: {probability:.10%}')
display_probabilities(predictions[340])
display_probabilities(predictions[740])
display_probabilities(predictions[1260])
https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751), [2]
cnn.save('mnist_cnn.h5')
from tensorflow.keras.models import load_model cnn = load_model('mnist_cnn.h5')
predict to make additional predictions on new datafit to train with additional data©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.