Once you installed Keras and made sure it works, let’s do something.

Typical “Hello, World!” example for neural networks is recognizing the handwritten digits. There is a famous MNIST dataset, containing grayscale images of the handwritten digits from 0 to 9.

The images are 28x28 pixels size: Digits examples from the MNIST

Neural networks may sound complicated, but with Keras it is possible to create neural network, train it, and estimate the accuracy of the predictions on unseen data with only 10-20 lines of the code.

Here is the Jupyter notebook for the tutorial, please fell free to download and play with it.

Open the notebook code.

To make things more fun, I use the dataset provided by Kaggle training competition Digit Recognizer. It uses the MNIST dataset (download it), and as a side effect, we will be able to score the result on the Kaggle Leaderboard of the competition.

Creating the model

For this tutorial, I created a very simple net with one hidden fully dense layer with 32 nodes.
The input layer has 784 nodes - a node for every pixel in the image.
Each of the 32 nodes in the hidden layer connected to each of the 784 input nodes. And we have 10 nodes output layer with standard softmax activation.

The whole code for creating this neural network:

model = Sequential([
    Dense(32, input_dim=784),
    Activation('relu'),
    Dense(10),
    Activation('softmax')
])

# Multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

This simple net requires totally (784 + 1)*(32) + (32 + 1)*10 = 25450 parameters to fit.
Not a small number for such a simple NN! And we have only 42000 digits in the training set.
So as you can imagine, we definitely should look out for overfitting here.

Fitting the model

The code for fitting the model (training):

history=model.fit(train_images.values, train_labels.values, validation_split = 0.05, 
            nb_epoch=25, batch_size=64)

Estimating accuracy

Parameters for model.fit() mean that every [epoch][wiki-epoch] we will use 5% of the training data to validate accuracy (i.e. estimate accuracy of the model for unseen data).

As we see from the following picture overfitting happens already on 8-10 epoch.

Accuracy over epochs

Basically what we did here was tuning hyper-parameter ‘number of epochs’ using validation.

Submitting to the Kaggle

To create submission file, I do the model fitting on the whole training dataset with nb_epoch = 10.

For this simple NN I’ve got 0.96257 score. For this competition, the score is defined as the categorization accuracy of the predictions (the percentage of images you get correct).
So, 96% accuracy for unseen test data is good, right?

But nowadays, this result isn’t considered as good. The best you can get using only one hidden layer NN, for now, is around 99.3% accuracy.
But please don’t use Kaggle leaderboard as a target for the accuracy. Apparently, some people with 100% accuracy just cheated somehow (i.e. overfitted on the testing dataset somehow).

Can we do better?

The next steps from here would be to try hyperparameters tuning.
Please feel free to do it by yourself, here is the source of the code for this post.