Experiment with an AI that can understand your handwriting and discover how this ability is slowly developed through a learning process like the one humans follow.
To understand handwriting is not an easy task for a computer. Handwriting is irregular by nature, characters change shape even when written by the same person, there are slight variations in the way some characters are written, and a change in the length or angle of a line might be all that’s needed for one character to turn into another. Humans have trouble understanding handwriting very often, even their own.
To help computers read we first simplified the task. Barcodes (which exist since 1951) only require to detect an alternating sequence of black and white lines. OCR (Optical Character Recognition) can read text written in arbitrary fonts, but relies on their regularity. It can also have trouble with ambiguous characters, so it’s sometimes helped by special fonts that use distinct character shapes. Palm Pilot devices from the late 90s allowed handwritten input but used a special simplified alphabet with characters drawn in a particular size and with a specific set of strokes.
Programming an AI with a precise definition of the shape of numbers so it can tell them apart is not an easy task. It has to be broad enough to deal with style variations, and the point at which a 1 becomes a 7, or a 3 becomes an 8, is a very gray area. Humans learn the numbers by looking at examples and with time recognition is almost instantaneous and automatic.
The AI you used at the top of this page uses a type of system called an artificial neural network that also learns by looking at examples. Neural networks have that name because they are inspired by biological brains and their collection of neurons that connect and transmit signals to take input from the sensory organs and generate an appropriate response in muscles or other organs. The artificial neural networks are similarly made of thousands of artificial neurons connected in a very complex network and they also exchange simple signals that travel between an input and an output.
Neural Networks commonly learn to perform tasks by training with thousands or millions of examples. They are particularly good at recognizing patterns and shapes: numbers, letters, and of course: pictures of cats.
In our number recognizing AI the input is a 28 by 28 pixel image. That is, 784 individual points (28 × 28 = 784) associated with a number that goes from 1 (white) to 0 (black), with shades of gray as decimals. The shape you draw is centered and zoomed before being used as input. Try writing a small number on a corner in the recognizer below and see how it’s normalized to simplify the task.
Since we’re limiting the task to recognizing numbers, we have ten outputs: one for each possible character between 0 and 9. Each output produces a signal which tells us how much confidence the network has that the input is that particular number. At any point the neural network might consider different answers as likely. In the next recognizer you can see bars for each output, which grow as the signal becomes stronger. Try drawing a number slowly and see how the bars get bigger and smaller as you proceed.
Each input is connected to all others and then to the outputs through around 80.000 connections. These connections grow stronger or become weaker as the training progresses and substructures that are able to detect certain patterns and shapes in the input emerge.
Our network’s training was done with the help of the MNIST database, a collection of 70.000 images of individual digits handwritten by high school students and employees of the American Census Bureau that are shared to help AI researchers. Each image in the MNIST database was pre-labeled by a human with the number pictured, so the AI can verify whether its output was correct or not by itself.
The pictures are used as input, one at a time, and the output is checked. Feedback (whether it was right or wrong) is provided to the network after each picture so its connections can be adjusted. Thanks to this process responses become more and more accurate over time. The training is lengthy, but done automatically by the computer.
This recognizer below was only trained using around 300 pictures. You can see that as you write the bars all remain quite low (the network has low confidence in the response) and they jump around a lot (it’s very sensitive to small changes in how the number is written). Some numbers aren’t recognized unless they’re drawn in a very specific way.
The next recognizer was trained with around 1500 pictures. Its performance is much better and it can identify many numbers confidently. It still has trouble with certain number shapes, identifying numbers or not, depending on how they’re drawn. For instance, it doesn’t recognize the 8 unless it’s drawn with a particular width (not too thin, not too wide) and with the upper circle bigger than the bottom one.
For convenience in experimenting, this is the fully trained network (using 70.000 pictures) once more.
Neural networks often understand complex shapes as the combination of simpler visual patterns. You can explore the way our fully trained recognizer sees numbers by drawing certain shapes and seeing which output bars activate. For instance, notice how drawing a plus sign raises the 4 signal, or how drawing an x raises the 8. Other interesting shapes to try are two or three parallel lines (either vertical or horizontal), rotated T shapes and half circles.
The network can also deal very well with noisy images. Try drawing a number using dots or dashes instead of full lines and you’ll notice it’s still able to identify the numbers quite well.
You can experiment with the interactive Neural Numbers exhibit this trail is based on, which allows you to train the neural network up to any point and try out many different options. Also, the Talk to Me exhibit shows in detail how neural networks can be used for speech recognition.
Learn how to visit.
Text is available under the Creative Commons Attribution License.