How do you feel about this article? Help us to provide better content for you.
Thank you! Your feedback has been received.
There was a problem submitting your feedback, please try again later.
What do you think of this article?
Deep learning has become more and more vitally important in our daily life. The way that we unlock our phone and the intelligent LEDs on the streets are both demonstrating the implementation of AI in terms of image recognition. There are many open sources on the internet about AI, including handwritten digit recognition, but how could we build our own dataset? In this article, I am sharing an application with you all, the Neural Network Console.
This article will be divided into 4 parts, first preparation of the datasets, second, building the Network model, third creating the dataset and in the last part, how to evaluate the model.
Let's watch a video and set up the network concept first:
Neural Network Model for hand sign:
The following are the basic concepts of each block.
A picture include 3 colours, Red, Blue, and Green. They are all separated into layers with a different gradient and are numbered 0 to 255.
In this article, we will pick the red layer as a demonstration.
Input: the neural network input layer specifies the input size.
MulScalar is a function that multiplies a value to the input.
ImageAugmentation randomly alters the input image.
Convolution: Ox,y,m = Σ_i,j,n Wi,j,n,m Ix+i,y+j,n + bm (two-dimensional convolution) (where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel). The KernelShape is the size of the window that we want to convolve, for example, a 5*5 png could be set a KernalShape of 2*2.
Relu is an activation function which could find out the linear region.
MaxPooling outputs the maximum value of local inputs.
This maxPooling is using a 2x2 downsampling method which reduces the size of the picture and gets the largest parameter to represent the signal. The figure below is showing a 4x4 red layer downsample to 2x2.
Tanh is an activation function which could find out the linear region.
Affine: Fine-tune a better ratio of the image for recognition.
For example, the current data is one-fourth of the picture, after applying the affine, the picture will multiply a 4 to each data in the matrix and give a similar ratio to the original dataset for recognition.
Softmax shows probability distributions of a list of potential outcomes.
Softmax outputs the Softmax of inputs. This is used when you want to obtain probabilities in a categorisation problem or output values ranging from 0.0 to 1.0 that sum up to 1.0.
The data after passing via the model will give a weight, after multiplying the weight with dot product, we are sent the result with the algorithm ox=exp(ix) / Σ_jexp(ij) then we will obtain the probability of the data directly proportional to the weight.
The next part will be to create the dataset.
Future development
Could the neural network console recognise a face? Yes! It would be interesting using this app as a starter.
By inputting different data sets, various results will be generated.
Other Articles in this Series
Coming Soon
PART 3
PART 4