Skip to main content

How to build your own dataset with a Neural Network Console? (Part 2)

Deep learning has become more and more vitally important in our daily life. The way that we unlock our phone and the intelligent LEDs on the streets are both demonstrating the implementation of AI in terms of image recognition. There are many open sources on the internet about AI, including handwritten digit recognition, but how could we build our own dataset? In this article, I am sharing an application with you all, the Neural Network Console.

This article will be divided into 4 parts, first preparation of the datasets, second, building the Network model, third creating the dataset and in the last part, how to evaluate the model.

Let's watch a video and set up the network concept first:

Neural Network Model for hand sign:

mo_a60c5a66fa637f6553ae9429a5c204abc2ef6e17.jpg

The following are the basic concepts of each block.

A picture include 3 colours, Red, Blue, and Green. They are all separated into layers with a different gradient and are numbered 0 to 255.

RBG_LAYER_ff74ab5987d3eea3dd421842fefb14028c43fcba.png

In this article, we will pick the red layer as a demonstration.

red_169910eadad9e331ca3d265eb9bc52b13a1a34ae.jpg

Input: the neural network input layer specifies the input size.

MulScalar is a function that multiplies a value to the input.

ImageAugmentation randomly alters the input image.

Convolution: Ox,y,m = Σ_i,j,n Wi,j,n,m Ix+i,y+j,n + bm (two-dimensional convolution) (where O is the output; I is the input; i,j is the kernel size; x,y,n is the input index; m is the output map (OutMaps property), W is the kernel weight, and b is the bias term of each kernel). The KernelShape is the size of the window that we want to convolve, for example, a 5*5 png could be set a KernalShape of 2*2.

Relu is an activation function which could find out the linear region.

488px-Ramp_function.svg__8907fe53b9d3c36a78793b292fc6732709da0fbd.png

MaxPooling outputs the maximum value of local inputs.

This maxPooling is using a 2x2 downsampling method which reduces the size of the picture and gets the largest parameter to represent the signal. The figure below is showing a 4x4 red layer downsample to 2x2.

maxpool_1ab3d6a573308b52e145578583dd5c7792341164.jpg

Tanh is an activation function which could find out the linear region.

tnah_61aef8fe1f9cd35999c3d681722904b87e6d1369.png

 

Affine: Fine-tune a better ratio of the image for recognition.

For example, the current data is one-fourth of the picture, after applying the affine, the picture will multiply a 4 to each data in the matrix and give a similar ratio to the original dataset for recognition.

AFFIne_3ccc3b79af9f71cc22a80f84bfad02c3d3938900.jpg

Softmax shows probability distributions of a list of potential outcomes.

Softmax outputs the Softmax of inputs. This is used when you want to obtain probabilities in a categorisation problem or output values ranging from 0.0 to 1.0 that sum up to 1.0.

The data after passing via the model will give a weight, after multiplying the weight with dot product, we are sent the result with the algorithm ox=exp(ix) / Σ_jexp(ij) then we will obtain the probability of the data directly proportional to the weight.

 

smax_e07306017538a501ac6be13362e67efd03ef356c.jpg

The next part will be to create the dataset.

Future development

Could the neural network console recognise a face? Yes! It would be interesting using this app as a starter.

By inputting different data sets, various results will be generated.

Other Articles in this Series

PART 1

Coming Soon

PART 3

PART 4

Brian0925 has not written a bio yet…
DesignSpark Electrical Logolinkedin