1. Model representation
1.1 Neural network model
The typical neuron has input wires which called the dendrites and also has an output wire called an Axon. The nucleus is considered as the computational unit. We could simplify the model as follows:
Terms: a neuron or an artificial neuron with a sigmoid or logistic activation function
1.2 Some notations in the neural networks
layer 1: Input layer
layer 2: Hidden layer (the rest layers all could be called hidden layer)
layer 3: Output layer
-
x,Θ are parameter vectors. In addition, the
Θ also is called as weights.
-
Θin(j)? = matrix of weights mapping from layer
j in layer
j+1. If a network has
sj? units in layer
j and has
sj+1? units in layer
j+1, then
Θ(j)=sj+1??(sj?+1).
-
ai(j)? = “activation” of unit
i in layer
j.
a1(2)?a2(2)?a3(2)?hΘ?(x)=a1(3)??=g(Θ10(1)?x0?+Θ11(1)?x1?+Θ12(1)?x2?+Θ13(1)?x3?)=g(Θ20(1)?x0?+Θ21(1)?x1?+Θ22(1)?x2?+Θ23(1)?x3?)=g(Θ30(1)?x0?+Θ31(1)?x1?+Θ32(1)?x2?+Θ33(1)?x3?)=g(Θ10(2)?a0(2)?+Θ11(2)?a1(2)?+Θ12(2)?a2(2)?+Θ13(2)?a3(2)?)?*
g is sigmoid/logistic activation function.
1.3 Forward propagation nueral network
The process of computing the activations, shown in the above figure, from the input then the hidden then the output layer, and that’s also called forward propagation
Now, we will vectorize the model. We difine
z1(2)?z2(2)?z3(2)??=Θ10(1)?x0?+Θ11(1)?x1?+Θ12(1)?x2?+Θ13(1)?x3?=Θ20(1)?x0?+Θ21(1)?x1?+Θ22(1)?x2?+Θ23(1)?x3?=Θ30(1)?x0?+Θ31(1)?x1?+Θ32(1)?x2?+Θ33(1)?x3??
we can rewrite it as
z(2)=[z1(2)?z1(2)?z1(2)?]T=Θ(1)x. If we treat
x as
a(1), so
z(2)=Θ(1)a(1) .
That is
z(j+1)=Θ(j)a(j)
And
a1(2)?=g(z1(2)?),a2(2)?=g(z2(2)?),a3(2)?=g(z3(2)?) could be written as
a(2)=g(z(2)) .
For the above neural network model, if we take the input layer away, the model just like logistic function.
Logistic function:hθ?(x)=g(θ0?+θ1?x+θ2?x2?)
The simplified neural network model:hΘ?(x)=g(Θ10(2)?a0(2)?+Θ11(2)?a1(2)?+Θ12(2)?a2(2)?+Θ13(2)?a3(2)?)
1.4 Other network architectures
2. How to compute a complex nonlinear function?
x1?,x2?∈{0,1}
2.1 AND
y=x1? AND
x2?
Θ(1)=[?30?20?20?]
2.2 OR
y=x1? OR
x2?
Θ(1)=[?10?20?20?]
2.3 NOT
y= NOT
x1?
Θ(1)=[10??20?]
2.4 (NOT
x1) AND (NOT
x2?)
Θ(1)=[10??20??20?]
2.5 XNOR
y=(x1? AND
x2?) OR
((NOT
x1) AND (NOT
x2?)
)
* we are able to put pieces together to generate some new functions.
3. Multi-class Classification
The ouput
yi? will be
?????1000?????? ,
?????0100?????? ,
?????0010?????? ,
?????0001?????? depending on what the corresponding input
Xi? is. And in this way, we could implement the multi-class Classification.