ML Notes: Week 4 – Neural Networks – Representation

1. Model representation

1.1 Neural network model

在这里插入图片描述The typical neuron has input wires which called the dendrites and also has an output wire called an Axon. The nucleus is considered as the computational unit. We could simplify the model as follows:
在这里插入图片描述
Terms: a neuron or an artificial neuron with a sigmoid or logistic activation function

1.2 Some notations in the neural networks

在这里插入图片描述
layer 1: Input layer
layer 2: Hidden layer (the rest layers all could be called hidden layer)
layer 3: Output layer

  • x,Θx, \Theta

    x,Θ are parameter vectors. In addition, the

    Θ\Theta

    Θ also is called as weights.

  • Θin(j)\Theta_{in}^{(j)}

    Θin(j)? = matrix of weights mapping from layer

    jj

    j in layer

    j+1j+1

    j+1. If a network has

    sjs_j

    sj? units in layer

    jj

    j and has

    sj+1s_{j+1}

    sj+1? units in layer

    j+1j+1

    j+1, then

    Θ(j)=sj+1?(sj+1)\Theta^{(j)} = s_{j+1}*(s_j+1)

    Θ(j)=sj+1??(sj?+1).

  • ai(j)a_i^{(j)}

    ai(j)? = “activation” of unit

    ii

    i in layer

    jj

    j.

a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))\begin{aligned}
a_1^{(2)}& = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \\
a_2^{(2)} &= g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\
a_3^{(2)} &= g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \\
\newline h_\Theta(x) = a_1^{(3)} &= g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end{aligned}

a1(2)?a2(2)?a3(2)?hΘ?(x)=a1(3)??=g(Θ10(1)?x0?+Θ11(1)?x1?+Θ12(1)?x2?+Θ13(1)?x3?)=g(Θ20(1)?x0?+Θ21(1)?x1?+Θ22(1)?x2?+Θ23(1)?x3?)=g(Θ30(1)?x0?+Θ31(1)?x1?+Θ32(1)?x2?+Θ33(1)?x3?)=g(Θ10(2)?a0(2)?+Θ11(2)?a1(2)?+Θ12(2)?a2(2)?+Θ13(2)?a3(2)?)?*

gg

g is sigmoid/logistic activation function.

1.3 Forward propagation nueral network

The process of computing the activations, shown in the above figure, from the input then the hidden then the output layer, and that’s also called forward propagation

Now, we will vectorize the model. We difine

z1(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3z2(2)=Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3z3(2)=Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3\begin{aligned}
z_1^{(2)}&=\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3 \\
z_2^{(2)}&=\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3\\
z_3^{(2)}&=\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3
\end{aligned}

z1(2)?z2(2)?z3(2)??=Θ10(1)?x0?+Θ11(1)?x1?+Θ12(1)?x2?+Θ13(1)?x3?=Θ20(1)?x0?+Θ21(1)?x1?+Θ22(1)?x2?+Θ23(1)?x3?=Θ30(1)?x0?+Θ31(1)?x1?+Θ32(1)?x2?+Θ33(1)?x3??
we can rewrite it as

z(2)=[z1(2)z1(2)z1(2)]T=Θ(1)xz^{(2)} =[z_1^{(2)} z_1^{(2)} z_1^{(2)}]^T= \Theta^{(1)} x

z(2)=[z1(2)?z1(2)?z1(2)?]T=Θ(1)x. If we treat

xx

x as

a(1)a^{(1)}

a(1), so

z(2)=Θ(1)a(1)z^{(2)} = \Theta^{(1)} a^{(1)}

z(2)=Θ(1)a(1) .
That is

z(j+1)=Θ(j)a(j)z^{(j+1)} = \Theta^{(j)} a^{(j)}

z(j+1)=Θ(j)a(j)

And

a1(2)=g(z1(2)),a2(2)=g(z2(2)),a3(2)=g(z3(2))a_1^{(2)} = g(z_1^{(2)}), a_2^{(2)} = g(z_2^{(2)}), a_3^{(2)} = g(z_3^{(2)})

a1(2)?=g(z1(2)?),a2(2)?=g(z2(2)?),a3(2)?=g(z3(2)?) could be written as

a(2)=g(z(2))a^{(2)} =g( z^{(2)})

a(2)=g(z(2)) .

For the above neural network model, if we take the input layer away, the model just like logistic function.
在这里插入图片描述
Logistic function:

hθ(x)=g(θ0+θ1x+θ2x2)h_\theta(x) = g(\theta_0+\theta_1x+\theta_2x_2)

hθ?(x)=g(θ0?+θ1?x+θ2?x2?)
The simplified neural network model:

hΘ(x)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))h_\Theta(x)=g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)} )

hΘ?(x)=g(Θ10(2)?a0(2)?+Θ11(2)?a1(2)?+Θ12(2)?a2(2)?+Θ13(2)?a3(2)?)

1.4 Other network architectures

在这里插入图片描述

2. How to compute a complex nonlinear function?

x1,x2{0,1}x_1,x_2 \in\{0,1\}

x1?,x2?∈{0,1}

2.1 AND

y=x1y = x_1

y=x1? AND

x2x_2

x2?
在这里插入图片描述

Θ(1)=[?302020]\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \end{bmatrix}

Θ(1)=[?30?20?20?]

2.2 OR

y=x1y = x_1

y=x1? OR

x2x_2

x2?
在这里插入图片描述

Θ(1)=[?102020]\Theta^{(1)} =\begin{bmatrix}-10 & 20 & 20 \end{bmatrix}

Θ(1)=[?10?20?20?]

2.3 NOT

y=y =

y= NOT

x1x_1

x1?
在这里插入图片描述

Θ(1)=[10?20]\Theta^{(1)} =\begin{bmatrix}10 & -20 \end{bmatrix}

Θ(1)=[10??20?]

2.4 (NOT

x1x1

x1) AND (NOT

x2x_2

x2?)

在这里插入图片描述

Θ(1)=[10?20?20]\Theta^{(1)} =\begin{bmatrix}10 & -20 & -20\end{bmatrix}

Θ(1)=[10??20??20?]

2.5 XNOR

y=(x1y = (x_1

y=(x1? AND

x2)x_2)

x2?) OR

((

((NOT

x1x1

x1) AND (NOT

x2x_2

x2?)

))

)
在这里插入图片描述

* we are able to put pieces together to generate some new functions.

3. Multi-class Classification

The ouput

yiy_i

yi? will be

[1000]\begin{bmatrix} 1 \\ 0\\ 0\\ 0\end{bmatrix}

?????1000?????? ,

[0100]\begin{bmatrix} 0 \\ 1\\ 0\\ 0\end{bmatrix}

?????0100?????? ,

[0010]\begin{bmatrix} 0 \\ 0\\ 1\\ 0\end{bmatrix}

?????0010?????? ,

[0001]\begin{bmatrix} 0 \\ 0\\ 0\\ 1\end{bmatrix}

?????0001?????? depending on what the corresponding input

XiX_i

Xi? is. And in this way, we could implement the multi-class Classification.