More layers
Contents
More layers¶
Two layers of neurons¶
In the previous chapter we have seen that the MCP neuron with the step activation function realizes the inequality
is the dividing line.
Imagine now that we have more such conditions: two, three, etc., in general

Fig. 7 Sample convex regions in the plane obtained, from left to right, with one inequality condition, and a conjunctions of 2, 3, or 4 inequality conditions, with the latter two yielding polygons.¶
Convex region
By definition, region
Clearly,
Indeed, let
if and only if all
Architectures of networks for

Fig. 8 Networks capable of classifying data in the corresponding regions of Fig. 7.¶
In the geometric interpretation, the first neuron layer represents the
The situation generalizes in an obvious way to data in more dimensions. In that case we have more black dots in the inputs in Fig. 8. Geometrically, for
Note
If there are numerous neurons
Important
A percepton with two neuron layers (with sufficiently many neurons in the intermediate layer) can classify points belonging to a convex region in
Three or more layers of neurons¶
We have just shown that a two-layer network may classify a convex polygon. Imagine now that we produce two such figures in the second layer of neurons, for instane as in the following network:

Note that the first and second neuron layers are not fully connected here, as we “stack on top of each other” two networks producing triangles, as in the third panel of Fig. 8. Next, in the third neuron layer (here having a single neuron) we implement a

Fig. 9 Triangle with a tringular hollow.¶
Generalizing this argument to other shapes, one can show an important theorem:
Important
A perceptron with three or more neuron layers (with sufficiently many neurons in intermediate layers) can classify points belonging to any region in
Note
It is worth stressing here that three layers provide full functionality! Adding more layers to a classifier does not increase its capabilities.
Feeding forward in Python¶
Before proceeding with an explicit example, we need a Python code for propagation of the signal in a general fully-connected feed-forward network. First, we represent the architecture of a network with
where
arch=[2,4,1]
arch
[2, 4, 1]
In the codes of this course we use the convention of Fig. 5, namely, the bias is treated uniformly with the remaining signal. However, the bias notes are not included in counting the numbers
plt.show(draw.plot_net(arch))

Here, black dots mean input, gray dots indicate the bias nodes carrying input =1, and the blue blobs are the neurons.
Next, we need the weights of the connections. There are
Thus, to store all the weights of a network we actually need three indices: one for the layer, one for the number of nodes in the preceding layer, and one for the number of nodes in the given layer. We could have used a three-dimensional array here, but since we number the neuron layers staring from 1, and arrays start numbering from 0, it is somewhat more convenient to use the Python dictionary structure. We then store the weights as
where
w={1:np.array([[1,2,1,1],[2,-3,0.2,2],[-3,-3,5,7]]),2:np.array([[1],[0.2],[2],[2],[-0.5]])}
print(w[1])
print("")
print(w[2])
[[ 1. 2. 1. 1. ]
[ 2. -3. 0.2 2. ]
[-3. -3. 5. 7. ]]
[[ 1. ]
[ 0.2]
[ 2. ]
[ 2. ]
[-0.5]]
For the signal propagating along the network we also correspondingly use a dictionary of the form
where
Next, we present the corresponding formulas in a rather painful detail, as this is key to avoid any possible confusion related to the notation.
We already know from (2) that for a single neuron with
With more layers (labeled with superscript
Note that the summation starts from
In the algebraic matrix notation, we can also write more compactly
As we already know very well, the output from a neuron is obtained by acting on its incoming input with an activation function. Thus we finally have
with the bias nodes set to one. The figure below illustrates the notation.

The implementation of the feed-forward propagation in Python is following:
def feed_forward(ar, we, x_in, f=func.step):
"""
Feed-forward propagation
input:
ar - array of numbers of nodes in subsequent layers [n_0, n_1,...,n_l]
(from input layer 0 to output layer l, bias nodes not counted)
we - dictionary of weights for neuron layers 1, 2,...,l in the format
{1: array[n_0+1,n_1],...,l: array[n_(l-1)+1,n_l]}
x_in - input vector of length n_0 (bias not included)
f - activation function (default: step)
return:
x - dictionary of signals leaving subsequent layers in the format
{0: array[n_0+1],...,l-1: array[n_(l-1)+1], l: array[nl]}
(the output layer carries no bias)
"""
l=len(ar)-1 # number of the neuron layers
x_in=np.insert(x_in,0,1) # input, with the bias node inserted
x={} # empty dictionary x
x.update({0: np.array(x_in)}) # add input signal to x
for i in range(1,l): # loop over layers except the last one
s=np.dot(x[i-1],we[i]) # signal, matrix multiplication
y=[f(s[k]) for k in range(arch[i])] # output from activation
x.update({i: np.insert(y,0,1)}) # add bias node and update x
# the output layer l - no adding of the bias node
s=np.dot(x[l-1],we[l]) # signal
y=[f(s[q]) for q in range(arch[l])] # output
x.update({l: y}) # update x
return x
For brevity, we adopt a convention where we do not pass the input bias node of the input in the argument. It is inserted inside the function with np.insert(x_in,0,1). As usual, we use np.dot for matrix multiplication.
Next, we test how feed_forward works on a sample input.
xi=[2,-1]
x=func.feed_forward(arch,w,xi,func.step)
print(x)
{0: array([ 1, 2, -1]), 1: array([1, 1, 0, 0, 0]), 2: [1]}
The output from this network is obtained as
x[2][0]
1
Digression on linear networks¶
Let us now make the following observation. Suppose we have a network with a linear activation function
or, in the matrix notation,
Iterating this, we get for the signal in the output layer
where
Note
For that reason, it does not make sense to consider multiple layer networks with linear activation function.
Visualization¶
For visualization of simple networks, in the draw module of he neural library we provide some drawing functions which show the weights, as well as the signals. Function plot_net_w draws the positive weights in red and the negative ones in blue, with the widths reflecting their magnitude. The last parameter, here 0.5, rescales the widths such that the graphics looks nice. Function plot_net_w_x prints in addition the values of the signal leaving the nodes of each layer.
plt.show(draw.plot_net_w(arch,w,0.5))

plt.show(draw.plot_net_w_x(arch,w,0.5,x))

Classifier with three neuron layers¶
We are now ready to explicitly construct an example of a binary classifier of points in a concave region: a triagle with a triangular hollow of Fig. 9. The network architecture is
arch=[2,6,2,1]
plt.show(draw.plot_net(arch))

The geometric conditions and the corresponding weights for the first neuron layer are
inequality condition |
||||
---|---|---|---|---|
1 |
-0.1 |
1 |
0 |
|
2 |
-0.1 |
0 |
1 |
|
3 |
1 |
-1 |
-1 |
|
4 |
-0.25 |
1 |
0 |
|
5 |
-0.25 |
0 |
1 |
|
6 |
0.8 |
-1 |
-1 |
Conditions 1-3 provide boundaries for the bigger triangle, and 4-6 for the smaller one contained in the bigger one. In the second neuron layer we need to realize two AND gates for conditions 1-3 and 4-6, respectively, hence we take
1 |
-1 |
0.4 |
0.4 |
0.4 |
0 |
0 |
0 |
2 |
-1 |
0 |
0 |
0 |
0.4 |
0.4 |
0.4 |
Finally, in the output layer we take the
1 |
-1 |
1.2 |
-0.6 |
Putting all this together, the weight dictionary is
w={1:np.array([[-0.1,-0.1,1,-0.25,-0.25,0.8],[1,0,-1,1,0,-1],[0,1,-1,0,1,-1]]),
2:np.array([[-1,-1],[0.4,0],[0.4,0],[0.4,0],[0,0.4],[0,0.4],[0,0.4]]),
3:np.array([[-1],[1.2],[-0.6]])}
Feeding forward a sample input yields
xi=[0.2,0.3]
x=func.feed_forward(arch,w,xi)
plt.show(draw.plot_net_w_x(arch,w,1,x))

We have just found that point [0.2,0.3] is within our region (1 from the output layer). Actually, we we have more information from the intermediate layers. From the second neuron layer we know that the point belongs to the bigger triangle (1 from the lower neuron) and does not belong to the smaller triangle (0 from the upper neuron). From the first neuron layer we may read out the conditions from the six inequalities.
Next, we test how our network works for other points. First, we define a function generating a random point in the square
def po():
xi=[np.random.random(),np.random.random()] # random point from the [0,1]x[0,1] square
x=func.feed_forward(arch,w,xi) # feed forward
return [xi[0],xi[1],x[3][0]] # the point's coordinates and label
samp=np.array([po() for _ in range(10000)])
print(samp[:5])
[[0.71435539 0.05281912 0. ]
[0.54490285 0.59347251 0. ]
[0.95485103 0.77741598 0. ]
[0.71722458 0.42208055 0. ]
[0.54892879 0.62149029 0. ]]

We can see that our little machine works perfectly well!
At this point the reader might rightly say that the preceding results are trivial: in essence, we have just been implementing some geometric conditions and their conjunctions.
However, as in the case of single-layer networks, there is an important argument against this apparent triviality. Imagine again we have a data sample with labels, and only this, as in the example of the single MCP neuron of chapter MCP Neuron. Then we do not have the dividing conditions to begin with and need some efficient way to find them. This is exactly what teaching of classifiers will do: its sets the weights in such a way that the proper conditions are implicitly built in. After the material of this chapter, the reader should be convinced that this is perfectly possible and there is nothing magical about it! In the next chapter we will show how to do it.
Exercises¶
Design a network and run the code from this lecture for a convex region of your choice.
Design and program a classifier for four categories of points belonging to regions formed by 2 intersecting lines (hint: include four output cells).