Kohonen Self-Organizing Maps

Self-organizing maps are one of the types of neural network algorithms. The main difference of this network learned by the back-propagation algorithm is that it is unsupervised learning, so the result depends solely on the structure of input data. The neural networks of this type are often used to solve various problems from recovering missing data to analyzing data and finding patterns.

The algorithm of self-organizing maps (SOM) is one of the clustering types of multidimensional vectors. An important distinction of the SOM algorithm is that all the neurons (nodes, class centers) are structured (usually in a two-dimension grid). During learning not only the winner neuron is modified but also its neighbors (to a smaller degree). Due to this, the SOM algorithm can be considered one of the methods of projecting the multidimensional space into a lower dimension space. When this algorithm is used, vectors similar in the original space are also close on the output map.

The SOM algorithm uses the ordered structure of neurons. Usually one or two-dimension grids are used. Each neuron is a n-dimensional column vector:

,

where n is defined by the dimensions of the original space (input vectors dimensions). Using one or two-dimension grids is caused by the problems of displaying spatial structures of higher dimensions (however, there are problems with reducing to two dimensions on the screen).

Typically, neurons are located in the nodes of a two-dimension grid with rectangular or hexagonal cells. Also, as mentioned above, neurons interact with each other. The magnitude of this interaction is defined by the distance between neurons on the map.

The number of neurons in the grid determines the level of detail of the algorithm result and affects the accuracy of the generalization capacity of the map.

Map Initialization

When the SOM algorithm is implemented, the grid configuration (rectangular or hexagonal) and the number of neurons in the network are predefined. In some cases, it is recommended to use the maximum number of neurons on the map. The initial radius of learning to a great extent affects the generalization capacity using the obtained map. If the number of map nodes exceeds the number of examples in the learning sample, the algorithm success to a large extent depends on the appropriately selected initial radius of learning. If map size is tens of thousands of neurons, the map learning takes too much time for solving practical problems. Thus, reaching an acceptable compromise is necessary when choosing the number of nodes.

Before the map learning, initialize the weight coefficients of neurons. The appropriately selected initialization method can significantly accelerate learning and lead to more qualitative results. There are three ways of initiate weights:

Learning

Learning is a sequence of vector (neurons) corrections. At each step of learning, one vector is randomly selected from the source dataset to match with the most similar neuron coefficient vector. Only one winner neuron that is most similar to the input vector is selected. The similarity in this case is the distance between vectors calculated in the Euclidean space. Thus, if the winner neuron is denoted as c, get:

Once the winner neuron is found, the weights of the neural network are corrected. The vector that describes the winner neuron and the vectors that describe its neighbors in the grid are moved towards the input vector.

To modify the weight coefficients, use the following formula:

,

where t is the epoch number (discrete time). The x(t) vector is randomly selected from the training sample in the t iteration. The h(t) function is named the neuron neighborhood function. This function is a non-increasing function of time and distance between the winner neuron and the neighboring neurons in the grid. This function is divided into two parts: the function of distance and the function of learning rate over time.

Usually, one of the two functions of distance is used:

The best result is obtained using the Gaussian function of distance. It is a function decreasing with time. Usually this value is named the radius of learning, which size is selected large enough at the initial stage of learning and is gradually decreased until one winner neuron is left. The most frequently used function is linearly decreasing with time.

Consider the learning rate function a(t). This is also a function decreasing with time. The most commonly used types of this function: linear and inversely proportional to the time of the form:

,

where A and B are constants. When this function is used, all the vectors from the learning sample make approximately equal contributions to the learning outcome.

The learning consists of two main phases: at the initial stage, the learning rate and radius values are selected large enough to distribute the neuron vectors to match the sample distribution. When the learning rate parameter values are much less than at the initial stage, the weights are fine tuned. In case the linear initialization is used, the initial stage of rough tuning can be skipped.

See also:

Library of Methods and Models | Detecting Categories | ISmSelfOrganizingMap