where
xst represents the measurement of the
tth descriptor. So a data set can be represented by the following matrix.
Here, N means the number of input vectors.
The initial weight vectors are determined based on the first and second principal components of the M-dimensional space by PCA. Weights in the first dimension (I) are arranged into lattices corresponding to a width that is five times the standard deviation (5δ1) of the first principal component. The second dimension (J) is defined by the nearest integer greater than (δ2/δ1) x I. The total number of weights in the first dimension I is set by a user. The weight vector on the ijth lattice (wij) is represented as follows:
Here xav is the average vector for oligonucleotide frequencies of all input vectors, and b1 and b2 are eigenvectors for the first and second principal components.
Step 2: Adaptation of weight vectors to the input vectors.
The minimum Euclidean distance of the input vector xk with respect to all weight vectors wij (i = 1,2,...,I; j = 1,2,...,J) is denoted by wi'j'. The input vector xk is classified into set Sijfor the lattice points (i, j) satisfying i'-β≤i≤i'+β(r) and j'-β≤j≤j'+β(r) . After classification of all input vectors to the lattice pointes (i, j), weight vectors are updated by
The two parameters α(r) and β(r) are learning coefficients for the rth cycle, and Nij is the number of components of Sij. α(r) and β(r) are calculated as follows:
α(r) = max {0.01, α(1)(1 - r/T)}
β(r) = max {0, β(1) - r}
Here, α(1) and β(1) are the initial values for the T-cycle of the learning process. The learning process is monitored by the total distance between xk and the nearest weight vector wi'j', represented as
Step 3: Classification of input vectors to weight vectors
Each of the input vectors is classified into lattice point whose distance is the minimum from the input vector.