Using Modified S Transform For Data Mining Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract: This paper presents a new approach for power signal time series data mining using S-transform based K-means clustering technique. Initially the power signal time series disturbance data are pre-processed through an advanced signal processing tool such as S-transform and various statistical features are extracted, which are used as inputs to the K-means algorithm for disturbance event detection. Particle Swarm Optimization (PSO) technique is used to optimize cluster centers which can be inputs to a decision tree for pattern classification. The proposed hybrid PSO-K-Means clustering technique provides accurate classification rates even under noisy conditions compared to the existing techniques, which show the efficacy and robustness of the proposed algorithm for time varying database like the power signal data.

Key Words: K-means clustering, Time-varying power signal data, Particle swarm optimization, S-transform, decision tree, classification.

1. Introduction

Time series clustering has been very effective in providing pertinent and useful informations in various application domains as a part of temporal data mining research. Time series data are of interest because of its pervasiveness in various areas ranging from science, engineering, business, finance, economics, healthcare, etc. The goal of clustering is to identify the structure of an unlabeled data set by objectively organizing data into homogenous groups where the within-group-object similarity is minimized and between the groups object-dissimilarity is maximized. Clustering is required when no labeled data whether is binary, categorical, numerical, textual, interval, spectral, temporal or multimedia or mixture of the above data types is available. Data are termed as static if all their feature values do not change with time or change negligibly. The bulk of clustering analysis has been applied to static data. Cluster analysis has become an important technique in exploratory data analysis, pattern recognition, machine learning, neural computing, and other engineering system studies. The clustering aims at identifying and extracting significant groups in underlying data.

Unlike static data, the features of the time series comprise values that change with time. From a given unlabelled time series data streams, it is often desirable to determine the groups of data that belong to a similar time series. The unlabelled time series data could be obtained by monitoring a process or more than a process over a certain period of time. Just like the static data clustering, time series data clustering requires a clustering algorithm or method to form clusters from a given set of unlabelled data objects and the clustering algorithm is chosen depending on the type of data available and its area of application [1-7]. Various algorithms have been developed to cluster different types of time series data. In order to apply the clustering algorithms, it is necessary to convert the time series data to the static one and modify the similarity or distance measures for clustering. This is known as feature based approach and is accomplished by processing the raw time series data using a time-frequency transform such as modified S-transform. The raw time series database comprises the power line disturbance data recorded or simulated using MATLAB software. The various power line disturbances include voltage sag, swell, impulsive, and oscillatory transients, multiple notches, momentary interruption, harmonics, and voltage flickers, etc. In order to improve the quality of electrical power supplied to customers, it is customary to continuously record the disturbance waveforms using power-monitoring instruments. Unfortunately most of these recorders rely on visual inspection of data record creating an unprecedented volume of data to be inspected by engineers. Thus an automatic recognition or classification of the voluminous data is required for improving the quality of electricity supply.

Although wavelet multiresolution analysis combined with a large number of neural networks provides efficient classification of power quality (PQ) events, the time domain featured disturbances such as sags, swells, etc. may not easily be classified [8],[9]. In addition if an important disturbance frequency component is not precisely extracted by wavelet transform which consists of octave band pass filters the classification accuracy may also be limited. This paper, therefore, presents a generalized discrete S-transform [10] with modified Gaussian window derived from STFT for the detection, localization of power line disturbance signal time series database. The S-transform with modified Gaussian window or Modified S-transform (MST) is essentially a variable window STFT whose window width varies inversely with the frequency. The MST produces a time-frequency representation of a time varying signal by uniquely combining the frequency dependent resolution with simultaneously localizing the real and imaginary spectra. Since the MST provides the local spectrum of a signal the time averaging of the local spectrum gives the Fourier spectrum. The MST of a Power line disturbance signal time series data provides contours which closely resemble the disturbance patterns unlike the wavelet transform and hence the features extracted from it are very suitable for developing highly efficient and accurate classification scheme. Further the MST analysis of time varying signal data yields all of the quantifiable parameters for localization, detection and quantification of signals comprising a time series database.

After the features are extracted from the time-series data using a Fast discrete S-transform, a clustering analysis is used to group the data into clusters and thereby identifying the class of the data. The well known Fuzzy C-means Algorithm [12] is commonly used for data clustering but suffers from trial and error choice of the initial cluster centers and also the noise present in the original time series data. In addition to this the traditional Fuzzy C-means Algorithm gets stuck in the local minima and, therefore, does not offer robustness in clustering the features. In order to overcome this problem a simpler hybrid K-Means clustering algorithm along with Particle swarm optimization (PSO) technique [13-16] is used for clustering the features into distinct groups that may be used further for classification. Particle swarm optimization (PSO) is a population-based algorithm that simulates bird flocking or fish schooling behavior to achieve an optimized cluster centre using an objective function. Unlike other evolutionary learning algorithms, PSO needs smaller parameters to decide. PSO can be easily implemented, and has stable convergence characteristic with good computational efficiency. However, to improve the clustering performance of PSO further, a hybrid GPAC-Adaptive PSO, which is more effective and capable of solving non-linear optimization problems faster and with better accuracy in detecting the global best solution is used in this paper. The main concept of this development is based on the passive congregation and a kind of coordinated aggregation observed in the swarms known as GPAC-PSO and modifying it further to yield GPAC-Adaptive PSO or simply to be known as GPAC-APSO.

This paper is organized as follows: Section-2 presents the modified Discrete S-transform to extract features from raw time varying power signal data and its implementation is described in Section-3. In Section-4 the K-Means clustering is described and the cluster centre optimization is described in Section 5. This section also describes the GPAC-APSO for improving the clustering performance of K-Means-PSO. Finally Section-6 gives the computational results and conclusion is presented in Section-7.

Modified Discrete S-Transform For Feature Extraction

The short-time Fourier transform of a signal is given by


where and f denote the time and frequency is the main signal and the position of the translating window is determined by, which has the same value as t. An alternate expression for (1) is obtained through the use of convolution theorem as


where W is the Fourier transform of w, and the convolution variable  has the same dimension as f. One of the disadvantages of STFT is the fixed width and height of the window. This causes misinterpretation of signal components with periods longer than the window width and the width that works well with low frequency components can not detect the high frequency components. The S-transform is a time localized Fourier transform and has a window whose width and height vary with frequency. Although the theory of S-Transform has been presented in references [9], [10] and originally by Stockwell etal. [8], some of the equations and computational steps are outlined below:

The general window function w(t, f) is chosen as

, and (3)

The window is normalized as


Here  is normally set to a value 0.2 for best overall performance of S-Transform where the contours exhibit the least edge effects and for computing the highest frequency component of very short duration oscillatory transients  is made equal to 5. The S-Transform performs multiresolution analysis on the signal, because the width of its window varies inversely with frequency. This gives high time resolution at high frequencies and on the other hand, high frequency resolution at low frequencies.

The expression for S-transform is given by


Here the window function is used as given in (3).

An alternative formulation for S-transform is given by


The S-transform has an advantage in that it provides multiresolution analysis while retaining the absolute phase of each frequency. This has led to its application for detection and interpretation of events in time series in a variety of disciplines. In this expression of S- transform, the scalable Gaussian window (the product of and the real exponential) localizes the complex Fourier sinusoid, giving the S-transform analyzing function (the term in braces in (5)). The CWT provides time resolution by translating its whole analyzing function (the wavelet) along the time axis. The S-transform is different because only the amplitude envelope of the analyzing function (the window) translates; the oscillations are given by the fixed Fourier sinusoid, which does not depend on. Since the local oscillatory properties of the analyzing function determine the phase of the local spectrum, the S-transform can be considered as having "fixed" phase reference. The fixed-phase reference gives the S-transform some advantages over wavelet transform.

The inverse S-transform is given by :

and is complex and it is represented as (7)

where is the amplitude of the S-spectrum and is the phase of the S-spectrum. The phase of S-spectrum is an improvement on the wavelet transform in that the average of all the local spectra does indeed give the same result as the Fourier transform.

The discrete version of the S-Transform of a signal is obtained as


where X[m + n] is obtained by shifting the discrete Fourier Transform (DFT) of x(k) by n, X[m] being given ,and , j, m and n = 0, 1, ….., N-1

Another version of the discrete S-Transform used for computation


for m = 1, 2, …, M and n = 0, 1, 2, …N/2

where M is the number of data points of the signal x[m], N is the width of the window, the signal vector x[m] is padded at the beginning or at the end with 0.

Implementation of S-Transform (ST)

The computation of the S-Transform is efficiently implemented using the convolution theorem and FFT. The following steps are used for the computation of S-Transform:

Compute the DFT of the signal x(k) using FFT software routine and shift spectrum X[m] to X[m+n].

Compute the gaussian window function for the required frequency n.

Compute the inverse Fourier Transform of the product of DFT and Gaussian window function to give the ST matrix.

The output of the S-transform is an nï‚´m matrix, whose rows pertain to frequency and columns indicate time. Each column thus represents the "local spectrum" for that point in time. From the ST matrix we obtain the frequency-time contours having the same amplitude spectrum and these contours can be used to visually classify the nature of the signal and its change of frequency. and the harmonic content if any during the distortion.

Although there are several algorithms available for time series data clustering, the simple and widely used ones are the K-means and Fuzzy C-means approaches. Other algorithms include self-organizing maps, similarly distance measures, short-time series distance measures, probability based methods, linear predictive coding etc. The main idea behind the K-means algorithm is the minimization of an objective function, which is normally chosen to be the total distance between all patterns from their respective cluster centers. The usual approach in arriving at a solution is to start with arbitrary cluster centers and then proceed iteratively. The distribution of feature values amongst clusters and updating cluster centers are the two main steps of the K-means algorithm. The clustering algorithm alternates between these two steps until the value of the objective function becomes minimum. The objective function of the K-means is not convex and hence it may contain local minima. Consequently, while minimizing the objective function, there is possibility of getting stuck at local minima (also at local maxima and saddle point). The performance of the K-means algorithm depends on the initial choice of the cluster centers. Besides, the Euclidean norm is sensitive to noise or outliers. Hence K-means algorithm should be affected by noise and outliers. To overcome the problems faced by GA and Simulated annealing, we provide an evolutionary-based clustering method by PSO. It has the higher ability to find the near-optimal solutions in the search space.

Particle swarm optimization (PSO) is a population-based algorithm. This algorithm simulates bird flocking or fish schooling behavior to achieve a self evolution system. It can automatically search the optimum solution in the solution space. But the searching process isn't randomness. According to the different problems, it decides the searching way by the fitness function. Unlike other evolutionary learning algorithms, PSO needs smaller parameters to decide. PSO can be easily implemented, and has stable convergence characteristic with good computational efficiency. This paper presents an evolutionary particle swarm optimization (PSO) learning-based method to optimally cluster N data points into K clusters. The main concept of this development is based on the passive congregation GPAC-PSO and a kind of coordinated aggregation observed in the swarms. As with GA, GPAC-PSO also has premature convergence and thus results in an inaccurate solution.

K-Means Clustering

In the field of clustering, K-means algorithm is the most popularly used algorithm to find a partition that minimizes mean square error (MSE) measure. Clustering analysis is a technology, which can classify the similar sample points into the same group from a data set. It is a branch from multi-variable analysis and unsupervised learning rule in the pattern recognition. For space S which has the K groups and the N points {x1, x2, …, xN}, the definition of the clustering analysis is as follows: In many clustering techniques, the K-means (generally called hard C-means) algorithm is one of well-known hard clustering techniques. It can allocate the data point xi to the closest cluster center zj by using Euclidean distances:


A good method will cluster data set X = {x1, x2, …,xN} into K well partitions with 2£ï€ K£ï€ N-1. When we have an unlabelled data set, it is very important to define an objective function for a clustering analysis method. Intuitively, each cluster shall be as compact as possible. Thus, the objective function of the K-means algorithm is created with the Euclidean norm given by:


where zj is the jth cluster center. The necessary condition of the minimum J is


where Nj is the number of points belonging to cluster Cj.

The K-means clustering process can be stopped when any one of the following criteria are satisfied: when the maximum number of iterations has been exceeded, when there is little change in the centroid vectors over a number of iterations, or when there are no cluster membership changes.

Particle Swarm Optimization

Particle swarm optimization (PSO) is a population-based stochastic search process, modeled after the social behavior of a bird flock. The algorithm maintains a population of particles, where each particle represents a potential solution to an optimization problem. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function. Each particle represents a position in Nd dimensional space, and is "flown" through this multi-dimensional search space, adjusting its position towards both. For a particle moving in a multidimensional search space, let and denote the position of the ith particle in then jth dimension and velocity at time t.

Using the above notation, a particle's position is adjusted according to



where w is the inertia weight c1 and c2 are the acceleration constants, , and k=1,…,Nd. The velocity is thus calculated based on three contributions: (1) a fraction of the previous velocity, (2) the cognitive component which is a function of the distance of the particle from its personal best position, and (3) the social component which is a function of the distance of the particle from the best particle found thus far (i.e. the best of the personal bests).

The personal best position of particle i is calculated as


5.1. PSO Clustering

In the context of clustering, a single particle represents the Nc cluster centroid vectors. That is, each particle xi is constructed as follows:


where mi1 refers to the j-th cluster centroid vector of the i-th particle in cluster Cij. Therefore, a swarm represents a number of candidates clustering for the current data vectors. The fitness of particles is easily measured as the quantization error,


where d is defined in equation (3.2), and is the number of data vectors belonging to cluster Cij.

The next section first presents a standard gbest PSO for clustering data into a given number of clusters and then shows how K-means and the PSO algorithm can be combined to further improve the performance of the PSO clustering algorithm. Hybrid PSO algorithm consistently performs better than the other two approaches with an increase in the number of clusters. The K-means algorithm exhibited a faster, but premature convergence to a large quantization error, while the PSO algorithms and slower quantization errors. Again, we proceed to the effort of developing more effective PSO algorithms by reflecting recent advances in swarm intelligence and, in addition, by introducing new concepts. Under these conditions, one new hybrid PSO algorithm is proposed, which is more effective and capable of solving non-linear optimization problems faster and with better accuracy in detecting the global best solution. The main concept of this development is based on the passive congregation and a kind of coordinated aggregation observed in the swarms known as GPAC-PSO and modifying it further to yield GPAC-Adaptive PSO or simply to be known as GPAC-APSO.

Congregation, on the other hand, is a swarming by social forces, which is the source of attraction of particle to others and is classified in two types: social and passive. Social congregation usually happens when the swarm's fidelity is high, such as genetic relation. Social congregation necessitates active information transfer e.g., ants that have high genetic relation use antennal contacts to transfer information about location of resources. We do not consider social congregation in this paper because it frequently displays a division of labor. Finally, passive congregation is an attraction of a particle to other swarm members, where there is no display of social behavior since particles need to monitor both environment and their immediate surroundings such as the position and the speed of neighbors. Such information transfer can be employed in the passive congregation. The global variant-based passive congregation PSO (GPAC) is enhanced with the construction factor approach.

The swarms of the enhanced GPAC is manipulated by the velocity update


where i = 1, 2,…, N; c1, c2, and c3 are the cognitive, social, and passive congregation parameters, respectively; rand1, rand2, and rand3 are random numbers uniformly distributed within [0,1]; Pi is the best previous position of the ith particle; Pk is either the global best position ever attained among all particles in the case of enhanced GPAC and Pr is the position of passive congregator (position of a randomly chosen particle-r). The position of the ith particle in the n-dimensional decision space is limited by the minimum and maximum and maximum positions expressed by vectors. Here, the minimum and maximum position vectors express the inequality constraints. The velocities of the ith particle in the n-dimensional decision space are limited by, where the maximum velocity in the lth dimension of the search space is proposed as


where and are the limits in the l-dimension of the search space. The maximum velocities are constructed in small intervals in the search space for better balance between exploration and exploitation. Nr is a chosen number of search intervals for the particles. It is an important parameter in the enhanced GPAC algorithms. A small Nr facilitates global exploration (searching new areas), while a large one tends to facilitate local exploration (fine tuning of the current search area). A suitable value for the Nr usually provides balance between global and local exploration abilities and consequently results in a reduction of the number of iterations required to locate the optimum solution.

The basic steps of the enhanced GPAC are listed below:

Step-1. Generate a swarm of N-particles with uniform probability distribution, initial positions , and velocities , and initialize the random parameters. Evaluate each particle-i using objective function f (e.g., f to be minimized).

Step-2. For each particle-i, calculate the distance between its position and the position of all other particles: where and are the position vectors of particle-i and particle-j, respectively.

Step-3. For each particle-i, determine the nearest particle, particle-k, with better evaluation than its own, i.e., , and set is as the leader of particle-i.

In the case of enhanced GPAC, particle-r and set it as the global best.

Step-4. For each particle-I, randomly select a particle-r and set it as passive congregator of particle-i.

Step-5. Update the velocities and positions of particles using equations (13) (14), and (15), respectively.

Step-6. Check if the limits of positions and velocities are enforced. If the limits are violated, then they are replaced by the respective limits.

Step-7. Evaluate each particle using the objective function f.

Step-8. If the stopping criteria are not satisfied, go to Step 2.


he enhanced GPAC algorithms will be terminated if one of the following criteria is satisfied:

no improvement of the global best in the last 30 generations is observed, or (2) the maximum number of allowed iterations is achieved (in this chapter).


PSO and GPAC-APSO based K-means Clustering Algorithm for

Time Series Pattern Classification

The hybrid K-means PSO algorithm is applied to the time series data obtained from an electricity distribution network and the feature clusters are found to clearly classify the disturbance event of the time series. After the complex S-matrix is obtained by using the procedure outlined earlier, the feature vectors F1, F2, F3, and F4 required for clustering are chosen from the sets of features described in Table-1. These features are used to classify the time series data into several classes by using a hybrid clustering technique based on PSO and K-means algorithms. The K-means algorithm tends to converge faster (after less function evaluations) than the PSO, but usually with a less accurate clustering. This section shows that the performance of the PSO clustering can further be improved by sending the initial swarm with the result of the K-means algorithm. The hybrid algorithm first executes K-means algorithm once. In this case the K-means clustering is terminated when the maximum number of iterations is exceeded, or when the average change in centroid vectors is less that that 0.0001 (a user specified parameter). The result of the K-means algorithm is then used as one of the particles, while the rest of the swarm is initialized randomly. The extracted features are extracted from S-transform as

1. = max (A)+min (A)-max (B)-min (B). (20)

where A is the amplitude versus time graph from the S-matrix under disturbance and B is the amplitude versus time graph of the S-matrix without disturbance.

2. = Standard deviation of the magnitude versus time spectrum obtained from ST matrix

= . (21)

3. = energy in the S-transform output = (22)

4. F4 = Total harmonic distortion (THD) = (23)

where N is the number of points in the FFT, the value of the nth harmonic component of the FFT.

F= estimated frequency at the maximum amplitude obtained from ST matrix.

Table- 1 Features extracted from S-transform with noise













Sag (60%)






Swell (50%)






Momentary Interruption (MI)












Sag with Harmonic (60%)






Swell with Harmonic (50%)






Flicker (4%, 5 Hz)






Notch + harmonics






Spike + harmonics






Transient (low frequency)






Transient (high frequency)







Computational Results for PSO Optimized K-means algorithm

Thus section compares the results of the K-means, PSO and Hybrid clustering algorithms on six time series events. The main purpose is to compare the quality of the respective clustering, where quality is measured according to the following three criteria:

the quantization error

the intra-cluster distances, i.e. the distance between data vectors within a cluster, where the objective is to minimize the intra-cluster distances;

the inter-cluster distances, i.e. the distance between the centroids of the clusters, where the objective is to maximize the distance between clusters.

The latter two objectives, respectively, correspond to crisp, compact clusters that are well separated.

For all the results reported, an average over 30 simulations is given. All algorithms are run for 1000 function evaluations, and the PSO algorithms and 20 particles are chosen for PSO based clustering. The parameters chosen for PSO are: w=0.72 and c1=c2=1.49 to ensure good convergence.

Fig.2 shows the plot of the features F1 versus F2 showing clearly the clusters belonging to steady disturbance patterns like voltage sag (or dip), voltage swell (rise), harmonics and normal signal time series, and transient disturbance patterns like oscillatory transient, notches, and spikes. From Fig.2, it can be seen that the K-means clustering although clusters, there may be a number of misclassifications, while GPAC-APSO is found to produce clear cluster centers separating the above mentioned disturbance patterns in comparison to K-Mean and is shown in Fig.3. The fitness function of K-Means and different PSO variants described in Fig.4 shows that the GPAC-APSO produces greater convergence in comparison to simply K-means algorithm.

Similar plots for the features F2 and F4 clusters relate to the presence or absence of harmonics, and fitness values, respectively. The fitness function converges to a low quantization error rapidly in the case of both K-Means and GPAC-APSO algorithms. Cluster 1 belongs to power signal time series data, when the % of harmonics present in the data is < 5% of the fundamental component. When the harmonic content is more than 7 to 10 % of the fundamental component, the cluster-2 becomes prominent and shows such data points. However, K-Means does not produce a very clear boundary between different patterns while GPAC-APSO seems to be very effective in separating the clusters with a greater clarity in comparison to other PSO variants.


An evolutionary clustering technique has been developed by hybridizing the K-means algorithm and PSO for time-series clustering of power signal disturbance data. It can be considered as a viable and an efficient heuristic to find optimal or near-optimal solutions to clustering problems of allocating N data points to K clusters. The proposed method is very efficient and simple to implement for clustering analysis when the number of clusters is known a priori. According to the simulation results of the proposed approach to a number of time series data sets occurring in power network disturbances with different geometrical properties, it is clear that the proposed approach is more robust than the traditional clustering analysis algorithms. Several variants of PSO have been also presented in this chapter and the modified GPAC-PSO known as GPAC-APSO is found to be more efficient and has a quick convergence property in comparison to other PSO variants.