This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
River Hoogly, considered as an important distributary of river Ganga, has been affected by indiscriminate discharging of polluted and untreated sewage-sludge and industrial waste into waterways. A Water quality index has been developed using five water quality parameters like: dissolved oxygen, biochemical oxygen demand, pH, total coliform and fecal coliforms at eight different stations along the river of Hoogly from 2002 to 2007. In this present study, the WQI was calculated by both DELPHI and CCME process. Thus these two methods reflect the quality of the water measured with respect to its pollution level. The relationships among the stations are highlighted by cluster analysis to characterize the WQI. The study represents a computer-simulated artificial neural network (ANN) model for evaluation the relationship between the different parameters of water quality sample collected at different stations along Hooghly river responsible for Water Quality measurement. Finally, both the water quality method (CCME and DELPHI) were statistically compared by the coefficient of determination (R2), root mean square error (RMSE) and absolute average deviation (AAD) based on the validation data set.
Keywords: ANN Model, CCME Method, Cluster Analysis, DELPHI Process, River Hoogly, Water Quality Index
The index of Water quality (Abbasi, 1998; Coulston and Mrak, 1977) is a numerical indicator of physical, chemical, biological or radiological condition of water sources and determining its quality before use for various purposes such as drinking water, agricultural, aquatic life, recreational and industrial water etc. (Carpenter et al., 1998; Jarvie et al., 1998; Sargaonkar and Deshpande, 2003). The environment we live in is polluted greatly by various biological activities. If the equilibrium among the activities of different organisms- plant, animal and micro-organism is seriously disturbed, new equilibrium have to be attained sooner that may not be favorable to human or the plant or animal. Water pollution is the contamination of water bodies from chemical, particulate or bacterial matter that affects the water's quality level.
Sewage, agricultural and industrial waste discharges are the main reason of water pollution. The River Hooghly, receiving the mixed domestic sewage and industrial treated effluents and untreated effluents from the cities on the banks is now the mostly polluted rivers of world; the water is even unhygienic for bathing and for drinking purposes. The contaminated water bodies may have undesirable colour, odour, taste, turbidity, organic matter contents, harmful chemical contents, toxic and heavy metals, pesticides, oily matters, industrial waste products, radioactivity, high Total Dissolved Solids (TDS), acids, alkalies, domestic sewage content, virus, bacteria, protozoa, rotifers, worms, etc. (Carpenter et al., 1998; Jarvie et al., 1998). The contaminated drinking water may also cause human health risk such as tumors, ulcers, skin disorders (Michael Hogan, 2010).
However, all available water sources are not suitable for all different purposes. Water quality index (WQI) has been build to assess the suitability of water for a variety of uses. A water quality index is a single number quantitative expression that provides overall water quality at a certain location and time based on several water quality parameters (Bordalo and Wiebe, 2006; Boyacioglu, 2007; Brown et al., 1970; Landwehr and Deininger, 1976; Pesce and Wunderlin, 2006; Sanchez et al., 2007; Sargaonkar and Deshpande, 2003; Singh and Anandh, 1996). Different techniques have been used in attempt to change complex water quality data into simpler information that is easy to understand and available by the public (Bennetts et al., 2006; Pulido-Leboeuf et al., 2003). This water quality index is a dimensionless values that ranking between 0 and 100 (Parmar and Parmar, 2010). A good water quality is represented by a higher index value of water quality (Cude, 2001Í¾ Pandey and Sundaram, 2002). However, a water index based on some very important parameters that can provides a simple measurement of water quality. It gives the public a general idea the possible problems with the water in the region. There are several water quality parameters to include in the index (APHA, AWWA, WPCF, 1976; AWWA 1971; Hooda and Kaur, 1999; Kannel et al., 2007; Karia and Christian, 2001; Metcalf and Eddy, 1992; Ramalho, 1983). These parameters are:
dissolved oxygen (DO)
biochemical oxygen demand (BOD)
2. Materials and methods
2.1. Study sites and collected data
In order to determine the water quality index, water samples were typically collected on a monthly basis across the river width of Hooghly at all the eight sites (Berhampore, Palta, Srirampore, Howrah (Shibpur), Garden Reach, Dakhshineswar, Uluberia, Dimond Harbour) during the study period (2002-2008). Among 19 total water quality parameters, 5 of them were selected for analysis the water quality measurement. The five selected parameters were pH, dissolved oxygen, biochemical oxygen demand, total coliform, fecal coliform.
Several methods have been introduced in the past to generate a suitable WQI method. Each method had some certain advantages and some disadvantages also.
2.2. Calculation of CCME Water Quality Index (CCME 2001)
Each water quality index (indicator) was calculated using methods developed by Canadian Council of Ministers of the Environment (CCME, 2001) based on three different measurements of water quality: SCOPE (F1), FREQUENCY (F2), and AMPLITUDE (F3) results as, (Eq. 1).
WQI (indicator) = (1)
For each indicator, the grading scale followed the "ranking" scale is used five categories or levels that corresponded to specific levels of water quality which is shown in Table 1.
Table 1: Grading Scale used for the water quality indicator
Where, F1 (scope) describes the extent of quality guideline non compliance over the time period of interest and it was calculated as
where the failed variables indicate the water quality variables with objectives which are tested during the time period for the index calculation.
F2 (frequency) represents the percentage of individual tests that do not exceed failed tests.
F3 (amplitude) represents the value by which the failed test values do not meet their objectives, and it was calculated in three steps as:
When the test value must not exceed the objective and the objective is termed an 'excursion', then it expressed as follows:
For cases where test value should not fall below the objective:
The collective amount by which individual tests are out of compliance is calculated by summing the excursions of individual tests from their objectives and dividing by the total number of tests. This variable referred to as the normalized sum of excursions or NSE calculated as
NSE = (6)
F3 is then calculated by an asymptotic function that ranges the normalized sum of excursions from objectives (NSE) to yield a range between 0 and 100.
Once the CCME WQI value has been determined, water quality can be categorized by corresponding it to one of the following level.
2.3. Calculation of water quality index with DELPHI process
The systematic technique was attempted to incorporate the judgments of a large diverse system in water quality management process (Alexander, 1999; Saha et al., 2007; Walski and Parker, 1974). Two basic approaches are followed by the researchers: aggregative method and multiplicative method. An overall quality rating is derived by multiplying the final weights (wi) of each individual parameter with the corresponding quality rating (qi), the sum of which gives the required single number WQI. The quality rating is measured on a scale of 0 to 100 point (i.e., highest to lowest polluting).
Method 1: Aggregative Method (Saha et al., 2007)
The WQI considered is of the form
where WQIa is the aggregative water quality index between 0 and 100, qi the quality of ith parameter between 0 and 100, wi the weight of ith parameter (between 0 and 1), and n is the total number of parameters.
In this type of index, if any significantly relevant parameter exceeds the permissible limit, the mean weighted indices does not consider sufficient lowering of the water quality index. Table 2 is used to describe the high Indicator values corresponded to low levels of contamination (i.e., good water quality) and low values indicated high levels of contamination (i.e., poor water quality).
Table 2. Classification of water quality based on WQI using agrregative method
Good to Excellent
Good to Moderate
Bad to very Bad
Method 2: Multiplicative Method (Saha et al., 2007)
Multiplicative form of index may be considered by
In this index, weights are calculated to the individual parameters based on a subjective opinion. The classification is shown in Table 3.
Table 3. Classification of water quality based on WQI using Multiplicative method
3. Results and Discussions
3.1. Cluster analysis
In order to avoid univariate statistical analysis problem, multivariate analysis such as Cluster analysis is used in the study to describe the correlation amongst a large number of meaningful data without losing much information (Jackson, 1991; Meglen, 1992). Cluster analysis is a technique to classify groups of objects, or clusters, in such a way that the resulting groups are similar to each other but distinct from other groups (Helena et al., 2000; Raghunath et al., 2002; Simeonov et al., 2003a; Simeonova et al., 2003b; Simeonov et al., 2004; Singh et al., 2004; Vega et al., 1998; Voncina et al., 2002). Cluster analysis can be performed on many different types of data sets. Hierarchical clustering is a way to investigate grouping in data, simultaneously over a variety of scales, by creating a cluster tree. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are joined as clusters at the next higher level. Hierarchical agglomerative Cluster analysis was performed on the normalized data set by means of the Ward's method for sample classification using squared Euclidean distances as a measure of close proximity (Einax et al., 1997; Fovell, 1993). Dendrogram has been developed using the Matlab7 (The Mathworks, Inc. ver. 7.0.1) with water quality index data set of Hooghly river to find out the similar sampling sites spread over the river stretch.
3.2. ANN modeling
In this present study, different neural network models and algorithm were tested and optimized to obtain the best model structure for the prediction of water quality index of sampling stations along the river Hooghly. Based on the principles of the feedforward backpropagation algorithm, the modeling method has been developed (Rumelhart et al., 1986). The ANN model was constructed on examples of calculated datasets with known outputs to analyze existing processes. The ANN architecture typically comprises three types of neuron layers: an input layer (independent variables), one or more number of hidden layers and an output layer (dependent variables). The input layer, which only connect one input value with its associated weighted values receives information from external sources and transfer this information to the hidden layer for processing (Ozdemir et al., 2011). The net input for each neuron (aj) is the sum of all input values Xi; each multiplied by its weight Wji, and added a bias term Zj which may be formulated as:
The output value (tj) can be generated by processing all the data of hidden layer and net input neuron into the linear transfer function (purelin) of the neuron:
In this present study, two types of transfer function have been applied: a tan-sigmoid transfer function (tansig) at hidden layer and a linear transfer function (purelin) at output layer. The Levenberg-Marquardt back-propagation algorithm was used for network training. The inputs and output parameters to the ANN model were identical to the factors considered in cluster analysis approach, namely pH, dissolved oxygen, biochemical oxygen demand, total coliform, faecal coliform and Water quality index respectively. All neural network calculations were implemented using Neural Network Toolbox of MATLAB Version 7.0.1.
Cluster analysis was performed to identify the spatial similarity for clustering of water quality index of sampling sites under the monitoring network. It represented a dendrogram (Fig. 1 and Fig. 2) by using two different methods, grouping all the eight sampling stations based on the water quality index of Hooghly River.
Fig. 1. Dendrogram based on agglomerative hierarchical clustering using CCME method
Fig. 2. Dendrogram based on agglomerative hierarchical clustering using DELPHI method
From the result it was observed that in CCME method, the clustering procedure generated two statistically significant groups according to the average calculated water quality index from the year 2002-2008. Cluster 1 (sites 2, 3, 4, 5, 7) and another cluster 2 (site 6 is distantly related to sites 1 and 8) can be classified correspond to a relatively moderate pollution, low pollution, very high pollution stations respectively. From the DELPHI technique it was evident that all eight stations on the river can be grouped into three major significant clusters with similar characteristic features, cluster 1: Srirampore, Howrah (Shibpur), Garden Reach, Dakhshineswar (the range of WQI is between 24 to 28.32) cluster 2: Palta and Uluberia (WQI are 36.67 and 35.97) and cluster 3: Berhampore and Diamond Harbour (WQI are 62.25 and 62.77) as presented in Fig. 2. It was clearly found that the 3rd major clustering group (high significance of clustering) was characterized by the highest Euclidean linkage distance than the other two clustering group. CA technique is useful in reliable classification of water quality index in the whole region across the river basin and will make possible to design a future spatial sampling strategy in an optimal manner. Thus, the number of sampling sites in the evaluating network will be reduced.
3.3. Development of ANN model
Machine learning techniques such as Artificial Neural Networks (ANNs) has increased recently as a powerful tool in simulation of data modelling and could be useful in ecological aspects (Moghaddam and Khajeh, 2011; Recknagel, 2001; Sinha et al., 2012). It is necessary to generate and optimize the ANNs for prediction the best model configuration that gives lower error during training with minimal computing time. In the present study, an ANN based model was also developed for describing the Average calculated water quality index of all the eight sampling stations along the river basin of Hooghly using both CCME and DELPHI technique from the year 2002-2008. All analyses were based on the calculated data set. The training procedure is repeated until the errors become small enough and the value of correlation coefficient (R) between the model prediction and experimental results is reached to 1. After the training, the ANN can be validated using independent data (Tokar and Johnson, 1999). The goodness of fit of the trained network was shown in Fig. 3 and Fig 4. Regression plot in Fig 4 has correlation coefficient of 0.987 using DELPHI method and correlation coefficient of Fig 3 is 0.954 using CCME technique.
Fig. 3. Regression plot on WQI (Experimentally vs. Predicted) using CCME Method with five input variables, ten processing elements in hidden layer, and one output variable
Fig. 4. Regression plot on WQI (Experimentally vs. Predicted) using DELPHI Method with five input variables, ten processing elements in hidden layer, and one output variable
The performance of the constructed DELPHI method and CCME method were also statistically analysed by the root mean squared error (RMSE), coefficient of determination (R2) and absolute average deviation (AAD) as follows (Geyikci et al., 2012):
where n is the number of data points, is the predicted value from ANN results, is the actual WQI value calculated by CCME and DELPHI method, and the symbol '-' is the average of the related values. Table 4 represents the statistical comparison between CCME and DELPHI technique. In this present study both CCME and DELPHI methods provided good determinations of water quality of Hooghly river, yet the DELPHI method showed the clear superiority over CCME method for both data fitting by ANN model development and estimation capabilities.
Table 4. Comparison of DELPHI and CCME method for determining of WQI
Thus it would be more rational and reliable to calculate the annual water quality index using five different parameters such as pH, dissolved oxygen, biochemical oxygen demand, total coliform, faecal coliform at 8 different sampling stations through a process of DELPHI.
From the present case study of water quality of various stations along the river Hooghly, it is noticed that water of Hooghly is somewhat polluted with the pollutants receiving from various industries and domestic sources. DELPHI and CCME both methods were applied to calculate the average data of every month of a year. The Hierarchical cluster analysis and developed ANN model was applied to the Hooghly river basin to measure WQI, which has very poor water quality. Overall water quality ranges shows from poor to marginal quality depending on the river reach and sample year. Hierarchical cluster analysis grouped 8 sampling stations into three major clusters of similar characteristics reflecting the water quality index calculated by DELPHI method. The WQI was formulated by both DELPHI and CCME technique and the root mean square error (RMSE), coefficient of determination (R2) and absolute average deviation (ADD) were used together to compare the water quality performance of the CCME and DELPHI methods. The DELPHI method was found to have higher predictive capability than CCME method.