Application Of Weather Generator For Environmental Parameters Estinmation Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The major source of water for Indus basin is snowmelt runoff from the Northern areas of Pakistan. The agricultural and hydropower is mainly dependent on the snowmelt runoff. The temperature is the major parameter responsible for the snowmelt and precipitation is source of snowfall and rainfall runoff. For better management and planning of water resources of the country the prediction of environmental parameters is of great important. The stochastic model is used for generation of weather parameters. The LARS-WG model was applied in northern areas for prediction of minimum, maximum parameter and rainfall. The model was tested using different statistical tests such as F-test, T-test, Mann-Whitney Test and Lenene's test for climate parameters assessment on different time scales. This attempt was made first time in Pakistan, and it has help the researcher to use stochastic techniques effectively for planning and management of water resources in the country.

Key words: Weather Generator, LARS-WG, F-test, T-test, Climate parameters, stochastic model


Stochastic weather generators are numerical models which produces long term time series synthetic daily data of climate variables such as temperature, precipitation and solar radiations (Richardson, 1981; Richard and Wright, 1984; Racsko et al., 1991). The climate data generated by stochastic weather generators have been widely used for hydrological applications, environmental management, water quality, erosion and agricultural risk management. (Sultani and Hoogenboon, 2003; Zhang 2004; Yu, 2003, 2005; Zhang, 2005).

Chineke et al., 2000 conducted a study for 17 stations of Nigeria to check the dependence pattern of daily weather variables like daily maximum and minimum temperature. Daily maximum and minimum temperatures have an average lag-one serial correlation coefficient of 0.833 and 0.802 respectively. The correlation between maximum and minimum temperature averaged 0.387 and it varied with season and location. The results of Student's t-test showed that the persistence within maximum and minimum temperature and interdependence did not differ significantly, 95% of the times at most of the study sites. The use of these models for forecasting has been stressed. Semenov, 1999 found the stochastic weather generator LARS-WG was valid to Europe and it performed well for the simulation of different weather statistics including the climatic extremes which are relevant to agriculture. Chineke et al., 1999The long term time series of daily meteorological data are required in many fields which are not always available or appropriate for use at many locations. The WGEN weather generator model was evaluated to reproduce daily observed data at 17 sites in Nigeria. The Wilcoxon Mann-Whitney U-test showed that the number of the months for which the difference between observed and WGEN generated data were significant was less than 4 for most of the study sites. According to Semenov et al., 1998 stochastic weather generators are used in different studies as hydrological, agricultural and environmental management etc. These studies require long term time series data for risk assessment and these generators can produce time series weather data of any length. The performance of generator is tested and the accuracy required will depend on the application of the data. Two commonly used weather generators LARS-WG and WGEN were compared at 18 sites in USA, Europe and Asia. Different statistical tests were used for comparison. LARS-WG generated data matched more closely the observed data. The implications for use and development of weather generators were also discussed.

Barrow and Semenov, 1995, described a method of producing high resolution scenarios based on regression downscaling techniques linked with a stochastic weather generator. To construct the climate change scenarios the Meteorological Office high resolution GCM transient experiment (UKTR) was used. Site specific, UKTR derived changes in a number of weather statistics were used to perturb the parameters of the stochastic weather generator (LARS-WG), which was initially calibrated using observed daily climate data. The data required by crop growth simulation models was simulated by LARS-WG. This method permits changes to a wider set of climate parameters in the scenarios, including variability. Results were presented for two European sites. Soltani et al., 2000 evaluated the ability of the WGEN model to generate long term weather series weather series in situations where the historic weather data was only 3-10 years long. The generated series were used to simulate yield of irrigated and rainfed chickpea. To do such, four 100-year samples of weather data were generated for Tabriz, Iran. The WGEN used historic weather data of 3, 5, 7 and 10 years for parameterization. Each of the actual and generated weather data series were used as input to a chickpea crop model under irrigated and rainfed conditions at three planting dates. Results showed that generated data were very similar to the actual data used for the estimation of parameters for all the base periods tested. The means and distributions of generated and historic data differ significantly. With increasing number of years from 3 to 10 percent of significant differences were 38, 26, 17 and 13% respectively. We need longer base period data to generate data similar to long term. When the purpose is to generate recent historic data rather than a long term period, the WGEN can be used as reliable source of weather data. Zhang et al., 2004 used the WGEN weather generator to simulate daily maximum and minimum air temperature, precipitation and solar radiation for Six Canadian climate stations. To evaluate WGEN model, the observed data was compared with WGEN simulated daily data. The results showed that the comparisons between observed and WGEN generated data, in general, produce statistically significant correlations for maximum air temperature, solar radiation and precipitation.

Study Area

The application of stochastic weather generators in water sector is of much importance because water has a vital role for sustaining quality of life on earth. This precious commodity plays a basic role in all sectors of economy. In Pakistan its importance is more than ordinary because the economic life of the country depends on agriculture. Most of the fresh water originates from the northern part of Indus basin, which feed to entire Indus Basin Irrigation System. The major source of water is snowmelt and rainfall that mainly affected by climate variables such as temperature and precipitation. The climate parameters change both in temporal and spatial scales. The long time data are required for planning and management of water resources projects. Most of the time the long time data is not available, to overcome with such constraints weather generators are efficient tools which are helpful for generation of long term time series of weather parameters required for water resources management in Pakistan.

.The study was conducted on the upper Indus catchment. The Indus catchment is divided in to number of sub catchments, the meteorological stations have established for measurement of climate parameters such as temperature, wind velocity, precipitation etc. Nine sites have been selected for the study with different elevation ranges. The elevations vary from 614 meter at Kotli to 2317 meter at Skardu. The meteorological stations are located at the lower elevations in the valley. The detail description regarding location and other parameters is given in Table 1.

Table1. Locations and elevations of the selected sites (source: SWHP, 2002)

Data Analysis

Measurement of climate data (precipitation, temperature, humidity and wind velocity) is the primarily responsibility of the Pakistan Meteorological Department. Twenty five years daily data of precipitation, maximum and minimum temperatures of nine sites in the northern areas of Pakistan has been collected from the Pakistan Meteorological Department. The rainfall and temperature data of Astor site for 25 years from 1981 to 2005 has been analyzed which shows that there is an increasing trend in maximum and minimum temperatures as shown in Figure 1. There is 0.73 Co increase in maximum temperature and 0.2 Co increase in minimum temperature in 25 years. The rainfall trend is also analyzed; Figure 2 shows that there is an increasing trend in rainfall from 1981 to 1995, then decreasing from 1996 to 2005.

Figure 1. Annual and five year running mean temperature at Astor from 1981 to 2005

Figure 2. Annual and five year running mean rainfall at Astor from 1981 to 2005

Model Description

The weather generator model LARS-WG utilizes semi-empirical distributions for the lengths of wet and dry day series, daily precipitation and daily solar radiation. The semi-empirical distribution Emp= { ao, ai; hi, i =1,.…,10} is a histogram with ten intervals, [ai-1, ai), where ai-1 < ai, and hi denotes the number of events from the observed data in the i-th interval. Random values from the semi-empirical distributions are chosen by first selecting one of the intervals (using the proportion of events in each interval as the selection probability), and then selecting a value within that interval from the uniform distribution. Such a distribution is flexible and can approximate a wide variety of shapes by adjusting the intervals [ai-1, ai). The cost of this flexibility, however, is that the distribution requires 21 parameters (11 values denoting the interval bounds and 10 values indicating the number of events within each interval) to be specified compared with, for example, 3 parameters for the mixed-exponential distribution used in an earlier version of the model to define the dry and wet day series (Racsko et al., 1991).

The simulation of precipitation occurrence is modeled as alternate wet and dry series, where a wet day is defined to be a day with precipitation > 0.0 mm (Semenov, 1998). The length of each series is chosen randomly from the wet or dry semi-empirical distribution for the month in which the series starts. In determining the distributions, observed series are also allocated to the month in which they start. For a wet day, the precipitation value is generated from the semi-empirical precipitation distribution for the particular month independent of the length of the wet series or the amount of precipitation on previous days. Daily minimum and maximum temperatures are considered as stochastic processes with daily means and daily standard deviations conditioned on the wet or dry status of the day. The technique used to simulate the process is very similar to that presented in Racsko et al. 1991. The seasonal cycles of means and standard deviations are modeled by finite Fourier series of order 3 and the residuals are approximated by a normal distribution. The Fourier series for the mean is fitted to the observed mean values for each month. Before fitting the standard deviation Fourier series, the observed standard deviations for each month are adjusted to give an estimated average daily standard deviation by removing the estimated effect of the changes in the mean within the month. The adjustment is calculated using the fitted Fourier series already obtained for the mean. The observed residuals, obtained by removing the fitted mean value from the observed data, are used to analyze a time autocorrelation for minimum and maximum temperatures. For simplicity both of these are assumed to be constant through the whole year for both dry and wet days with the average value from the observed data being used. Minimum and maximum temperature residuals have a pre-set cross correlation of 0.6. Occasionally, simulated minimum temperature is greater than simulated maximum temperature, in which case the program replaces the minimum temperature by the maximum less 0.1.

Model Application

As mentioned earlier nine site on upper Indus basin were selected for this study. Twenty five years (1981-2005) daily data was used as an input for LARS-WG. The generator can use its parameter file to generate a time series of synthetic data of any length. For each of the nine sites 250 years of daily weather data were generated using LARS-WG. For testing the validity of the model results the statistical analysis was done. For longer series of the data statistical tests are powerful tools to produce significant levels of result when there is a difference between the observed and simulated data (Semenov et al, 1998). To test the results simulated data was divided into ten spans each with twenty five years in length. Each span data was compared with the observed data.

Precipitation occurrence provides a basis for other generated variables and therefore a critical component of the weather generator. Different statistical tests were used to compare a variety of characteristics of data. It is important not only for the simulated data to be similar to the observed data on average, but the distributions of observed and simulated data should also be similar across their whole range. The Kolmogorov Smirnov test (KS-test) was used to compare the probability distributions for each month. KS-test is a non-parametric and distribution free test which tries to determine if two data sets are significantly different and come from different distributions. It is an alternative to the Chi-square goodness of fit test. KS-test compares the two empirical distribution functions such as:


where E1 and E1 are the empirical distribution functions of the two distributions.

On daily basis non-parametric tests were used as the precipitation data was not normally distributed. Mann-Whitney U-test was used as a measure of central tendency. This is an alternative to the independent group t-test when the assumption of normality is not met. Unlike t-test it is nonparametric test and makes no assumptions about distribution of data. Like many non-parametric tests, it uses ranks of the data rather than their raw values to calculate the statistic. This test does not make a distribution assumption; it is not as powerful as the t-test.

The null hypothesis to be tested is that the two samples are from identical populations. The test statistic for the Mann-Whitney test is U. This value is compared to a table of critical values for U based on the sample size of each group. If U exceeds the critical value for U at some significance level (usually 0.05) it means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis. For sample sizes greater than 8, a z-value can be used to approximate the significance level for the test. In this case, the calculated z is compared to the standard normal significance levels.

Where R1 is the sum of the ranks from the observed data and R2 is the sum of the ranks from generated data.

and Su are mean and standard deviation of M test.


Levene's test was used as a measure of variability. It is also a non-parametric and distribution free test. This test has the advantage of being less sensitive to deviations from normality and is widely accepted as the most powerful homogeneity of variance test. The test statistic, which has an F distribution with (N-k) and (k-1) degrees of freedom, is computed as follows:



, ,

Characteristics of observed and simulated monthly and annual total precipitation data were compared by taking means, standard deviations, skewness and kurtosis for all the months and for each span. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.


where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly skewed right means that the right tail is long relative to the left tail.

Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.


where is the mean, is the standard deviation, and N is the number of data points.

The kurtosis for a standard normal distribution is three. The following definition of kurtosis is used.


This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution.

Means of observed and simulated monthly total precipitation were compared using t-test. It is a parametric test with the assumption that the population from which the samples are drawn should be normally distributed. It tests the hypothesis that the samples came from the populations with equal means and has more power to produce significant results when difference exists. The test statistic is computed as follows:


with , where degrees of freedom (d.f).

Where and are the means of observed and generate data respectively, n1 and n2 are number of observations of observed and generated data and Sp is the combined estimate of the common variance δ².

Sp= (10)

The test is rejected if the t-statistic exceeds the critical value at a specified level of significance (""). And a t-test comparing means is used when variances are not equal i.e.


with (12)

That is why variances are compared first. F-test was applied on variances of all values for the months across all the years. This measures the inter-annual variability. F-distribution does not depend upon the population variance but depends upon the two parameters ­1 and 2 only. The procedure for F-test is as follows:

(Assuming that is larger than) (13)

Where and

F-test and t-test are both based on the assumptions that the data (observed and simulated) is from the random samples from existing distribution and test the null hypothesis that the two distributions are the same. Both tests produce p-values measuring the probability that both the data sets come from the same distribution (no difference between observed and simulated data for that variable). A low p-value indicates that the two data sets are not same (i.e. the model is not performing well). Similarly for annual total amount of precipitation the characteristics were also observed and both t-test and F-test values were also performed for each span for nine sites.

An additional test was performed only for annual total amount of precipitation to test the independence of data sets. Lag-one autocorrelation coefficients were computed.

The autocorrelation function (ACF) is the plot of autocorrelations and is very useful when examining stationary and when selecting from among various non-stationary models. Autocorrelation is one of the major tools in time series modeling.

Given measurements, Y1, Y2, ..., YN at time X1, X2, ..., XN, the lag k autocorrelation function is defined as


Autocorrelation is a correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xi and Xi+k. When the autocorrelation is used to detect non-randomness, it is usually only the first (lag-1) autocorrelation that is of interest.

We test the null hypothesis that autocorrelation coefficient is equal to zero that the data is generated through random process. For autocorrelations, we examine the t-statistic (T) for a particular lag to test whether or not the corresponding autocorrelation coefficient equals zero. One commonly used rule is that a t-statistic greater in absolute value than 2 indicates that the corresponding autocorrelation is not equal to zero.

Results and Discussions

Medians of observed and LARS-WG generated data sets on daily basis were compared using Mann-Whitney U-test. Each span was compared with observed data set. As there were 10 spans so number of successes out of 10 for each site on daily basis are shown in Table 2. LARS-WG showed moderate behavior but at some sites for some months there was not any single test in acceptance that medians of both data sets are same. Similarly Levene's test (Table 3) comparing variances of data sets showed good performance of LARS-WG except at Bunji. At Bunji there were six months where all tests were rejected.

Means of observed and LARS-WG generated maximum temperature on monthly and annual basis were compared using t-test as on monthly and annual basis maximum temperature was normally distributed. Each span was compared with observed data set. LARS-WG performed very well specially on annual basis as all not any single test for any single site was rejected on annual basis. But the variances were significantly different as shown in table 4. All tests for all sites were rejected at 5% level of significance (except single test was not rejected at Astore and Chilas). LARS performed well on monthly basis, means and variances of observed and simulated data sets were not significantly different as shown in tables 5. At most of the sites only a few tests were rejected.

Medians of observed and LARS-WG generated daily minimum temperature were compared. A lot of tests were rejected at 5% level of significance as shown in table 6. It means LARS did not produced daily minimum temperature with the medians as in observed data. Same was the case with the Levene's test. Variations of values from their median was also significantly different at most of the sites as shown by Levene's test results in table 7.

Means of observed and LARS-WG generated minimum temperature on monthly and annual basis were compared using t-test and the number of successes are shown in table 8. Each span was compared with observed data set. Means of observed and simulated annual minimum temperature were statistically in strong agreement as all the tests for all sites were accepted with 95% confidence level. Also means of monthly minimum temperature were also in strong agreement as shown in table 9. Variability of observed and simulated data sets was statistically significant on annual basis as shown in table 10. LARS performed well on monthly basis as was less inter-annual variability. At most of the sites only a few F tests were rejected.

Precipitation is a very important weather factor as all other weather factors depend upon it. The comparison results of observed and simulated daily precipitation data by using Levene's test and M.W test at for each month are shown in table 11 and 12. M.W test produced some significant values. Levene's test also produced some significant values for each month.

Means of observed and LARS-WG generated monthly and annual total amount of precipitation were compared using t-test and the number of successes are shown in table 12. Means of observed and simulated annual total precipitation were statistically in strong agreement as no test for any site was rejected at 5% level of significance. Also means of monthly total amount of precipitation were also in strong agreement as shown in table and the minimum numbers of successes were at Gari Dopata for the month of June. Variability of observed and simulated data sets was statistically significant on annual basis while on monthly basis its performance was better than on annual basis as shown in table 13.

Trend of observed annual total precipitation for nine sites is shown in figs. There was no pattern in annual rainfall. Also autocorrelation results revealed that there was no significant autocorrelation among the data points (At Skardu 1 span showed significant autocorrelation). Some tables showing lag-one autocorrelation results are given.

Table 15: Results of Autocorrelation.


LARS-WG regenerated good means and variances of rain, maximum and minimum temperature on monthly basis but on daily basis its performance was not good. CLIGEN performed well on daily basis but on monthly basis it regenerated means well but the inter-annual variances were not generated well for maximum and minimum temperature.

When comparing the daily observed and simulated precipitation it was a common trend that the daily variability between simulated data was greater than the observed data sets and the Levene's test produced more significant results during winter season. Similarly M W-test also produced more significant results during winter season. While talking about monthly total amount of precipitation there was high inter-annual variability as F-test produced several p-values less than 0.05 but the hypothesis about equality of means of observed and simulated data was not rejected more than twice a month for all the sites (except at Bunji where in Nov and Dec there were 5 and 4 rejection cases respectively). All the F-tests showed significant variability in annual total amount of precipition but none of the t-tests was significant at any site. LARS-WG is a random generator, its randomness was tested and only single significant result was found at Skardu.