This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This project will begin by presenting an extensive biography of Sir Arnold Fisher. He was a genius who almost single-handedly created the foundations for modern statistical science [Hald. P]. Therefore it is vitally important to acknowledge his life, which rewarded statistics, its identity.
Many of the basic ideas of statistics will be discussed which are required to understand statistical estimation. When scientists carry out experiments they use the observed data obtained from the random samples, in order to gain valuable knowledge about the respecting population. This is the underlying idea of the frequentist approach to statistical estimation. Whereas, the Bayesian approach is different because it assumes prior knowledge about the population before the observations are made. This paper will give an in-depth insight to the frequentist approach to statistical estimation, which was initiated by Sir Ronald Aylmer Fisher.
There will be a formal introduction to the method of maximum likelihood estimation, which is by far, the most popular technique for deriving estimators. [Casella and Berger, 1990, p.289]. This method is considered to be better because it produces estimators that acquire all the optimal properties. Although having both advantages and disadvantages like any other point estimation methods, it is still superior and more valuable.
This will be followed by the historical revolution of maximum likelihood estimation, which was consolidated by Sir Ronald Aylmer Fisher's contributions. He produced a simple recipe that purports to lead to the optimum solution for all parametric problems and beyond, and not only promises an optimum estimate, but also a simple all-purpose assessment of its accuracy. [Stigler, 2007, p.598] A brief account for the implications of the method before Fisher's work will be given to emphasize why Fisher was influenced to carry out his studies. Along with this a discussion of Fisher's challenging mathematical study over the period 1912-1922 will be evaluated briefly in order to account for the approval of his method.
Furthermore, another brief account will be given to illustrate Fisher's influence to the recent progress of the method. Intuitively the properties of maximum likelihood estimators will be discussed greatly and contrasted to estimators obtained from other methods of estimation. The estimators that satisfy the criteria of consistency, efficiency and sufficiency are known as the good estimators [Hald, 2007, p. 6]. Fisher formulated this in his work after he had produced the well-defined method. It will be shown through this paper that maximum likelihood estimators possess these qualities making them advanced to other estimators.
Ronald Aylmer Fisher's Biography
Sir Ronald Aylmer Fisher was a British Statistician born on the 17th February 1890 in London, England. He started attending Harrow (independent boys school) in 1904. There he performed outstandingly well in a mathematical essay competition  and was acknowledged for his achievement by winning the Neeld Medal in 1906. He carried on with his education further and enrolled to study Mathematics and astronomy at Cambridge University in October 1909. During his time at university, he developed an interest in biology and was enthusiastic to form a Cambridge University Eugenics Society. Having successfully completed his course he graduated with distinction in the mathematical tripos of 1912 . Again he was recognised for his achievements and as a result was awarded a Wollaston studentship. Hence, he continued his education by studying the theory of errors. It was Fisher's interest in the theory of errors that eventually led him to investigate statistical problems . During his final year as a undergraduate he wrote the paper On a absolute criterion for fitting frequency curves. This paper had illustrated his interest in statistical problems and his initial theory of the method of maximum likelihood.
After successfully leaving Cambridge University in 1913, he travelled to Canada and worked at a farm for a small period of time. Then he returned back to England and worked temporarily for various institutes until 1919, when the First World War ended. As a result of the ending distress of the war, he was in a position to stabilise his future career, when he was offered two jobs at the same time. Karl Pearson offered him the post of chief Statistician at the Galton Laboratories and he was also offered the post of statistician at the Rothamsted Agricultural Experiment Station. 
He rejected the offer made by Pearson and accepted the position at Rothamsted (well known establishment), in which the effects of nutrition and soil types on plant fertility are studied. This position appealed more to Fisher clearly because of his experience and interest in farming. Hence he made many contributions to statistics and genetics whilst working there.
Up to that time he had only a few papers to his credit, including two which were most important, one in mathematical statistics, and the other in genetics-the two fields in which he ultimately made his greatest contributions.[Bennet,1990,p.xvi] Charles Robert Darwin formed the idea of natural selection in the 19th century before genetics was discovered. Therefore, Fisher was interested in Darwin's evolutionary theory, which concluded that individuals having advantageous variations are more likely to survive and reproduce compared to those without the advantageous variations. Fisher carried out many biological experiments and the results he obtained about genetics where analysed using statistical methods.
In 1921 he introduced the concept of likelihood enabling him to present the new definition of statistics. Furthermore, he developed the criteria for estimation by defining concepts required such as consistency, efficiency and sufficiency. Fisher  published a number of important texts; in particular Statistical methods for Research Workers (1925). This book was extremely significant as it was like a manual for the methods that he formulated whilst he was working at Rothamsted.
Karl Pearson (Galton Professor of eugenics) retired from University College, London in 1933.His department was divided into two namely; Statistics and Eugenics. Fisher was appointed as the new Galton Professor of eugenics and Egon S Pearson (Karl Pearson's son) as head of the statistics department. Fisher continued his study of genetics and statistics here for ten years. Although, again he faced the distressing situations in the Second World War period he still continued with his hard work and contributions. After this difficult period in 1943 he became a Balfour professor of genetics at Cambridge University where he initial had started his own educational journey After his retirement in 1957 he spent the last few years of his life at University of Adelaide, Australia as a research fellow.
Fundamentals of statistics
Statistics is an important branch of applied mathematics and one may consider it to be the study of observational data using mathematical tools. The observational data is obtained from the experiment carried out on a representative sample of an adequate size from the population. This idea of " population" is applicable to not only living creatures but also to non-living objects. In reality the size of a population may be infinitely large, hence the population could be divided into distinguishable subgroups. This would obviously make the population size more condensed and allow one to take a reasonable sample. Just as a single observation may be regarded as an individual, and its repetition as generating a population, so the entire result of an extensive experiment may be regarded as but one of a population of such experiments. [Fisher, page 1].
If we can find a mathematical form for the population which adequately represents the data, and then calculate from the data the best possible estimates of the required parameters, then it would seem that there is little, or nothing, more that the data can tell us; we shall have extracted from it all the available relevant information.
The way that the population is distributed can be represented by a mathematical equation that involves a certain number, usually a few, of parameters or "constants" entering into the mathematical formula. [1, page 1]. The population is characterized by these parameters. Thus knowing the precise values of these will allow us to know everything that we could possibly find out about the population. Although, it is not possible to determine the precise value of this parameter it can be approximated using statistical estimation methods. It is also important to note that no matter how good our approximations are, the estimates (also called statistics) of the parameters will still be imprecise.
The statistics will change from sample to sample within the population. For example if we calculate the sample mean (statistics) for many random samples taken from the population, we will observe that each time the mean will change. Clearly we can mathematically represent the distribution of this statistics. It is important to note that if we know the distribution of the sample from which the statistics is derived from, then we can find the distribution of the statistics through that.
The method of maximum likelihood
This is the best known, most widely used, and most important of the methods of estimation [Garthwaite, Jolliffe, and Jones, 1995, p.41]. The name of this method gives its simple idea away. Therefore in this method we find the value of which maximizes [Garthwaite, Jolliffe, and Jones, 1995, p.41]. This indicates that the likelihood function plays a fundamental role in this method. The function allows us to estimate the unknown parameters based on the known observed data obtained from the random samples. Clearly this is unlike the idea of probability, which allows us to predict the unknown outcomes, based on the known parameters [Wikipedia, likelihood function].
The method of maximum likelihood occurs in various rudimentary forms before Fisher, but not under this name [Hald, p.214]. Although, this important method was not known in the same form as it is known by now, it still was a great important mystery to be discovered. This brings out the following question: Why was it necessary to develop another new method of estimation? Clearly the simple answer to this question is that, the existing methods of estimation were not good enough. This could be interpreted in different ways depending on the nature of the estimation problem. In the process of developing new ideas related to the existing methods Carl Friedrich Gauss (1816), Gotthilf Heinrich Ludwig Hagen (1837) and Francis Ysidro Edgeworth (1909) made some contributions that had implied the method of maximum likelihood. However, they all failed in some way or another because their procedures did not hold a satisfactory proof. Although their ideas are shown in some areas of Fishers work, it is stated the he did not know these results when he wrote his first papers on maximum likelihood[ Hald, 215].
Gauss (1809) had derived the normal distribution that initiated the development for the method of maximum likelihood. Laplace (1812) used the method of moments to show that is a biased estimate of the variance, . Also, Gauss (1823) used his frequentist approach of the method of least squares to prove that is an unbiased estimate of [Hald, p.215]. One of the biggest problems with their estimation procedure was that their estimate depends on the parameterization of the model, which means that the resulting estimate is arbitrary [hald p.7].
Fisher wanted to create a new method, which gave estimates that, are invariant to parameter transformation. This consideration would have eliminated such criteria as that the estimate should be "unbiased"[Fisher,1973, p.146]. He introduced his new method of estimation in the paper on absolute criterion for fitting frequency curves [Fisher, 1912]. This paper begins with his criticism to the existing methods; the method of least squares because he thought it was not applicable to frequency curves and arbitrariness arises in the scaling of the abscissa line [Fisher]; method of moments because it is arbitrary and doesn't define a rule for choosing a moment to estimate equations. Although he introduced the method in this paper, it is imperceptible to the reader that he is stating a new method. The reason being for this is: it seems like he is using modified version of the existing inverse probability methods. Hence, the paper wasn't given much importance at that time. This obviously had set him on a mission to find out what had caused this disapproval and misunderstanding.
After years of study he discovered the distinction between the concept of likelihood and inverse probability. This was a breakthrough for his method. He introduced a formal definition of likelihood and explained the importance of the likelihood function in his method. Fisher was the first to introduce the concept of Likelihood in 1921. This concept was also one of his great achievements in the development of statistics. Therefore, it allowed him rectify the mistake shown in the disapproved paper on absolute criterion for fitting frequency curves [Fisher, 1912]. On this paper, he initially presented the method, which was incorrectly derived from the idea of inverse probability. I must indeed plead guilty in my original statement of the Method of Maximum Likelihood (1912) to having based my argument upon the principle of inverse probability [Fisher]. Hence after he introduced the concept of likelihood he had corrected the misinterpretation of that paper. This led to the successful achievement of producing his well-defined method of estimation in 1922.
At that point the method of maximum likelihood was the talk of the town in statistics, and led to less interest in inverse probability methods. Fisher did say that the theory of inverse probability is found upon an error, and must be wholly rejected.[Fisher, p.9] By this he meant it is not appropriate to reverse the idea of probability to make estimations. This must not be misunderstood to think that we can't draw inferences about populations from the knowledge of a sample.
Properties of statistics
The distribution of the statistics allows us to choose the most suitable statistics to use for the estimation. Hence, from the behaviour of the distribution we can separate the statistics into groups. If we calculate a statistic, for example the mean of a very large sample then it will be a much more accurate estimate of the population mean in comparison to a estimate obtained from a smaller sample.
As a result of this if the sample size gets larger and larger the difference between the statistics gets smaller and smaller. In fact, as the samples are made larger without limit, the statistic will usually tend to some fixed value characteristic of the population, and, therefore, expressible in terms of the parameters of the population.
There is only one correct parametric function to which the statistic can be equated. However, if the statistics can be equated to another parametric function, then regardless of how large the sample is made it still will tend to the incorrect fixed value. These statistics are called inconsistent statistics. Conversely, consistent statistics are equated to the correct parametric function. Then as the sample size increases it will tend to the "correct" value. The errors between the actual value and the estimate tend to normal distribution.
The mean value of the square of errors is the variance. The variance is inversely proportional to the sample size. Hence if we increase the sample size then the variance decreases with a constant of proportionality
Suppose we calculate the mean (consistant statistics) of many random samples of the same size from a population. Clearly variance of each consistant statistic will be different. Therefore the statistic will the smaller variance can be classed as more of an "efficient" statistic
RA Fisher illustrated this idea by this example: â€¦If from a large sample of (say) 1000 observations we calculate an efficient statistic, A, and a second consistent statistic, B, having twice the variance of A, then B will be a valid estimate of the required parameter, but one definitely inferior to A in its accuracy. Using the statistic B, a sample of 2000 values would be required to obtain as good an estimate as is obtained by using the statistic A from a sample of 1000 values. We may say, in this sense, that the. statistic B makes use Of 50 per cent of the relevant information available in the observations; or, briefly, that its efficiency is 50 per cent. The term "efficient" in its absolute sense is reserved for statistics the efficiency of which is 100 per cent.
Although all consistent estimates are not efficient they are still valuable estimates and can be adequately accurate in estimation problems.The only time when it is actually necessary to use efficient statistics is "If we are to make accurate tests of goodness of fit, the methods of fitting employed must not introduce errors of fitting comparable to the errors of random sampling; when this requirement is investigated, it appears that when tests of goodness of fit are required, the statistics employed in fitting must be not only consistent, but must be of 1000 per cent efficiency. This is a very serious limitation to the use of inefficient statistics, since in the examination of any body of data it is desirable to be able at any time to test the validity of one or more of the provisional assumptions which have been made."[1,pg1]For large samples it can be shown that all efficient statistics tend to the same value. Therefore, it is okay to use any but one efficient statistics.
Furthermore, there are "sufficient statistics" that contain all the important information of observations from small samples. Theses estimate "...are definitely to other efficient statistics."[1,pg1] When they exist. Examples of sufficient statistics are the arithmetic mean of samples from the normal distribution, or from the Poisson series; it is the fact of providing sufficient statistics for these two important types of distribution which gives to the arithmetic mean its theoretical importance.[1,Pg1]