# Probability concepts and regression models

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

PROBABILITY CONCEPTS AND APPLICATIONS AND REGRESSION MODELS

RUNNING HEAD: PROBABILITY CONCEPTS AND APPLICATIONS AND REGRESSION MODELS

Probability Concepts and Applications

August 4, 2014

ABSTRACT

This paper is intended to focus on probability concepts and applications and regression model. The paper discusses elements from quantitative methods such as probability concepts, types of probability and probability distribution. It also looks at regression models and how are variables significant to these models. Different types of regression are mentioned when discussing model building. It also talks about decision making processes where owners and managers evaluate information relating to new opportunities using probability concepts.

Probability Concepts and Applications

1. Describe the rationale for utilizing probability concepts.

Probability concepts are used to ask the question what is the likelihood of an event occurring. Analyst also uses these concepts to assist in making logical and consistent investment decisions, and help manage expectations in an environment of risk. The amount of likelihood related to a single statistical decision of it being wrong is essential for analysts. These analysts’ uses variables observed to collect their data. A sample data is then used to evaluate and extract information for their decision making process, but this information is unfinished and sampling error can occur. Hence, probability concepts are used by means of modeling the sampled population and testing hypotheses. For any businesses to be successful they will require probability. The majority of assessments are made in the face of ambiguity. In this process probability plays the part of replacement for certainty - a substitute for absolute knowledge. For example, if a local pizza company is given the right to sell pizzas at local college game, then the manager needs to know how many pizzas they should have on hand so as not to run short or to have several pizzas leftover involves uncertainty. This manager has to decide how many pizzas will be demanded. If he chooses to stock 300 pizzas, he might say the chance of him needing more than the 300 pizzas is 0.30. This produces a value between 0 and 1 which reflects his uncertainty concerning the need for pizzas at the game.

Is there more than one type of probability? If so, describe the different types of probability.

There is more than one type of probability the objective and the subjective.

Objective probability

The probability that an event will occur based an analysis in which each measure is based on a recorded observation, rather than a subjective estimate. Objective probabilities are a more accurate way to determine probabilities than observations based on subjective measures, such as personal estimates.

For instance, one could determine the objective probability of tossing a coin once and getting a head is

2 Number of possible outcomes (head or tail)

When performing any statistical analysis, it is important for each observation to be an independent event that has not been subject to manipulation. The less biased each observation is, the less biased the end probability will be.

Subjective Probability

A probability derived from an individual's personal judgment about whether a specific outcome is likely to occur. Subjective probabilities contain no formal calculations and only reflect the subject's opinions and past experience.

Subjective probabilities differ from person to person. Because the probability is subjective, it contains a high degree of personal bias. An example of subjective probability could be asking New York Yankees fans, before the baseball season starts, the chances of New York winning the world series. While there is no absolute mathematical proof behind the answer to the example, fans might still reply in actual percentage terms, such as the Yankees having a 25% chance of winning the world series.

1. Briefly discuss probability distributions. What is a normal distribution? Please provide a written example of how 'understanding distribution' can be an asset for any business project.

A statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. This range will be between the minimum and maximum statistically possible values, but where the possible value is likely to be plotted on the probability distribution depends on a number of factors, including the distributions mean, standard deviation, skewness and kurtosis.

Academics and fund managers alike may determine a particular stock's probability distribution to determine the possible returns that the stock may yield in the future. The stock's history of returns, which can be measured on any time interval, will likely be comprised of only a fraction of the stock's returns, which will subject the analysis to sampling error. By increasing the sample size, this error can be dramatically reduced. There are many different classifications of probability distributions, including the chi square, and normal and binomial distributions.

A probability distribution is a mapping of all the possible values of a random variable to their corresponding probabilities for a given sample space.

The probability distribution is denoted as

which can be written in short form as

The probability distribution can also be referred to as a set of ordered pairs of outcomes and their probabilities. This is known as the probability functionf(x).

This set of ordered pairs can be written as:

where the function is defined as:

Normal distribution is the continuous bell-shaped distribution that is a function of two parameters, the mean and standard deviation of the distribution.

Understanding probability distribution is a useful tool that entrepreneurs employ to anticipate potential results. For example, for any business project that deals with sales predictions, risk assessments, and scenario analysis probability distribution comes in handy. Many companies relies on sales forcasts to predict revenue so the probability distribution of how many units the firm expects to sell in a given period can help it anticipate revenues for that period. The distribution also allows a company to see the worst and best possible outcomes and plan for both. The worst outcome could be 100 units sold in a month, while the best result could be 1,000 units sold in that month.

the distribution shows which outcomes are most likely in a risky proposition and whether the rewards for taking specific actions compensate for those risks. For instance, if the probability analysis shows that the costs of launching a new project is likely to be \$350,000, the company must determine whether the potential revenues will exceed that amount to make it a profitable venture.

For instance, the probability distribution can show that the most likely scenario for a new product launch will cost \$250,000, while the best possible scenario shows it will cost \$150,000 and the worst possible scenario shows it will cost \$500,000. Businesses can work toward the best possible outcome while preparing for the worst.

Regression Models

1. What benefit does a variable provide when developing and examining models?

Variables are intended to predict the best outcome of the model. The benefit that a variable provides when developing and examining models is that best independent variable provides the best model with a high r2 and few variables. The r2 is increase as more variables are added to the model and cannot be decrease. However, if too many independent variables are in the model it can cause the adjusted r2 to decrease thus creating problems in the model. Therefore, if a new variable is added to the model and the adjusted r2 is increased then that variable should remain in the model. On the other hand, if that new variable decreases the adjusted r2 in the model then that variable should be removed from it to provide the best model. Variable selection methods stepwise regressionandbest subsets regression are used to select the best variable. If the variable selected doesn’t benefit the regression then the outcome from the model could be wrong.

1. Explain the purpose of simple linear regression and scatter diagrams. Please provide a simple linear regression model and define each variable used.

Simple linear regression is the most commonly used technique for determining how one variable of interest (the response variable) is affected by changes in another variable (the explanatory variable).

Simple linear regression is used for three main purposes:

1. To describe the linear dependence of one variable on another

2. To predict values of one variable from values of another, for which more data are available

3. To correct for the linear dependence of one variable on another, in order to clarify other features of its variability.

Any line fitted through a cloud of data will deviate from each data point to greater or lesser degree. The vertical distance between a data point and the fitted line is termed a "residual". This distance is a measure of prediction error, in the sense that it is the discrepancy between the actual value of the response variable and the value predicted by the line.

Linear regression determines the best-fit line through a scatter-plot of data, such that the sum of squared residuals is minimized; equivalently, it minimizes the error variance. The fit is "best" in precisely that sense: the sum of squared errors is as small as possible. That is why it is also termed "Ordinary Least Squares" regression. Scatter diagrams are use to forecasted the predictor variable against the response variable. For instance, in an experiment is run to investigate the relationship between speed and fuel economy for a car operating at highway speeds. The experiment is run in the laboratory rather than on the road in order to control for outside factors such as weather, road surface, tire wear and the like. This does limit the scope of the experiment since the fuel economies obtained in the laboratory may not match those obtained by the average driver on actual highways. There are 7 levels of speed in the experiment. At each level the car is maintained at that speed and the fuel economy in miles per gallon is measured. The speeds are run in random order. Below are the data

 Run Order Speed, X Fuel Economy, Y (MPH) (MPG) 4 40 33 7 45 31 1 50 29 2 60 29 6 70 25 3 75 22 5 80 20

Fuel Economy vs Speed

One can see from the data

that when the speed increases from 40 MPH to 80 MPH the fuel economy decreases from 33

MPG to 20 MPG. On average the fuel economy has decreased 13 MPG over an increase of

40 MPH. If one uses the change in Y over the change in X as an approximation for the slope of a line, one obtains a naive estimate of the slope, β˜1 = −13 40 = –0.325. Extending this back to the point where X=0, one gets a naive estimate of the intercept as β˜0=33 + 40*(0.325)=46. An equation of one line that might be used to approximate the relationship between speed, X, and fuel economy, Y, is: Y = 46 – 0.325X

The Standard Form of the Regression Equation

The standard form for the regression equation or formula is:

Y = β0 + β1X + Ïµ

where

Y= dependent variable

X= independent variable

β1 is the slope of the regression line, or the multiplier of X

β0= intercept, or the point on the vertical axis where the regression line crosses the vertical y-axis

Ïµ = random error

1. Describe multiple regression analysis and discuss potential uses for this model

Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors. The general purpose of multiple regression (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. For example, a real estate agent might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. For example, you might learn that the number of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). You may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics.

Personnel professionals customarily use multiple regression procedures to determine equitable compensation. You can determine a number of factors or dimensions such as "amount of responsibility" (Resp) or "number of people to supervise" (No_Super) that you believe to contribute to the value of a job. The personnel analyst then usually conducts a salary survey among comparable companies in the market, recording the salaries and respective characteristics (i.e., values on dimensions) for different positions. This information can be used in a multiple regression analysis to build a regression equation of the form:

Y = β0 + β1X1 + β2X2 + âˆ™âˆ™âˆ™âˆ™âˆ™ + βkXk +Ïµ

Once a multiple regression equation has been constructed, one can check how good it is (in terms of predictive ability) by examining the coefficient of determination (R2). R2 always lies between 0 and 1.

R2- coefficient of determination

The closer R2 is to 1, the better is the model and its prediction.

A related question is whether the independent variables individually influence the dependent variable significantly. Statistically, it is equivalent to testing the null hypothesis that the relevant regression coefficient is zero.

This can be done using t-test. If the t-test of a regression coefficient is significant, it indicates that the variable is in question influences Y significantly while controlling for other independent explanatory variables.

Assumptions

Multiple regression technique does not test whether data are linear. On the contrary, it proceeds by assuming that the relationship between the Y and each of Xi's is linear. Hence as a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,…,k. If any plot suggests non linearity, one may use a suitable transformation to attain linearity.

Another important assumption is non existence of multicollinearity- the independent variables are not related among themselves. At a very basic level, this can be tested by computing the correlation coefficient between each pair of independent variables.

Other assumptions include those of homoscedasticity and normality.

Multiple regression analysis is used when one is interested in predicting a continuous dependent variable from a number of independent variables. If dependent variable is dichotomous, then logistic regression should be used.

References

Render, B., Stair, Jr., R. M., & Hanna, M. E. (2012). Probability Concepts and Applications. Quantitative Analysis For Management (). : Pearson.

Render, B., Stair, Jr., R. M., & Hanna, M. E. (2012). Regression Models. Quantitative Analysis For Management (). : Pearson.

Basic Probability Concepts - CFA Level 1 | Investopedia. (n.d.). Investopedia. Retrieved August 4, 2014, from http://www.investopedia.com/exam-guide/cfa-level-1/quantitative-methods/probability-subjective-empirical.asp