Linear Regression for Investors
In general, linear regression can be used to identify trends in the behavior of one asset relative to another asset or a relevant index. The most common uses of regression analysis by investors are to establish the relative variability of an asset with its underlying index and to estimate required rates of return based on relative volatility. Linear regression is the final statistical concept that we'll cover. You'll be happy to know there's no computation involved here, as this can be done from lists with a spreadsheet or dedicated statistics software. You can actually do least squares regression with a pad, pencil and slide rule, but I wouldn't recommend it. Linear regression fits a straight line to a series of points and provides an equation for that line. Then the equation can be used to estimate additional points by plugging numbers into the equation. In its simplest form, there is an independent variable and a dependent variable. It's possible to have more than one independent variable, but for the content of this site we only need to worry about one independent variable. When you input two series (lists) of numbers, identify one as independent and one as dependent, and run a linear regression analysis, the output will provide the information you need to construct an equation, i.e., the regression model, in the form Y = a + bX, which you should recognize as the equation of a straight line. Y is the dependent variable, X is the independent variable, a is the Y intercept and b is the slope of the line. In linear regression there is actually a third term, e, which is the error of the estimate. But, being that we don't know what the error is, it's dropped from the equation. Now, the probable error of the estimate is of concern to us and I've addressed that below. If there is little or no correlation between the variables, the error will be too high and the model will be unreliable. The regression equation allows us to compute estimated values of Y for any value of X. A simple example would be estimating the diameter of trees given their age. If you went to a sawmill and measured the diameter of 50 logs and recorded the number of growth rings for each log, you'd have what you need to create a regression model to estimate tree diameter from age. It's a pretty good bet that diameter is highly correlated to age, so the decision to try regression analysis is reasonable. The age in years would be the independent variable and the diameter in inches would be the dependent variable, after running the regression analysis we would have a mathematical model to estimate tree diameter from age, which would be useful if you owned timber land and wanted to estimate what size your trees would grow to over any given number of years. Here's a real example relevant to investing:


Here we have regressed the returns of a large-cap growth fund on the returns of the S&P 500. From the regression output we can assemble the formula that is the regression model.Y = a + bX = 3.25 + 1.23(X) The slope of the regression line, b, is a measure of the expected change in Y per unit of change in X. This is used in investing as a relative measure of variability, which will be discussed under Risk & Return. Now we can use this formula to compute values of Y for any value of X, where X is the return on the S&P 500 index and Y is the estimated return on the large-cap growth fund. For example: If there's a consensus among analysts that the return on the S&P 500 will be 15% next year, we can estimate the expected return on the growth fund as [3.25 + 1.23(15)] = 21.70%. The error term, e, noted above would be the difference between the estimated value of 21.70% and what the fund actually returned next year. We can get an idea of how accurate the regression model is by examining the graph and the regression output. The straight line in the graph is the regression line, i.e., the straight line described by the formula Y = 3.25 + 1.23(X). By examination, it appears that the line fits the data points well. If the points were more dispersed it would be hard to make an evaluation on this basis. So we look at the R-squared of the regression output, 0.928. This is known as the coefficient of determination and it is used as a measure of the "goodness of fit." 0.928 indicates an excellent fit. With linear regression, 0.5 and greater is pretty darned good. R-squared is a measure of the dispersion of the data points around the regression line, and it tells us what percent of the variation in the dependent variable is explained by its correlation with the independent variable. In the example, 92.8% of the variation in X is explained by its correlation with Y. But remember, correlation does not necessarily mean causation. R-squared is the square of the correlation coefficient, R. But remember, squaring is a one way street. The square root of a number is an absolute value, i.e., we don't know whether it's positive or negative. In this case R could be +0.963 or -0.963, and, as you know from the discussion of correlation, there is a vast difference between a correlation of +0.963 and -0.963. But, looking at the graph and seeing how well the line fits the data, it's pretty safe to assume that it is +0.963. If there's still any question about the validity of the estimate, you can use the t statistic. If it isn't included in the regression output, t can be computed by dividing the X coefficient by the standard error of the X coefficient. In this case t = 10.167. According to the t distribution tables, with 8 degrees of freedom, you can be at least 95% certain that the slope of the regression line is significantly different than zero if t is equal to or greater than 1.86. You need to be careful about how you interpret this. It doesn't mean we have at least 95% confidence in the model. It merely means we are at least 95% certain that the linear regression model is better than selecting a value at random. The use of linear regression will be discussed further under Risk & Return. Return to the top of linear Regression for Investors.
Return to the Investing Basics summary page. Move on to the next subsection, Risk & Return.

|