Info Hzl
There are two reasons for using this estimator one practical, one theoretical. If any column of X also appears in Z, then that column of X is reproduced exactly in X This is easy to show. In the expression for X, if the kth column in X is one of the columns in Z, say the lth, then the kth column in Z'Z -1 Z'X will be the lth column of an L x L identity matrix. This result means that the kth column in X Z Z'Z -1Z'X will be the lth column in Z, which is the kth column in X. This result is...
Info Crt
4. Obtain the reduced form for the model in Exercise 1 under each of the assumptions made in parts a and in parts b1 and b9. 5. The following model is specified All variables are measured as deviations from their means. The sample of 25 observations produces the following matrix of sums of squares and cross products
Info Dis
24 CHAPTER 3 Least Squares The solution is b X'X -1X'y -o.5o9o7, -o.o1658, o.67o38, -o.oo2326, -o.oooo94o1 '. 3.2.3 ALGEBRAIC ASPECTS OF THE LEAST SQUARES SOLUTION X'Xb - X'y -X' y - Xb -X'e 0. 3-12 Hence, for every column xk of X, xke o. If the first column of X is a column of 1s, then there are three implications. 1. The least squares residuals sum to zero. This implication follows from x e i'e 2. The regression hyperplane passes through the point of means of the data. The first normal...
Treatment Effects
The basic model of selectivity outlined earlier has been extended in an impressive variety of directions.27 An interesting application that has found wide use is the measurement of treatment effects and program effectiveness.28 An earnings equation that accounts for the value of a college education is where Ci is a dummy variable indicating whether or not the individual attended college. The same format has been used in any number of other analyses of programs, experiments, and treatments. The...
Summary And Conclusions Mnk
This chapter has focused on two uses of the linear regression model, hypothesis testing and basic prediction. The central result for testing hypotheses is the F statistic. The F ratio can be produced in two equivalent ways first, by measuring the extent to which the unrestricted least squares estimate differs from what a hypothesis would predict and second, by measuring the loss of fit that results from assuming that a hypothesis is correct. We then extended the F statistic to more general...
Integrated Processes And Differencing
A process that figures prominently in recent work is the random walk with drift, That is, yt is the simple sum of what will eventually be an infinite number of random variables, possibly with nonzero mean. If the innovations are being generated by the same zero-mean, constant-variance distribution, then the variance of yt would obviously be infinite. As such, the random walk is clearly a nonstationary process, even if m equals zero. On the other hand, the first difference of yt, is simply the...
Exercises
1. The Two Variable Regression. For the regression model y a fix e, a. Show that the least squares normal equations imply Viei 0 and Vixiei 0. b. Show that the solution for the constant term is a y - bx. c. Show that the solution for b is b EL x - x yi - y E 1 xi - x 2 . d. Prove that these two values uniquely minimize the sum of squares by showing that the diagonal elements of the second derivatives matrix of the sum of squares with respect to the parameters are both positive and that the...
Censoring And Truncation In Models For Counts
Truncation and censoring are relatively common in applications of models for counts see Section 21.9 . Truncation often arises as a consequence of discarding what appear to be unusable data, such as the zero values in survey data on the number of uses of recreation facilities Shaw 1988 and Bockstael et al. 1990 . The zero values in this setting might represent a discrete decision not to visit the site, which is a qualitatively different decision from the positive number for someone who had...
Testing For Overdispersion
The Poisson model has been criticized because of its implicit assumption that the variance of yi equals its mean. Many extensions of the Poisson model that relax this assumption have been proposed by Hausman, Hall, and Griliches 1984 , McCullagh and Nelder 1983 , and Cameron and Trivedi 1986 , to name but a few. The first step in this extended analysis is usually a test for overdispersion in the context of the simple model. A number of authors have devised tests for overdispersion within the...
Incidental Truncation In A Bivariate Distribution
Suppose that y and z have a bivariate distribution with correlation p. We are interested in the distribution of y given that z exceeds a particular value. Intuition suggests that if y and z are positively correlated, then the truncation of z should push the distribution of y to the right. As before, we are interested in 1 the form of the incidentally truncated distribution and 2 the mean and variance of the incidentally truncated random variable. Since it has dominated the empirical literature,...
Some Issues In Specification
Two issues that commonly arise in microeconomic data, heteroscedasticity and nonnor-mality, have been analyzed at length in the tobit setting.13 Maddala and Nelson 1975 , Hurd 1979 , Arabmazar and Schmidt 1982a,b , and Brown and Moffitt 1982 all have varying degrees of pessimism regarding how inconsistent the maximum likelihood estimator will be when heteroscedasticity occurs. Not surprisingly, the degree of censoring is the primary determinant. Unfortunately, all the analyses have been carried...
The Censored Normal Distribution
The relevant distribution theory for a censored variable is similar to that for a truncated one. Once again, we begin with the normal distribution, as much of the received work has been based on an assumption of normality. We also assume that the censoring point is zero, although this is only a convenient normalization. In a truncated distribution, only the part of distribution above y 0 is relevant to our computations. To make the distribution integrate to one, we scale it up by the...
Introduction Dsn
In Chapter 9, we extended the classical linear model to allow the conditional mean to be a nonlinear function.1 But we retained the important assumptions about the disturbances that they are uncorrelated with each other and that they have a constant variance, conditioned on the independent variables. In this and the next several chapters, we extend the multiple regression model to disturbances that violate these classical assumptions. The generalized linear regression model is y X0 e, E e X 0,...
Restrictions And Nested Models
One common approach to testing a hypothesis is to formulate a statistical model that contains the hypothesis as a restriction on its parameters. A theory is said to have testable implications if it implies some testable restrictions on the model. Consider, for example, a simple model of investment, It, suggested by Section 3.3.2, ln It Pi frit fo Apt Pa ln Yt Pst e,, 6-1 which states that investors are sensitive to nominal interest rates, it, the rate of inflation, Apt, the log of real output,...
Info Acb
This result is the same one we had for the linear model with X0 in the role of X. You should check that when e 0 y - X0, our results for the linear model in Section 9.5.1 are replicated exactly. This problem, however, is highly nonlinear in most cases, and the repeated least squares approach is unlikely to be effective. But it is a straightforward minimization problem in the frameworks of Appendix E, and instead, we can just treat estimation here as a problem in nonlinear optimization. We have...
Minimum Mean Squared Error Predictor
As an alternative approach, consider the problem of finding an optimal linear predictor for y. Once again, ignore Assumption A6 and, in addition, drop Assumption A1 that the conditional mean function, E y x is linear. For the criterion, we will use the mean squared error rule, so we seek the minimum mean squared error linear predictor of y, which we'll denote x'y. The expected squared error of this predictor is MSE Ey,x y - E y x 2 Ey,x E y x - x'y 2. We seek the y that minimizes this...
Choosing Between Nonnested Models
The classical testing procedures that we have been using have been shown to be most powerful for the types of hypotheses we have considered.2 Although use of these procedures is clearly desirable, the requirement that we express the hypotheses in the form of restrictions on the model y Xfi e, can be limiting. Two common exceptions are the general problem of determining which of two possible sets of regressors is more appropriate and whether a linear or loglinear model is more appropriate for a...
DEFINITION 51 Asymptotic Efficiency
An estimator is asymptotically efficient if it is consistent, asymptotically normally distributed, and has an asymptotic covariance matrix that is not larger than the asymptotic covariance matrix of any other consistent, asymptotically normally distributed estimator. In Chapter 17, we will show that if the disturbances are normally distributed, then the least squares estimator is also the maximum likelihood estimator. Maximum likelihood estimators are asymptotically efficient among consistent...
Nonnormal Disturbances And Large Sample Tests
The distributions of the F, t, and chi-squared statistics that we used in the previous section rely on the assumption of normally distributed disturbances. Without this assumption, 7This case is not true when the restrictions are nonlinear. We consider this issue in Chapter 9. the exact distributions of these statistics depend on the data and the parameters and are not F, t, and chi-squared. At least at first blush, it would seem that we need either a new set of critical values for the tests or...
The Goldfeldquandt Test
By narrowing our focus somewhat, we can obtain a more powerful test. Two tests that are relatively general are the Goldfeld-Quandt 1965 test and the Breusch-Pagan 1979 Lagrange multiplier test. For the Goldfeld-Quandt test, we assume that the observations can be divided into two groups in such a way that under the hypothesis of homoscedasticity, the disturbance variances would be the same in the two groups, whereas under the alternative, the disturbance variances would differ systematically....
Partitioned Regression And Partial Regression
It is common to specify a multiple regression model when, in fact, interest centers on only one or a subset of the full set of variables. Consider the earnings equation discussed in Example 2.2. Although we are primarily interested in the association of earnings and education, age is, of necessity, included in the model. The question we consider here is what computations are involved in obtaining, in isolation, the coefficients of a subset of the variables in a multiple regression for example,...
The Least Squares Coefficient Vector
The least squares coefficient vector minimizes the sum of squared residuals where b0 denotes the choice for the coefficient vector. In matrix terms, minimizing the sum of squares in 3-1 requires us to choose b0 to Minimizebo S bo e0eo y - Xbo ' y - Xbo . 3-2 eoeo y'y - box'y - y'Xbo boX'Xbo 3-3 S bo y'y - 2y'Xbo boX'Xbo. The necessary condition for a minimum is 1 We shall have to establish that the practical approach of fitting the line as closely as possible to the data by least squares leads...
Testing Nonlinear Restrictions
The preceding discussion has relied heavily on the linearity of the regression model. When we analyze nonlinear functions of the parameters and nonlinear regression models, most of these exact distributional results no longer hold. The general problem is that of testing a hypothesis that involves a nonlinear function of the regression coefficients We shall look first at the case of a single restriction. The more general one, in which c f q is a set of restrictions, is a simple extension. The...
THEOREM 44 Independence of b and s2
If e is normally distributed, then the least squares coefficient estimator b is statistically independent of the residual vector e and therefore, all functions of e, including s2. t bk - M VoW bk - Pk k n - K s2 a2 n - K VsW ' has a t distribution with n - K degrees of freedom.2 We can use tk to test hypotheses or form confidence intervals about the individual elements of . A common test is whether a parameter pk is significantly different from zero. The appropriate test statistic
The Population Orthogonality Conditions
Let x denote the vector of independent variables in the population regression model and for the moment, based on assumption A5, the data may be stochastic or nonstochastic. Assumption A3 states that the disturbances in the population are stochastically orthogonal to the independent variables in the model that is, E e x 0. It follows that Cov x, e 0. Since by the law of iterated expectations Theorem B.l Ex E e x E e 0, we may write this as The right-hand side is not a function of y so the...
Info Rri
Overidentified case In this case, the optimal weighting matrix, that is, the W which produces the most efficient estimator is W V-1. That is, the best weighting matrix is the inverse of the asymptotic covariance of the moment vector. THEOREM 10.6 Generalized Method of Moments Estimator The Minimum Distance Estimator obtained by using W V-1 is the Generalized Method of Moments, or GMM estimator. The GMM estimator is consistent, asymptotically normally distributed, and has asymptotic covariance...
The Restricted Least Squares Estimator
A different approach to hypothesis testing focuses on the fit of the regression. Recall that the least squares vector b was chosen to minimize the sum of squared deviations, e'e. Since R2 equals 1 - e'e y'M y and y'M y is a constant that does not involve b, it follows that b is chosen to maximize R2. One might ask whether choosing some other value for the slopes of the regression leads to a significant loss of fit. For example, in the investment equation in Example 6.1, one might be interested...
Insufficient Observations
In some circumstances, the data series are not long enough to estimate one or the other of the separate regressions for a test of structural change. For example, one might surmise that consumers took a year or two to adjust to the turmoil of the two oil price shocks in 1973 and 1979, but that the market never actually fundamentally changed or that it only changed temporarily. We might consider the same test as before, but now only single out the four years 1974, 1975,1980, and 1981 for special...
pUm n XXpm n Bxr2Q 99
where Q0 is a positive definite matrix. To establish coZTstency of b in the linear model, we required plim 1 n X'e 0. We will use the counterpart to this for the pseudoregressors This is the orthogonality condition noted earlier in 5-4 . In particular, note that orthogonality of the disturbances and the data is not the same condition. Finally, asymptotic normality can be established under general conditions if With these in hand, the asymptotic properties of the nonlinear least squares...
Info Nub
plim ni X'Z Z'Z -1Z'X -1 - X'X -11. n To compare the two matrices in the brackets, we can compare their inverses. The inverse of the first is X'Z Z'Z -1Z'X X' I - MZ X X'X - X'MZX. Since MZ is a nonnegative definite matrix, it follows that X'MZX is also. So, X'Z Z'Z -1Z'X equals X'X minus a nonnegative definite matrix. Since X'Z Z'Z -1Z'X is smaller, in the matrix sense, than X'X, its inverse is larger. Under the hypothesis, the asymptotic covariance matrix of the LS estimator is never larger...
Spherical Disturbances
The fourth assumption concerns the variances and covariances of the disturbances Var e X a2, for all i 1, ,n, Constant variance is labeled homoscedasticity. Consider a model that describes the profits of firms in an industry as a function of, say, size. Even accounting for size, measured in dollar terms, the profits of large firms will exhibit greater variation than those of smaller firms. The homoscedasticity assumption would be inappropriate here. Also, survey data on household expenditure...
Partial Regression And Partial Correlation Coefficients
The use of multiple regression involves a conceptual experiment that we might not be able to carry out in practice, the ceteris paribus analysis familiar in economics. To pursue Example 2.2, a regression equation relating earnings to age and education enables us to do the conceptual experiment of comparing the earnings of two individuals of the same age with different education levels, even if the sample contains no such pair of individuals. It is this characteristic of the regression that is...



