Exercises Kpl

13.1 Show that the solution to the Yule-Walker equations 13.07 for the AR 2 process is given by equations 13.08 . 13.2 Demonstrate that the first p 1 Yule-Walker equations for the AR p process Pi vo - Vi pj v i-j 0, i 1, ,p. 13.95 Then rewrite these equations using matrix notation. 13.3 Consider the AR 2 process for which the covariance matrix 13.09 of three consecutive observations has elements specified by equations 13.08 . Show that necessary conditions for stationarity are that pi and p2...

f x

situation, the obvious way to estimate 7 is to use 7 g 9 . Since 9 is a random variable, so is 7. The problem is to estimate the variance of 7. Since 7 is a function of 9, it seems logical that Var 7 should be a function of Var 9 . If g 9 is a linear or affine function, then we already know how to calculate Var Y recall the result 3.33 . The idea of the delta method is to find a linear approximation to g 9 and then apply 3.33 to this approximation. It is frequently necessary in econometrics to...

Granger Causality

One common use of vector autoregressions is to test the hypothesis that one or more of the variables in a VAR do not Granger cause the others. The concept of Granger causality was developed by Granger 1969 . Other, closely related, definitions of causality have been suggested, notably by Sims 1972 . Suppose we divide the variables in a VAR into two groups, Yt1 and Yt2, which are row vectors of dimensions g1 and g2, respectively. Then we may say that Yt2 does not Granger cause Yt1 if the...

Estimation of AR Models

We have already studied a variety of ways of estimating the model 13.32 when ut follows an AR 1 process. In Chapter 7, we discussed three estimation methods. The first was estimation by a nonlinear regression, in which the first observation is dropped from the sample. The second was estimation by feasible GLS, possibly iterated, in which the first observation can be taken into account. The third was estimation by the GNR that corresponds to the nonlinear regression with an extra artificial...

Asymmetric Bootstrap Confidence Intervals

Let us denote by F the EDF of the B bootstrap statistics t . For given 90, the bootstrap P value is, from 5.10 , If this P value is greater than or equal to a, then 90 belongs to the 1 a confidence interval. If F were the CDF of a continuous distribution, we could express the confidence interval in terms of the quantiles of this distribution, just as in 5.13 . In the limit as B to, the limiting distribution of the t , which we call the ideal bootstrap distribution, is usually continuous, and...

Random Walks and Unit Roots

The asymptotic results we have developed so far depend on various regularity conditions that are violated if nonstationary time series are included in the set of variables in a model. In such cases, specialized econometric methods must be employed that are strikingly different from those we have studied 1 In the literature, such series are usually described as being integrated of order one, but this usage strikes us as being needlessly ungrammatical. so far. The fundamental building block for...

ZTZ1 ZTy ZTZ1ZTPXy ZTZ1ZTMXy1549

The leading factor ZTZ -1 has no effect on the test, because it is just a square matrix of full rank. Since some columns of Z generally lie in S X , some of the columns of the matrix ZTMX usually are identically zero. Thus, as before, we let Z' denote the remaining columns of Z. Then what we really want to test is whether the plim of the vector n-1 Z'TMXy is zero. This calls for a conditional moment test. Since the model H1 is linear, such a test can be implemented without an explicit GNR...

SameOrder Notation

Before we can discuss models in which one or more of the regressors has a unit root, it is necessary to introduce the concept of the same-order relation and its associated notation. Almost all of the quantities that we encounter in econometrics depend on the sample size. In many cases, when we are using asymptotic theory, the only thing about these quantities that concerns us is the rate at which they change as the sample size changes. The same-order relation provides a very convenient way to...

x1TM2 x1

and, by a calculation similar to that leading to 3.28 , its variance is Thus Var 31 is equal to the variance of the error terms divided by the squared length of the vector M2 x1. The intuition behind 3.31 is simple. How much information the sample gives us about 31 is proportional to the squared Euclidean length of the vector M2x1, which is the denominator of the right-hand side of 3.31 . When M2x11 is big, either because n is large or because at least some elements of M2x1 are large, 31 will...

Exercises Ycs

3.1 Generate a sample of size 25 from the model 3.11 , with fl 1 and 32 0.8. For simplicity, assume that yo 0 and that the ut are NID 0,1 . Use this sample to compute the OLS estimates fl and 32. Repeat at least 100 times, and find the averages of the fi and the 32. Use these averages to estimate the bias of the OLS estimators of fi and 32. Repeat this exercise for sample sizes of 50, 100, and 200. What happens to the bias of fl and 32 as the sample size is increased 3.2 Consider a sequence of...

Nonlinear Regression

Up to this point, we have discussed only linear regression models. For each observation t of any regression model, there is an information set Qt and a suitably chosen vector Xt of explanatory variables that belong to Qt. A linear regression model consists of all DGPs for which the expectation of the dependent variable yt conditional on Qt can be expressed as a linear combination Xt of the components of Xt, and for which the error terms satisfy suitable requirements, such as being IID. Since,...

Basic Concepts of Maximum Likelihood Estimation

Models that are estimated by maximum likelihood must be fully specified parametric models, in the sense of Section 1.3. For such a model, once the parameter values are known, all necessary information is available to simulate the dependent variable s . In Section 1.2, we introduced the concept of the probability density function, or PDF, of a scalar random variable and of the joint density function, or joint PDF, of a set of random variables. If we can simulate the dependent variable, this...

Probability Distributions

We may now make explicit the general rules that must be obeyed by probability distributions in assigning probabilities to events. There are just three of these rules i All probabilities lie between 0 and 1 ii The null set is assigned probability 0, and the full set of possibilities is assigned probability 1 iii The probability assigned to an event that is the union of two disjoint events is the sum of the probabilities assigned to those disjoint events. We will not often need to make explicit...

Correlation Between Error Terms and Regressors

We now briefly discuss two common situations in which the error terms will be correlated with the regressors and will therefore not have mean zero conditional on them. The first one, usually referred to by the name errors in variables, occurs whenever the independent variables in a regression model are measured with error. The second situation, often simply referred to as simultaneity, occurs whenever two or more endogenous variables are jointly determined by a system of simultaneous equations.

Exercises Tqo

6.1 Let the expectation of a random variable Y conditional on a set of other random variables X1, , Xk be the deterministic function h X1, , Xk of the conditioning variables. Let Q be the information set consisting of all deterministic functions of the X , i 1, ,k. Show that E Y Q h X1, ,Xk . Hint Use the Law of Iterated Expectations for Q and the information set defined by the X . 6.2 Consider a model similar to 3.20 , but with error terms that are normally distributed where t 1, 2, ,n. If the...

Aa1 A1a I

If A is symmetric, then so is A-1. If A is triangular, then so is A-1. Except in certain special cases, it is not easy to calculate the inverse of a matrix by hand. One such special case is that of a diagonal matrix, say D, with typical diagonal element Da. It is easy to verify that D-1 is also a diagonal matrix, with typical diagonal element D-1 If an n x n square matrix A is invertible, then its rank is n. Such a matrix is said to have full rank. If a square matrix does not have full rank,...

Weighted Least Squares

It is particularly easy to obtain GLS estimates when the error terms are heteroskedastic but uncorrelated. This implies that the matrix Q is diagonal. Let u denote the tth diagonal element of Q. Then Q-1 is a diagonal matrix with tth diagonal element u-2, and amp can be chosen as the diagonal matrix with tth diagonal element u-1. Thus we see that, for a typical observation, regression 7.03 can be written as u-1 yt u-1Xt u-1 ut. 7.12 This regression is to be estimated by OLS. The regressand and...

Linear Independence

In order to define the OLS estimator by the formula 1.46 , it is necessary to assume that the k x k square matrix XTX is invertible, or nonsingular. Equivalently, as we saw in Section 1.4, we may say that XTX has full rank. This condition is equivalent to the condition that the columns of X should be linearly independent. This is a very important concept for econometrics. Note that the meaning of linear independence is quite different from the meaning of statistical independence, which we...

Xtx Is Invertible If Linear Independence

But 2.14 and 2.15 cannot both be true, and so XTX -1 cannot exist. Thus a necessary condition for the existence of XTX -1 is that the columns of X should be linearly independent. With a little more work, it can be shown that this condition is also sufficient, and so, if the regressors x1, , xk are linearly independent, XTX is invertible. If the k columns of X are not linearly independent, then they will span a subspace of dimension less than k, say k', where k' is the largest number of columns...

Efficiency of the OLS Estimator

One of the reasons for the popularity of ordinary least squares is that, under certain conditions, the OLS estimator can be shown to be more efficient than many competing estimators. One estimator is said to be more efficient than another if, on average, the former yields more accurate estimates than the latter. The reason for the terminology is that an estimator which yields more accurate estimates can be thought of as utilizing the information available in the sample more efficiently. For a...

The t Test with Predetermined Regressors

If we relax the assumption of exogenous regressors, the analysis becomes more complicated. Readers not interested in the algebraic details may well wish to skip to next section, since what follows is not essential for understanding the rest of this chapter. However, this subsection provides an excellent example of how asymptotic theory works, and it illustrates clearly just why we can relax some assumptions but not others. We begin by applying a CLT to the k -vector v n-1'2XTu n-1'2Y, utXT....

AtA AtAt and AAT AATT

It is frequently necessary to multiply a matrix, say B, by a scalar, say a. Multiplication by a scalar works exactly the way one would expect Every element of B is multiplied by a. Since multiplication by a scalar is commutative, we can write this either as aB or as Ba, but aB is the more common notation. Occasionally, it is necessary to multiply two matrices together element by element. The result is called the direct product of the two matrices. The direct product of A and B is denoted A B,...

The Multivariate Normal Distribution

Multivariate Gaussian Matrix

The results of the previous subsection can be extended to linear combinations of normal random variables that are not necessarily independent. In order to do so, we introduce the multivariate normal distribution. As the name suggests, this is a family of distributions for random vectors, with the scalar normal distributions being special cases of it. The pair of random variables 31 and w considered above follow the bivariate normal distribution, another special case of the multivariate normal...

Testing Common Factor Restrictions

Any of the techniques discussed in Sections 6.7 and 6.8 can be used to test common factor restrictions. In practice, if the error terms are believed to be homoskedastic, the easiest approach is probably to use an asymptotic F test. For the example of equations 7.72 and 7.73 , the restricted sum of squared residuals, RSSR, is obtained from NLS estimation of Hi, and the unrestricted one, USSR, is obtained from OLS estimation of H2. Then the test statistic is F r,n k r 2 , 7.79 where r is the...

The 3-month Treasury Bill Rate Rt For Canada For The Period 1967 1 To 1998 4.

7.1 Using the fact that E w T X S2 for regression 7.01 , show directly, without appeal to standard OLS results, that the covariance matrix of the GLS estimator 3qls is given by 7.05 . 7.2 Show that the matrix 7.11 , reproduced here for easy reference, is positive semidefinite. As in Section 6.2, this may be done by showing that this matrix can be expressed in some n x k matrix Z and some n x n orthogonal projection matrix M. It is helpful to express SI-1 as as in 7.02 . 7.3 Using the data in...

Discrete and Continuous Random Variables

The easiest sort of probability distribution to consider arises when X is a discrete random variable, which can take on a finite, or perhaps a countably infinite number of values, which we may denote as x ,x2, The probability distribution simply assigns probabilities, that is, numbers between 0 and 1, to each of these values, in such a way that the probabilities sum to 1 where p xi is the probability assigned to xi. Any assignment of nonnegative probabilities that sum to one automatically...

The AR1 Process

One of the simplest and most commonly used stochastic processes is the firstorder autoregressive process, or AR 1 process. We have already encountered regression models with error terms that follow such a process in Sections 6.1 and 6.6. Recall from 6.04 that the AR 1 process can be written as ut put-i et, t IID 0, of , p lt 1. 7.29 The error at time t is equal to some fraction p of the error at time t 1, with the sign changed if p lt 0, plus the innovation et. Since it is assumed that et is...

Moving Average Processes

Autoregressive processes are not the only way to model stationary time series. Another type of stochastic process is the moving average, or MA, process. The simplest of these is the first-order moving average, or MA 1 , process ut t 1 t-1, t - IID 0, of , 7.37 For a complex number a bi, a and b real, the absolute value is a b2 1 2. in which the error term ut is a weighted average of two successive innovations, t and et-1. It is not difficult to calculate the covariance matrix for an MA 1...

Exercises 1

2.1 Consider two vectors x and y in E . Let x xi . 2 and y yi . y2 . Show trigonometrically that xTy xiyi 2y2 is equal to x y cos0, where 0 is the angle between x and y. 2.2 A vector in En can be normalized by multiplying it by the reciprocal of its norm. Show that, for any x En with x 0, the norm of x x is 1. Now consider two vectors x, y En. Compute the norm of the sum and of the difference of x normalized and y normalized, that is, of By using the fact that the norm of any nonzero vector is...

Asymmetric Confidence Intervals

The confidence interval 5.06 , which is the same as the interval 5.08 , is a symmetric one, because 0l is as far below 0 as 0u is above it. Although many confidence intervals are symmetric, not all of them share this property. The symmetry of 5.06 is a consequence of the symmetry of the standard normal distribution and of the form of the test statistic 5.04 . It is possible to construct confidence intervals based on two-tailed tests even when the distribution of the test statistic is not...

Conditional Expectations

Whenever we can describe the distribution of a random variable, Xi, conditional on another, X2, either by a conditional CDF or a conditional PDF, we can consider the conditional expectation or conditional mean of X1. If it exists, this conditional expectation is just the ordinary expectation computed using the conditional distribution. If x2 is a possible value for X2, then this conditional expectation is written as E X1 x2 . For a given value x2, the conditional expectation E X1 x2 is, like...

f2 32oTX2TM1X2 332 32ok2

where k k1 k2 see Exercise 5.8. When multiplied by k2, this F statistic is in the form of 5.18 . For the purposes of inference on 32, regression 5.24 is, by the FWL Theorem, equivalent to the regression Thus Var 32 is equal to a2 X2TM1X2 -1. Since the denominator of 5.26 is just s2, the OLS estimate of the error variance from running regression 5.24 , k2 times the F statistic 5.26 can be written in the form of 5.18 , with providing a consistent estimator of the variance of 32 compare 3.50 ....

xTBTBx BxTBx Bx2 0325

This result can hold with equality only if Bx 0. But, in that case, since x 0, the columns of B are linearly dependent. We express this circumstance by saying that B does not have full column rank. Note that B can have full rank but not full column rank if B has fewer rows than columns, in which case the maximum possible rank equals the number of rows. However, a matrix with full column rank necessarily also has full rank. When B does have full column rank, it follows from 3.25 that BTB is...

Exercises Pwg

4.1 Suppose that the random variable z follows the N 0,1 density. If z is a test statistic used in a two-tailed test, the corresponding P value, according to 4.07 , is p z 2 1 - z . Show that Fp - , the CDF of p z , is the CDF of the uniform distribution on 0,1 . In other words, show that 4.2 Extend Exercise 1.6 to show that the third and fourth moments of the standard normal distribution are 0 and 3, respectively. Use these results in order to calculate the centered and uncentered third and...

Exercises

1.1 Consider a sample of n observations, yi ,y2 , ,yn, on some random variable Y. The empirical distribution function, or EDF, of this sample is a discrete distribution with n possible points. These points are just the n observed points, yi,y2, ,yn. Each point is assigned the same probability, which is just l n, in order to ensure that all the probabilities sum to 1. Compute the expectation of the discrete distribution characterized by the EDF, and show that it is equal to the sample mean, that...

XTX XTy

To find the estimator that solves 1.45 , we simply multiply it by the inverse of the matrix XTX, assuming that this inverse exists. This yields the famous formula The estimator 3 given by this formula is generally called the ordinary least squares, or OLS, estimator for the linear regression model.4 Why it is called this, rather than the MM estimator, will be explained shortly.

Misspecification of Linear Regression Models

Up to this point, we have assumed that the DGP belongs to the model that is being estimated, or, in other words, that the model is correctly specified. This is obviously a very strong assumption indeed. It is therefore important to know something about the statistical properties of 3 when the model is not correctly specified. In this section, we consider a simple case of misspecifica-tion, namely, underspecification. In order to understand underspecification better, we begin by discussing its...

Px y X XTX 1XTy

Since this takes the form Xb for b 3, it is a linear combination of the columns of X, and hence it belongs to S X . From 2.20 , it is easy to show that PXX X. Since any vector in S X can be written as Xb for some b G Rk, we see that We saw from 2.21 that the result of acting on any vector y G En with PX is a vector in S X . Thus the invariant subspace of the projection PX must be contained in S X . But, by 2.22 , every vector in S X is mapped into itself by PX. Therefore, the image of PX, which...

Multivariate Distributions

A vector-valued random variable takes on values that are vectors. It can be thought of as several scalar random variables that have a single, joint distribution. For simplicity, we will focus on the case of bivariate random variables, where the vector is of length 2. A continuous, bivariate r.v. X1, X2 has a distribution function F xi,x2 Pr Xi lt xi n X2 lt x2 , where n is the symbol for set intersection. Thus F x1,x2 is the joint probability that both X1 lt x1 and X2 lt x2. For continuous...

The Specification of Regression Models

We now return our attention to the regression model 1.01 and revert to the notation of Section 1.1 in which yt and Xt respectively denote the dependent and independent variables. The model 1.01 can be interpreted as a model for the mean of yt conditional on Xt. Let us assume that the error term ut has mean 0 conditional on Xt. Then, taking conditional expectations of both sides of 1.01 , we see that E yt I Xt 3i 32Xt E u I Xt 3i 32Xt. Without the key assumption that E ut Xt 0, the second...