USING R STUDIO ONLY
title: “Assignment 2”
output:
pdf_document: default
keep_tex: yes
html_document:
df_print: paged
subtitle: Applied Econometrics
fontsize: 12pt
—
### Name:
### Student ID:
## Empirical Exercises 1
Using the data set **Growth** described in Empirical Exercise E4.1,but excluding the data for Malta, carry out the followingexercises.
a. Construct a table that shows the sample mean, standarddeviation, and minimum and maximum values for the series *Growth,TradeShare, YearsSchool, Oil, Rev_Coups, Assassinations,* and*RGDP60*. Include the appropriate units for all entries. [Hint:Some initial R-code is written below. Complete the remainingpart.]
“`{r}
library(readxl)
library(lmtest)
library(sandwich)
growth.dat = read_excel(“Growth.xlsx”)
# drop Malta from the data
growth.dat = growth.dat[-65, ]
# Calculate mean
g.mean = apply(growth.dat[,-1], 2, mean)
# Calculate standard deviation
# Calculate standard deviation
# Calculate standard deviation
“`
b. Run a regression of *Growth* on *TradeShare, YearsSchool,Rev_Coups, Assassinations,* and *RGDP60*. What is the value of thecoefficient on *Rev_Coups*? Interpret the value of thiscoefficient. Is it large or small in a real-world sense?
**Answer: **[Your answer will be here.]
c. Use the regression to predict the average annual growth rate fora country that has average values for all regressors.
d. Repeat (c), but now assume that the country’s value for*TradeShare* is one standard deviation above the mean.
e. Why is *Oil* omitted from the regression? What would happenif it were included?
## Empirical Exercises 2
In the empirical exercises on earning and height in Chapters 4 and5, you estimated a relatively large and statistically significanteffect of a worker’s height on his or her earnings. One explanationfor this result is omitted variable bias: Height is correlated withan omitted factor that affects earnings. For example, Case andPaxson (2008) suggest that cognitive ability (or intelligence) isthe omitted factor. The mechanism they describe is straightforward:Poor nutrition and other harmful environmental factors in utero andin early childhood have, on average, deleterious effects on bothcognitive and physical development. Cognitive ability affectsearnings later in life and thus is an omitted variable in theregression.
a. Suppose that the mechanism described above is correct.Explain how this leads to omitted variable bias in the OLSregression of *Earnings* on *Height*. Does the bias lead theestimated slope to be too large or too small?
If the mechanism described above is correct, the estimated effectof height on earnings should disappear if a variable measuringcognitive ability is included in the regression. Unfortunately,there isn’t a direct measure of cognitive ability in the data set,but the data set does include years of education for
each individual. Because students with higher cognitive ability aremore likely to attend school longer, years of education might serveas a control variable for cognitive ability; in this case,including education in the regression will eliminate, or at leastattenuate, the omitted variable bias problem.
Use the years of education variable (*educ*) to construct fourindicator variables for whether a worker has less that a highschool diploma (*LT_HS*=1 if $educ<12$, 0 otherwise), a highschool diploma (*HS*=1 if $educ=12$, 0 otherwise), some college(*Some_Col*=1 if $12 < educ < 16$, 0 otherwise), or abachelor’s degree or higher (*College* = 1 if $educ ge 16$, 0otherwise).[Hint: Complete the remaining parts of the R code]
“`{r}
# I will show how to generate LT_HS. You need to generate otherbianary variables in a similar way.
library(readxl)
library(lmtest)
library(sandwich)
library(car)
h.dat = read_excel(“Earnings_and_Height.xlsx”)
attach(h.dat)
lt_hs = as.numeric(educ < 12)
“`
b. Focusing first on women only, run a regression of (1) *Earningson Height* and (2) *Earnings on Height*, including *LT_HS, HS,* and*Some_Col* as control variables.
(i) Compare the estimated coefficient on *Height* in regressions(1) and (2). Is there a large change in the coefficient? Has itchanged in a way consistent with the cognitive ability explanation?Explain.
(ii) The regression omits the control variable *College*.Why?
(iii) Test the joint null hypothesis that the coefficients onthe education variables are equal to 0.
(iv) Discuss the values of the estimated coefficients on*LT_HS*, *HS*, and *Some_Col*. (Each of the estimated coefficientsis negative, and the coefficient on *LT_HS* is more negative thanthe coefficient on *HS*, which in turn is more negative than thecoefficient on *Some_Col*. Why? What do the coefficientsmeasure?)
## Empirical Exercises 3
Use the data set **cps12.xlsx** to answer the followingquestions.
a. Run a regression of average hourly earnings (*AHE*) onage(*Age*). What is the estimated intercept? What is the estimatedslope?
b. Run a regression of *AHE* on *Age*, gender (*Female*), andeducation (*Bachelor*). What is the estimated effect of *Age* onearnings? Construct a 95% confidence interval for the coefficienton *Age* in the regression.
c. Are the results from the regression in (b) substantivelydifferent from the results in (a) regarding the effects of *Age*and on &AHE*? Does the regression in (a) seem to suffer fromomitted variable bias?
d. Bob is a 26-year-old male worker with a high school diploma.Predict Bob’s earnings using the estimated regression in (b).Alexis is a 30-year-old female worker with a college degree.Predict Alexis’s earnings using the regression.
e. Are gender and education determinants of earnings? Test thenull hypothesis that *Female* can be deleted from the regression.Test the null hypothesis that *Bachelor* can be deleted from theregression. Test the null hypothesis that both *Female* and*Bachelor* can be deleted from the regression.
Expert Answer
Answer to USING R STUDIO ONLY title: “Assignment 2” output: pdf_document: default keep_tex: yes html_document: df_print: paged sub…