It is common to hear about « control variables » in econometric models. Similarly, we often hear that a result is questionable because the specified model does not include enough control variables. What are we talking about?
Control variables are simply variables that economists add to a regression in order to avoid bias in the estimation of the parameter of interest. In other words, if you are interested in the effect of the euro/dollar exchange rate on economic growth in France, performing a regression with only the euro/dollar exchange rate as the explanatory variable (x) and economic growth as the explained variable (y) is likely to be misleading: the parameter linking the two variables will not be measured correctly because other variables explaining economic growth are not specified in the regression. Specifying them (also known as « controlling for other variables ») will avoid bias in the estimation of the parameter of interest, i.e., the one linking growth to the exchange rate.
Let’s explain this in more mathematical terms: when a variable capable of explaining the dependent variable is not specified in the model, it is effectively found in the regression residuals (all the information that explains y that is not in the x). If this variable is correlated with the explanatory variables specified in the model, the assumption of no correlation between the explanatory variables and the residuals, which is necessary for the correct estimation of the model by ordinary least squares, will not be respected. This will result in a bias in the estimation of the coefficients concerned.
We can therefore see the importance of specifying all the relevant x variables from the outset to explain a y variable, even if we are only interested in the effect of one x variable[1].
[1] For those more advanced in econometrics, note that Frisch-Waugh’s theorem allows us to directly understand the interest and conditional relevance (i.e., conditional on a possible correlation with the explanatory variables) of including control variables. Indeed, this theorem demonstrates that the coefficient linking one of the explanatory variables to the explained variable will necessarily be biased if the other significant variables are not included in the regression. This shows the importance of including a sufficient number of explanatory variables in a regression.
Julien Pinter