In this post, I will show how to do recursive estimation to determine how robust are your panel estimates to outliers. I saw first time this procedure in Herzer and Nunnenkamp (2013). Basically, if you have a panel of 160 countries, you will determine how change the coefficient estimates and their significance when a certain country, or two, or a small group of countries in your panel are removed. Indeed, once that nation (or those nations) is identified, you may enrich your paper referring to some characteristics or peculiarities of that nation. For example, what if it is a nation with several military coups, or prolonged war, or the only tax haven nation in your sample?
In the mentioned article, this was done by removing only one unit at once. In this case, it is possible to prepare charts where the reader can see the robustness of your estimates to the outliers. As anecdote, at some point of the revision stage of my publication Statistical disclosure and economic growth: What is the nexus?, I prepared these kind of charts. However, it was necessary to shorten the article and I ended up removing that part. I am pretty certain that I will include these kind of charts when applying panel data methodologies in some of my future articles. Even if I don’t include them in future articles, I will use it almost routinely during the preliminary analyses in my articles.
If you consider that this post can be useful for you, unfortunately, not all are good news. If you are using a software where programming is not possible, you will not be able to apply this. In this post, I will describe how to apply this in R and in Eviews. The only problem that you will face is the accommodation of these codes to your estimation method and data. For example, I do the demonstration in Eviews using the Panel Group Mean FMOLS estimator, and using the Mean Group estimator in R.
In the demonstration, I employ one of the very few provincial data of Cuba that can be used to illustrate the procedure. Concretely, I use average salary and investment per capita during the period 1999-2010 for the fourteen provinces and the special municipality (Isla de la Juventud). Then, I will estimate regressions using average salary as dependent variable and investment as regressor. Interestingly, before running the estimations, I did not expect a significant coefficient for investment because the Cuban government has a very centralized mechanism to allocate investments, and the inefficiency in the allocation of the resources is rampant. However, I got a massive surprise because as you will find out in the demonstration example that I will develop in the codes, the coefficient for investment per capita was positive and highly significant. Of course, the period is very short, and we did not perform any robustness check other than the one described in this post.
You can download the data here, the R code here and the Eviews code here (The R and Eviews codes are in txt, but you can change them to *.R and *.prg). If you plan to apply this using Eviews and you want to apply the iterative removal of more than one unit at once, you still will need the R code to obtain the matrix with the combinations. Both codes have some comments for a better understanding of the less experienced readers, and they contain the examples for the removal of one unit (province), and for two units. As I explain in the R code, you can modify the code to analyze the case of the removal of more than two units. Although not mentioned in the Eviews code, it is also easily modifiable for the case of the removal of more than two units at once. As usual, I have created a YouTuBe video in English and another in Spanish describing all this. I strongly recommend to watch the video because there I will provide important complementary information.