In this post, I summarize some famous examples of selection bias in finance and economics. Hopefully they serve as a useful reminder that even when identification and estimation are sound, there are always lingering concerns about external validity.
In general, the data we use are obtained from a third-party provider who collect data based on their own criteria. This means that there are generally two types of selection biases that one may worry about:
Compustat, perhaps one of the most widely used source for accounting and financial data, primarily covers public companies. This is an important consideration when one studies topics that may impact private and public firms differentially such as the role of financing constraints or investment behavior.
To see the sampling bias of Compustat most bluntly, consider this table from Crouzet and Mehrotra (2020) who juxtapose the Quarterly Financial Report (QFR) data with Compustat:
The last column reports the average value for the Compustat manufacturing segment, while the first four columns report the distribution of equivalent statistics for the QFR sample. The Compustat average is close to the average size of the top 1% of the QFR sample!
So how much does this affect results? Take a look at this fascinating figure from Zwick and Mahon (2017) who estimate the effect of temporary tax incentives on investment: