Law as Data pp. 21–57
DOI: 10.37911/9781947864085.02
2. Big Data, Machine Learning, and the Credibility Revolution in Empirical Legal Studies
Authors: Ryan Copus, Harvard University; Ryan Hübert, University of California, Davis; and Hannah Laqueur, University of California, Davis
Excerpt
The so-called credibility revolution changed empirical research (see Angrist and Pischke 2010). Before the revolution, researchers frequently relied on attempts to statistically model the world to make causal inferences from observational data. They would control for confounders, make functional form assumptions about the relationships between variables, and read regression coefficients on variables of interest as causal estimates. In essence, they would rely heavily on ex post statistical analysis to make causal inferences. The revolution centered around the idea that the only way to truly account for possible sources of bias is to remove the influence of all confounders ex ante through better research design. Thus, since the revolution, researchers have attempted to design studies around sources of random or as-if random variation, either with experiments or what have become known as “quasi-experimental” designs. This credibility revolution has increasingly brought quantitative researchers into agreement that, in the words of Donald Rubin, “design trumps analysis” (Rubin 2008).
However, the research landscape has changed dramatically in recent years. We are now in an era of “big data.” At the same time as the internet vastly expanded the number of available data sources, sophisticated computational resources became widely accessible. This has opened up a whole new frontier for social scientists and empirical legal scholars: textual data. Indeed, most of the information we have about law, politics, and society is contained in texts of one kind or another, almost all of which are now digitized and available online. For example, in the 1990s, federal courts began to adopt online case records management—known as CM/ECF—where attorneys, clerks, and judges file and access documents related to each case.1 Using the federal government’s PACER database (available at pacer.gov), researchers (both academic and professional) can now easily access the dockets and filings for each case that is filed in a federal court. LexisNexis, Westlaw, and other companies have further improved access by providing raw text versions of a wide range of legal documents, along with expert-coded metadata to help researchers more easily find what they are looking for. And yet, despite the potential of these newly available resources, the sheer volume presents challenges for researchers. A core problem is how to draw substantively important inferences from a mountain of often unstructured digitized text. To deal with this challenge, researchers are turning their attention back toward the tools of statistical analysis. As many of the essays in this volume demonstrate, there is now a surging interest among researchers in one particularly powerful tool of statistical analysis: machine learning.
This chapter addresses the place of machine learning in a post–“credibility revolution” landscape. We begin with an overview of machine learning and then make four main points. First, design still trumps analysis. The lessons of the credibility revolution should not be forgotten in the excitement around machine learning; machine learning does nothing to address the problem of omitted variable bias. Nonetheless, machine learning can improve a researcher’s data analysis. Indeed, with growing concerns about the reliability of even design-based research, perhaps we should be aiming for triangulation rather than design purism. Further, for some questions, we do not have the luxury of waiting for a strong design, and we need a best approximation of answer in the meantime. Second, even design-committed researchers should not ignore machine learning: it can be used in service of design-based studies to make causal estimates less variable, less biased, and more heterogeneous. Third, there are important policy-relevant prediction problems for which machine learning is particularly valuable (e.g., predicting recidivism in the criminal justice system). Yet even with research questions centered around prediction, a focus on design is still essential. As with causal inference, researchers cannot simply rely on statistical models but must also carefully consider threats to the validity of predictions. We briefly review some of these threats: GIGO (“garbage in, garbage out”), selective labels, and Campbell’s law. Fourth, the predictive power of machine learning can be leveraged for descriptive research. Where possible, we illustrate these points using examples drawn from real-world research.
Bibliography
Abrams, D., M. Bertrand, and S. Mullainathan. 2012. “Do Judges Vary in Their Treatment of Race?” Journal of Legal Studies 41 (2): 347–383.
Angrist, J. D., and J.-S. Pischke. 2010. “The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics.” Journal of Economic Perspectives 24 (2): 3–30.
Athey, S., and G. W. Imbens. 2017. “The State of Applied Econometrics: Causality and Policy Evaluation.” Journal of Economic Perspectives 2 (31): 3–32.
Bang, H., and J. M. Robins. 2005. “Doubly Robust Estimation in Missing Data and Causal Inference Models.” Biometrics 61 (4): 962–973.
Berk, R. A., S. B. Sorenson, and G. Barnes. 2016. “Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions.” Journal of Empirical Legal Studies 13 (1): 94–115.
Campbell, D. T. 1979. “Assessing the Impact of Planned Social Change.” Evaluation & Program Planning 2 (1): 67–90.
Chen, D. L., T. J. Moskowitz, and K. Shue. 2016. “Decision-Making Under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires.” Quarterly Journal of Economics 131 (3): 1181–1242.
Chilton, A. S., and M. K. Levy. 2015. “Challenging the Randomness of Panel Assignment in the Federal Courts of Appeals.” Cornell Law Review 101 (1): 1–56.
Copus, R., and R. Hübert. 2017. “Detecting Inconsistency in Governance.” Working paper, Social Science Research Network (SSRN). https://ssrn.com/abstract=2812914.
Diamond, A., and J. S. Sekhon. 2013. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics & Statistics 95 (3): 932–945.
Drake, C. 1993. “Effects of Misspecification of the Propensity Score on Estimators of Treatment Effect.” Biometrics 49 (4): 1231–1236.
Fischman, J. B. 2014. “Measuring Inconsistency, Indeterminacy, and Error in Adjudication.” American Law & Economics Review 16 (1): 40–85.
Goel, S., J. M. Rao, and R. Shroff. 2016. “Precinct or Prejudice? Understanding Racial Disparities in New York City’s Stop-and-Frisk Policy.” Annals of Applied Statistics 10 (1): 365–394.
Grimmer, J. 2015. “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” PS: Political Science & Politics 48 (1): 80–83.
Grus, J. 2015. Data Science from Scratch: First Principles with Python. Sebastopol, CA: O'Reilly Media.
Hastie, T., R. Tibshirani, and J. Friedman. 2008. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd. New York, NY: Springer.
Holland, P. W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–960.
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An Introduction to Statistical Learning. New York, NY: Springer.
Kleinberg, J., H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan. 2018. “Human Decisions and Machine Predictions.” The Quarterly Journal of Economics 133 (1): 237–293.
Kleinberg, J., J. Ludwig, S. Mullainathan, and Z. Obermeyer. 2015. “Prediction Policy Problems.” American Economic Review 105 (5): 491–495.
Lakkaraju, H., J. Kleinberg, J. Leskovec, J. Ludwig, and S. Mullainathan. 2017. “The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables.” In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 275–284. New York, NY: Association for Computing Machinery.
LaLonde, R. J. 1986. “Evaluating the Econometric Evaluations of Training Programs with Experimental Data.” American Economic Review 76 (4): 604–620.
Laqueur, H., and R. Copus. 2016. “Synthetic Crowdsourcing: A Machine-Learning Approach to the Problems of Inconsistency and Bias in Adjudication.” Working paper, Social Science Research Network (SSRN). https://ssrn.com/abstract=2694326.
Martin, A. D., and K. M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the US Supreme Court, 1953–1999.” Political Analysis 10 (2): 134–153.
Mullainathan, S., and J. Spiess. 2017. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31 (2): 87–106.
Nakosteen, R., and M. Zimmer. 2014. “Approval of Social Security Disability Appeals: Analysis of Judges’ Decisions.” Applied Economics 46 (23): 2783–2791.
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.
Ramji-Nogales, J., A. I. Schoenholtz, and P. G. Schrag. 2007. “Refugee Roulette: Disparities in Asylum Adjudication.” Stanford Law Review 60 (2): 295–411.
Robins, J. M., A. Rotnitzky, and L. P. Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846–866.
Rosenbaum, P. R., and D. B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.
Rubin, D. B. 2008. “For Objective Causal Inference, Design Trumps Analysis.” Annals of Applied Statistics 2 (3): 808–840.
Stith, K., and J. A. Cabranes. 1998. Fear of Judging: Sentencing Guidelines in the Federal Courts. Chicago, IL: University of Chicago Press.
Tiller, E. H., and F. B. Cross. 1999. “A Modest Proposal for Improving American Justice.” Columbia Law Review 99 (1): 215–234.
Van der Laan, M. J., E. C. Polley, and A. E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics & Molecular Biology 6 (1): 1–23.
Van der Laan, M. J., and S. Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. New York, NY: Springer Science & Business Media.
Westreich, D., J. Lessler, and M. J. Funk. 2010. “Propensity Score Estimation: Machine Learning and Classification Methods as Alternatives to Logistic Regression.” Journal of Clinical Epidemiology 63 (8): 826–833.