Law as Data pp. 313–337
DOI: 10.37911/9781947864085.11

11. Case Vectors: Spatial Representations of the Law Using Document Embeddings

Authors: Elliott Ash, ETH Zurich; and Daniel L. Chen, University of Toulouse

Excerpt

Law is an artifact of language. In this chapter, we ask what can be gained by applying to the law new techniques from natural language processing that translate words and documents into vectors within a space. These vector representations of words and documents are information-dense—in the sense of retaining information about semantic content and meaning— while also being computationally tractable. This combination of information density and computational tractability opens up a wide potential realm of mathematical tools that can be used to generate quantitative and empirically testable hypotheses about the law.

This new approach to legal studies addresses the shortcomings of existing methods for studying legal language. At a theoretical level, even the best formal models of legal decision-making require strong simplifying assumptions that treat the law metaphorically. The case-space literature, for example, assumes the language of law to be a function over an idealized geometric space, where the law separates the fact space into “liable” and “not liable” or “guilty” and “not guilty.”1 Case-space models give us some insight into the legal reasoning process, but they have been somewhat limited because it has been unfeasible to empirically realize the legal case-space model in any formal mathematical way.

Likewise, because law consists of text, the standard empirical research methods are somewhat limited in the questions that can be asked. Traditionally, text-based empirical legal studies research has relied on small-scale datasets, where legal variables are manually coded (e.g., Songer and Haire 1992). Hand-coding legal documents is labor-intensive and requires subjective and simplifying decisions.

Bibliography

Ash, E. 2016. “The Political Economy of Tax Laws in the US States.” Working paper, Columbia University, New York, NY. https://pdfs.semanticscholar.org/1a4d/365571252a70c9db0e94b0a5d98f128e7c78.pdf.

———. 2018. “Emerging Tools for a ‘Driverless’ Legal System: Comment.” Journal of Institutional & Theoretical Economics 174 (1): 206–213.

Ash, E., D. L. Chen, and S. Naidu. 2017. “Ideas have Consequences: The Impact of Law and Economics on American Justice.” Working paper. https://users.nber.org/~dlchen/papers/Ideas_Have_Consequences.pdf.

Ash, E., D. L. Chen, and A. Ornaghi. 2018. “Implicit Bias in the Judiciary: Evidence from Judicial Language Associations.” Working paper. https://users.nber.org/~dlchen/papers/Implicit_Bias_in_the_Judiciary.pdf.

Ash, E., D. Chen, and W. Liu. 2017. “The (Non-)Polarization of US Circuit Court Judges, 1930–2013.” Working paper. https://users.nber.org/~dlchen/papers/Polarization_of_US_Circuit_Court_Judges_slides.pdf.

Ash, E., W. B. MacLeod, and S. Naidu. 2018. “The Language of Contract: Promises and Power in Union Collective Bargaining Agreements.” Working paper. http://elliottash.com/wp-content/uploads/2019/03/paper-ash-macleod-naidu-2019-03-30.pdf.

Blei, D. M. 2012. “Probabilistic Topic Models.” Communications of the ACM 55 (4): 77–84.

Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3:993–1022.

Bolukbasi, T., K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. 2016. “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.” In Advances in Neural Information Processing Systems 29, 4349–4357. Red Hook, NY: Curran Associates.

Breyer, S. 2009. “Economic Reasoning and Judicial Review.” Economic Journal 119 (535): F215–F135.

Caliskan, A., J. J. Bryson, and A. Narayanan. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-Like Biases.” Science 356 (6334): 183–186.

Cameron, C., and L. Kornhauser. 2017. “What Courts Do . . . And How to Model It.” Draft chapter of a book-in-progress on the positive political theory of courts.

Carlson, K., M. A. Livermore, and D. Rockmore. 2016. “A Quantitative Analysis of Writing Style on the US Supreme Court.” Washington University Law Review 93 (6): 1461–1510.

Dai, A. M., C. Olah, and Q. V. Le. 2015. “Document Embedding with Paragraph Vectors.” Working paper.

Epstein, L., A. D. Martin, K. M. Quinn, and J. A. Segal. 2007. “Ideological Drift among Supreme Court Justices: Who, When, and How Important?” Northwestern University Law Review 101 (4): 1483–1542.

Fagan, J., and E. Ash. 2017. “New Policing, New Segregation? From Ferguson to New York.” Georgetown Law Journal 106 (1): 25–102.

Ganglmair, B., and M. Wardlaw. 2017. “Complexity, Standardization, and the Design of Loan Agreements.” Working paper, Social Science Research Network (SSRN). https://ssrn.com/abstract=2952567.

Garg, N., L. Schiebinger, D. Jurafsky, and J. Zou. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115 (16): E3635–E3644.

Haire, S. B., D. R. Songer, and S. A. Lindquist. 2003. “Appellate Court Supervision in the Federal Judiciary: A Hierarchical Perspective.” Law & Society Review 37 (1): 143–168.

Iyyer, M., P. Enns, J. L. Boyd-Graber, and P. Resnik. 2014. “Political Ideology Detection Using Recursive Neural Networks.” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1113–1122. Stroudsburg, PA: Association for Computational Linguistics.

Jurafsky, D., and J. H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall.

Kozlowski, A. C., M. Taddy, and J. A. Evans. 2018. “The Geometry of Culture: Analyzing Meaning through Word Embeddings.” Working paper, arXiv:1803.09288 [cs.CL]. https://arxiv.org/abs/1803.09288.

Le, Q., and T. Mikolov. 2014. “Distributed Representations of Sentences and Documents.” In Proceedings of the 31st International Conference on Machine Learning, edited by E. P. Xing and T. Jebara, 32:1188–1196. Beijing, China: PMLR.

Lee, J. A., and M. Verleysen. 2007. Nonlinear Dimensionality Reduction. New York, NY: Springer Science & Business Media.

Leibon, G., M. A. Livermore, R. Harder, A. Riddell, and D. Rockmore. 2018. “Bending the Law: Geometric Tools for Quantifying Influence in the Multinetwork of Legal Opinions.” Artificial Intelligence & Law 26 (2): 145–167.

Levy, O., and Y. Goldberg. 2014. “Dependency-Based Word Embeddings.” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 302–308. Stroudsburg, PA: Association for Computational Linguistics.

Levy, O., Y. Goldberg, and I. Dagan. 2015. “Improving Distributional Similarity with Lessons Learned from Word Embeddings.” Transactions of the Association for Computational Linguistics 3:211–225.

Livermore, M. A., A. Riddell, and D. Rockmore. 2017. “The Supreme Court and the Judicial Genre.” Arizona Law Review 59 (4): 837–901.

McConnell, M. W., and R. A. Posner. 1989. “An Economic Approach to Issues of Religious Freedom.” The University of Chicago Law Review 56 (1): 1–60.

Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. “Distributed Representations of Words and Phrases and their Compositionality.” In Advances in Neural Information Processing Systems 26, 3111–3119. Red Hook, NY: Curran Associates.

Rachlinski, J. J., S. L. Johnson, A. J. Wistrich, and C. Guthrie. 2009. “Does Unconscious Racial Bias Affect Trial Judges?” Notre Dame Law Review 84 (3): 1195–1246.

Rudolph, M., and D. Blei. 2017. “Dynamic Bernoulli Embeddings for Language Evolution.” Working paper, arXiv:1703.08052 [stat.ML]. https://arxiv.org/abs/1703.08052.

Rudolph, M., F. Ruiz, S. Athey, and D. Blei. 2017. “Structured Embedding Models for Grouped Data.” In Advances in Neural Information Processing Systems, 30:250–260. Red Hook, NY: Curran Associates.

Rudolph, M., F. Ruiz, S. Mandt, and D. Blei. 2016. “Exponential Family Embeddings.” In Advances in Neural Information Processing Systems, 29:478–486. Red Hook, NY: Curran Associates.

Ruiz, F. J. R., S. Athey, and D. M. Blei. 2017. “SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements.” Working paper, arXiv: 1711.03560 [stat.ML]. https://arxiv.org/abs/1711.03560.

Songer, D. R., and S. Haire. 1992. “Integrating Alternative Approaches to the Study of Judicial Voting: Obscenity Cases in the US Courts of Appeals.” American Journal of Political Science 36 (4): 963–982.

van der Maaten, L., and G. Hinton. 2008. “Visualizing Data using t-SNE.” Journal of Machine Learning Research 9:2579–2605.

Case Vectors: Spatial Representations of the Law Using Document Embeddings

11. Case Vectors: Spatial Representations of the Law Using Document Embeddings

Excerpt

Bibliography

BACK TO Law as Data