General Course Information
1.1 Course details
Course code: | LLAW6313 / JDOC6313 |
Course name: | Law as Data |
Programme offered under: | LLM Programme / JD Programme |
Semester: | First |
Prerequisites / Co-requisites: | No |
Credit point value: | 9 credit / 6 credits |
1.2 Course description
Law is created, transmitted, and performed through speech. By summarizing and extracting information from large amounts of text, we can better understand legal behaviour and institutions. This course has three objectives. First, to introduce some of the building blocks for treating legal text as data. Second, to gain some hands-on experience in analysing text data using the Python programming language. Third, to explore how quantitative methods for text analysis can yield social scientific insights. Motivated examples are provided throughout. No knowledge of Python is necessary although prior exposure to programming will be very helpful. Knowledge of calculus and linear algebra is highly recommended.
Topics to be covered include:
Introduction and Basic Python Syntax. What can text tell us? This module looks at how text data can illuminate questions in law and social science. Basic concepts of Python coding are reviewed, alongside regular expressions.
Machine Learning. How do machines learn patterns from data? By optimizing on an objective function of course! We will undertake a high-level survey of machine learning techniques, beginning from linear models like regressions and progressing to non-linear models like random forests and neural networks.
Pre-Processing Text. Text comprises many characters and symbols and comes in many shapes and sizes. How can we standardize and clean text documents to make them fit for purpose? We will discuss why and how to pre-process text documents and the consequences of pre-processing choices.
Bag-of-Words Representations. We have to represent text numerically to perform computational operations on them. How can we turn documents into vectors? In this module, we study one of the most basic numerical representations of text: the bag-of words (BOW) model. The BOW model is, in essence, a word frequency count that disregards order. Despite its simplicity, the BOW model—including weighted variants like TF-IDF—performs well in many applications.
Topic Modelling. Can a machine identify topics in a text corpus without any human supervision? In this module, we will examine how empirical relationships between words, topics, and documents can be exploited to classify and describe the content of large corpora. Models to be considered include Latent Dirichlet Allocation and Non-negative Matrix Factorization. The topics estimated by these models are probability distributions over words and must be interpreted by the researcher.
Word Embeddings. Apples and oranges are both fruits. But apples are red and oranges are, well, orange. Can we represent words as vectors in a way that captures such similarities and differences? We will study how algorithms such as word2vec generate word embeddings by training neural networks to predict masked words. Word embeddings have improved performance on many natural language processing tasks.
1.3 Course teachers
Name | E-mail address | Office | Consultation | |
Course convenor | Benjamin Chen | benched@hku.hk | CCT 512 | By email |
Learning Outcomes
2.1 Course Learning Outcomes (CLOs) for this course
CLO 1 Describe and explain basic approaches to text as data.
CLO 2 Describe and explain how quantitative methods for analysing text can be used to illuminate empirical relationships germane to law and legal institutions.
CLO 3 Apply their knowledge and skills to assess the viability of academic studies or commercial applications that apply quantitative methods for analysing text.
CLO 4 Demonstrate their knowledge of quantitative methods for analysing text by ideating plausible studies or applications and recognizing their limitations.
2.2 LLM and JD Programme Learning Outcomes (PLOs)
Please refer to the following link:
LLM – https://course.law.hku.hk/llm-plo/
JD – https://course.law.hku.hk/jd-plo/
2.3 Programme Learning Outcomes to be achieved in this course
PLO A | PLO B | PLO C | PLO D | PLO E | PLO F | |
CLO 1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
CLO 2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
CLO 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
CLO 4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Assessment(s)
3.1 Assessment Summary
Assessment task | Due date | Weighting | Feedback method* | Course learning outcomes |
Five In-class quizzes | TBC | 20% | 1, 2, 3, 4 | |
Programming assignment | TBC | 30% | 1, 2, 3, 4 | |
Proposal that applies law as data methods | TBC | 50% | 1, 2, 3, 4 |
*Feedback method (to be determined by course teacher) | |
1 | A general course report to be disseminated through Moodle |
2 | Individual feedback to be disseminated by email / through Moodle |
3 | Individual review meeting upon appointment |
4 | Group review meeting |
5 | In-class verbal feedback |
3.2 Assessment Detail
To be advised by course convenor(s).
3.3 Grading Criteria
Please refer to the following link: https://www.law.hku.hk/_files/law_programme_grade_descriptors.pdf
Learning Activities
4.1 Learning Activity Plan
Seminar: | 3 hours / week for 12 teaching weeks |
Private study time: | 9.5 hours / week for 12 teaching weeks |
Remarks: the normative student study load per credit unit is 25 ± 5 hours (ie. 150 ± 30 hours for a 6-credit course), which includes all learning activities and experiences within and outside of classroom, and any assessment task and examinations and associated preparations.
4.2 Details of Learning Activities
To be advised by course convenor(s).
Learning Resources
5.1 Resources
Reading materials: | Reading materials are posted on Moodle |
Core reading list: | TBA |
Recommended reading list: | TBA |
5.2 Links
Please refer to the following link: http://www.law.hku.hk/course/learning-resources/