The most common question we hear from students considering LexIQ's essay marking is: "Can AI really mark a law essay as well as a human tutor?" It is a fair question. Legal analysis requires nuanced judgment — can an algorithm really assess whether your IRAC structure is sound or your case citations are appropriate?
We decided to test it properly.
The Methodology
We selected 100 law essays across five subjects (Contract, Tort, Criminal, Public, and Land Law) at varying quality levels (Third through First). Each essay was marked by:
- Two experienced human tutors (practising barristers with 5+ years of university teaching experience)
- LexIQ's AI marking system
All markers were blind to each other's grades. We compared:
- Grade accuracy (within one grade band)
- Identification of key strengths and weaknesses
- Specificity and actionability of feedback
The Results
Grade Accuracy
| Metric | AI vs Tutor 1 | AI vs Tutor 2 | Tutor 1 vs Tutor 2 |
|---|---|---|---|
| Exact grade match | 61% | 58% | 64% |
| Within one band | 89% | 87% | 91% |
| Average score difference | 3.2 marks | 3.8 marks | 2.7 marks |
The key finding: AI-human agreement (89%) was only slightly lower than human-human agreement (91%). This is significant because it shows that the variation between AI and human markers is comparable to the variation between two human markers.
Feedback Quality
Human tutors excelled at:
- Identifying subtle issues with argument flow
- Providing encouragement alongside criticism
- Connecting feedback to specific course content
AI excelled at:
- Consistency — applying the same standards to every essay
- Comprehensiveness — annotating every paragraph, not just the weakest ones
- Speed — 2-3 minutes vs. 1-2 hours
- Specificity — providing concrete rewrites rather than general comments
Where AI Falls Short
The AI occasionally:
- Missed context-specific nuances (e.g., a lecturer's particular approach to a topic)
- Over-penalised unconventional but valid arguments
- Struggled with highly interdisciplinary essays
Where AI Outperformed
The AI consistently:
- Identified descriptive vs. analytical writing more reliably than human markers
- Provided more actionable feedback (specific rewrites vs. "needs more analysis")
- Caught OSCOLA referencing errors that human markers overlooked
- Applied grade calibration more consistently (human markers showed grade inflation for well-written but analytically weak essays)
What This Means for Students
AI essay marking is not a replacement for human tutoring — but it is a powerful complement. The optimal approach is:
- Use AI marking for regular practice — get feedback on every essay you write, not just the ones you can afford to send to a tutor
- Use human tutoring for specific challenges — exam technique, mooting preparation, dissertation supervision
- Use both for high-stakes work — get AI feedback first, revise, then get human feedback on the improved version
The biggest advantage of AI marking is volume. Most students get meaningful feedback on 2-3 essays per year. With AI marking, you can get feedback on 2-3 essays per week. This volume of practice-and-feedback cycles is what drives real improvement.
Our Commitment to Accuracy
We continuously calibrate LexIQ's marking against real Russell Group standards. Our system is trained on the marking criteria used by Oxford, Cambridge, UCL, KCL, Edinburgh, and other leading law schools. We regularly audit AI grades against human markers and update our models accordingly.
We also apply strict grade calibration to prevent inflation — our system is deliberately conservative, because a false sense of security is more harmful than a tough grade.
Judge for yourself. Try the free Instant Essay Diagnosis — paste a paragraph and see whether the grade estimate and feedback match your expectations. Or upload your full essay for complete analysis from £8.99.
