Five tips for successful use of Technology Assisted Review (TAR)

Out-Law Analysis | 15 Feb 2018 | 11:52 am | 5 min. read

ANALYSIS: Courts and tribunals around the world are encouraging the use of technology assisted review (TAR) in the disclosure process. The Supreme Court in the State of Victoria in Australia even made its use compulsory in one case a year ago .

TAR can significantly streamline the document review process, particularly in complex disputes requiring the analysis of large volumes of documents. Document review is not an automated, 'lawyer-free' process - expertise and vigilance is required to maximise the effectiveness of TAR and to ensure slip-ups are avoided. Here are our top five tips to successful navigation of the TAR process.

What is TAR?

TAR is an umbrella term which captures a suite of tools and approaches to legal document review. Broadly, TAR covers the use of any technology which makes the legal document management and review process more efficient and reliable when compared to traditional manual review methods. Although TAR is most commonly used in document review for litigation, there is scope to use some of the tools to manage and organise documents in non-contentious matters.

TAR is suitable for any type of case where there are too many documents for a human to properly consider. Even on small cases, TAR can be used to cluster certain documents so that they are presented to human reviewers in a sensible order. This can help reviewers identify patterns in the documents and can cut down the time it takes to review each document.

The efficiency gains from the use of TAR will be more dramatic on any case involving large numbers of documents, typically greater than 100,000. Of course, the potential efficiencies that can be gained by using TAR need to be balanced with service provider costs. This should be assessed on a case-by-case basis and will depend on the nature of the matter and the amounts in dispute.

TAR incorporates some or all of:

Predictive coding/continuous active learning (CAL): These terms refer to the use of algorithms based on statistical modelling to recall documents that can be considered conceptually similar to a sample set of subjectively reviewed documents, referred to as a 'seed set'. A reviewer then verifies whether the recalled documents are consistent with the coding of the seed set and those results are fed back into the algorithm, which in turn increases the size of the seed set.

Predictive coding should be viewed as a prioritisation tool rather than a substitute for subjective document review. While it can quickly return results more likely to be relevant or useful, some subjective review will still be required to ensure the consistency and integrity of results.

Concept clustering/mining: These terms are used to refer to the organisation and display of document sets or search results in an interactive, graphical format, which enables subsets of documents containing similar and related concepts to be virtually clustered around each other. Concepts can be defined by the user using traditional search terms or 'browsed' by statistical prevalence. For example, the technology may reveal that a set of documents is made up of two separate subsets of documents: one where all documents contain frequent mentions of the words 'train', 'ticket' and 'purchase', and another where all documents contain frequent mentions of the words 'conductor', 'train' and 'fine'.

Although the display configurations vary between service providers, documents which reference particular sets of concepts will often be represented as individual 'dots' within a circle, with lines connecting other circles with similar concepts. Going back to the previous example, a set of documents containing frequent mentions of the words 'train', 'ticket' and 'purchase' would display a logical link to a set of documents containing frequent mentions of the words 'carriage', 'ticket' and 'buy'.

Email threading: This is the use of statistical methods to create logical links between similar emails that would not be considered identical, such as emails that are forwarded to different recipients or email chains which branch off from a main email thread.

'Near duplicates' identification: This covers the use of 'fuzzy' searching, synonyms and custom dictionaries, combined with statistical methods, to identify documents that contain similar concepts that would not be considered identical, for example, the use of idiosyncratic site-specific terms on particular projects.

Top 5 Tips for effective use of TAR

1. Constant vigilance: remember to monitor and re-train the system

TAR is not a 'set and forget' process. It is an iterative process that requires the legal team to think critically about which documents are being pulled into the pool for subjective review, and which ones are being excluded from the subjective review pool. Constant tuning and tweaking, including manual reviews of the documents recalled by the technology, is essential to ensure that the system is getting it right. The algorithm will then be able to refine its searches based on these corrections.

2. One knowledgeable lawyer reviewing a smaller review set trumps multiple lawyers reviewing larger sets

Unlike traditional review methods, which distribute a large review load across a number of reviewers, TAR realises greater efficiency gains when one or two knowledgeable lawyers review a smaller training set correctly. The algorithm, rather than reviewers, can do the leg work in identifying related documents, which can be verified by less experienced reviewers, supervised by the senior reviewers.

It is important that only one or two senior reviewers who are intimately familiar with the subject matter of the dispute perform "sprints" to ensure consistency of the approach and reasoning for coding and to avoid inaccurate or inconsistent coding. Coding errors, especially in the training set, will be amplified over the entire database.

A sprint is the process of reviewing and verifying a random set of documents coded for relevance by less experienced reviewers. Ideally, a sprint will contain involve the review of 10,000 to 15,000 documents, depending on the size of the database. However - and in the right circumstances - as few as 3,000 random documents may be sufficient for a set of one million documents.

3. Consider running multiple simultaneous TAR processes if there are lots of issues in dispute

TAR processes don't work well when there are lots of issues in dispute or lots of categories for discovery. Consider whether separate TAR processes should be performed for each issue or a group of related issues. This can be helpful when there is a large team working on the matter, with different lawyers specialising in different aspects.

The downside to running multiple TAR processes is that it is difficult to avoid 'double-handing' of documents, where the same documents are reviewed twice or even more for different purposes. However, this issue also affects traditional methods of document review.

4. Exploit your silver bullets –if you don't have any, make them yourself

It is worthwhile locating 'example documents' then letting the algorithm do its work to find similar documents. An example document is either a highly relevant contemporaneous document or a 'dummy' document created by the legal team to test or guide the system. These documents should capture relevant content and context so as to be a useful springboard for similarity, email threading and clustering tools.

If an example document you are using is a dummy or not part of your data collection set, ensure it is quarantined and excluded from any disclosure processes.

5. Blind faith is inadvisable – always keep potential blind spots in mind

Although TAR promises some extraordinary benefits, it is not infallible. TAR identifies documents that are more likely to be relevant based on their content. Remember that documents, or patterns of documents, may also be relevant because of what they do not contain. Sometimes, context drives relevance rather than content: for example, in a case of possible fraud, concealment or misrepresentation, a daily report which purports to be a true and accurate record of the day's events but which only lists delay events caused by one party may be relevant to showing that a party was deliberately concealing certain matters.

Similarly, TAR does not work on photographs, drawings or other documents that contain very little text and it does not work well on spreadsheets.

Gemma Thomas and Rahul Thyagarajan are Australia-based construction disputes experts at Pinsent Masons, the law firm behind