There are numerous examples of how AI is applied in the workplace to help people with routine, manual tasks so they can focus on more high-value and sophisticated tasks. Specifically, AI can lend a much-needed helping hand with creating text summaries from hundreds of pages long documents. At the Data Innovation Summit, Nina Hristozova, Data Scientist & Milda Norkute, Senior Designer at Thomson Reuters will share their experience with creating an AI-powered summarization tool that acts almost like a helpful colleague to their editors, aiding them by creating text summarization of lengthy court cases. We invited Nina and Milda to tell us a bit more about how the initiative started, how it was accepted by the editors and how they achieved explainability.
Hyperight: Hello Nina and Milda, I’m very glad to welcome you to the 6th edition of the Data Innovation Summit. As an intro to our discussion, please tell us a bit more about yourself and your background.
Milda Norkute: Sure, I guess I can go first. I am a Senior Designer at Thomson Reuters Labs. The Labs pursue emerging trends in technology and data-driven innovation through collaborative experimentation with customers across Thomson Reuters core business segments. My role is focused on UX research and design of our concepts to figure out how and where to put the human in the loop (HITL) in AI-powered systems.
I have been part of the labs for just over two years now. Before the labs, I worked at Nokia and CERN and I did a masters in human-computer interaction and design, which was my way of “breaking into tech”. I have completed a bachelors in Psychology and worked in media and advertising research for two years before deciding that I wanted to pursue a career in technology. I have never regretted the switch and was lucky to end up at the labs where I get to work with many interesting problems, for example looking at what is the best way to surface different AI recommendations or what are the different ways to explain the AI recommendations to users and many others.
Nina Hristozova: And I am a Data Scientist also working with the Labs, where my focus is primarily on developing Machine and Deep Learning models for Natural Language Processing (NLP) tasks. Some examples of such tasks are text summarization, custom-named entity extraction, text classification, and sentiment analysis. As part of my day-to-day job, I love to drive innovation, incorporate a customer-first mindset and help my stakeholders make informed decisions.
The programming language that I use primarily is Python, with which we have a love-hate relationship as it is not the fastest or most memory-efficient language, but at the same time, it is very easy to write and increases productivity. I got my first hands-on experience with Data Science during my final year at university while working on my thesis in efficient egocentric visual perception combining eye-tracking, a software retina and deep learning. I experienced the whole process, from running experiments and collecting the eye-tracking data to finding the most significant part of it and architecting, training, and evaluating a Convolutional Neural Network (CNN) model.
Hyperight: Your Data Innovation Summit 2021 session will focus on a really interesting case study that is a dream of every writer and editor. You will present to us an AI summarization tool that improves legal editors’ work by creating legal documents summaries, which editors only review instead of writing them from scratch. The use case is a great example of how AI can help in the workplace and improve workflow, freeing up time for people to focus on more important activities. Could you please tell us a bit more about this AI summarization tool and how the initiative started?
Milda Norkute: The legal editorial tool is used by editors who traditionally are responsible for monitoring and collecting new court cases to perform various editorial tasks. One such task is case summarization. During the completion of this task, editors will read the case and write a short summary of the allegations made by the plaintiffs. The editors, who are trained lawyers, must follow strict guidelines on how to write this summary. All the necessary information for this task is available in the original court case, which can range from 10 to 100 pages. As you can imagine, reading these cases and finding the needed information for writing the allegations summary is not an easy task, it can take quite a bit of time and is also quite repetitive. We explored many different models and approaches before arriving at this build of an AI-powered summarization tool.
Nina Hristozova: The model we ended up with is a Pointer Generator network. The model was trained from scratch based on nearly 1 million court cases. One of the reasons why we chose this model was because we could input ~5’000 words. Apart from standard evaluation metrics like ROUGE, we measured the quality of the summarization model in a blind evaluation experiment conducted with editors. In the evaluation, we asked editors whether a summary, either generated by the model or by another editor – without them knowing which – would be publishable with minor edits. We measured that 75+/-10% of the summaries produced by the model were publishable with minor edits, compared to 88+/-10% of the summaries produced by human editors from scratch. From this measurement, we concluded that the editors should review the summaries and if needed add enhancements before it is published instead of pursuing a fully automated approach. This has greatly reduced the time spent by legal editors in the summarization of court case filings.
Hyperight: People are often unsure about AI and how it might affect their day to day work. Did you experience any pushback about this project or any other challenges in its implementation?
Milda Norkute: The summarization tool was accepted very positively. From the very beginning, editors realized that this was not a job replacer, but rather a job enhancer that augmented their workflow for the better. The editors said they were happy about the time they were able to save and said it was much easier to review and edit the summary if needed instead of writing it from scratch. They were able to spend time on the higher level more sophisticated task of summary review. From the conversations we had with them, it sounded like most of them viewed the summarization tool as a colleague who helps them to do their job. In fact, I would say that most of the stakeholders we work with on projects like this are excited about the value AI solutions can add to their work.
Nina Hristozova: One of the challenges we did experience was ensuring trust in the outputs of the AI. We ran additional iterations of the project where we included explainability features that enhanced editors trust in the summaries.
I would add that it’s imperative to have humans involved throughout the process and this project would not have been possible without the editor subject matter experts (SMEs) offering their input. The SMEs knew this tool was being built and were helping us to make it as good as possible. We closely tailored our experiments to their day-to-day work.
Hyperight: One of the main points of your session is the explainability layer added to your machine-generated summaries. AI explainability is a broad subject, but could you please tell us what the concrete goals you were trying to achieve in your case and the most important aspects you took into account for adding explainability to the AI system were?
Nina Hristozova: Since the AI model has been in active use, the primary task of the editors has become to review and edit the machine-generated summaries rather than creating them from scratch based on the long input documents. However, to validate the machine-generated summaries, the editors still had to review the initial pages of the court case manually to identify the factual correctness of the machine-generated summary and whether it captures everything of interest. So, knowing that our next question was: “How can we help the review process now?” In effect, how could we better improve trust in the outputs?
Milda Norkute: Exactly and one of the things we heard in our conversations with the editors was that one feature they would really like would be the possibility to see where the summary came from. So, we had a chat with the team and decided that we would like to spend some time looking into how we could do that. We were also curious as to whether this new feature would help the editors trust the summarization tool more as well as make them more efficient. We wanted this new feature to have an additional positive impact on the users.
Hyperight: And as the last point, how do you see the future unravelling in terms of machine-generated text? What are the most significant trends we can expect when it comes to applying AI in various areas of producing text documents?
Nina Hristozova: Natural Language Processing (NLP) presents more and more new challenges to the research community every day. Some of the topics that I am personally very passionate about are Human-Centric AI, Monitoring Data Drift and Active Learning. I love these topics as what they all have in common is that they present a fresh way of looking at already established processes.
For example, let’s look at the field of Human-Centric AI. This includes AI Ethics, AI Explainability, Trustworthy AI, etc. Up until now, the main focus when doing data science research and applied data science was to improve the performance of models/tasks as much as possible – even if only by 0.1%. Now with all the incoming regulations and with the advancements in compute power, going to cloud, etc., we can expect greater adoption of AI models. Therefore, the emerging area of HCAI is becoming crucial for the adoption of AI. The more we talk about them, the better for everybody!
Milda Norkute: I think one of the limitations of where and how AI can be applied comes from the computing power needed and, therefore, the cost of creating and maintaining different AI solutions. I expect that future work will focus on exploring how we can reduce the impact of those issues. Data availability to train the models on is another pain point – not many companies have the needed data in the quantity and quality that is required to build powerful, well-performing AI models. So, I would guess that there will be more work and progress in the area of transfer learning, where the application of knowledge gained from completing one task is used to help solve a different, but related problem. More specifically, within the area of NLP at the moment, it is tricky to work with very long text documents – you have a limit on how many tokens you can use, for example, for model training. So I think this will be another area where work will be done to bypass these limits. Finally, I also think that AI Explainability will remain an important topic of research and I personally look forward to doing more work in this area.