Natural language processing augments analytics and data use
Natural language processing is a key feature of modern BI and analytics platforms that simplifies and democratizes analytics across the company.
As organizations compete to operationalize data, analyze it and generate predictions, it’s necessary to empower business decision-makers and data professionals. Instead of typing queries using a query language, NLP allows non-technical users to simply type in a natural language query. The platforms also provide other assistive capabilities such as type-ahead and popular search phrases to make working with data even easier.
If businesspeople can ask their questions, it liberates sophisticated users from doing mundane tasks and it’s also easier to collaborate on model engineering, said Cindi Howson, chief data officer at BI and AI-driven analytics platform provider ThoughtSpot.
“In ThoughtSpot, it will tell you [a metric in] this zip code is 300% higher than that zip code which a businessperson can take and tell the data scientist, ‘This is where I want you to focus your efforts,'” she said.
ThoughtSpot and other BI and analytics vendors, such as Qlik, have formed partnerships with companies that specialize in NLP to extend their capabilities. For example, ThoughtSpot and VoiceBase announced a partnership in 2020 to make it easier to search voice data, share insights and use them to drive business value. That way, it’s possible to understand how many angry phone calls a call center received and whether a representative’s empathetic treatment of a customer led to increased sales.
NLP has many use cases, such as predicting litigation risks in auto claims. NLP can complement phone conversations between a claims adjuster and various parties in multiple ways.
“NLP can also understand [other] unstructured data such as notes, assigning a qualitative assessment indicating the likelihood the insured driver was at fault,” said Kieran Wilcox, director of claims solutions at AI-as-a-service company Clara Analytics. “Another capability we’re starting to see is that NLP can fill in missing information that might be in the claim adjuster’s notes but were never included in the structured data.”
It’s not what people say, it’s what they mean
Ten different people may ask the same question 10 different ways, which is why semantic relationships are so important. Worse, humans don’t always say what they mean.
“We let customers define their own synonyms but getting to some industry taxonomies is where I think the technology will actually go,” Howson said.
Graph technologies are used to understand relationships among words, individuals and things. Graphs can recommend content or suggest popular queries. It’s an ideal way to gain additional information over relationships extracted from natural language data, said Paul Milligan, director of product strategy at Linguamatics, an NLP text mining products and solutions vendor.
“Different problems require different NLP methodologies, so it’s important [to answer] the questions the analyst has,” he said. “Also, look for clarification on performance and scalability and published articles that [explain] use cases and accuracy figures.”
Embedded analytics on the rise
Meanwhile, embedded analytics are making their way into many kinds of apps, whether for ranking game players or providing a visual dashboard for decision-making purposes. Before grabbing an NLP API and plugging it into an application, it’s wise to understand what the total cost of using that API will be, including training data, model development and model deployment. At scale, the costs may be prohibitive.
Data scientists and analysts seek operational support, source and version control, and means of distribution as they need access to data and models.
“A remaining bottleneck is the lack of training data in domains such as healthcare where data isn’t really accessible due to privacy concerns,” Milligan said. “Human-in-the-loop tools can alleviate this by providing an initial semiautomated process using a small amount of training material with more automation over time as more data is reviewed.”
Voice interfaces are on the horizon, but there are challenges
NLP experts often disagree about how advanced NLP state of the art is. The ones who are the most confident point to Alexa and Siri as evidence that voice interfaces work, but skeptics use the same examples to underscore their imperfections.
The right measure isn’t a universal level of accuracy, like 95%, but an accuracy level that’s appropriate for the use case. Usually, if Alexa or Siri misunderstands a query, it’s a mildly annoying user experience. However, providing a patient with the wrong medical diagnosis could be malpractice. While using synthetic data in healthcare is an option, it may not be adequate to test models, Milligan said.
Right now, BI and analytics platforms require users to type in their queries because it’s a simpler problem to solve than speech recognition. Natural language understanding can falter for myriad reasons, including a failure to understand foreign or domestic accents and individual speech habits. Analytics and BI platforms are judged on their ability to analyze information accurately, so it’s inadvisable to race forward with a voice interface that may negatively impact the platform’s accuracy and the vendor’s reputation.
“We are ready for a revolution in augmented analytics with recent advancements in text-based modeling and multimodal deep learning architectures that combine written text with other modalities such as video and audio,” said Toshish Jawale, CTO and co-founder of Symbl, a conversational intelligence platform for developers.
Traditional NLP tasks have become significantly more sophisticated, Jawale said. The dependency on high-level skills to hand curate a specifically targeted model is being required less and less.
“More out-of-the-box models are able to perform general purpose NLP tasks without the need for larger training cycles and data curation,” he said.
For example, zero-shot and full-shot learning techniques have enabled systems that can generalize themselves enough to perform tasks they weren’t trained to do specifically. These capabilities make it possible for someone with a basic NLP understanding to build sophisticated systems while focusing on the business problem at hand.
“Human language is full of semantic and syntactic nuances,” said Abhishek Pakhira, COO of AI solution provider Aureus Tech Systems. “NLP helps machines make sense of these nuances to determine context and exact meaning in verbal and written language, which can be everything from eDiscovery to a chatbot helping a customer troubleshoot a problem. Perhaps the most fascinating aspect of NLP is ‘transfer learning,’ where the machine can take learnings from one context and apply them to another.”
Basic data literacy is still wise
The whole point of using NLP in analytics is to simplify the use of the platform so less sophisticated users can take advantage of it, but ease of use only goes so far. It doesn’t teach the masses how to think like an analyst, though through practice and interacting with the platform even average business professionals can learn how to ask questions that result in better quality answers.
More fundamentally, people within an insight-driven enterprise should have a 101-level understanding of data literacy, meaning they have a basic understanding of data, the data lifecycle and the need for data governance. While platform use may not require it, a common level of understanding helps a company build a data-driven culture.