Last year, John Snow Labs and Gradient Flow issued the first NLP Industry Survey, exploring the applications, practices, and challenges practitioners face when deploying natural language processing (NLP) technology. Now, in its second year, the research gives a new outlook on what transpired over the last 365 days and in the wake of the COVID-19 pandemic.
Adoption, investment, and budgets are increasing
Surprisingly, and despite the uncertain times, progress in NLP adoption has been consistent. For example, NLP budgets increased in 2020 — a time when many organizations were forced to scale back on IT projects. This year, investments in NLP have continued on that trajectory, with 60% of tech leaders indicating that their NLP budgets grew by at least 10%, while 33% reported a 30% increase, and 15% said their budget more than doubled.
While it’s encouraging to see budgets increasing so significantly, it’s important to understand how these funds are being put to use. As in the 2020 survey, document classification and named entity recognition (NER) were the most popular NLP use cases. NER is usually the first step in information extraction projects, as it enables users to identify key entities — companies, products, locations, and others — within text.
A sign of the practical benefits of NER is that companies further along the natural language processing adoption curve are using NER at a higher rate compared to early-stage companies. NER is increasingly applied alongside another use case that is drawing attention because of the rise of AI: entity linking and knowledge graphs. Powered by large language prediction models and related open-source alternatives, we can expect growth in question answering (Q&A) and natural language generation (NLG) use cases in the coming years.
A few Natural Language Processing tools stand out in particular
When it comes to the tools making such NLP use cases possible, many technologists are using multiple libraries and cloud services together to bring their projects to life. 83% of respondents reported using at least one of the four major cloud providers, while 53% of respondents reported using at least one of the following NLP libraries popular within the Python ecosystem: Hugging Face, spaCy, Natural Language Toolkit (NLTK), Gensim, or Flair.
For the second year in a row, Spark NLP was the most widely used library by all respondents (31%). Spark NLP is known for its expertise in the healthcare space, an industry wrought with its own unique jargon, nuances, and data-sharing restrictions imposed by higher compliance and Responsible AI requirements. As such, solutions that require no data sharing, like Spark NLP, will remain attractive for users in highly regulated industries, such as healthcare and finance. In fact, Spark NLP is by far the most widely adopted library in healthcare (59%) and finance (40%).
Accuracy remains a top priority
Accuracy has remained the top priority for a majority (44%) of practitioners when evaluating NLP solutions, and it’s also one of the most challenging. This is true regardless of company size, level of maturity, and level of expertise. The good news is that users are properly focused on getting it right from the start, knowing that the efficacy of their models will have subsequent implications.
People still have questions about
The bad news? It’s unlikely that users will ever stop questioning the accuracy of their natural language processing models. Models degrade over time. They behave differently across production environments. Language tends to be local, since people develop jargon constantly. As a result, the technology requires ongoing monitoring and tuning. This is foundational for successful AI projects, and it’s where a lot of organizations struggle. In fact, aside from steep cost (33%), difficulty tuning models is the biggest gripe plaguing users of popular NLP cloud services (28%).
Conclusion on Natural Language Processing Trends
It’s an exciting time for natural language processing, and it will be interesting to see how practitioners cope with these challenges and make progress over the next year. The full 2021 NLP Industry Survey results can be found on the Gradient Flow website. Research like this, as well as real-world use cases of NLP will be presented at the upcoming NLP Summit (October 5–7), a free, virtual event for the AI & NLP community. You can register here.
Learn More About NLP and NLP Research at ODSC West 2021
At our upcoming event this November 16th-18th in San Francisco, ODSC West 2021 will feature a plethora of talks, workshops, and training sessions on NLP and NLP research. You can register now for 30% off all ticket types before the discount drops to 20% in a few weeks. Some highlighted sessions on NLP and NLP research include:
- Transferable Representation in Natural Language Processing: Kai-Wei Chang, PhD | Director/Assistant Professor | UCLA NLP/UCLA CS
- Build a Question Answering System using DistilBERT in Python: Jayeeta Putatunda | Data Scientist | MediaMath
- Introduction to NLP and Topic Modeling: Zhenya Antić, PhD | NLP Consultant/Founder | Practical Linguistics Inc
- NLP Fundamentals: Leonardo De Marchi | Lead Instructor | ideai.io
Sessions on Deep Learning and Deep Learning Research:
- GANs: Theory and Practice, Image Synthesis With GANs Using TensorFlow: Ajay Baranwal | Center Director | Center for Deep Learning in Electronic Manufacturing, Inc
- Machine Learning With Graphs: Going Beyond Tabular Data: Dr. Clair J. Sullivan | Data Science Advocate | Neo4j
- Deep Dive into Reinforcement Learning with PPO using TF-Agents & TensorFlow 2.0: Oliver Zeigermann | Software Developer | embarc Software Consulting GmbH
- Get Started with Time-Series Forecasting using the Google Cloud AI Platform: Karl Weinmeister | Developer Relations Engineering Manager | Google
Sessions on Machine Learning:
- Towards More Energy-Efficient Neural Networks? Use Your Brain!: Olaf de Leeuw | Data Scientist | Dataworkz
- Practical MLOps: Automation Journey: Evgenii Vinogradov, PhD | Head of DHW Development | YooMoney
- Applications of Modern Survival Modeling with Python: Brian Kent, PhD | Data Scientist | Founder The Crosstab Kite
- Using Change Detection Algorithms for Detecting Anomalous Behavior in Large Systems: Veena Mendiratta, PhD | Adjunct Faculty, Network Reliability, and Analytics Researcher | Northwestern University
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform.