An Introduction to Natural Language Processing (NLP)
Everything we express (either verbally or in written) carries huge amounts of information. The topic we choose, our tone, our selection of words, everything adds some type of information that can be interpreted and value can be extracted from it. In theory, we can understand and even predict human behavior using that information.
But there is a problem: one person may generate hundreds or thousands of words in a declaration, each sentence with its corresponding complexity. If you want to scale and analyze several hundreds, thousands, or millions of people or declarations in a given geography, then the situation is unmanageable.
Data generated from conversations, declarations, or even tweets are examples of unstructured data. Unstructured data doesn’t fit neatly into the traditional row and column structure of relational databases, and represent the vast majority of data available in the actual world. It is messy and hard to manipulate. Nevertheless, thanks to the advances in disciplines like machine learning, a big revolution is going on regarding this topic. Nowadays it is no longer about trying to interpret text or speech based on its keywords (the old fashioned mechanical way), but about understanding the meaning behind those words (the cognitive way). This way, it is possible to detect figures of speech like irony or even perform sentiment analysis.
Natural Language Processing or NLP is a field of artificial intelligence that gives the machines the ability to read, understand and derive meaning from human languages.
It is a discipline that focuses on the interaction between data science and human language, and is scaling to countless industries. Today, NLP is booming thanks to huge improvements in the access to data and increases in computational power, which are allowing practitioners to achieve meaningful results in areas like healthcare, media, finance, and human resources, among others.
[Related article: Essential NLP Tools, Code, and Tips]
Use Cases of NLP
In simple terms, NLP represents the automatic handling of natural human language like speech or text, and although the concept itself is fascinating, the real value behind this technology comes from the use cases.
NLP can help you with lots of tasks and the fields of application just seem to increase on a daily basis. Let’s mention some examples:
- NLP enables the recognition and prediction of diseases based on electronic health records and patient’s own speech. This capability is being explored in health conditions that go from cardiovascular diseases to depression and even schizophrenia. For example, Amazon Comprehend Medical is a service that uses NLP to extract disease conditions, medications, and treatment outcomes from patient notes, clinical trial reports, and other electronic health records.
- Organizations can determine what customers are saying about a service or product by identifying and extracting information in sources like social media. This sentiment analysis can provide a lot of information about customers choices and their decision drivers.
- An inventor at IBM developed a cognitive assistant that works like a personalized search engine by learning all about you and then remind you of a name, a song, or anything you can’t remember the moment you need it to.
- Companies like Yahoo and Google filter and classify your emails with NLP by analyzing text in emails that flow through their servers and stopping spam before they even enter your inbox.
- To help to identify fake news, the NLP Group at MIT developed a new system to determine if a source is accurate or politically biased, detecting if a news source can be trusted or not.
- Amazon’s Alexa and Apple’s Siri are examples of intelligent voice driven interfaces that use NLP to respond to vocal prompts and do everything like find a particular shop, tell us the weather forecast, suggest the best route to the office, or turn on the lights at home.
- Having insight into what is happening and what people are talking about can be very valuable to financial traders. NLP is being used to track news, reports, comments about possible mergers between companies, everything can be then incorporated into a trading algorithm to generate massive profits. Remember: buy the rumor, sell the news.
- NLP is also being used in both the search and selection phases of talent recruitment, identifying the skills of potential hires and also spotting prospects before they become active on the job market.
- Powered by IBM Watson NLP technology, LegalMation developed a platform to automate routine litigation tasks and help legal teams save time, drive down costs and shift strategic focus.
NLP is particularly booming in the healthcare industry. This technology is improving care delivery, disease diagnosis, and bringing costs down while healthcare organizations are going through a growing adoption of electronic health records. The fact that clinical documentation can be improved means that patients can be better understood and benefited through better healthcare. The goal should be to optimize their experience, and several organizations are already working on this.
Companies like Winterlight Labs are making huge improvements in the treatment of Alzheimer’s disease by monitoring cognitive impairment through speech and they can also support clinical trials and studies for a wide range of central nervous system disorders. Following a similar approach, Stanford University developed Woebot, a chatbot therapist with the aim of helping people with anxiety and other disorders.
However, there is still serious controversy around the subject. A couple of years ago, Microsoft demonstrated that by analyzing large samples of search engine queries, they could identify internet users who were suffering from pancreatic cancer even before they have received a diagnosis of the disease. How would users react to such diagnosis? And what would happen if you were tested as a false positive? (meaning that you can be diagnosed with the disease even though you don’t have it). This recalls the case of Google Flu Trends which in 2009 was announced as being able to predict influenza but later on vanished due to its low accuracy and inability to meet its projected rates.
NLP may be the key to effective clinical support in the future, but there are still many challenges to face in the short term.
How does the future look like?
At the moment NLP is battling to detect nuances in language meaning, whether due to lack of context, spelling errors, or dialectal differences.
On March 2016 Microsoft launched Tay, an Artificial Intelligence (AI) chatbot released on Twitter as an NLP experiment. The idea was that as more users conversed with Tay, the smarter it would get. Well, the result was that after 16 hours Tay had to be removed due to its racist and abusive comments:
Microsoft learned from its own experience and some months later released Zo, its second generation English-language chatbot that won’t be caught making the same mistakes as its predecessor. Zo uses a combination of innovative approaches to recognize and generate conversation, and other companies are exploring with bots that can remember details specific to an individual conversation.
Although the future looks extremely challenging and full of threats for NLP, the discipline is developing at a very fast pace (probably like never before) and we are likely to reach a level of advancement in the coming years that will make complex applications look possible.
Interested in these topics? Follow me on Twitter at @lopezyse where I post all about Data Science, Technology, and the most interesting advances!
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.