Upcoming Live Training: NLP Fundamentals with Leonardo De Marchi
I had the pleasure of presenting at eight ODSC events so far. Every time, there is something special on the trip: San Francisco, Boston, London, and so on. Eventually, I started understanding these cities bit by bit, conference by conference, restaurant by restaurant. But what is really special in an ODSC conference is the community: I’m still in touch with at least one attendee per event, and it does not happen in any other conference.
I also saw the AI field growing in these conferences, as there are always presentations on cutting edge topics by great and knowledgeable industry experts from around the world.
This year, unfortunately, is different. But all crises bring new opportunities and this one is no different. ODSC is launching a new remote training platform, AI+ Training, that will allow anyone in the world to access world-class training in machine learning, deep learning, NLP, and other hot topics in the realm of AI.
I’m lucky enough to be part of this new offering with a brand new course on natural language processing this August 5th: NLP Fundamentals.
I’ve decided to create a brand new course on NLP as in the past five years the field changed drastically. I’ve been in the AI space for ten years now and I vividly remember how NLP was once viewed in the conference sphere: defeat. Many researchers spend years and years on rule-based and statistical NLP, always chasing the magic formula that will solve NLP riddles, always reaching some sort of plateau. At that time, neural networks started getting traction and researchers started exploring new ways of using them in NLP.
Fast forward three years and there was a groundbreaking new algorithm: word2vec. We finally had a way to project words in a mathematical space and perform mathematical operations on them. The atmosphere changed completely, and there was a rediscovered enthusiasm and hope. It’s now possible to subtract, add, etc words to each other in a mathematical space and obtain another world that made sense.
Another breakthrough four years later (2017) was the introduction of transformers. Transformers are designed to handle an ordered sequence of data, for example, Recurrent Neural Networks. Transformers allow for a better parallelization than RNN as they don’t require the data to be processed in order. They can process the end of the sentence before the beginning.
Now, state of the art models are built using transformers. Two particularly successful projects are GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) that we will see in my upcoming training.
The workshop will follow this evolution of NLP, but only keeping the approaches that are still used in the industry. It’s divided into five parts, all with a practical coding example.
Here is what we are going to see:
Lesson 1: Text Representation (60m)
Familiarize yourself with NLP fundamentals and text preprocessing to prepare the data for our models. We will go through the main steps like removing stopwords, stemming, one-hot encoding, and more.
Lesson 2: Topic Modeling (45m)
We will see what LDA is and how it can help to extract information from documents. We will also try different clustering techniques and implement a non-negative matrix factorization.
Lesson 3: Text Classification (30m)
We will learn how it’s possible to represent text and how a classifier can use this representation. We will use TF-Idf and experiment with a couple of supervised learning models.
Lesson 4: Introduction to Deep Learning in NLP (45m)
Understand word embedding, how it works, and how to use it. We will go through the main concepts behind word embedding and see some practical examples using the Gensim library.
Lesson 5: Overview of Advanced Deep NLP (15m)
We will introduce the most recent developments of deep learning in NLP, in particular, we will see how to leverage BERT and ELMo and their pre-trained models to solve NLP problems.
More on Leonardo De Marchi:
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving.
He now works as a Head of Data Scientist and Analytics in Badoo, the largest dating site with over 500 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning and Machine Learning training. More details on the workshops can be found here.
He is also a contractor for several companies and for the European Commission, as an expert in AI and Machine Learning. As an author he wrote “Hands-On Deep Learning” and he authored an online training course for O’Reilly, Introduction to Reinforcement Learning.
In the academic world he also helped set-up the PhD centre on Interactive Artificial Intelligence and will take part in the Inner Assessment Board to assign funding to Irish research in AI.