5 Hands-on Skills Every Data Scientist Needs in 2020 — Coming to ODSC East 2020
At ODSC events, our goal is to provide training and workshops that help everyone from beginners to experienced data scientists and software engineers accelerate their hands-on skills in all areas of data science and AI. Here are our top five hands-on training focus areas that every data scientist should know and that we’re paying extra attention to at ODSC East 2020 this April 13–17.
[Related article: Announcing the First ODSC East 2020 Speakers!]
Recent breakthroughs in Natural Language Processing — coupled with the fact that many companies are awash in human language data — solidifies this as one of the most in-demand hands-on skills of 2020. NLP transformer architectures was one of ODSC speakers’ favorite topics in 2019 due to major advances such as OpenAI’s GPT-2, Google’s BERT, DiDI’s Elmo, and FaceBook’s RoBERTa. API libraries like Hugging Face’s transformer library have greatly accelerated their adoption. These pre-trained models, especially BERT, have been especially hailed by many as NLPs Imagenet moment. By employing new techniques like bi-directional sequencing and transformers, these models are saving data scientists the time and expenses normally required to train NLP models, thus marking a major development. Firing up and tuning these pre-trained models should be top of anyone’s list for 2020.
Deep Learning and Machine Learning
Hands-on skills with deep learning and machine learning are the bread and butter of any active practitioner. TensorFlow, PyTorch, Keras, and scikit-learn are a few of the popular tools and frameworks that saw major releases in 2019 that further cemented their position as the leading machine learning and deep learning tools — and are featured in many of our hands-on training sessions. The release of TensorFlow 2.0 kept its position as the top framework, but PyTorch continued to get a lot of traction in 2019. Getting some hands-on experience with the latest releases in these prolific frameworks is a must for 2020.
MLOps & Workflow
Our new MLOps and Data Engineering focus area tracks coincide with the massive ramp-up of efforts to increase the percent of data science projects deployed to production. Machine Learning lifecycle tools like MLFlow and Kubeflow continue to grow in popularity, as do workflow tools like Airflow. Interest in AutoML grew exponentially in 2019, given its potential as a productivity tool in all stages of the machine learning life cycle. Sessions around labeling & annotation (LabelImg etc), model interoperability, pipelines, deployment, and testing saw increased interest. This coupled with the fact that 2019 saw a significant drop in the cost of modeling helped accelerate deployment in production environments. Thus understanding how to use some of these tools to build, test, deploy, and monitor your model in production will be the norm in 2020
Trusted and Responsible AI
Increased model deployment in the real-world has raised the importance of Trusted and Responsible AI greatly. Hands-on experience in security, privacy, fairness, and explainability is pretty much a requirement for anyone practicing data science today. Tools like IBM’s AI Fairness 360 Toolkit and Google’s’ Differential Privacy library — which allows one to draw insights from massive datasets while protecting user privacy — were but two of many popular projects in 2019 projects that allowed teams to put Responsible AI into practice. Microsoft’s SEAL, TensorFlow Privacy, Advertorch RBC Capital, InterpretML, ALIBI were some of the additional tools released for this category that practitioners can use to implement responsible AI.
ArXiv published over 21,000 papers on AI and data science topics in 2019 alone which doubled 2018’s figure. Research is not an area one would normally associate with hands-on training sessions; however, more experienced practitioners can benefit by staying current on emerging research topics, especially those that quickly move to applied applications. For example, Pieter Abbeel, a leading researcher from the UC Berkeley BAIR lab, ran a deep reinforcement learning session that was very well-received at ODSC West and is a workshop topic we will continue to explore in 2020. Other research topics that we are excited about in terms of real-world potential include Federated Learning, Advances in Recommendation Systems, Adversarial Deep Learning, Active Learning, Semi-supervised & Self-supervised learning, Causal Inference with Machine Learning and Detecting AudioVisual Fakes, and Deepfakes are all areas where we will be hosting hands-on workshops on in 2020
Bonus Skill — AI for Climate
AI for social good is an important track at all our events. We are excited to host our first AI for Climate track, and are welcoming industry experts such as Microsoft researcher Lester Mackey, who won the $50K Prize4Life ALS disease progression prediction challenge. He has also won prizes for temperature and precipitation forecasting in the yearlong, real-time $800K Subseasonal Climate Forecast Rodeo. Experts like Lester Mackey are helping us understand how we can put our data science talents to use in tackling one of the most important issues of our time.
[Related article: Announcing ODSC East 2020!]
ODSC East 2020
Hands-on skills are in demand, now more than ever. With 280 speakers no other conference, or format for that matter, gives you the breadth and depth of hands-on training in the shortest amount of time. ODSC is the perfect conference to pick up and expand on new data science skills. Whether you’re a beginner or a seasoned pro, our events aim to set you up to continue your training needs for the remainder of the year and beyond. Learn more about ODSC East 2020 hands-on training sessions here and register now for 60% off by Friday, January 10th, 2020