ODSC Europe 2024 Virtual Sessions Now Available On-Demand!
We had a great time connecting with Europe’s AI community at ODSC Europe 2024 earlier this month. If you weren’t able to join us, you can still get a taste of the expert-led AI talks that were featured with the on-demand videos listed below. You can watch everything here with a subscription to Ai+ Training.
Tabular Learning: skrub and Foundation Models
Gaël Varoquaux, PhD | Research Director at Inria | scikit-learn Author | Co-Founder of Probabl
While tabular data is central to all organizations, it seems left out of the AI discussion, which has focused on images, text, and sound. In this talk, you will explore how the speaker progressively rethought this process, building machine learning tools that require less wrangling, including a new library, skrub, that facilitates complex tabular-learning pipelines, writing as much as possible wrangling as high-level operations and automating them.
You’ll also discuss the CARTE model, which shows how pretrained models can bring value to downstream table analytics without manually transforming these tables to please the model.
AI Development Lifecycle: Learnings of What Changed with LLMs
Noé Achache | Engineering Manager & Generative AI Lead | Sicara
Using LLMs to build models and pipelines has made it incredibly easy to build proof of concepts, but much more challenging to evaluate the models. As a result, the evaluation step is often neglected leading to pointless iterations and a lack of knowledge on the true performance of the product. In this talk, we will explore the lessons learned from building products that are typical use cases of these technologies, and zoom on a specific use case: a RAG (Retrieval Augmented Generation) tool for a medical company.
Reinforcement Learning with Human Feedback
Luis Serrano, PhD | Author of Grokking Machine Learning and Creator of Serrano Academy
Although LLMs are tremendously successful at generating text, fine-tuning a model still relies on human feedback, commonly through reinforcement learning with human feedback. In this session, you’ll explore A very important step in their fine-tuning, which involves humans evaluating the output. In order to improve the model with human feedback, RLHF is a widely used method. In this talk, we’ll explore several aspects, including:
- How RLHF is used to fine-tune large language models
- Proximal Policy Optimization (PPO)
- Direct Preference Optimization (DPO)
Data Morph: A Cautionary Tale of Summary Statistics
Stefanie Molin | Data Scientist, Software Engineer | Bloomberg | Author of Hands-On Data Analysis with Pandas
Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution.
To illustrate this fact, researchers have generated many datasets that are very different visually but share the same summary statistics. In this talk, Stefanie will discuss Data Morph, an open-source package that builds on previous research from Autodesk (the Datasaurus Dozen) using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. Stefanie will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.
Orchestrating LLM AI Agents with CrewAI
Alessandro Romano | Senior Data Scientist | Kuehne Nagel
This talk will explore the integration of Large Language Models using CrewAI, an open-source software platform designed for orchestrating multiple AI agents. As you watch, the range of topics will cover the fundamentals of LLMs, their integration challenges, and how CrewAI enhances their collaborative capabilities. Key themes include inter-LLM communication, dynamic task decomposition, adaptive learning, and ethical considerations. Attendees will learn how and when to use CrewAI, as well as how it compares to other modules. Through real-world examples, this session will provide insights into leveraging CrewAI to improve LLM efficiency and tackle complex problems across various industries.
Do Large Language Models Have a Duty to Tell the Truth?
Brent Mittelstadt, PhD | Associate Professor | University of Oxford
Careless speech is a new type of harm created by large language models (LLM) that pose cumulative, long-term risks to science, education, and the development of shared social truths in democratic societies
This talk examines the existence and feasibility of a legal duty for LLM providers to create models that “tell the truth.” LLM providers should be required to mitigate careless speech and better align their models with truth through open, democratic processes. Careless speech is defined and contrasted with the simplified concept of “ground truth” in LLMs and prior discussion of related truth-related risks in LLMs including hallucinations, misinformation, and disinformation. The talk concludes by proposing a pathway to create a legal truth duty applicable to providers of both narrow- and general-purpose LLMs and discusses “zero-shot translation” as a prompting method to constrain LLMs and better align their outputs with verified, truthful information.
An Intro to Federated Learning with Flower
Daniel J. Beutel | Co-Founder & CEO | Flower Labs
Federated Learning is revolutionizing the way we think about data privacy and distributed machine learning by enabling model training across multiple devices or servers without centralizing data. In this session, you will explore the fundamental concepts of federated learning and its growing importance in a world where model architectures are data-hungry, but protecting sensitive data is paramount.
Beyond Interpretability: An Interdisciplinary Approach to Communicate Machine Learning Outcomes
Merve Alanyali, PhD | Head of Data Science Research and Academic Partnerships | Allianz Personal
Explainable AI (XAI) is one of the hottest topics among AI researchers and practitioners. These explanations however often focus solely on providing technical interpretations of how a given machine learning model generates a certain outcome. To take a step beyond these technical explanations, the Allianz Personal data science team together with collaborators from the University of Bristol, investigated explaining AI decision-making through a socio-technical lens. In this talk, you will cover how they extended the concept of XAI with their multidisciplinary collaboration and reflections on the insights gained from setting up an interdisciplinary collaboration between industry and academia.
Beyond Aesthetics: Do Text-to-Image Models Equally Serve Everyone?
Nithish Kannen | Researcher (AI Resident) | Google DeepMind
Text-to-Image (T2I) models are set to revolutionize a wide range of industries, from digital arts and advertising to education and beyond, which raises important ethical and social considerations: are these models truly representative of the global cultures and communities embracing them now, and will increasingly use them in the years to come?
In this talk, you’ll explore some of the challenges in T2I development and showcase innovative methods for assessing the cultural competence of T2I models. We’ll explore best practices for building evaluation resources, discuss the importance of cultural diversity in T2I generation, and examine how faithfulness, realism, and diversity interplay in the development of these models. Finally, you’ll discuss some of the early efforts to address these challenges, aiming to create generative models that truly serve the global community.
Why Gaussian Splatting is a New Neural Imagination Engine
Oles Petriv | Co-founder and CTO | Reface
In AI, new techniques are continually emerging to push the boundaries of what’s possible. One such innovation is Gaussian splatting, a method poised to revolutionize the representation of visual and semantic information. In his talk, “Why Gaussian Splatting is a New Neurone Imagination Engine,” Oles Petriv will shed light on why this technique is making waves in the AI community. By offering a more efficient and clear representation of data, Gaussian splatting not only reduces the cost of model training but also simplifies its integration into existing machine learning pipelines, 3D software, and game engines.
Gender Bias in Machine Learning
Shalvi Mahajan | Senior Data Scientist | SAP SE
Gender bias in machine learning is a pervasive issue with significant implications, as it often mirrors and amplifies societal stereotypes, affecting areas like product design and service delivery. Natural language processing models, including Large Language Models (LLMs), are particularly prone to this bias, as they are trained on vast datasets that reflect historical gender norms, leading to problematic assumptions — such as defaulting to female for “nurse” and male for “doctor.” This bias is largely fueled by the use of biased training data, which reinforces stereotypes in subtle yet impactful ways. Addressing this challenge requires improving dataset diversity, refining algorithms for greater transparency, and implementing fairness-aware techniques during model development. Additionally, the development of ethical guidelines and regulations is crucial to ensure responsible and accountable deployment of ML systems. This talk will explore these issues and potential solutions for creating more equitable AI systems.
Using Generative AI to Better Understand B2B Audiences: from Topic Modelling to Text Classification
Lourens Walters | Senior Data Scientist | Informa
In the complex and data-rich world of B2B marketing, understanding audience interests and improving data quality is paramount for driving successful campaigns. The IIRIS team at Informa has been at the forefront of this challenge, supporting the promotion of 1,500 trade shows by collecting, enriching, and analyzing a staggering 2.5 billion customer interactions. In the talk “Using Generative AI to Better Understand B2B Audiences: From Topic Modelling to Text Classification,” you’ll see insights into how a fusion of traditional Machine Learning (ML) techniques and cutting-edge Generative AI, particularly Large Language Models (LLMs), is being used to overcome these challenges.
Design and Build Powerful LLM Agents
Valentina Alto | AI and Apps Tech Architect | Microsoft
When it comes to generative AI-powered applications, one of the most trending frameworks is Agent, which can be defined as highly-specialized entities that can achieve the user’s goal by panning and interacting with the surrounding ecosystem. In this session, You’ll explore the main components that feature AI Agents, such as LLMs, Prompts, Memory, and Tools. We will also see architectural best practices to build robust and enterprise-scale agents, focusing on emerging trends like semantic caching and GraphRag.
What’s next?
Looking for more expert-led talks and AI hands-on deep dives? Join us for ODSC West 2024 this October 29th-31st for 300+ hours of content, 250+ world-class experts, 2 co-located summits and much more!