More Speakers and Sessions Announced for the 2024 Data Engineering Summit

ODSC - Open Data Science
5 min readMar 20, 2024

--

We couldn’t be more excited to announce that the schedule for the Data Engineering Summit, co-located with ODSC East this April 23–24, is now live! We’ve got an impressive line-up of experts, thought leaders, and practitioners. Check below for just a taste of what is in store for you.

Experimentation Platform at DoorDash

Yixin Tang│Engineer Manager │DoorDash

The experimentation platform at DoorDash is an essential component, utilizing big data tools to assist with thousands of decisions every day. Explore how DoorDash leverages the platform to make decisions in business strategies, machine learning models, optimization algorithms and infrastructure changes.

Data Infrastructure through the Lens of Scale, Performance, and Usability

Elliott Cordo │Founder, Architect, Builder │Datafutures

Despite its seeming benefits (saving time, more productivity), monoliths pose more challenges, especially as complexity increases and teams get larger. This session will review strategies and technologies for avoiding monoliths and their pitfalls.

From Research to the Enterprise: Leveraging Foundation Models for Enhanced ETL, Analytics, and Deployment

Ines Chami │Co-founder and Chief Scientist │NUMBERS STATION AI

Join this session to explore recent research on applying foundation models to structured data and their applications in the modern data stack from Stanford University and Numbers Station AI.

Building Data Contracts with Open-Source Tools

Jean-Georges Perrin │AbeaData │CIO

In this session, you’ll discuss data contracts, starting with an introduction that covers:

  • What is a data contract?
  • What’s its purpose?
  • Why it simplifies data engineers’ lives?

Then you’ll get hands-on and use open-source tools to generate a skeleton of a data contract through which you’ll learn more about their life cycle.

Why the Hype Around dbt is Justified

Dustin Dorsey │Sr. Cloud Data Architect │Onix

In just 30 minutes, you’ll learn what dbt really is, what makes it unique, and show you why it is so much more than just SQL. You’ll discuss what makes it so popular (and unpopular) as a data transformation tool and the driving factors behind those opinions, dispelling some mistruths along the way.

Clean as You Go: Basic Hygiene in the Modern Data Stack

Eric Callahan │Principal, Data Solutions │Pickaxe Foundry

Join this session for an overview of the challenges that arise from the “I’ll clean it up later” mindset. In particular

  • Piles of small cleanup tasks for later
  • Confusion among peers who try to use incomplete data assets
  • Lack of metadata to activate throughout the Modern Data Stack

And some solutions that can provide long-term benefits.

Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions

Jay Mishra │Chief Operating Officer │Astera

Join this session to delve into the innovative applications of generative AI in natural language processing and computer vision, highlighting the technologies driving this evolution, including transformer architectures, attention mechanisms, and the integration of OCR for processing scanned documents.

Deciphering Data Architectures

James Serra │Data & AI Architect │Microsoft

Join us for a guided tour of data fabric, data lakehouse, and data mesh that will cover their different pros and cons. You will also examine common data architecture concepts, including data warehouses and data lakes helping you to determine the most appropriate data architecture for your needs.

Designing ETL Pipelines with Delta Lake and Structured Streaming — How to Architect Things Right

Tathagata Das │Staff Software Engineer │Databricks

Structured Streaming has proven to be the best framework for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark’s built-in functions make it easy for developers to express complex computations. Delta Lake, on the other hand, is the best way to store structured data because it is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. Together, these can make it very easy to build pipelines in many common scenarios.

In a complex ecosystem of storage systems and workloads, it’s important for a developer to understand the problem that needs to be solved. Understanding the requirements of the problem allows you to architect your pipeline so that it is the most resource-efficient. Join this session to examine a number of common streaming design patterns that can be utilized.

Data Engineering in the Era of Gen AI

Ryan Boyd │Co-founder │MotherDuck

This talk explores the changes in hardware and mindsets enabling a new breed of software that is optimized for the 95% of us who do not have petabytes to process daily. Instead of focusing on consensus algorithms for large-scale distributed compute, can our engineers instead focus on making data more accessible, more usable and reduce the time between “problem statement” and “answer?”

The Value of A Semantic Layer for GenAI

Jeff Curran│Senior Data Scientist │AtScale

Krishna Srihasam│Senior Data Scientist │AtScale

In this session, you’ll learn how you can incorporate business terminology and logic into the logic of an LLM, enabling queries to the database using natural language (instead of SQL). In this session, you’ll explore the outcome of coupling this LLM with AtScale’s query engine through an LLM and Semantic Layer backed Chat Bot.

Unlock Safety & Savings: Mastering a Secure, Cost-Effective Cloud Data Lake

Ori Nakar│ Principal Engineer, Threat Research │Imperva

Johnathan Azaria │ Data Science TechLead │Imperva

Explore two novel techniques for data lake monitoring, leveraging both object store logs and query engine logs. Dive deep into our aggregation strategies and discover how anomaly detection can be applied to this consolidated data. You’ll see how enhanced access control mechanisms can fortify your data lake’s security, mitigating the risk of data leaks and data corruption. Additionally, we’ll shed light on how to harness these insights to minimize the attack surface, identify and fix cost anomalies and system glitches.

Sign me up!

Get your pass to attend these sessions and more at the Data Engineering Summit this April. But you’d better act fast. Prices go up soon.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.