Navigating the Challenges of LLMs in Big Data Analytics with Google Cloud

ODSC - Open Data Science
5 min readApr 1, 2024

Editor’s note: Rohan Johar & Mohammad Soltanieh-ha are speakers for ODSC East 2024 this April 23–25. Be sure to check out their talk, “LLMs Meet Google Cloud: A New Frontier in Big Data Analytics,” there!

In the realm of big data analytics, the use of Large Language Models (LLMs) provides a unique opportunity to create and develop additional insights. As newer models are released into the market, consumers will need to evaluate key architectural decisions to support their AI use cases. One of the primary decisions is whether to leverage an open-source or commercial LLM within a specific solution.

Get your ODSC East 2024 pass today!

In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

REGISTER NOW

Open-source LLMs offer a higher degree of flexibility and customization but require additional technical investment to manage the deployment infrastructure. Managing these models demands substantial technical know-how, particularly when it comes to maintenance and updates. Scalability also emerges as a challenge when processing large datasets. Conversely, commercial LLM products bring their own set of challenges, including cost, customization limitations, and potential data security concerns.

In December 2023, Google introduced Gemini, a new set of multimodal models designed to “operate across and combine different types of information.” Within Google Cloud, the Gemini API allows for integrating a fixed version of a pre-trained LLM into your cloud environment for inference. Like many Google Cloud solutions, this setup is highly scalable, ensuring it can efficiently handle growing data needs. Notably, the data processed through this system isn’t used for further model training, addressing privacy concerns. However, it’s crucial to note that there are costs associated with each API call. This system is advantageous for organizations that prioritize control, compliance, and scalability in their operations.

In February 2024, Google announced Gemma, a set of “lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.” Similarly to the Gemini API, Google Cloud provides the option to deploy open-source and third-party models. Data scientists and engineers will need to evaluate multiple models to select the appropriate LLM for their solution architecture.

Our upcoming workshop at ODSC East 2024, titled “LLMs Meet Google Cloud: A New Frontier in Big Data Analytics,” is designed to address these architectural decisions head-on. We will start by introducing the core components of big data analytics in the cloud, focusing on the comprehensive suite of services offered by Google Cloud. Our sessions will include detailed demonstrations of effectively utilizing these powerful LLMs at scale and integrating them into your big data toolkit.

Exploring Google Cloud’s Big Data Analytics Services

The workshop will guide you through the key Google Cloud services instrumental in big data analytics. We will explore Cloud Storage for efficient data management, Compute Engine for robust computing power, and BigQuery for advanced data processing and analysis. Each of these services plays a vital role in managing and analyzing large datasets, forming the foundation of a robust big data analytics framework.

A significant portion of the workshop will be dedicated to understanding and leveraging Google Cloud’s Gemini API. We’ll demonstrate how to bring pre-trained LLMs into your cloud environment, focusing on practical applications and best practices. You’ll learn how to maintain data sovereignty and compliance while benefiting from the advanced capabilities of LLMs.

2024 Data Engineering Summit tickets available now!

In-Person Data Engineering Conference

April 23rd to 24th, 2024 — Boston, MA

At our second annual Data Engineering Summit, Ai+ and ODSC are partnering to bring together the leading experts in data engineering and thousands of practitioners to explore different strategies for making data actionable.

REGISTER NOW

In conclusion, this workshop can be helpful to professionals looking to navigate the complexities of using LLMs in big data analytics. The session would also be geared towards new cloud users, allowing them to understand core concepts before diving deeper into more advanced capabilities.

Join us at ODSC East 2024 to equip yourself with the knowledge and tools to bring advanced LLM capabilities into your big data analytics pipeline and transform how your organization leverages data.

About the Authors/ODSC East Speakers:

Rohan Johar, an AI specialist in Google Cloud’s customer engineering organization, advises enterprises in Boston on digital transformation strategy. Over his career, Rohan has had the opportunity to specialize on projects that span the entire data ecosystem, ranging from data warehousing to generative AI. Prior to Google, he worked as a cloud engineer at Oracle and as an engineering applications intern at Tesla. Additionally, Rohan holds a bachelor’s degree in computer science from Boston University, a graduate certificate in innovation/entrepreneurship from the Harvard Extension School, and eight cloud certifications.

Mohammad Soltanieh-ha is a Clinical Assistant Professor in the Information Systems department at Boston University’s Questrom School of Business. He specializes in data science programming, big data analytics, and business applications. He earned his Ph.D. in computational physics from Northeastern University and currently focuses his research on computer vision applications in cancer diagnosis, macroeconomic forecasting, and high-performance computing. Mohammad holds leadership roles at Google and the American Physical Society (APS). He founded APS’s data science unit in 2019 and serves as a Faculty Expert at Google Cloud, where he supports cloud computing education and best practices for fellow faculty members.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.