Ensuring Safety and Trustworthiness in Generative AI with Guardrails

4 min read2 days ago

Editor’s note: Evaline Ju and Gaurav Kumbhat are speakers for ODSC East 2025 this May 13th to 15th. Be sure to check out their talk, “Guardrails in Generative AI Workflows via Orchestration,” there!

Artificial Intelligence has been one of the fastest-growing technology fields, and generative AI has been at its forefront. Generative AI refers to the ability of an AI model to create content, such as text, images, or speech. If you have used or heard of OpenAI’s ChatGPT chatbot or Google’s Gemini Live or IBM’s watsonx, these applications are all examples using Generative AI, which run or provide large language models (LLMs) — OpenAI’s GPT models, Google’s Gemini models, and IBM’s Granite models respectively. LLMs have revolutionized the field of natural language processing (NLP) with their ability to comprehend and generate human-like text.

With the increase in model usage and applications, there have been also growing concerns of mis-interactions with these models. Some recent famous examples include models falsely accusing a basketball player of vandalism (hallucination), chatbots providing incorrect information (misinformation), a recruiting AI agent rejecting candidates due to age (bias), and many more. These incidents have raised serious concerns on the usage of this technology and have ignited the need for safety measures around these generative models. The tools and protocols that are designed to ensure AI applications and systems operate within ethical, legal, and technical boundaries are called “guardrails,” which help promote correctness, safety, and fairness, among other trustworthy AI concepts.

In a typical pipeline, or “LLM inference workflow” (refer to the diagram below), a user gives input text (or prompt) to a trained LLM in order to generate text responses. A number of techniques in the “guardrails” space are interpolated in the workflow.

The variety of applications running on these generative models has resulted in an equally large variability in the types of models they use and the way these applications use the underlying models. In turn, an assortment of guardrails need to be implemented. Moreover, with the field of generative AI still rapidly evolving, the diversity of guardrails techniques themselves will continue to grow. A simple guardrail can guard against hate speech, abuse, and profanity (HAP). For user prompts, this can check whether the user has put in inappropriate content. For LLM output, this can check that the generated output is appropriate for end-user viewing. More evolved inferencing workflows like retrieval-augmented generation (RAG), where external information is provided to LLMs to generate more accurate or more suitable content for an application, may require equally complex guardrails. A guardrail example is one that checks the relevancy of the answers generated by the model to the user input.

Given the inconstancy in this space combined with growing usage in live production systems, adding guardrails requires a systematic approach, with consideration for future volatility and transitions, without affecting any higher-level applications that consume guardrails. Numerous factors have to be taken into consideration, such as the amount of text each guardrail can work on or how users may want generative model results streamed back to them. We have designed an orchestrator framework to account for such factors and allow users to easily add guardrails to their LLM inference workflows.

Be sure to check out our talk at ODSC East to learn more about implementing and orchestrating guardrails in production systems!

Authors

Evaline Ju is a senior engineer working on the watsonx platform engineering team of IBM Research and based in Denver, Colorado. She currently focuses on building guardrails infrastructure for large language model workflows. Her previous experience includes MLOps for IBM’s cloud ML offerings.

LinkedIn: www.linkedin.com/in/evalineju

Gaurav Kumbhat is a Software Architect for the Guardrails team in the watsonx platform engineering team at IBM Research and based in Austin, Texas. He is currently working on developing guardrails infrastructure for generative model workflows. His previous experience includes customization of language models, developing NLP algorithms, MLOps for IBM cloud ML offerings.

LinkedIn: https://www.linkedin.com/in/gkumbhat/

Ensuring Safety and Trustworthiness in Generative AI with Guardrails

Written by ODSC - Open Data Science

Responses (1)