EU AI Act Checker Uncovers Big Tech’s Compliance Challenges

3 min readOct 22, 2024

A new tool designed to assess AI models’ compliance with the European Union’s forthcoming AI Act has revealed significant shortcomings among prominent AI systems. According to data reviewed by Reuters, several leading models, including those from Meta and OpenAI, fall short in key areas such as cybersecurity and discriminatory output.

The EU AI Act, set to come into effect in stages over the next two years, has sparked considerable attention since the release of OpenAI’s ChatGPT in late 2022. The tool’s immense popularity and subsequent concerns about the potential risks posed by general-purpose AI models (GPAI) pushed EU lawmakers to enact stringent new regulations to govern the use of AI technologies.

Now, the newly developed AI compliance checker, created by Swiss startup LatticeFlow AI in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, has tested several generative AI models across multiple categories.

The models were scored on a scale of 0 to 1 in areas such as technical robustness, safety, and compliance with the AI Act.

Compliance Shortfalls in Key Areas

The initial results from LatticeFlow’s “Large Language Model (LLM) Checker” show that while many models performed well overall, several exhibited weaknesses in critical areas. According to the company’s leaderboard, models from Alibaba, Anthropic, OpenAI, Meta, and Mistral all received average scores of 0.75 or higher.

However, the tool revealed specific compliance gaps that may require attention from tech companies to avoid future penalties. For instance, OpenAI’s GPT-3.5 Turbo received a low score of 0.46 when evaluated for discriminatory output, which has been a persistent issue in AI development.

Similarly, Alibaba Cloud’s “Qwen1.5 72B Chat” model scored only 0.37 in this category, signaling potential bias in its outputs based on gender, race, and other factors. On the cybersecurity front, the test also highlighted vulnerabilities to “prompt hijacking,” a type of cyberattack where malicious actors disguise harmful prompts as legitimate ones.

Meta’s “Llama 2 13B Chat” received a score of 0.42 in this category, while Mistral’s “8x7B Instruct” scored 0.38. Despite these challenges, some models performed exceptionally well. Google-backed Anthropic’s “Claude 3 Opus” achieved the highest average score of 0.89, showcasing a strong alignment with the AI Act’s technical requirements.

Preparing for EU AI Act Enforcement

While the EU is still working out how the AI Act will be enforced, the LatticeFlow test offers an early glimpse into where companies may need to focus to ensure compliance. Firms failing to meet the requirements of the AI Act could face fines of up to 35 million euros ($38 million) or 7% of global annual turnover.

LatticeFlow’s CEO, Petar Tsankov, remains optimistic, stating, “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models. With a greater focus on optimizing for compliance, we believe model providers can be well-prepared to meet regulatory requirements.”

The European Commission has also acknowledged the importance of tools like LatticeFlow’s LLM Checker in translating the AI Act’s guidelines into technical standards. A spokesperson for the Commission described the platform as a “first step” toward ensuring AI models meet the new regulations.

As companies gear up for the enforcement of the AI Act, tools like LatticeFlow’s compliance checker will play a crucial role in helping AI developers fine-tune their models, ensuring they align with EU standards and avoid costly penalties.

EU AI Act Checker Uncovers Big Tech’s Compliance Challenges

Compliance Shortfalls in Key Areas

Preparing for EU AI Act Enforcement

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by ODSC - Open Data Science

No responses yet