Watch how your AI grows - what do you feed it? | Blog

We ask pharmaceutical companies what goes into their drugs. We don't ask the same about AI - and that's worth examining.

For technologies with stark potential for misuse, it's standard to review its development. Except for AI - here, we focus on "How to control this weapon?" Why is AI different?

General-purpose AI

Instead of looking into "How is it created, and for which purpose?", the dominant public framing just accepts that generative AI can be used for supposedly everything. After all, this is where the power of such "general-purpose AI" is supposedly coming from. But: general-purpose is not the same as all-purpose.

Leading AI Labs like Anthropic and OpenAI state themselves that their leading AI models face the risk of providing unsafe information, for example regarding the creation of biological and chemical weapons. Thus, having an AI that knows literally everything cannot be the goal.

The seriousness with which we are having this discussion is relatively new. Before the advent of large language models (like the ones behind ChatGPT), it was a standard practice in applied and domain-specific AI development to spend much time in engineering and filtering the data that goes into the training of AI. One reason was that the models we had back then were less powerful, so we were forced to make them work in one domain.

Now, with foundational AI models that work well in a broad set of domains, the narrative has flipped. Adapting an AI to work well happens mostly through adaptation of a ready-trained model. We've replaced pre-training data curation with post-training methods.

This increases convenience - and this is great for many purposes. But we need to be clear about where its limitations are.

The different types of risks imposed by general-purpose AI

On a general level, we can distinguish two categories of risks that occur when we use generative AI: those that arise from users' interaction with AI, and those which relate to dangerous knowledge shared with users.

Interaction: This takes forms like sycophantic behavior, spreading hate-speech and biases, motivating harmful behavior, etc. These risks can be mitigated well by post-training methods (e.g. based on human feedback).
Knowledge: Examples that are part of prominent recent discussions are insights that help to build biological and chemical weapons, or help hackers to attack cloud applications. Here, post-training methods can be applied as well as data-filtering before AI-training starts.

Consequently, interaction controls sit "on top" of the AI models - and can be optimized or jailbroken - while filtering data changes what's inside the AI model to fall back to.

This distinction matters. The knowledge-based risks were at the core of recent discussions when Tech-CEOs warn about the dangers to humanity that are triggered by AI models. These companies have focussed on growing their AI models as fast as possible - for which they needed as much data as they could get. Slowing down to curate data would have meant slower growth and falling behind.

But what if such filtering methods would have been the most efficient approach to avoid the risks they are flagging?

Can filtering training data make AI safer?

Scientists found: Filtering dangerous data out before training is more than ten times more robust than fixing the model afterward — and it doesn't reduce quality elsewhere.

Concretely, in a recent research paper, scientists from EleutherAI, the UK AI Security Institute and the University of Oxford have investigated how we can make latest AI models (LLMs) more resistant against harmful misuse by malicious users. [1] At the example of misusing AI for biological threats (e.g. bio-weapons), they showed: Filtering out the right data before training the AI is more than ten times more robust against getting 'dangerous knowledge' back out of it, compared to optimizing the AI after the training finished. At the same time, the model stayed as powerful on topics that were not filtered out.

Remeber what we discussed before: The AI Labs' original assumption was that curation slows you down - but the research shows that assumption is wrong.

This mechanism isn't perfect, and there is still some knowledge left that can be misused. The researchers also highlight that AI trained on filtered data could still access dangerous knowledge later on, e.g. through tools like web search. However, its impact to reduce "dangerous knowledge" which an AI can share is significant.

Pre-think from the beginning what AI shouldn't know

What changed since the release of ChatGPT is the framing: AI models now can solve everything. We don't even ask how these models can be restricted to a specific purpose. Although, for most business applications, we have a specific usage intended from the very beginning. There is no reason - and no value-add - to us by having an AI that knows how to build weapons.

When Tech leadership calls for stronger regulation, they mostly share in the sub-text: As long as we take care of it, everything will be fine. They do not highlight that they accelerated these knowledge-based risks by scaling their AI models as fast as possible, but ask for more control to manage the risks.

Understanding how knowledge-based risks can be mitigated, we can see that we need to face such statements with strong scepticism.

We shouldn't blindly follow the Tech leaders' calls for stronger regulation. In the pharmaceutical industry, we have strong control and reporting about what ingredients are used to create drugs, and what measure have been taken to quantify potential risks. Such an approach is missing for AI today, but it would help to decrease risks here as well.

Good regulation does not need to reinvent the wheel. Maybe, we just need to get back to the roots, and accentuate again the data-engineering and -filtering challenges that have always been part of AI development.

How to make your own AI solutions more secure

Now, what does this mean for business managers, or product owners seeking to include AI into their solutions?

As we learned, the knowledge that gets into an AI is a main source for its usefulness and potential risks at the same time. But it's not only the data used during training an AI that matters. In many AI applications, we rely on capabilities like web search or RAG knowledge bases, which dynamically introduce new knowledge.

These capabilities can impose risks if they provide harmful or wrong information. But the good news is: Integrating such capabilities in a good way is under your full control. Consequently, take your time in assessing the data sources on which your AI solutions ground their information.

So, if you only remember one thing from this article, it should be: controlling and being transparent about the data that goes into AI matters. Further, for assessing risks imposed by AI, separate harmful knowledge from harmful user interaction.

References:

[1] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs; from Kyle O'Brien et al, EleutherAI, the UK AI Security Institute and the University of Oxfors; arXiv:2508.06601v2