The EU's new AI training data transparency rules - what in-house lawyers need to know

Imagine this. You’re in a board meeting and someone casually mentions that the company’s AI supplier now has to publish details of the data they used to train their model. All eyes turn to you. What do you say?

As of 2 August 2025, this scenario is now a reality thanks to the European Commission’s template for AI model providers to disclose how they trained their models. This requirement comes from Article 53(1)(d) of the EU AI Act.

Here’s what in-house teams need to know.

Why is this happening?

The AI Act introduces transparency obligations for providers of general-purpose AI models. These models – think of the large language models behind tools like ChatGPT – are trained on vast amounts of data, much of it scraped from the internet. Until now, little has been known about the make-up of that data.

The new rules aim to:

  • give rights holders visibility over whether their copyrighted works were used
  • allow regulators, businesses and the public to assess bias, legality and risk in these models
  • improve trust in AI technologies
  • promote fair competition by levelling the playing field between smaller developers and large tech companies
  • ensure downstream users of AI models can make more informed decisions about their own compliance frameworks.

What must AI providers do now?

Providers must publish a “public summary” that is comprehensive but not overly technical. It must include:

  • Types of data – a breakdown of data by modality (text, images, audio, video and other)
  • Volume of data – estimated size of the datasets in broad ranges, including how much was drawn from public sources, licensed sources, or user-generated input
  • Sources – identification of large public datasets and a narrative explanation of other datasets used, including:
    • details of the most common domain names scraped
    • any web crawling or scraping methods
    • the timeframe during which data was collected
  • User data – whether user interactions with the provider’s models or services have been used as training data
  • Synthetic data – whether synthetic data (i.e. data generated by other AI models) has been used
  • Data safeguards – information on steps taken to avoid using copyrighted or illegal content, including filters, licensing arrangements and respect for “opt-outs” from text and data mining exceptions.

You can read the official template here.

Key dates to note

  • 2 August 2025: Obligation now applies to models placed on the market from this date.
  • 2 August 2027: Deadline for publishing summaries for models that were already on the market before 2025.
  • 2 August 2026: The AI Office will begin enforcement and can impose fines of up to 3% of global turnover for non-compliance.

Why it matters to in-house lawyers

  1. Contracts with AI suppliers – Consider adding clauses requiring your AI vendors to comply with these obligations, provide copies of their public summaries, and notify you of any significant changes.
  2. IP enforcement – For businesses with significant IP portfolios, these summaries will be a valuable resource to monitor whether copyrighted content has been used to train third-party AI models.
  3. AI governance – This new level of transparency will support better due diligence, risk assessments and supplier management. It also provides a benchmark for your own AI policies.
  4. Policy and training – Expect questions from the board and senior leadership about how your organisation collects, uses and licenses data for AI. Transparency requirements for suppliers will lead to greater scrutiny of your own practices.
  5. Cross-border impact – Even UK-based companies that use or supply AI models into the EU market will be affected. This is particularly relevant for any group with European operations or customers.

What’s next?

Providers must now complete and publish their summaries in plain language and make them accessible to the public. Summaries will be hosted on providers’ websites and linked wherever the models are distributed. The AI Office will monitor these disclosures and can request corrections or updates.

Immediate steps for in-house lawyers

  • Map out where and how your business uses third-party AI models.
  • Ask your AI suppliers for copies of their public summaries.
  • Update supplier contract templates to include obligations around the new disclosure rules.
  • Prepare to brief senior stakeholders on how this transparency might influence AI strategy and procurement.

Takeaway

For in-house lawyers, this is a new tool to add to your AI governance toolkit. With these rules now in force, you have a clearer view of what’s inside the “black box” of your supplier’s AI models. That knowledge will support compliance, risk management, and informed decision-making.

It’s another step towards responsible AI – and one more thing to brief your board about.

the plume press

THE NEWSLETTER FOR IN-THE-KNOW IN-HOUSE LAWYERS

Get the lowdown on legal news, regulatory changes and top tips – all in our newsletter made especially for in-house lawyers.

sign up today