Blog

EU AI Act compliance for LLM applications: a deployer's guide

The high-risk deadlines just moved to December 2027, but the transparency rules still land on 2 August 2026. What EU AI Act compliance actually asks of teams deploying LLM applications, and when, as of July 2026.

Rogerio ChavesRogerio Chaves · July 3, 2026 · Article

On 29 June 2026, the Council of the EU gave its final approval to the omnibus package that rewrites the AI Act's calendar. The high-risk obligations that were due to hit on 2 August 2026 now apply from 2 December 2027. The transparency rules in Article 50 did not move: they land on 2 August 2026, four weeks from now. So this is a strange month to be running an LLM application in production. The deadline most EU AI Act compliance projects were racing toward just slipped by sixteen months, while a quieter one is arriving on schedule.

This guide is for teams deploying LLM applications: a support chatbot built on the GPT or Claude APIs, an internal copilot, an agent that can act on customer accounts, a resume screener. It is not about training frontier models. The labs got their own chapter of the Act, the general-purpose AI rules, in force since August 2025, and those duties sit with OpenAI, Anthropic, Google, and Mistral. The question here is which obligations reach the people shipping applications on top, and when.

One caveat before the article numbers start: I am an engineer, not a lawyer, and this is not legal advice. Treat it as a map of Regulation (EU) 2024/1689 drawn from the engineering side, accurate as of 3 July 2026, with citations so your counsel can check every claim. Decisions that carry seven-figure fines deserve an actual lawyer.

Where the dates stand, July 2026

The AI Act entered into force on 1 August 2024, designed to phase in over three years. In November 2025 the Commission proposed the digital omnibus on AI, a simplification package that pushes the high-risk dates back, mainly because the harmonised standards and templates companies needed to comply with were not ready. Parliament adopted it on 16 June 2026, the Council on 29 June 2026, and as I write this, publication in the Official Journal is pending, with entry into force three days after publication. The new dates are fixed calendar dates, not triggers tied to standards availability.

DateWhat applies
2 February 2025Prohibited practices (Article 5) and AI literacy duties (Article 4). Already in force.
2 August 2025General-purpose AI model obligations, governance structures, and the penalties framework. Already in force.
2 August 2026Article 50 transparency duties: chatbot disclosure, machine-readable marking of AI-generated content. The Act's remaining provisions also apply, minus the carve-outs below.
2 December 2026Marking deadline for generative systems already on the market before 2 August 2026. New prohibitions on generating non-consensual intimate imagery and CSAM take effect.
2 December 2027High-risk obligations for the use cases in Annex III (hiring, credit, education, and the rest). Previously 2 August 2026.
2 August 2028High-risk obligations for AI embedded in products regulated under Annex I (medical devices, machinery, vehicles). Previously 2 August 2027.

Two things in this table are easy to misread. The December 2027 delay covers the high-risk regime, which is the heaviest but narrowest part of the Act; everything already in force stays in force. And the omnibus was not purely relief: it added the new Article 5 prohibitions on AI that generates non-consensual intimate imagery or child sexual abuse material, and it moved the national regulatory sandbox deadline to 2 August 2027.

Provider or deployer: you are probably both

The Act splits duties between two roles (Article 3). A provider develops an AI system and places it on the market or puts it into service under its own name (Art. 3(3)). A deployer uses an AI system under its authority (Art. 3(4)). Every obligation in the Act hangs off which role you hold for which system, so this classification is worth fifteen careful minutes before anything else.

Buying model access does not make you "just a user", which surprises most engineering teams I talk to. If you build a support chatbot on the Claude API and ship it to your customers, you developed an AI system and put it into service under your own name. You are the provider of that system. Anthropic remains the provider of the general-purpose AI model underneath, with separate obligations (Articles 53 and 55) that stay on their side of the API. And this holds for purely internal tools too, because "putting into service" includes supplying a system for your own use (Art. 3(11)).

You are a deployer for the systems you did not build: the vendor HR screening tool, the SaaS meeting-notes bot, Copilot in the hands of your engineers. A mid-size company typically holds both roles at once, provider of its two customer-facing LLM features and deployer of a dozen bought ones.

Two edge cases worth knowing about:

  • Fine-tuning does not usually make you a model provider. The Commission's guidelines for GPAI providers (July 2025) presume a downstream company becomes the provider of a general-purpose AI model only when its modification uses more than a third of the original model's training compute. A fine-tune on a few thousand support tickets is nowhere near that. Your system-level responsibilities stay exactly as they were.
  • Deployers can slide into the provider seat. Under Article 25, if you put your own name on a high-risk system already on the market, substantially modify one, or take a system that was not high-risk and repurpose it for a high-risk use, the provider obligations transfer to you. Wiring a general-purpose assistant into promotion decisions is enough to trigger that third path.

The risk tiers, applied to real LLM applications

The Act classifies what a system is used for, so the same model can sit in different tiers on different screens of your product. A use is prohibited (Article 5), high-risk (Article 6: safety components of regulated products, or one of the use cases listed in Annex III), subject to transparency duties (Article 50), or none of the above. LLM applications mostly live in the bottom two tiers, with one sharp exception.

A customer support chatbot is not on the Annex III list; customer service is not a regulated use case. Its duties are the Article 50 ones: the provider must ensure people are told they are talking to an AI system, unless that is already obvious to a reasonably well-informed person, and the disclosure has to be there at the latest at first interaction. GDPR applies to the conversation data exactly as before; nothing in the AI Act displaces it. The thing to watch is scope creep. The day the bot starts influencing who gets a payment plan, you are brushing against Annex III point 5(b), creditworthiness assessment, and that is a different tier entirely.

An internal copilot, staff asking questions over your docs and drafting emails, sits in minimal-risk territory. What applies today is Article 4: providers and deployers must take measures so the staff operating and using their AI systems have a sufficient level of AI literacy, in force since 2 February 2025 (the omnibus softened the exact wording, but the duty stays). It is a real duty, and the cheapest one on this page to discharge: usage guidelines, short training, a record that it happened.

An agent acting on customer data, one that reads the account, issues refunds, changes bookings, has no dedicated category in the Act; "agent" appears nowhere in the risk tiers. It is an AI system like any other, classified by what it is used for. Article 50 disclosure applies, since customers interact with it. Its actions on personal data are a GDPR matter. Most of its risk today is plain liability, an agent that promises a refund your policy does not cover, which no regulation needed to make expensive. The AI Act starts caring the same way it does for the chatbot: the moment the agent's outputs feed a listed purpose like credit, insurance pricing, or hiring.

Anything touching HR or recruiting is the exception. Annex III point 4 lists AI used for "the recruitment or selection of natural persons, in particular to place targeted job advertisements, to analyse and filter job applications, and to evaluate candidates", and AI making decisions on promotion, termination, task allocation, or "to monitor and evaluate the performance and behaviour of persons" at work. An LLM ranking resumes is squarely on that list: high-risk, with obligations from 2 December 2027. Article 6(3) offers a derogation where the system does not pose a significant risk of harm, for example because it only performs a narrow procedural task or preparatory work, but with an override: a system that profiles natural persons is always high-risk, and the provider must document the derogation assessment before shipping either way (Art. 6(4)). One workplace use is already banned outright rather than high-risk: inferring employees' emotions, per Article 5(1)(f), with only medical and safety exceptions, in force since February 2025. An LLM scoring your support team's tone for performance reviews is an uncomfortable distance from that line.

On top of high-risk classification, a subset of deployers must run a fundamental rights impact assessment before first use (Article 27): bodies governed by public law, private entities providing public services, and deployers assessing creditworthiness or pricing life and health insurance. A typical B2B SaaS deploying a resume screener is not on that list; a bank deploying the same screener may well be.

What EU AI Act compliance means in engineering terms

The high-risk requirements read like procurement boilerplate until you notice they describe infrastructure your team either already has or has been putting off. Five of them map directly to engineering practice. From December 2027 they are hard requirements for Annex III systems, with providers carrying the long list (Article 16) and deployers a shorter one (Article 26).

Logging and record-keeping. High-risk systems must "technically allow for the automatic recording of events (logs) over the lifetime of the system" (Article 12), so that risky situations can be identified and traced. Providers keep the logs under their control for at least six months (Article 19), and deployers keep the logs their systems generate, also at least six months (Article 26(6)). For an LLM application, a log that can explain an outcome is a trace: model and version, prompt version, retrieved context, tool calls, output, and who acted on it. A row saying "request served, 200 OK" proves nothing about why the system rejected a candidate.

Transparency, in both directions. Article 50 is the user-facing half: disclosure for chatbots, machine-readable marking for generated content, labels on deepfakes and on AI-written text published on matters of public interest. Article 13 is the business-facing half: a high-risk system must ship with instructions for use covering its capabilities, limitations, declared accuracy metrics, the human oversight measures, and how to interpret its logs. If you buy AI systems, Article 13 describes the document you should be demanding from vendors. If you sell one, it is the document you owe.

Human oversight. High-risk systems must be designed so natural persons can effectively oversee them (Article 14): the people overseeing need to understand the system's capabilities and limits, stay aware of automation bias, interpret outputs correctly, and be able to disregard, override, or stop the system. Deployers must assign oversight to people with "the necessary competence, training and authority" (Article 26(2)). In the stack, that is a review queue and a kill switch. In the org, it is a named owner with a training record, and "authority" means they can halt the system without convening a meeting first.

Accuracy and robustness. High-risk systems must achieve "an appropriate level of accuracy, robustness, and cybersecurity" and perform consistently in those respects throughout their lifecycle (Article 15), with the accuracy metrics declared in the instructions for use and resilience against data poisoning, model poisoning, and adversarial examples. Declaring an accuracy metric for an LLM system means having the evaluation suite that produces it: a baseline before release, regression runs when the prompt or model changes, and monitoring after. "Consistently throughout the lifecycle" is a claim about production behavior, and only observability can back it.

Post-market monitoring and serious incidents. Providers must run a documented post-market monitoring system that "actively and systematically" collects and analyses performance data over the system's lifetime, including data from deployers (Article 72). Serious incidents, defined in Art. 3(49) as death or serious harm to health, serious and irreversible disruption of critical infrastructure, infringement of fundamental-rights protections, or serious harm to property or the environment, must be reported to the market surveillance authority within 15 days of awareness, 10 days for a death, 2 days for widespread infringement or critical-infrastructure disruption (Article 73). Deployers monitor operation against the instructions for use, inform the provider of relevant developments, suspend use when they identify a risk, and report serious incidents up the chain (Article 26(5)). None of this works retroactively. If the reporting clock starts when you become aware, you need the monitoring that makes you aware and the traces that let you reconstruct what happened.

The fine schedule, once and without drama (Article 99): up to EUR 35 million or 7% of worldwide annual turnover, whichever is higher, for prohibited practices; up to EUR 15 million or 3% for breaching the core provider, deployer, or transparency obligations (Articles 16, 26, and 50 all sit in this bucket); up to EUR 7.5 million or 1% for supplying incorrect information to authorities. For SMEs, each cap applies at whichever amount is lower.

What to put in place this quarter

Ordered by deadline, for Q3 2026:

  1. Inventory and role mapping. One afternoon with a spreadsheet: every LLM system, whether you are provider or deployer of it, which tier it falls in, which date makes it regulated. Most teams find more systems than they expected and one or two that are high-risk-adjacent.
  2. An Article 50 pass before 2 August. Chatbots identify themselves at first interaction unless it is genuinely obvious. Generated image, audio, and video carries machine-readable marks. Systems already on the market before 2 August 2026 have until 2 December 2026 for the marking part, new ones comply from day one.
  3. A check against the prohibited list. Nothing inferring emotions in the workplace, nothing that amounts to social scoring. This has been live since February 2025, and the list grows by two entries in December 2026.
  4. Tracing with six-month retention and export. The substrate for Articles 12, 19, and 26(6): full context per request, retained at least six months, exportable, because the first thing an auditor asks for is a copy.
  5. Evals as accuracy evidence. A baseline per release, regression runs on every prompt or model change, online monitoring in production. This is Article 15's declared accuracy metric and Article 72's systematic data collection, and it is the same machinery you want for quality anyway.
  6. Human oversight wiring. A named owner per system, a working override and stop path, and a training record, which also discharges the Article 4 literacy duty.
  7. An incident runbook with the clocks in it. What counts as a serious incident for your system, who informs whom, and the 15, 10, and 2 day deadlines written down where the on-call can find them.
  8. Vendor documentation requests. Ask your LLM vendors for Article 13-style instructions for use, and check whether any of your uses needs a fundamental rights impact assessment under Article 27.

Full disclosure on items 4 and 5: this substrate is our business. LangWatch is an EU-based LLM engineering platform, and its tracing, evaluations, and audit exports map onto the record-keeping and monitoring duties above; that mapping is why compliance teams in regulated industries end up talking to us. Whatever tooling you pick, the requirement is the same: evidence of what your system did, and evidence that you were watching.

The dates in this post have moved once already, so before planning a quarter around them, check the current state on the Commission's AI Act Service Desk or the AI Act Explorer. What does not expire with the next amendment is the substrate: a team that can already explain what its system did last Tuesday, with traces and eval scores, will find that most of this list was already done.

Frequently asked questions

Does the EU AI Act apply to companies outside the EU?
Yes, in two ways. Article 2 covers providers placing AI systems on the EU market regardless of where they are established, and providers and deployers located in third countries where the output produced by the AI system is used in the EU. A US company whose chatbot serves EU customers is in scope on both counts.
Is a customer support chatbot high-risk under the EU AI Act?
Usually no. Customer service is not one of the use cases listed in Annex III, so a support chatbot sits in the transparency tier: under Article 50 users must be told they are talking to an AI system unless that is already obvious, from 2 August 2026. It becomes high-risk only if it is used for a listed purpose, such as assessing creditworthiness or making hiring decisions.
What logging does the EU AI Act require for LLM applications?
For high-risk systems, Article 12 requires the system to technically allow automatic recording of events over its lifetime, and both providers (Article 19) and deployers (Article 26(6)) must keep the automatically generated logs for at least six months. For LLM apps the practical form is tracing: model and prompt versions, retrieved context, tool calls, and outputs per request. These duties apply to Annex III systems from 2 December 2027.
Do I need a fundamental rights impact assessment (FRIA)?
Only some deployers of high-risk systems do: bodies governed by public law, private entities providing public services, and deployers using AI for creditworthiness assessment or life and health insurance pricing (Annex III points 5(b) and 5(c)). Article 27 requires the assessment before first use and notification of the market surveillance authority. A private company deploying an internal copilot does not need one.
What are my obligations if I use GPT or Claude via API?
The general-purpose AI model obligations (Articles 53 and 55) stay with OpenAI or Anthropic. But if you build an application on the API and ship it, even internally, you are the provider of that AI system under Article 3, and the system-level duties that match its risk tier are yours: Article 50 transparency for a chatbot, the full high-risk set if it is used for an Annex III purpose. Ordinary fine-tuning does not change this; under the Commission's GPAI guidelines you would only become a model provider if your modification used more than a third of the original model's training compute.
When do the EU AI Act rules apply to LLM applications?
In stages, as amended in June 2026: prohibited practices and AI literacy since 2 February 2025, general-purpose AI model rules since 2 August 2025, Article 50 transparency from 2 August 2026, high-risk obligations for Annex III use cases from 2 December 2027, and high-risk AI embedded in regulated products from 2 August 2028.