Patterns d'intégration LLM pour produits SaaS

8 mai 2026Empirium Team11 min read

Read in:en fr es de it pt nl pl ru zh ja ko ar hi tr sv no da fi cs

Every SaaS product is adding AI features. Most are doing it wrong — bolting a chatbot onto the sidebar, wrapping every feature in "AI-powered," and hoping users figure out what changed. The result is a product that feels gimmicky instead of genuinely useful.

There are four distinct patterns for integrating LLMs into existing SaaS products. Each solves a different problem, requires a different architecture, and has a different impact on user experience. Choosing the wrong pattern is worse than shipping no AI at all.

Integration Patterns

1. The Copilot Pattern

The model assists the user in their existing workflow. The user stays in control. The AI suggests, drafts, or autocompletes — but the human makes the final decision.

Examples: GitHub Copilot (code suggestions), Notion AI (writing assistance), Figma AI (design suggestions).

Best for: Tasks where the user has domain expertise and the AI accelerates their work. Content creation, data analysis, code writing, design iteration.

Architecture: The copilot runs in parallel with user actions. It observes context (current document, recent edits, cursor position) and generates suggestions asynchronously. Suggestions are displayed non-intrusively — inline hints, side panels, or keyboard-triggered completions.

Key metric: Time saved per task, not AI accuracy. A copilot that saves 30% of time but is only 70% accurate can still be valuable — the user corrects the 30% of suggestions that are wrong faster than writing from scratch.

2. The Agent Pattern

The model takes autonomous action on behalf of the user. The user defines a goal, and the agent figures out the steps.

Examples: Customer support agents, automated email responses, scheduling assistants.

Best for: Repetitive, well-defined workflows where the cost of errors is manageable. Support ticket triage, lead qualification, report generation, data entry.

Architecture: Agents need a production-grade orchestration layer with tool access, state management, and fallback chains. They operate asynchronously and report results to the user.

Key metric: Task completion rate and error rate. An agent that completes 85% of tasks correctly and escalates the other 15% is valuable. An agent that completes 95% but silently fails on 5% is dangerous.

3. The Classifier Pattern

The model categorizes, routes, or scores data. No generation — just classification.

Examples: Email priority scoring, support ticket categorization, lead scoring, content moderation, sentiment analysis.

Best for: High-volume data processing where the categories are well-defined and the cost per classification needs to be low.

Architecture: Classifiers are the simplest LLM integration. Input goes in, a label comes out. Use structured outputs (JSON mode or function calling) to ensure consistent labels. For high volume, consider fine-tuning a smaller model — classification does not need frontier model capabilities.

const classification = await llm.classify({
  input: supportTicket.body,
  categories: ['billing', 'technical', 'feature_request', 'complaint', 'spam'],
  outputFormat: 'json',
});
// Cost: ~$0.001 per classification with GPT-4o-mini

Key metric: Accuracy against human labels. Target 90%+ for production use. Below that, the correction overhead exceeds the automation benefit.

4. The Generator Pattern

The model creates content from structured data. The user provides inputs or triggers, and the AI generates complete outputs.

Examples: Product description generation, report writing, personalized email campaigns, code generation from specs.

Best for: Content production at scale where templates are too rigid and human writing is too slow.

Architecture: Generators need strong input validation (garbage in, garbage out), output formatting controls, and quality gates. Always include a human review step for externally published content.

Key metric: Quality score and generation cost vs human writing cost. If AI-generated content requires 20 minutes of human editing, compare that to the 45 minutes of writing from scratch.

User Experience Design for AI Features

The UX around AI features matters more than the AI itself. Users do not care about your model. They care about whether the feature helps them or wastes their time.

Loading States

LLM responses take 1-5 seconds. That is an eternity in UI terms. Handle it:

Streaming: Show tokens as they arrive. This is mandatory for any text generation feature. Users read while the model writes, and the perceived wait time drops to zero.
Progress indicators: For multi-step agent tasks, show which step is being executed. "Searching your CRM... Found 12 matches... Generating summary..."
Skeleton screens: For classification or scoring, show the UI structure immediately with a loading indicator where the AI result will appear.

Confidence Indicators

Users need to know when to trust the AI output:

High confidence: Show the result directly. No extra UI needed.
Medium confidence: Show the result with a subtle indicator: "AI-generated — review before sending."
Low confidence: Show the result with a prominent warning and easy access to the source data or a human alternative.

Fallback Flows

What happens when the AI fails? Every AI feature needs a non-AI fallback:

The AI email draft fails → show the template picker
The AI classifier is uncertain → route to manual triage
The AI search returns no results → fall back to keyword search

Users should never hit a dead end because the AI broke.

Architecture for LLM Features

API Gateway Pattern

All LLM requests should flow through a centralized gateway that handles:

Rate limiting: Per user, per feature, per organization
Cost tracking: Every request is logged with cost attribution
Model routing: Different features use different models based on quality/cost requirements
Caching: Identical or near-identical queries serve cached responses
Fallback: If the primary provider is down, route to the secondary

Feature Code → AI Gateway → Model Router → Provider API
                  ↓              ↓
              Cost Logger    Cache Layer

This pattern means you can switch providers, adjust rate limits, or add new models without touching feature code.

Streaming Responses

For any user-facing text generation, streaming is non-negotiable. The architecture differs from standard request-response:

// Server-Sent Events for streaming
app.get('/api/ai/generate', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  
  const stream = await llm.stream({
    model: 'claude-sonnet-4-20250514',
    messages: buildMessages(req.query),
  });
  
  for await (const chunk of stream) {
    res.write(`data: ${JSON.stringify(chunk)}\n\n`);
  }
  res.end();
});

Caching Strategy

Not all AI requests need a fresh model call. For cost optimization:

Exact match caching: Same input → same output. Works for classification and structured queries. Cache TTL: 1-24 hours.
Semantic caching: Similar inputs → cached output. Works for FAQ-style queries. Requires embedding comparison at request time.
User-level caching: Cache personalized results per user. Works for repeated workflows.

At scale, caching reduces AI costs by 30-50% with minimal impact on response quality.

Anti-Patterns to Avoid

The "AI Everything" Approach

Not every feature benefits from AI. Adding AI to a date picker, a settings page, or a file upload does not improve the product — it adds latency, cost, and confusion. Apply AI to features where the alternative is significantly worse without it.

The Chatbot Sidebar

Bolting a generic chatbot onto your SaaS product is the laziest possible AI integration. Users do not want to type natural language to do things they can already do with two clicks. A chatbot is appropriate when the query space is genuinely open-ended (support, research, exploration). For structured workflows, use pattern-specific UI.

Ignoring Latency

An AI feature that takes 8 seconds to respond will be abandoned after the second use. If you cannot achieve acceptable latency (under 2 seconds for interactive features, streaming for longer outputs), do not ship the feature. Improve it first.

No Graceful Degradation

If your AI provider has an outage, what happens to your product? If the answer is "those features stop working," your architecture is fragile. Every AI feature needs a fallback path — a simpler model, a template, a manual workflow.

Invisible AI Costs

AI costs are per-request, not per-seat. A power user making 500 AI requests per day costs 50x more to serve than a casual user making 10. If your pricing model is flat per-seat, AI power users will destroy your margins. Build cost monitoring from day one and consider usage-based pricing for AI features.

FAQ

How should I price AI features? Three options: include in the base plan (works if AI usage is predictable), add-on tier (separate AI plan at higher price), or usage-based (charge per AI action). Usage-based is the most sustainable but hardest to communicate. Most SaaS products start with an add-on tier and migrate to usage-based as AI features become core.

How do I A/B test AI vs non-AI features? Expose the AI feature to 50% of users and measure task completion time, user satisfaction, and retention. The key metric is not "do users like AI" — it is "do users accomplish their goals faster." If the AI version is slower or more confusing, the non-AI version wins regardless of how impressive the technology is.

What happens when the model API is down? Your product should still work. Every AI feature has a non-AI fallback. Monitoring detects API failures within 30 seconds and automatically routes to fallbacks. Users may see "AI features temporarily limited" — not a broken interface.

Should I build or buy my AI layer? If AI is a core differentiator, build. If AI is a feature enhancement, buy (use a managed service like OpenAI Assistants or pre-built components). If you are unsure, prototype with a managed service and migrate to custom if the feature proves valuable. We help SaaS teams make this decision.

Integration Patterns

1. The Copilot Pattern

2. The Agent Pattern

3. The Classifier Pattern

4. The Generator Pattern

User Experience Design for AI Features

Loading States

Confidence Indicators

Fallback Flows

Architecture for LLM Features

API Gateway Pattern

Streaming Responses

Caching Strategy

Anti-Patterns to Avoid

The "AI Everything" Approach

The Chatbot Sidebar

Ignoring Latency

No Graceful Degradation

Invisible AI Costs

FAQ

Related Reading

From Other Pillars

Explore More

Agents IA vocaux pour la vente : guide d'implémentation

More in AI

Agents IA vocaux pour la vente : guide d'implémentation

L'anatomie d'un agent IA en production

RAG vs Fine-Tuning : quand utiliser lequel

Créer un GPT personnalisé qui fonctionne vraiment

From Other Pillars

Headless CMS en 2026 : quand ça vaut le coup (et quand non)

Entreprises API-First : pourquoi ça compte

Empreinte digitale du navigateur en 2026 : ce que les opérateurs doivent savoir

Related Resources

Key Terms

Common Questions

Compare

Services

Industries

Need help with this?