Contact
AI

Patterns d'intégration LLM pour produits SaaS

Empirium Team11 min read

Every SaaS product is adding AI features. Most are doing it wrong — bolting a chatbot onto the sidebar, wrapping every feature in "AI-powered," and hoping users figure out what changed. The result is a product that feels gimmicky instead of genuinely useful.

There are four distinct patterns for integrating LLMs into existing SaaS products. Each solves a different problem, requires a different architecture, and has a different impact on user experience. Choosing the wrong pattern is worse than shipping no AI at all.

Integration Patterns

1. The Copilot Pattern

The model assists the user in their existing workflow. The user stays in control. The AI suggests, drafts, or autocompletes — but the human makes the final decision.

Examples: GitHub Copilot (code suggestions), Notion AI (writing assistance), Figma AI (design suggestions).

Best for: Tasks where the user has domain expertise and the AI accelerates their work. Content creation, data analysis, code writing, design iteration.

Architecture: The copilot runs in parallel with user actions. It observes context (current document, recent edits, cursor position) and generates suggestions asynchronously. Suggestions are displayed non-intrusively — inline hints, side panels, or keyboard-triggered completions.

Key metric: Time saved per task, not AI accuracy. A copilot that saves 30% of time but is only 70% accurate can still be valuable — the user corrects the 30% of suggestions that are wrong faster than writing from scratch.

2. The Agent Pattern

The model takes autonomous action on behalf of the user. The user defines a goal, and the agent figures out the steps.

Examples: Customer support agents, automated email responses, scheduling assistants.

Best for: Repetitive, well-defined workflows where the cost of errors is manageable. Support ticket triage, lead qualification, report generation, data entry.

Architecture: Agents need a production-grade orchestration layer with tool access, state management, and fallback chains. They operate asynchronously and report results to the user.

Key metric: Task completion rate and error rate. An agent that completes 85% of tasks correctly and escalates the other 15% is valuable. An agent that completes 95% but silently fails on 5% is dangerous.

3. The Classifier Pattern

The model categorizes, routes, or scores data. No generation — just classification.

Examples: Email priority scoring, support ticket categorization, lead scoring, content moderation, sentiment analysis.

Best for: High-volume data processing where the categories are well-defined and the cost per classification needs to be low.

Architecture: Classifiers are the simplest LLM integration. Input goes in, a label comes out. Use structured outputs (JSON mode or function calling) to ensure consistent labels. For high volume, consider fine-tuning a smaller model — classification does not need frontier model capabilities.

const classification = await llm.classify({
  input: supportTicket.body,
  categories: ['billing', 'technical', 'feature_request', 'complaint', 'spam'],
  outputFormat: 'json',
});
// Cost: ~$0.001 per classification with GPT-4o-mini

Key metric: Accuracy against human labels. Target 90%+ for production use. Below that, the correction overhead exceeds the automation benefit.

4. The Generator Pattern

The model creates content from structured data. The user provides inputs or triggers, and the AI generates complete outputs.

Examples: Product description generation, report writing, personalized email campaigns, code generation from specs.

Best for: Content production at scale where templates are too rigid and human writing is too slow.

Architecture: Generators need strong input validation (garbage in, garbage out), output formatting controls, and quality gates. Always include a human review step for externally published content.

Key metric: Quality score and generation cost vs human writing cost. If AI-generated content requires 20 minutes of human editing, compare that to the 45 minutes of writing from scratch.

User Experience Design for AI Features

The UX around AI features matters more than the AI itself. Users do not care about your model. They care about whether the feature helps them or wastes their time.

Loading States

LLM responses take 1-5 seconds. That is an eternity in UI terms. Handle it:

  • Streaming: Show tokens as they arrive. This is mandatory for any text generation feature. Users read while the model writes, and the perceived wait time drops to zero.
  • Progress indicators: For multi-step agent tasks, show which step is being executed. "Searching your CRM... Found 12 matches... Generating summary..."
  • Skeleton screens: For classification or scoring, show the UI structure immediately with a loading indicator where the AI result will appear.

Confidence Indicators

Users need to know when to trust the AI output:

  • High confidence: Show the result directly. No extra UI needed.
  • Medium confidence: Show the result with a subtle indicator: "AI-generated — review before sending."
  • Low confidence: Show the result with a prominent warning and easy access to the source data or a human alternative.

Fallback Flows

What happens when the AI fails? Every AI feature needs a non-AI fallback:

  • The AI email draft fails → show the template picker
  • The AI classifier is uncertain → route to manual triage
  • The AI search returns no results → fall back to keyword search

Users should never hit a dead end because the AI broke.

Architecture for LLM Features

API Gateway Pattern

All LLM requests should flow through a centralized gateway that handles:

  • Rate limiting: Per user, per feature, per organization
  • Cost tracking: Every request is logged with cost attribution
  • Model routing: Different features use different models based on quality/cost requirements
  • Caching: Identical or near-identical queries serve cached responses
  • Fallback: If the primary provider is down, route to the secondary
Feature Code → AI Gateway → Model Router → Provider API
                  ↓              ↓
              Cost Logger    Cache Layer

This pattern means you can switch providers, adjust rate limits, or add new models without touching feature code.

Streaming Responses

For any user-facing text generation, streaming is non-negotiable. The architecture differs from standard request-response:

// Server-Sent Events for streaming
app.get('/api/ai/generate', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  
  const stream = await llm.stream({
    model: 'claude-sonnet-4-20250514',
    messages: buildMessages(req.query),
  });
  
  for await (const chunk of stream) {
    res.write(`data: ${JSON.stringify(chunk)}\n\n`);
  }
  res.end();
});

Caching Strategy

Not all AI requests need a fresh model call. For cost optimization:

  • Exact match caching: Same input → same output. Works for classification and structured queries. Cache TTL: 1-24 hours.
  • Semantic caching: Similar inputs → cached output. Works for FAQ-style queries. Requires embedding comparison at request time.
  • User-level caching: Cache personalized results per user. Works for repeated workflows.

At scale, caching reduces AI costs by 30-50% with minimal impact on response quality.

Anti-Patterns to Avoid

The "AI Everything" Approach

Not every feature benefits from AI. Adding AI to a date picker, a settings page, or a file upload does not improve the product — it adds latency, cost, and confusion. Apply AI to features where the alternative is significantly worse without it.

The Chatbot Sidebar

Bolting a generic chatbot onto your SaaS product is the laziest possible AI integration. Users do not want to type natural language to do things they can already do with two clicks. A chatbot is appropriate when the query space is genuinely open-ended (support, research, exploration). For structured workflows, use pattern-specific UI.

Ignoring Latency

An AI feature that takes 8 seconds to respond will be abandoned after the second use. If you cannot achieve acceptable latency (under 2 seconds for interactive features, streaming for longer outputs), do not ship the feature. Improve it first.

No Graceful Degradation

If your AI provider has an outage, what happens to your product? If the answer is "those features stop working," your architecture is fragile. Every AI feature needs a fallback path — a simpler model, a template, a manual workflow.

Invisible AI Costs

AI costs are per-request, not per-seat. A power user making 500 AI requests per day costs 50x more to serve than a casual user making 10. If your pricing model is flat per-seat, AI power users will destroy your margins. Build cost monitoring from day one and consider usage-based pricing for AI features.

FAQ

How should I price AI features? Three options: include in the base plan (works if AI usage is predictable), add-on tier (separate AI plan at higher price), or usage-based (charge per AI action). Usage-based is the most sustainable but hardest to communicate. Most SaaS products start with an add-on tier and migrate to usage-based as AI features become core.

How do I A/B test AI vs non-AI features? Expose the AI feature to 50% of users and measure task completion time, user satisfaction, and retention. The key metric is not "do users like AI" — it is "do users accomplish their goals faster." If the AI version is slower or more confusing, the non-AI version wins regardless of how impressive the technology is.

What happens when the model API is down? Your product should still work. Every AI feature has a non-AI fallback. Monitoring detects API failures within 30 seconds and automatically routes to fallbacks. Users may see "AI features temporarily limited" — not a broken interface.

Should I build or buy my AI layer? If AI is a core differentiator, build. If AI is a feature enhancement, buy (use a managed service like OpenAI Assistants or pre-built components). If you are unsure, prototype with a managed service and migrate to custom if the feature proves valuable. We help SaaS teams make this decision.

Written by Empirium Team

Explore More

Deep-dive into related topics across our five pillars.

Pillar Guide

Agents IA vocaux pour la vente : guide d'implémentation

A production-focused guide to deploying voice AI agents for sales operations. Architecture, platform comparison, cost analysis, and the integration challenges nobody warns you about.

View all AI articles

Related Resources

Need help with this?

Talk to Empirium