AI/ML System Design Patterns

Ten battle-tested architecture patterns for building production AI/ML systems on AWS. Each pattern covers the problem, solution architecture, and rationale for key design decisions.

1. RAG-Powered Chatbot for Enterprise Knowledge Base

Problem

A large enterprise has 50K+ internal documents (PDFs, wikis, Confluence). Employees spend 2+ hours/day searching for information. A standard LLM can't answer questions about internal data.

Solution

Services: Amazon Bedrock (Claude) + OpenSearch Serverless + LangChain + Lambda + S3 + API Gateway + DynamoDB

User --[API Gateway]--> Lambda --[query]--> OpenSearch (vector store)
                                |--[context+query]--> Bedrock (Claude)
                                |--[answer]--> DynamoDB (chat history)
                                |--[response]--> User

Why This Solution

RAG (Retrieval-Augmented Generation) grounds LLM responses in enterprise data. Documents are chunked and embedded into OpenSearch Serverless vector store. When a user asks a question, the system retrieves the top-K most relevant chunks and injects them into the LLM prompt as context. This eliminates hallucination and keeps data private. OpenSearch Serverless auto-scales based on query volume — no GPU infrastructure to manage. LangChain handles the orchestration, document chunking, and prompt templating.

2. Real-Time Fraud Detection System

Problem

A payment processor handles 10M transactions/day. Current rule-based system catches only 60% of fraud with a 5% false positive rate. Need real-time scoring in <100ms.

Solution

Services: Kinesis + SageMaker + Lambda + DynamoDB + QuickSight + S3 + EventBridge

Transaction --[Kinesis]--> Lambda --[feature vector]--> SageMaker endpoint
                                      |--[score]--> DynamoDB (history)
                                      |--[alert]--> EventBridge --> SNS

Why This Solution

SageMaker endpoints serve ML models with <50ms inference latency. Kinesis streams transactions in real-time. Lambda pre-processes and calls the endpoint. DynamoDB stores transaction history and feature values. The model (XGBoost or Deep Learning) detects patterns rules can't catch. SageMaker automatically scales endpoints based on traffic. Model retraining happens weekly with new labeled data.

3. Document Processing Pipeline (OCR + NLP)

Problem

An insurance company receives 50K claims/day as PDFs, images, and scanned documents. Manual data entry takes 3 days per claim and has a 15% error rate.

Solution

Services: Textract + Comprehend + Step Functions + DynamoDB + S3 + Lambda + Bedrock

Upload --[S3 trigger]--> Textract (OCR) --> Comprehend (entities)
                                        |--> Bedrock (ambiguity resolution)
                                        |--> DynamoDB (structured output)

Why This Solution

Textract extracts text, forms, and tables from documents with >95% accuracy. Comprehend performs entity extraction (names, dates, policy numbers). Step Functions orchestrates the multi-step workflow with error handling and retries. Bedrock (Claude) handles complex reasoning for ambiguous cases that NLP can't resolve. The pipeline processes each claim in <2 minutes end-to-end vs. 3 days manually.

4. Personalized Product Recommendation Engine

Problem

An e-commerce platform with 10M products wants to increase conversion rate. Current "popular items" recommendations lead to a 2% CTR. Need personalized real-time recommendations.

Solution

Services: SageMaker + Personalize + ElastiCache + DynamoDB + API Gateway + CloudFront

User --[CloudFront]--> API Gateway --> Lambda --> Personalize (real-time)
                                              |--> ElastiCache (hot cache)
                                              |--> DynamoDB (user profile)

Why This Solution

Amazon Personalize provides real-time and batch recommendations without ML expertise — it uses the same technology as Amazon.com. SageMaker trains custom models for cold-start scenarios (new users, new items). ElastiCache caches popular recommendations for sub-millisecond delivery. Personalize supports multiple recommendation types: related items, frequently bought together, and personalized ranking.

5. Video Content Moderation Pipeline

Problem

A UGC platform gets 100K video uploads/day. 5% contain inappropriate content. Manual moderation doesn't scale and exposes reviewers to harmful content.

Solution

Services: Rekognition + Step Functions + Lambda + DynamoDB + SNS + SQS + A2I (Human-in-Loop)

Upload --[S3 trigger]--> Rekognition (auto-moderation)
                    95% clear --> DynamoDB (approved)
                    5% borderline --> A2I (human review)

Why This Solution

Rekognition Video detects unsafe content, celebrities, and text in videos. Step Functions orchestrates the workflow. Rekognition handles 95% of clear cases automatically. The remaining 5% (borderline cases) go to Augmented AI (A2I) for human review. DynamoDB tracks moderation status. Step Functions enables easy workflow changes as policies evolve.

6. Real-Time Voice/Speech Analytics

Problem

A contact center receives 50K calls/day. Need to analyze sentiment, detect escalation keywords, and generate transcript summaries in real-time to help agents.

Solution

Services: Amazon Connect + Transcribe + Comprehend + Lambda + DynamoDB + Bedrock + QuickSight

Call audio --[Connect]--> Transcribe (speech-to-text)
                    |--> Comprehend (real-time sentiment)
                    |--> Bedrock (summaries & suggested responses)
                    |--> QuickSight (manager dashboards)

Why This Solution

Transcribe provides real-time speech-to-text with speaker diarization. Comprehend performs real-time sentiment analysis. Bedrock generates call summaries and suggested responses. Lambda triggers actions based on detected keywords (e.g. "I want to cancel" → notify retention team). Managers get real-time dashboards in QuickSight.

7. Predictive Maintenance for Industrial Equipment

Problem

A manufacturing plant has 10K machines. Unexpected failures cause $1M/hour in downtime. Current maintenance is schedule-based, replacing parts too early or too late.

Solution

Services: IoT Core + SageMaker + Timestream + Grafana + Lambda + S3 + EventBridge

Sensors --[IoT Core]--> Timestream (time-series store)
                    |--> SageMaker (anomaly detection)
                    |--> Grafana (real-time dashboards)
                    |--> EventBridge (alerts)

Why This Solution

IoT Core ingests sensor data. SageMaker trains anomaly detection models on historical failure data. Timestream stores time-series sensor data with built-in analytics functions. Grafana provides real-time dashboards. The system predicts failures 48–72 hours in advance with >90% accuracy. SageMaker's built-in algorithms (Random Cut Forest) are purpose-built for anomaly detection.

8. MLOps Continuous Training Pipeline

Problem

A recommendation model's accuracy degrades 5% per week as user behavior shifts. Manual retraining takes 3 days and requires engineer intervention each time.

Solution

Services: SageMaker Pipelines + Lambda + CodeCommit + Step Functions + Model Registry + CloudWatch

Schedule/Drift detection --[Step Functions]--> SageMaker Pipelines
                                     |--> Model Registry (approval gate)
                                     |--> Canary deployment
                                     |--> CloudWatch (monitoring)

Why This Solution

SageMaker Pipelines automates the entire ML workflow: data validation, training, evaluation, and deployment. Step Functions triggers pipelines on schedule or when data drift is detected. Model Registry tracks model versions with approval gates. Canary deployments automatically roll back if metrics degrade. Automated retraining keeps accuracy stable without human intervention.

9. Intelligent Document Search and Discovery

Problem

A law firm has 5M legal documents. Keyword search misses relevant results 40% of the time. Associates spend hours manually finding related cases.

Solution

Services: OpenSearch (k-NN) + Bedrock + Lambda + S3 + Textract + Neptune

Document --[Textract]--> Bedrock (embeddings) --> OpenSearch k-NN
         --[Textract]--> Bedrock (entities) --> Neptune (knowledge graph)

Query --[OpenSearch k-NN]--> (semantic results)
     --[Neptune]--> (graph traversal results)

Why This Solution

k-NN in OpenSearch enables semantic search using embeddings. Bedrock generates embeddings and summarizes results. Textract extracts text from scanned documents. Neptune builds a knowledge graph of document relationships, citations, and entities. Combined vector + graph search provides much better recall than keyword search alone.

10. Generative AI Content Creation Pipeline

Problem

A marketing agency needs to produce 1000 personalized email campaigns/day. Manual copywriting takes 30 min/email. Need consistent brand voice across all content.

Solution

Services: Bedrock + Step Functions + S3 + DynamoDB + Lambda + API Gateway + Comprehend

Trigger --[API Gateway]--> Step Functions
              |--> Lambda: fetch customer segments
              |--> Bedrock: generate personalized copy
              |--> Comprehend: quality & sentiment check
              |--> SNS: human approval (high-value sends)
              |--> SES: send campaign

Why This Solution

Bedrock (Claude) generates personalized email copy based on customer segments using prompt templates. Step Functions orchestrates the pipeline: fetch customer data → generate content → check quality with Comprehend sentiment → human approval for high-value sends. DynamoDB stores generation metadata. S3 stores templates and assets. The pipeline generates 1000 emails in <5 minutes with consistent brand voice.

← Back to System Design Index