A practical guide to using Amazon Bedrock with Claude models for production AI — from setup to scaling, without the complexity.
Most companies that want to use AI in production hit the same wall: they experiment with API keys and chatbot wrappers, but never get to something their team can rely on daily. Amazon Bedrock changes that equation — it gives you managed access to Claude models inside your existing AWS infrastructure, with the security and scaling you already trust.
Here's how to actually use it, based on what we've built for clients.
Why Bedrock Instead of Direct API Access
If you're already on AWS, Bedrock is the obvious choice. But it's worth spelling out why:
No API key management. Bedrock uses IAM roles — the same authentication your team already manages. No separate billing accounts, no API keys sitting in environment variables, no third-party vendor security reviews. Claude runs inside your AWS account, governed by the same policies as everything else.
Data stays in your VPC. For regulated industries — healthcare, finance, legal — this matters enormously. Your prompts and responses never leave AWS. You can use VPC endpoints to ensure traffic never touches the public internet. That's a compliance conversation that takes weeks with a standalone API and minutes with Bedrock.
Pay-per-token pricing with no commitment. You don't need to negotiate enterprise contracts or commit to monthly minimums. Run a proof of concept for a few dollars, then scale to thousands of daily requests without changing anything in your architecture.
Model switching is trivial. Today you're using Claude Sonnet for speed. Tomorrow you need Claude Opus for complex reasoning. That's a one-line change in your code — same endpoint, same authentication, same infrastructure.
Getting Started: The Minimum Setup
Here's what you need to get Claude running on Bedrock. This takes about 15 minutes if you already have an AWS account.
Step 1: Enable Model Access
In the AWS Console, go to Amazon Bedrock and request access to Anthropic Claude models. This is a one-click approval for most regions. You'll want us-east-1 or us-west-2 for the widest model selection.
Step 2: Set Up IAM Permissions
Your application needs an IAM role with bedrock:InvokeModel permission. Here's the minimum policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "bedrock:InvokeModel",
"Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-*"
}
]
}
Attach this to your Lambda execution role, ECS task role, or EC2 instance profile — wherever your application runs.
Step 3: Make Your First Call
Using the AWS SDK, calling Claude on Bedrock looks like this:
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new InvokeModelCommand({
modelId: "anthropic.claude-sonnet-4-6-20250514",
contentType: "application/json",
body: JSON.stringify({
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1024,
messages: [
{ role: "user", content: "Summarize this invoice and extract the total amount." }
]
})
}));
const result = JSON.parse(new TextDecoder().decode(response.body));
console.log(result.content[0].text);
That's it. No API keys, no external dependencies. If your code can already call AWS services, it can call Claude.
Real Architecture Patterns We Use
Pattern 1: Document Processing Pipeline
This is the most common pattern we build for clients. Documents come in — invoices, contracts, reports — and Claude extracts structured data.
The flow: S3 upload triggers a Lambda function. Lambda reads the document, sends it to Claude on Bedrock with extraction instructions, and writes the structured output to DynamoDB. A second Lambda handles any items flagged for human review.
Why this works: It's entirely serverless. You pay nothing when no documents are coming in. When a client uploads 500 invoices at once, Lambda scales automatically. Claude handles the understanding — dates, amounts, line items, vendor names — and your code handles the plumbing.
Cost reality: Processing a typical one-page invoice costs about $0.003 with Claude Sonnet. That's $3 for a thousand invoices. Compare that to manual data entry or legacy OCR tools that cost $0.10+ per page and still need human correction.
Pattern 2: Internal Knowledge Assistant
Every company has institutional knowledge trapped in wikis, Slack threads, and shared drives that nobody can find when they need it.
The flow: Ingest your documents into an S3 bucket. Use Bedrock Knowledge Bases to automatically chunk, embed, and index them. When someone asks a question, Bedrock retrieves relevant chunks and sends them to Claude with the question. Claude synthesizes an answer with source citations.
Why this works: Bedrock Knowledge Bases handles the RAG (retrieval-augmented generation) pipeline for you. No vector database to manage, no embedding pipeline to build, no chunking strategy to tune. You point it at S3 and it works.
What to watch for: Quality depends entirely on your source documents. If your wiki is full of outdated pages, Claude will confidently cite outdated information. Clean your sources first — this is the step most teams skip and then blame the AI for being wrong.
Pattern 3: Customer-Facing AI Features
Adding Claude-powered features to your product — smart search, content generation, automated responses — requires more thought about reliability and cost control.
The flow: Your application calls a backend API. The API constructs a prompt with the user's input plus relevant context from your database. It calls Claude on Bedrock, validates the response, and returns it to the user.
Critical guardrails:
- Set max_tokens aggressively. If you expect a one-paragraph response, set max_tokens to 300, not 4096. This caps your cost per request and prevents runaway responses.
- Add response validation. Before returning Claude's output to your user, check that it matches expected formats. If you asked for JSON, parse it. If you asked for a summary under 100 words, check the length.
- Implement caching. If the same question gets asked repeatedly (and it will), cache responses in ElastiCache or DynamoDB. This cuts costs dramatically and improves response times from seconds to milliseconds.
- Use Bedrock Guardrails. Configure content filters, denied topics, and PII redaction directly in Bedrock. This runs before and after Claude's response, catching issues your application code might miss.
Cost Management That Actually Works
AI costs can spiral if you're not paying attention. Here's how we keep them predictable:
Track per-feature costs, not just total spend. Use AWS Cost Allocation Tags on your Bedrock calls. Tag by feature, by customer tier, by environment. When your bill jumps, you'll know exactly which feature caused it — not just that "AI costs went up."
Right-size your model choice. Claude Haiku costs roughly 1/10th of Claude Opus. For classification tasks, simple extraction, and routing decisions, Haiku is more than enough. Save Opus for complex reasoning tasks where quality directly impacts business outcomes. Most production workloads should use Sonnet as the default — it hits the sweet spot of capability and cost.
Set billing alarms. CloudWatch alarms on Bedrock spend are free to set up and will save you from surprise bills. Set them at 50%, 80%, and 100% of your expected monthly spend.
Implement token budgets per user. If your application is user-facing, set daily or monthly token limits per user or per account. This prevents a single power user (or a bug) from consuming your entire budget.
Monitoring and Observability
You can't improve what you don't measure. Here's the minimum monitoring setup:
CloudWatch Metrics: Bedrock automatically publishes invocation count, latency, and error rates. Create a dashboard that shows these by model ID. You'll quickly see if a model is slower than expected or if error rates spike.
CloudWatch Logs: Enable model invocation logging to capture inputs and outputs. This is essential for debugging — when a user reports a bad response, you need to see exactly what prompt was sent and what came back. Store logs in S3 for long-term analysis.
Custom metrics that matter:
- Tokens per request — tracks whether your prompts are growing unexpectedly
- Cache hit rate — tells you if your caching strategy is working
- Human escalation rate — for automated workflows, how often does a human need to intervene
- End-to-end latency — not just Bedrock response time, but total time from user request to response delivered
Common Mistakes We See
Stuffing too much context into every prompt. Just because Claude can handle 200K tokens doesn't mean every request should include your entire knowledge base. More context means slower responses and higher costs. Send only what's relevant to the specific question.
Skipping evaluation before launch. Build a test set of 50-100 real examples before going live. Run them through your system and have a human grade the responses. This takes a day and prevents embarrassing failures in production.
Treating AI as a black box. Log your prompts and responses. Review them weekly. You'll find patterns — certain question types that consistently get poor answers, prompts that could be tightened, responses that need better formatting. This ongoing tuning is what separates good AI features from great ones.
Ignoring latency. Claude Opus might give better answers, but if it takes 8 seconds to respond in a user-facing feature, your users won't wait. Profile your latency requirements first, then choose the model that fits.
Building without fallbacks. Bedrock can have brief outages. Your application should handle this gracefully — return a cached response, fall back to a simpler model, or show a helpful message. Never let an AI service outage crash your application.
When to Build In-House vs. When to Get Help
Build in-house if: You have an AWS-experienced team, a clear use case, and time to iterate. The patterns above are straightforward for teams that already deploy on AWS.
Get help if: You need to move fast, you're in a regulated industry with compliance requirements, or this is your first production AI deployment. The difference between a proof of concept and a production system is significant — error handling, monitoring, cost controls, security hardening, and operational runbooks all take time to get right.
We've deployed Bedrock solutions across industries — from document processing pipelines that handle thousands of pages daily to customer-facing AI features serving real-time requests. The technology is ready. The question is whether your team has the bandwidth to do it well on the first try.
Getting Started This Week
Here's a practical plan for your first Bedrock deployment:
Day 1: Enable Bedrock model access, set up IAM permissions, make your first API call from a test script. Prove that the connection works.
Day 2-3: Build a minimal version of your use case. One Lambda, one prompt, one output format. Don't optimize — just get it working end to end.
Day 4: Add monitoring (CloudWatch dashboard) and basic error handling. Set a billing alarm.
Day 5: Run your test set through it. Grade the results. Identify the biggest quality gaps.
Week 2: Iterate on prompt quality, add caching, implement guardrails. By the end of week two, you should have something worth showing to stakeholders.
That's the path from "we should use AI" to "we have AI in production" — and it's shorter than most people expect.