Gabriel Caiana

Amazon Bedrock in practice: AI as part of the architecture, not as an external dependency


Table of contents
  1. What is Bedrock, exactly?
  2. Why I chose Bedrock instead of staying with OpenAI
  3. Haiku vs Sonnet: the decision that impacts cost the most
  4. The context: the product behind this decision
  5. The honest take

There’s a specific moment I still remember clearly.

It was late at night, I was trying to turn a PDF into structured JSON. Specific fields extracted from the document, validated with Zod, clean output. I had OpenAI calls working, the pipeline looked good. Then the monthly invoice arrived. And with it, a question I didn’t want to answer: if this scales, how much is it going to cost?

That’s when I stopped and looked at what I had built. ECS, SQS, SNS, Cognito, RDS. Everything living inside AWS. And the AI piece, which was the core of the product, ran on an external API I didn’t control, with a key that could leak and a cost I couldn’t predict. It didn’t make sense.

What is Bedrock, exactly?

The most direct way to explain it: Bedrock is a managed AI model gateway inside AWS. You don’t need to host any model, you don’t need GPUs, you don’t need infrastructure. You call an API, pick a model and get a response.

The fundamental difference compared to calling OpenAI directly is that you’re inside AWS. Bedrock uses the same IAM Role from your ECS Task to authenticate. No exposed API key. No secret to rotate. No OPENAI_API_KEY in your .env that someone might accidentally commit. It’s IAM permissions, the same ones you already use to access S3, SQS, RDS. You just need bedrock:InvokeModel in the policy and you’re done.

Inside Bedrock you have multiple models available: Claude from Anthropic, Llama from Meta, Titan from Amazon itself, Stable Diffusion, and the list keeps growing. You can switch models without changing your architecture, without changing your auth system, without creating an account somewhere else. For me, building everything inside AWS on purpose, both as a learning lab and as a deliberate architectural choice, this brought everything under the same ecosystem.

Why I chose Bedrock instead of staying with OpenAI

Three things drove this decision.

The first was control. Credentials via IAM Role means the worker running on ECS never needs a static key. The token is temporary, automatically rotated by AWS. You eliminate the risk of key exfiltration entirely.

The second was consistency. The product uses Haiku for simple tasks and Sonnet for complex reasoning, and the cost difference between them is roughly 10x. Having both on the same gateway, with the same calling pattern, without managing two separate integrations, simplified operations significantly.

The third was billing. The entire infrastructure already ran on AWS, so consolidating AI into the same billing, with the same cost visibility as the rest of the stack, gave me predictability. No more surprise invoices from an external provider at the end of the month. On a per-token basis, OpenAI’s gpt-4o-mini is still cheaper than Haiku on Bedrock. But Haiku’s cost for structured tasks like document extraction and classification is low enough that the difference doesn’t justify keeping an external dependency at the core of the product.

Haiku vs Sonnet: the decision that impacts cost the most

Early on, I put Sonnet on everything. My reasoning was straightforward: “it’s the more capable model, so it’s safer to use.” In practice, Sonnet is more expensive, slower, and for structured text extraction where the output is predefined JSON, it doesn’t deliver proportionally better results than Haiku.

Today I split by task type. Everything that’s extraction and classification runs on Haiku: document to JSON, field categorization, normalization of unstructured data. Haiku handles this with high accuracy and low cost, roughly ~$0.001 per processed document.

Sonnet comes in when the task requires actual reasoning. Analyses that cross-reference multiple data sources and need to prioritize results, content generation that synthesizes context from different inputs, multi-turn conversations with contextualized feedback. These tasks need a more capable model, and that’s where Sonnet’s cost is justified.

The cost difference between them is roughly 10x. Using Sonnet where Haiku would do the job is wasting money at scale. I always start with Haiku and only move up when it fails on a real task, not out of caution.

The context: the product behind this decision

I’m building a SaaS where AI isn’t a secondary feature, it’s the core of the product. The user submits a document, the platform processes it, analyzes it and returns personalized results. All the intelligence runs through Bedrock, with no OpenAI API key, no external dependency beyond AWS.

Every model and architecture decision was made in this context: an entire product built inside AWS, with AI as core and not as an accessory. If you’ve been following the build process, I wrote about how I use Spec-Driven Development to structure technical decisions with AI as a development partner.

The honest take

Bedrock is not the easiest platform to get started with. The documentation is good but scattered. LocalStack doesn’t emulate Bedrock. STS for local development is a friction point. Configuring IAM Roles on ECS Tasks for bedrock:InvokeModel permission requires understanding IAM Trust Policies, and that has a real learning curve.

But once you get past that curve, what you have is an environment where there’s no static API key to protect, billing is consolidated with the rest of your infrastructure, switching models doesn’t require changing authentication code, streaming works natively for interactive experiences, and Titan Embeddings integrates directly with pgvector.

If you’re building an entire product inside AWS, not just using AWS to host something, Bedrock makes sense as a strategic choice. You keep the ecosystem cohesive. You don’t add a critical external dependency to the core of your product. That was my choice, and so far I haven’t had a reason to reconsider.


In the next article, I’ll show how this pipeline works in practice: async processing with SQS, the problem of running Bedrock alongside LocalStack, embeddings with Titan, streaming with SSE, and the mistakes you only discover in production.