Audit trail via OTLP: every agent run as a trace

Terraform PR Agent June 9, 2026 · 15 min read

bedrock
pydantic-ai
audit
opentelemetry
s3

Prerequisites + catch-up download

Tooling and AWS access common to every post in this series.

Tooling

Terraform 1.x (install). Every post provisions infrastructure with Terraform.
uv for Python project management (install). Each post ships a runnable script you can invoke with uv run.
direnv (install) so terraform, uv run, and aws pick up AWS credentials automatically on cd. The project scaffold ships an .envrc that sources a gitignored .envrc.local.
(Optional) A coding agent such as Claude Code, Cursor, Codex, or Gemini CLI to consume the AgentPrompt blocks throughout the series. Not required (each prompt has a manual equivalent shown alongside it), but it skips the boilerplate.

Agent prompt: Check and install missing tooling

You are helping set up tooling for a tutorial project.

For each of `terraform`, `uv`, and `direnv`, run `command -v` to
check whether it is installed. If present, print the version and
continue.

For missing tools, detect the system package manager in this order:
`command -v brew`, `command -v dnf`, `command -v apt-get`. Use the
first one available:

  - Terraform: `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`,
    dnf via the HashiCorp RPM repo, or apt via the HashiCorp deb repo.
  - uv: `brew install uv`, or the official installer
    `curl -LsSf https://astral.sh/uv/install.sh | sh`.
  - direnv: `brew install direnv`, `dnf install direnv`, or
    `apt-get install direnv`.

If no package manager is available or the install fails, stop and
link the manual install page so the developer can finish by hand:

  - Terraform: https://developer.hashicorp.com/terraform/install
  - uv: https://docs.astral.sh/uv/getting-started/installation/
  - direnv: https://direnv.net/docs/installation.html

After installing direnv, do not modify any shell rc files. Print the
hook line for the developer's shell (bash, zsh, or fish) and the path
to the relevant rc file, then wait for them to apply it themselves.

Report which tools were already present, which you installed, and
which need manual follow-up.

AWS access

A sandbox, test, or personal AWS account with permission to create, modify, and delete the resources discussed in each post. If you don’t have one, follow the official Create Your AWS Account walkthrough (about ten minutes; requires a credit card and a phone number for verification). Treat it as disposable - you can close it from the billing console after the series.
AWS credentials available locally via aws configure sso, aws configure, or whichever method matches your setup. You wire them into the project through .envrc.local in the next section, not your shell rc.

Anthropic First Time Use

Bedrock requires a one-time use-case form per account (or per AWS Organization management account) before Anthropic models can be invoked. Easiest path: open any Claude model in the Bedrock console playground and submit the form. Auto-subscription on first invoke can take up to 15 minutes to settle, so it is worth clearing this before post 1.

CLI alternative and verification

Programmatic equivalent (requires AWS CLI 2.27.42 or later):

1
aws bedrock put-use-case-for-model-access \
2
  --form-data "$(printf '{"companyName":"...","companyWebsite":"...","intendedUsers":"1","industryOption":"...","otherIndustryOption":"","useCases":"..."}' | base64)"

Verify:

1
aws bedrock get-foundation-model-availability \
2
  --model-id anthropic.claude-haiku-4-5-20251001-v1:0 \
3
  --region eu-west-1

Look for agreementAvailability.status: AVAILABLE. Expected output:

1
{
2
  "modelId": "anthropic.claude-haiku-4-5-20251001-v1",
3
  "agreementAvailability": { "status": "AVAILABLE" },
4
  "authorizationStatus": "AUTHORIZED",
5
  "entitlementAvailability": "AVAILABLE",
6
  "regionAvailability": "AVAILABLE"
7
}

If the form has not been submitted, only agreementAvailability.status flips to NOT_AVAILABLE. The other three fields stay green even when invocation would fail, so do not rely on them.

Project scaffold

Download the cumulative checkpoint that matches the state at the start of this post:

1
mkdir -p ~/projects
2
cd ~/projects
3
curl -fsSL https://andreaslang.dev/terraform-pr-agent/terraform-pr-agent-01.tar.gz | tar xz

This contains everything through post 1. If you followed the previous post, your tree should already match; the curl above is for joining mid-series or recovering from drift. Tooling and AWS access from the sections above still apply.

The posts build on each other, so you may need artifacts created by previous posts to be able to run the examples.

Optional: Logfire for the live trace UI

Logfire is the live visualizer we screenshot in the querying section. It is optional: the span processor we set up below only enables the Logfire leg when a write token is present. You can finish the post without it and query the audit copy directly via Athena; the rest of the setup works either way.

If you want it, sign up at logfire.pydantic.dev (GitHub SSO works for the personal tier), create a project named terraform-pr-agent, then open Project settings -> Write tokens -> New write token. There is nothing finer-grained to pick: Logfire write tokens are project-scoped write keys by design. They cannot read telemetry, list other projects, delete the project, or impersonate users, so the blast radius if the token leaks is “someone can write garbage spans into this one project.” Name the token terraform-pr-agent-collector so it is obvious what it belongs to, and copy the value immediately (the UI shows it once).

Then add it to .envrc.local and run direnv reload:

32
export LOGFIRE_TOKEN="pylf_v1_..."

If you skip this section, leave LOGFIRE_TOKEN unset. The collector config below uses a Terraform conditional on the variable being non-empty, so the otlphttp exporter and its IAM permission only materialise when the token is there.

What this post covers

Every agent run becomes an OpenTelemetry trace. We wire pydantic-ai’s instrumentation to an in-process span processor in the Lambda itself, fanning OTLP out to two destinations: an S3 bucket with Object Lock in compliance mode for the immutable audit copy, and Logfire as the live visualizer for debugging the same trace. Logfire is not the system of record. S3 is.

The final tree. + is new in post 2, ~ extends a post 1 file, blank carries unchanged. The download below fast-forwards to this state if you want to walk through the post against the finished code.

terraform-pr-agent/
  infra/
    + audit-bucket.tf
    + firehose.tf
    + kms.tf
    + lambda.tf
    + logfire.tf
    ~ variables.tf
      alerts.tf
      bedrock.tf
      cloudwatch.tf
      iam.tf
      main.tf
  agent/
    + handler.py
    + __init__.py
  scripts/
    + build-lambda.sh
    + queries.sql
    + traces.sql
      chat.py
  + pyproject.toml

Fast-forward to the final code of this post

Download the cumulative checkpoint that matches the state at the end of this post. Useful for landing on the finished tree without working through every step.

1
mkdir -p ~/projects
2
cd ~/projects
3
curl -fsSL https://andreaslang.dev/terraform-pr-agent/terraform-pr-agent-02.tar.gz | tar xz

Architecture

Post 1 was a local script calling Bedrock through an assumed role, with CloudWatch metrics as the only observation surface. Post 2 wraps that call in a Lambda, adds the dual-sink span pipeline (logfire to the live UI, an in-memory buffer to Firehose for the audit copy), and stands up the S3-side query layer on top.

OTLP schema for audit

We do not design a schema ourselves. The OpenTelemetry GenAI semantic conventions already specify what an agent run looks like as a trace, and pydantic-ai’s Logfire instrumentation emits that shape for us. Reusing the convention means the same trace renders in Logfire, Jaeger, Tempo, or any OTLP-compatible backend, and attributes like gen_ai.usage.input_tokens mean the same thing everywhere.

A trace is a tree of spans sharing one trace_id. Each span has a parent, a start and end time, a status, and a bag of attributes. Per the OTel spec, span names follow {operation} {target}: the agent run becomes a root invoke_agent span, each LLM call becomes a chat span named after the model, and each tool becomes an execute_tool span. Pydantic-ai emits these names when you opt into the spec-compliant instrumentation with logfire.instrument_pydantic_ai(version=5). Rendered as a waterfall in a typical trace UI, one agent run looks like this:

invoke_agent terraform_pr_agent  ████████████████  4.1s
  chat claude-haiku-4-5          █████             0.9s
  execute_tool write_file             █            0.2s
  chat claude-haiku-4-5                ███         0.6s
  execute_tool terraform_validate         ██       0.3s

Each chat span carries the GenAI attributes that matter: gen_ai.request.model, gen_ai.usage.input_tokens and output_tokens, gen_ai.input.messages and gen_ai.output.messages (when content recording is on), and gen_ai.response.finish_reasons. Each execute_tool span carries the tool name, arguments, and result. Concretely:

chat claude-haiku-4-5
  gen_ai.system                  aws.bedrock
  gen_ai.operation.name          chat
  gen_ai.request.model           claude-haiku-4-5
  gen_ai.usage.input_tokens      1247
  gen_ai.usage.output_tokens     312
  gen_ai.response.finish_reasons ["stop"]
  gen_ai.input.messages          [{role: user, parts: [...]}]
  gen_ai.output.messages         [{role: assistant, parts: [...]}]

execute_tool write_file
  gen_ai.operation.name          execute_tool
  gen_ai.tool.name               write_file
  gen_ai.tool.call.id            toolu_01ABC...
  gen_ai.tool.call.arguments     {path: main.tf, content: ...}
  gen_ai.tool.call.result        {ok: true, bytes_written: 412}

That is the free debugger. When a run misbehaves you have a complete record of what the agent saw, what it called, what the call returned, and what it decided next. The pydantic-ai Logfire integration docs list everything emitted by default.

Infra

Variables

The full variables.tf with post 1’s two carried through and three new ones for the audit pipeline (retention horizon, Parameters-and-Secrets layer ARN, and the Logfire write token):

1
variable "alert_email" {
2
  description = "Email address subscribed to the agent alerts SNS topic. Set via TF_VAR_alert_email."
3
  type        = string
4
}
5

6
variable "daily_token_alarm_threshold" {
7
  description = "Daily combined input + output token threshold. Crossing it sends an email via SNS."
8
  type        = number
9
  default     = 1000000
10
}
11

12
variable "audit_retention_days" {
13
  type        = number
14
  description = "Object Lock default retention in days. Tutorial default is 7 so the bucket is easy to clean up; production audit horizons are typically years (e.g. 2555 for SOX-style controls)."
15
  default     = 7
16
}
17

18
# arm64 build of the AWS Parameters and Secrets Lambda Extension, pinned to eu-west-1.
19
variable "secrets_extension_layer_arn" {
20
  type        = string
21
  description = "ARN of the AWS Parameters and Secrets Lambda Extension layer."
22
  default     = "arn:aws:lambda:eu-west-1:015030872274:layer:AWS-Parameters-and-Secrets-Lambda-Extension-Arm64:87"
23
}
24

25
variable "logfire_token" {
26
  type        = string
27
  description = "Logfire write token. Leave empty to skip the Logfire integration. Set via TF_VAR_logfire_token in .envrc.local."
28
  default     = ""
29
  sensitive   = true
30
}

The Lambda

Two things in lambda.tf worth pointing at. First, the IAM policy mirrors post 1’s Bedrock invoke perms, then layers SSM + KMS Decrypt for the Logfire token (gated by dynamic blocks so they only attach when TF_VAR_logfire_token is set) and firehose:PutRecord on the audit stream:

14
# trivy:ignore:avd-aws-0057
15
# Bedrock foundation-model ARNs do not pin to the caller region (the inference
16
# profile fans out cross-region), and Marketplace subscription actions are
17
# global by design.
18
data "aws_iam_policy_document" "lambda_permissions" {
19
  # Bedrock invocation. Same shape as iam.tf's bedrock_invoke; copied
20
  # here so the Lambda role is self-contained and does not require
21
  # the user-role policy to also be attached to the Lambda role.
22
  statement {
23
    actions = [
24
      "bedrock:Converse",
25
      "bedrock:ConverseStream",
26
      "bedrock:InvokeModel",
27
      "bedrock:InvokeModelWithResponseStream",
28
    ]
29
    # Bedrock foundation-model ARNs do not pin to the caller region; the
30
    # inference profile fans out cross-region, so the * region segment is required.
31
    #trivy:ignore:avd-aws-0057
32
    resources = [
33
      aws_bedrock_inference_profile.agent.arn,
34
      local.system_inference_profile_arn,
35
      "arn:aws:bedrock:*::foundation-model/${local.bedrock_model_id}",
36
    ]
37
  }
38

39
  statement {
39 collapsed lines
40
    actions = [
41
      "aws-marketplace:Subscribe",
42
      "aws-marketplace:Unsubscribe",
43
      "aws-marketplace:ViewSubscriptions",
44
    ]
45
    # Marketplace subscription actions are global by design.
46
    #trivy:ignore:avd-aws-0057
47
    resources = ["*"]
48
  }
49

50
  # SSM SecureString read for the Logfire token, only when wired.
51
  dynamic "statement" {
52
    for_each = local.logfire_token_wired ? [1] : []
53
    content {
54
      actions   = ["ssm:GetParameter"]
55
      resources = [aws_ssm_parameter.logfire_token[0].arn]
56
    }
57
  }
58

59
  # KMS Decrypt on the AWS-managed SSM key, required to read a
60
  # SecureString value through the extension. Only when wired.
61
  dynamic "statement" {
62
    for_each = local.logfire_token_wired ? [1] : []
63
    content {
64
      actions = ["kms:Decrypt"]
65
      resources = [
66
        "arn:aws:kms:${data.aws_region.current.region}:${data.aws_caller_identity.current.account_id}:alias/aws/ssm",
67
      ]
68
    }
69
  }
70

71
  statement {
72
    actions = [
73
      "firehose:PutRecord",
74
      "firehose:PutRecordBatch",
75
    ]
76
    resources = [aws_kinesis_firehose_delivery_stream.audit.arn]
77
  }
78
}

Second, the function attaches the AWS Parameters and Secrets Lambda Extension as a layer. The extension caches SSM reads inside the Lambda execution environment, so we pay the SSM read cost at most once per cold start. The archive_file placeholder is a one-line stub that exists only to let Terraform create the function on first apply; the deploy flow below replaces it. lifecycle.ignore_changes keeps later terraform apply runs from clobbering deployed code.

117
resource "aws_lambda_function" "agent" {
118
  function_name = "terraform-pr-agent"
119
  role          = aws_iam_role.lambda.arn
120
  runtime       = "python3.13"
121
  architectures = ["arm64"]
122
  handler       = "agent.handler.handler"
123
  timeout       = 60
124
  memory_size   = 512
125

126
  filename         = data.archive_file.placeholder.output_path
127
  source_code_hash = data.archive_file.placeholder.output_base64sha256
128

129
  layers = [var.secrets_extension_layer_arn]
130

131
  tracing_config {
132
    mode = "Active"
133
  }
134

135
  environment {
136
    variables = merge(
137
      {
138
        BEDROCK_INFERENCE_PROFILE_ARN           = aws_bedrock_inference_profile.agent.arn
139
        BEDROCK_MODEL_ID                        = local.bedrock_model_id
140
        PARAMETERS_SECRETS_EXTENSION_CACHE_SIZE = "100"
141
        PARAMETERS_SECRETS_EXTENSION_HTTP_PORT  = "2773"
142
        FIREHOSE_DELIVERY_STREAM                = aws_kinesis_firehose_delivery_stream.audit.name
143
      },
144
      local.logfire_token_wired ? {
145
        LOGFIRE_TOKEN_PARAMETER = local.logfire_token_parameter_name
146
      } : {},
147
    )
148
  }
149

12 collapsed lines
150
  lifecycle {
151
    ignore_changes = [
152
      filename,
153
      source_code_hash,
154
    ]
155
  }
156

157
  depends_on = [
158
    aws_iam_role_policy_attachment.lambda_basic_execution,
159
    aws_iam_role_policy.lambda_permissions,
160
  ]
161
}

KMS and the audit bucket

Our audit bucket gets a dedicated KMS key for SSE-KMS. This allows us to further restrict reads by only allowing certain IAM roles to decrypt the audit bucket’s contents. It also gives us another audit trail via CloudTrail: every Decrypt is logged with the IAM role that requested it. We could also add S3 access logging as another layer, but for this post we skip it to keep the moving parts down.

2
resource "aws_kms_key" "audit" {
3
  description             = "Encrypts the terraform-pr-agent audit bucket."
4
  enable_key_rotation     = true
5
  deletion_window_in_days = 7
6
  policy                  = data.aws_iam_policy_document.audit_kms_key_resource_policy.json
7
}
8

9
resource "aws_kms_alias" "audit" {
10
  name          = "alias/terraform-pr-agent-audit"
11
  target_key_id = aws_kms_key.audit.key_id
12
}

One common gotcha with KMS is that a key policy (resource policy) is required to enable IAM-based control within the AWS account of the key (or others if you choose). The key policy below does that. Notice the principal: arn:aws:iam::<account>:root is AWS’s shorthand for “this account, governed by IAM,” not the literal root user. It’s how you delegate the key’s authorization to IAM policies, so concrete grants can live on role policies instead of in the key policy itself.

16
data "aws_iam_policy_document" "audit_kms_key_resource_policy" {
17
  statement {
18
    sid       = "EnableIAMUserPermissions"
19
    actions   = ["kms:*"]
20
    resources = ["*"]
21
    principals {
22
      type        = "AWS"
23
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
24
    }
25
  }
26
}

The audit bucket follows a standard setup with versioning enabled and all public access blocked.

6
# Access logging would require a second bucket and is out of scope: the audit
7
# copy here is the system of record for what the agent did, not for who read it.
8
#trivy:ignore:avd-aws-0089
9
resource "aws_s3_bucket" "audit" {
10
  bucket              = local.audit_bucket_name
11
  object_lock_enabled = true
12
}
13

14
resource "aws_s3_bucket_versioning" "audit" {
15
  bucket = aws_s3_bucket.audit.id
16

17
  versioning_configuration {
18
    status = "Enabled"
19
  }
20
}
21

22
resource "aws_s3_bucket_public_access_block" "audit" {
23
  bucket                  = aws_s3_bucket.audit.id
24
  block_public_acls       = true
25
  block_public_policy     = true
26
  ignore_public_acls      = true
27
  restrict_public_buckets = true
28
}
29

30
resource "aws_s3_bucket_server_side_encryption_configuration" "audit" {
31
  bucket = aws_s3_bucket.audit.id
32

33
  rule {
34
    apply_server_side_encryption_by_default {
35
      sse_algorithm     = "aws:kms"
36
      kms_master_key_id = aws_kms_key.audit.arn
37
    }
38

39
    bucket_key_enabled = true
40
  }
41
}

S3 Object Lock is the actual audit primitive (sometimes called WORM, write-once-read-many). With this config S3 blocks every modification to a locked object, regardless of IAM. We start in GOVERNANCE mode while validating the pipeline; flip to COMPLIANCE once you’re confident. The difference matters: in GOVERNANCE you can still delete or shorten retention with s3:BypassGovernanceRetention plus the x-amz-bypass-governance-retention: true header, so mistakes are recoverable. In COMPLIANCE that escape hatch is gone, not even root can shorten the lock. Lifecycle expirations also defer until each object’s retention clock runs out (transitions to colder storage classes still work).

For high-risk systems under the EU AI Act, Article 12 requires automatic event logging over the system’s lifetime and Article 19 sets a six-month minimum retention. Object Lock enforces that retention at the storage layer rather than via IAM policy, which closes the most common gap in cloud audit trails.

Two things this setup does not give you. First, cryptographic chain-of-custody is not mandated by the Act, but if you ever want tamper-evidence stronger than “the bucket says no”, add signed or hash-chained logs on top. Second, GDPR right-to-erasure collides with COMPLIANCE-mode retention: anything personal you write to the bucket is locked in for the full window, so plan what you log about identifiable individuals before turning it on.

73
# GOVERNANCE leaves the bucket deletable for the tutorial; switch to "COMPLIANCE" in production.
74
resource "aws_s3_bucket_object_lock_configuration" "audit" {
75
  bucket = aws_s3_bucket.audit.id
76

77
  rule {
78
    default_retention {
79
      mode = "GOVERNANCE"
80
      days = var.audit_retention_days
81
    }
82
  }
83
}

Simple logic for the lifecycle policy: after the audit retention period ends the current version expires; S3 adds a delete marker, turning the version that held the data into a noncurrent version. One day later that noncurrent version is permanently deleted, so storage stops being billed.

There’s also a transition to Glacier IR at day 90, which only fires when audit_retention_days > 90.

87
resource "aws_s3_bucket_lifecycle_configuration" "audit" {
88
  bucket = aws_s3_bucket.audit.id
89

90
  rule {
91
    id     = "transition-to-glacier-ir"
92
    status = "Enabled"
93

94
    filter {}
95

96
    transition {
97
      days          = 90
98
      storage_class = "GLACIER_IR"
99
    }
100

101
    noncurrent_version_transition {
102
      noncurrent_days = 90
103
      storage_class   = "GLACIER_IR"
104
    }
105
  }
106

107
  rule {
108
    id     = "expire-after-retention"
109
    status = "Enabled"
110

111
    filter {}
112

113
    expiration {
114
      days = var.audit_retention_days
115
    }
116

117
    noncurrent_version_expiration {
118
      noncurrent_days = 1
119
    }
120
  }
121
}

Firehose

The agent runs in Lambda, so in principle it can scale to high concurrency. Firehose handles that cleanly: 100k records/second is far more than we should ever need, and it also handles batching: multiple traces (PutRecord calls) get combined into a single S3 object, avoiding the small-files problem on read.

72
resource "aws_kinesis_firehose_delivery_stream" "audit" {
73
  name        = "terraform-pr-agent-audit"
74
  destination = "extended_s3"
75

76
  extended_s3_configuration {
77
    role_arn            = aws_iam_role.firehose.arn
78
    bucket_arn          = aws_s3_bucket.audit.arn
79
    prefix              = "traces/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/hour=!{timestamp:HH}/"
80
    error_output_prefix = "errors/!{firehose:error-output-type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"
81
    buffering_size      = 5
82
    buffering_interval  = 60
83
    compression_format  = "GZIP"
84

85
    cloudwatch_logging_options {
86
      enabled         = true
87
      log_group_name  = aws_cloudwatch_log_group.firehose.name
88
      log_stream_name = aws_cloudwatch_log_stream.firehose_s3.name
89
    }
90
  }
91
}

Firehose needs IAM to write encrypted objects into the audit bucket. The only slightly surprising permission is s3:PutObjectRetention. S3 evaluates the lock policy on every PutObject and rejects writers that can’t set retention, even when the value being applied is just the bucket default. That detail surfaces another important fact: the lock period is persisted at write time, so changing the bucket default later only affects future objects, not existing ones. This is by design - anything else would undermine the lock.

2
data "aws_iam_policy_document" "firehose_assume" {
3
  statement {
4
    actions = ["sts:AssumeRole"]
5
    principals {
6
      type        = "Service"
7
      identifiers = ["firehose.amazonaws.com"]
8
    }
9
  }
10
}
11

12
resource "aws_iam_role" "firehose" {
13
  name               = "terraform-pr-agent-firehose"
14
  assume_role_policy = data.aws_iam_policy_document.firehose_assume.json
15
}
16

17
data "aws_iam_policy_document" "firehose_permissions" {
18
  statement {
19
    actions = [
20
      "s3:PutObject",
21
      "s3:PutObjectRetention",
22
      "s3:GetBucketLocation",
23
      "s3:ListBucket",
24
    ]
25
    # /* is how S3 IAM grants object-scoped actions; bounded to the audit bucket.
26
    #trivy:ignore:avd-aws-0057
27
    resources = [
28
      aws_s3_bucket.audit.arn,
29
      "${aws_s3_bucket.audit.arn}/*",
30
    ]
31
  }
32

33
  statement {
34
    actions = [
35
      "kms:GenerateDataKey",
36
      "kms:Decrypt",
37
    ]
38
    resources = [aws_kms_key.audit.arn]
39
  }
40

41
  statement {
42
    actions = ["logs:PutLogEvents"]
43
    # :* covers log streams inside the named Firehose log group only.
44
    #trivy:ignore:avd-aws-0057
45
    resources = ["${aws_cloudwatch_log_group.firehose.arn}:*"]
46
  }
47
}
48

49
resource "aws_iam_role_policy" "firehose_permissions" {
50
  name   = "terraform-pr-agent-firehose-permissions"
51
  role   = aws_iam_role.firehose.id
52
  policy = data.aws_iam_policy_document.firehose_permissions.json
53
}

Build and deploy

The toy chat.py was a PEP 723 inline script; for Lambda we switch to a proper pyproject.toml:

1
[project]
2
name = "terraform-pr-agent"
3
version = "0.1.0"
4
description = "The terraform-pr-agent Lambda handler."
5
requires-python = ">=3.13"
6
dependencies = [
7
    "pydantic-ai-slim[bedrock]>=1.106",
8
    "logfire>=4.35",
9
    "boto3>=1.35",
10
]

The build script is the Astral uv-on-Lambda flow: export the locked deps to a requirements.txt, install them into a target dir with the manylinux2014 platform pinned, drop agent/ alongside, zip.

1
#!/usr/bin/env bash
2
# Build a deployable zip for the terraform-pr-agent Lambda.
3
#
4
# Runs from anywhere; cd's to the project root (the dir holding
5
# pyproject.toml). Produces build/lambda.zip ready for:
6
#
7
#   aws lambda update-function-code \
8
#       --function-name terraform-pr-agent \
9
#       --zip-file fileb://build/lambda.zip
10
#
11
# Mirrors the pattern from
12
# https://docs.astral.sh/uv/guides/integration/aws-lambda/
13
set -euo pipefail
14

15
cd "$(dirname "$0")/.."
16

17
# Make sure the lock is in sync with pyproject.toml before exporting.
18
uv sync --quiet
19

20
rm -rf build
21
mkdir -p build/packages
22

23
# Export the locked dependency set so uv pip install can consume it
24
# without re-resolving.
25
uv export --frozen --no-dev --no-editable -o build/requirements.txt
26

27
# Install deps for Lambda's runtime. --python-platform forces wheels
28
# compatible with Amazon Linux 2 on arm64 (matches the function's
29
# architectures = ["arm64"]). --no-compile-bytecode keeps the zip small
30
# and avoids spending cold-start cycles on pyc creation.
31
uv pip install \
32
    --no-installer-metadata \
33
    --no-compile-bytecode \
34
    --python-platform aarch64-manylinux2014 \
35
    --python 3.13 \
36
    --target build/packages \
37
    -r build/requirements.txt
38

39
# Drop the handler package alongside the installed deps.
40
cp -r agent build/packages/
41

42
# Zip from inside the staging dir so paths sit at the zip root.
43
( cd build/packages && zip -qr ../lambda.zip . )
44

45
echo "built: $(pwd)/build/lambda.zip ($(du -h build/lambda.zip | cut -f1))"

Push the zip and smoke-test:

1
chmod +x scripts/build-lambda.sh
2
./scripts/build-lambda.sh
3
aws lambda update-function-code \
4
  --function-name terraform-pr-agent \
5
  --zip-file fileb://build/lambda.zip
6
aws lambda invoke \
7
  --function-name terraform-pr-agent \
8
  --payload '{"prompt": "Hello"}' \
9
  /tmp/out.json
10
cat /tmp/out.json

If TF_VAR_logfire_token was set, the trace lands in your Logfire project. If not, the call still succeeds; logfire just does not ship anywhere.

Wiring pydantic-ai to OTLP

We use the Logfire and OTLP SDKs to wire up the pydantic-ai instrumentation and export into two sinks:

S3 (for audit)
Logfire (for traces, optional)

Fetching the Logfire token at first invoke

Lambda secrets are often wired directly into environment variables. AWS recommends against this: env var values are visible to anyone with read access on the Lambda’s config. Instead, Lambdas should use the Parameters and Secrets Lambda Extension to read the token from SSM Parameter Store on the first invocation. The utility function below does that fetch; the layer itself is declared in infra/lambda.tf.

61
def _fetch_logfire_token() -> str | None:
62
    """Read the Logfire token from the Parameters and Secrets extension.
63

64
    Returns None when no token parameter is configured, so the function
65
    runs fine without the Logfire integration. The extension caches the
66
    value across invocations, so this is cheap on warm starts.
67
    """
68
    parameter_name = os.environ.get("LOGFIRE_TOKEN_PARAMETER")
69
    if not parameter_name:
70
        return None
71

72
    session_token = os.environ["AWS_SESSION_TOKEN"]
73

74
    # safe="" forces urllib to percent-encode the leading and embedded
75
    # slashes in a hierarchical parameter name (e.g. /a/b/c -> %2Fa%2Fb%2Fc).
76
    # The Parameters and Secrets extension rejects unencoded slashes with
77
    # HTTP 400; AWS' own Python sample shows the same %2F-encoded form.
78
    url = (
79
        "http://localhost:2773/systemsmanager/parameters/get"
80
        f"?name={urllib.parse.quote(parameter_name, safe='')}&withDecryption=true"
81
    )
82
    req = urllib.request.Request(
83
        url,
84
        headers={"X-Aws-Parameters-Secrets-Token": session_token},
85
    )
86
    with urllib.request.urlopen(req, timeout=2) as resp:
87
        payload = json.load(resp)
88
    return payload["Parameter"]["Value"]

During the first invocation we fetch the token and instrument Logfire. This can’t happen at module load time, because the Parameters and Secrets Lambda Extension is not available during Lambda’s INIT phase (when the module is imported).

172
@cache
173
def _init_logfire() -> None:
174
    """Wire Logfire and the audit span processor once per warm
175
    container, on the first INVOKE.
176

177
    The Parameters and Secrets extension is not ready to serve traffic
178
    during the Lambda INIT phase, so the token fetch (and the matching
179
    logfire setup) cannot run at module import time. @cache memoises
180
    on the empty argument tuple, so this runs exactly once per
181
    container and is a no-op on every subsequent invocation.
182
    """
183
    token = _fetch_logfire_token()
184
    if token:
185
        os.environ["LOGFIRE_TOKEN"] = token
186
    # head=1.0 and tail=None are today's Logfire defaults; pinned here
187
    # because this is an audit pipeline, so every trace must reach S3.
188
    # Volume is low (one trace per Lambda invocation) and the audit
189
    # requirement outweighs Logfire ingest cost. Splitting the rates
190
    # (e.g. 1% to Logfire, 100% to S3) is possible with a small extra
191
    # sampler; see the post.
192
    logfire.configure(
193
        send_to_logfire="if-token-present",
194
        sampling=SamplingOptions(head=1.0, tail=None),
195
        additional_span_processors=[PerTraceAuditProcessor(_ship_trace)],
196
    )
197
    # include_content=True is the pydantic-ai default; pinned because the
198
    # audit copy needs the actual prompts, tool args, and responses to be
199
    # useful for after-the-fact forensics. If that ever becomes a
200
    # compliance problem (PII, secrets in prompts), flip to False and
201
    # accept a metadata-only audit trail.
202
    logfire.instrument_pydantic_ai(version=5, include_content=True)

The per-trace audit processor

OTel has no OnTraceComplete hook; the only signals are on_start and on_end per span. The processor implements the missing primitive against span.parent is None (root) and a trace_id-keyed buffer - Logfire’s tail sampler buffers spans by trace_id the same way internally. We do not use SimpleSpanProcessor + InMemorySpanExporter because the buffer would be module-scoped and might leak across invocations on a warm Lambda container.

95
class PerTraceAuditProcessor(SpanProcessor):
96
    """Buffer spans by trace_id, ship as one batch when the root ends.
97

98
    The OTel SDK has no `OnTraceComplete` hook, so this implements it
99
    against the only signal available: `on_end` fires synchronously and
100
    `span.parent is None` on a root. Late children (spans ended on a
101
    transport thread after the root has already shipped) are dropped,
102
    mirroring logfire's tail sampler. See pydantic/logfire#1034.
103
    """
104

105
    def __init__(
106
        self,
107
        on_trace_complete: Callable[[Sequence[ReadableSpan]], None],
108
    ) -> None:
109
        self._on_trace_complete = on_trace_complete
110
        self._buffers: dict[int, list[ReadableSpan]] = {}
111
        self._shipped: set[int] = set()
112
        self._lock = threading.Lock()
113

114
    def on_end(self, span: ReadableSpan) -> None:
115
        if not (span.context and span.context.trace_flags.sampled):
116
            return
117
        trace_id = span.context.trace_id
118
        with self._lock:
119
            if trace_id in self._shipped:
120
                return
121
            self._buffers.setdefault(trace_id, []).append(span)
122
            if span.parent is not None:
123
                return
124
            spans = self._buffers.pop(trace_id)
125
            self._shipped.add(trace_id)
126
        self._ship(spans)
127

128
    def force_flush(self, timeout_millis: int = 30000) -> bool:
129
        with self._lock:
130
            pending = list(self._buffers.values())
131
            self._shipped.update(self._buffers)
132
            self._buffers.clear()
133
        for spans in pending:
134
            self._ship(spans)
135
        return True
136

137
    def shutdown(self) -> None:
138
        self.force_flush()
139

140
    def _ship(self, spans: Sequence[ReadableSpan]) -> None:
141
        # Suppress instrumentation around the callback so an instrumented
142
        # boto3/requests client inside it does not emit a span that
143
        # re-enters on_end for a sibling trace.
144
        token = attach(set_value(_SUPPRESS_INSTRUMENTATION_KEY, True))
145
        try:
146
            self._on_trace_complete(spans)
147
        finally:
148
            detach(token)

Shipping one trace per Firehose record

Shipping the trace is one put_record call on Firehose.

155
_firehose = boto3.client("firehose")
156
_DELIVERY_STREAM = os.environ["FIREHOSE_DELIVERY_STREAM"]
157

158

159
def _ship_trace(spans: Sequence[ReadableSpan]) -> None:
160
    """Serialise one trace as OTLP-JSON and ship it as a single Firehose record."""
161
    payload = json_format.MessageToJson(encode_spans(spans), indent=None) + "\n"
162
    _firehose.put_record(
163
        DeliveryStreamName=_DELIVERY_STREAM,
164
        Record={"Data": payload.encode("utf-8")},
165
    )

Constructing the Bedrock model

The model config is simpler than in post 1: we use the Lambda execution role directly, no assume_role like chat.py.

209
_INFERENCE_PROFILE_ARN = os.environ["BEDROCK_INFERENCE_PROFILE_ARN"]
210
_MODEL_ID = os.environ["BEDROCK_MODEL_ID"]
211

212

213
def _build_model() -> BedrockConverseModel:
214
    return BedrockConverseModel(
215
        _MODEL_ID,
216
        settings={"bedrock_inference_profile": _INFERENCE_PROFILE_ARN},
217
    )

The handler entry

The handler does three things: ensure Logfire is initialised, pull the prompt off the event, and call agent.run_sync. Everything else (audit shipping, X-Ray tracing, error surfacing) runs through the per-trace processor and the Logfire SDK - there is no try/except in the handler; failures propagate as Lambda 5xx, and the audit copy is shipped from the processor’s on_end at root close, before the exception unwinds.

233
class HandlerEvent(TypedDict):
234
    prompt: NotRequired[str]
235

236

237
class HandlerResponse(TypedDict):
238
    status: str
239
    output: str
240

241

242
def handler(event: HandlerEvent, context: object) -> HandlerResponse:
243
    """Lambda entry point.
244

245
    Falls back to a default prompt so the function can be smoke-tested
246
    with an empty payload.
247

248
    The audit copy ships from inside PerTraceAuditProcessor.on_end when
249
    the agent's root span closes, so the handler does not need a finally
250
    block: a Firehose failure raises on the same thread as agent.run_sync
251
    and propagates as a Lambda 5xx. A failed agent run also closes its
252
    root span (with status=ERROR) before the exception unwinds, so the
253
    partial trace still ships.
254
    """
255
    _init_logfire()
256
    prompt = event.get("prompt", "Say hello.")
257
    result = agent.run_sync(prompt)
258
    return {
259
        "status": "ok",
260
        "output": str(result.output),
261
    }

The full file:

1
"""AWS Lambda handler for the terraform-pr-agent.
2

3
First invocation:
4
- Reads the Logfire token from SSM via the Parameters and Secrets Lambda
5
  Extension at http://localhost:2773 when LOGFIRE_TOKEN_PARAMETER is set
6
  in the env; sets LOGFIRE_TOKEN so the logfire SDK picks it up.
7
- Configures logfire with send_to_logfire="if-token-present" and registers
8
  PerTraceAuditProcessor, a custom OTel SpanProcessor that buffers spans
9
  by trace_id and ships one OTLP-JSON Firehose record when the trace's
10
  root span ends.
11
- Instruments pydantic-ai with version=5 for the spec-compliant span
12
  names (invoke_agent, chat, execute_tool) and current GenAI semantic
13
  conventions. Pinned explicitly so the schema readers see in the
14
  audit copy stays stable across pydantic-ai releases.
15

16
The extension's HTTP server rejects requests during INIT with a
17
"not ready to serve traffic" 400, so this work runs on the first
18
INVOKE and is memoised with @cache for subsequent warm invocations.
19

20
Handler:
21
- Reads `prompt` from the invocation event.
22
- Runs the agent synchronously and returns the structured output.
23

24
The audit copy lands in Firehose from inside the processor's on_end
25
when the agent root span closes, so the handler has no flush logic.
26
A Firehose-side failure raises on the same thread as agent.run_sync
27
and propagates as a Lambda 5xx; the system-of-record copy is never
28
silently dropped.
29
"""
30

31
from __future__ import annotations
32

33
import json
34
import os
35
import threading
36
import urllib.parse
37
import urllib.request
38
from collections.abc import Callable, Sequence
39
from functools import cache
40
from typing import NotRequired, TypedDict
41

42
import boto3
43
import logfire
44
from google.protobuf import json_format
45
from logfire.sampling import SamplingOptions
46
from opentelemetry.context import (
47
    _SUPPRESS_INSTRUMENTATION_KEY,
48
    attach,
49
    detach,
50
    set_value,
51
)
52
from opentelemetry.exporter.otlp.proto.common._internal.trace_encoder import (
53
    encode_spans,
54
)
55
from opentelemetry.sdk.trace import ReadableSpan, SpanProcessor
56
from pydantic_ai import Agent
57
from pydantic_ai.models.bedrock import BedrockConverseModel
58

59

60
def _fetch_logfire_token() -> str | None:
61
    """Read the Logfire token from the Parameters and Secrets extension.
62

63
    Returns None when no token parameter is configured, so the function
64
    runs fine without the Logfire integration. The extension caches the
65
    value across invocations, so this is cheap on warm starts.
66
    """
67
    parameter_name = os.environ.get("LOGFIRE_TOKEN_PARAMETER")
68
    if not parameter_name:
69
        return None
70

71
    session_token = os.environ["AWS_SESSION_TOKEN"]
72

73
    # safe="" forces urllib to percent-encode the leading and embedded
74
    # slashes in a hierarchical parameter name (e.g. /a/b/c -> %2Fa%2Fb%2Fc).
75
    # The Parameters and Secrets extension rejects unencoded slashes with
76
    # HTTP 400; AWS' own Python sample shows the same %2F-encoded form.
77
    url = (
78
        "http://localhost:2773/systemsmanager/parameters/get"
79
        f"?name={urllib.parse.quote(parameter_name, safe='')}&withDecryption=true"
80
    )
81
    req = urllib.request.Request(
82
        url,
83
        headers={"X-Aws-Parameters-Secrets-Token": session_token},
84
    )
85
    with urllib.request.urlopen(req, timeout=2) as resp:
86
        payload = json.load(resp)
87
    return payload["Parameter"]["Value"]
88

89

90

91

92
class PerTraceAuditProcessor(SpanProcessor):
93
    """Buffer spans by trace_id, ship as one batch when the root ends.
94

95
    The OTel SDK has no `OnTraceComplete` hook, so this implements it
96
    against the only signal available: `on_end` fires synchronously and
97
    `span.parent is None` on a root. Late children (spans ended on a
98
    transport thread after the root has already shipped) are dropped,
99
    mirroring logfire's tail sampler. See pydantic/logfire#1034.
100
    """
101

102
    def __init__(
103
        self,
104
        on_trace_complete: Callable[[Sequence[ReadableSpan]], None],
105
    ) -> None:
106
        self._on_trace_complete = on_trace_complete
107
        self._buffers: dict[int, list[ReadableSpan]] = {}
108
        self._shipped: set[int] = set()
109
        self._lock = threading.Lock()
110

111
    def on_end(self, span: ReadableSpan) -> None:
112
        if not (span.context and span.context.trace_flags.sampled):
113
            return
114
        trace_id = span.context.trace_id
115
        with self._lock:
116
            if trace_id in self._shipped:
117
                return
118
            self._buffers.setdefault(trace_id, []).append(span)
119
            if span.parent is not None:
120
                return
121
            spans = self._buffers.pop(trace_id)
122
            self._shipped.add(trace_id)
123
        self._ship(spans)
124

125
    def force_flush(self, timeout_millis: int = 30000) -> bool:
126
        with self._lock:
127
            pending = list(self._buffers.values())
128
            self._shipped.update(self._buffers)
129
            self._buffers.clear()
130
        for spans in pending:
131
            self._ship(spans)
132
        return True
133

134
    def shutdown(self) -> None:
135
        self.force_flush()
136

137
    def _ship(self, spans: Sequence[ReadableSpan]) -> None:
138
        # Suppress instrumentation around the callback so an instrumented
139
        # boto3/requests client inside it does not emit a span that
140
        # re-enters on_end for a sibling trace.
141
        token = attach(set_value(_SUPPRESS_INSTRUMENTATION_KEY, True))
142
        try:
143
            self._on_trace_complete(spans)
144
        finally:
145
            detach(token)
146

147

148

149

150
_firehose = boto3.client("firehose")
151
_DELIVERY_STREAM = os.environ["FIREHOSE_DELIVERY_STREAM"]
152

153

154
def _ship_trace(spans: Sequence[ReadableSpan]) -> None:
155
    """Serialise one trace as OTLP-JSON and ship it as a single Firehose record."""
156
    payload = json_format.MessageToJson(encode_spans(spans), indent=None) + "\n"
157
    _firehose.put_record(
158
        DeliveryStreamName=_DELIVERY_STREAM,
159
        Record={"Data": payload.encode("utf-8")},
160
    )
161

162

163

164

165
@cache
166
def _init_logfire() -> None:
167
    """Wire Logfire and the audit span processor once per warm
168
    container, on the first INVOKE.
169

170
    The Parameters and Secrets extension is not ready to serve traffic
171
    during the Lambda INIT phase, so the token fetch (and the matching
172
    logfire setup) cannot run at module import time. @cache memoises
173
    on the empty argument tuple, so this runs exactly once per
174
    container and is a no-op on every subsequent invocation.
175
    """
176
    token = _fetch_logfire_token()
177
    if token:
178
        os.environ["LOGFIRE_TOKEN"] = token
179
    # head=1.0 and tail=None are today's Logfire defaults; pinned here
180
    # because this is an audit pipeline, so every trace must reach S3.
181
    # Volume is low (one trace per Lambda invocation) and the audit
182
    # requirement outweighs Logfire ingest cost. Splitting the rates
183
    # (e.g. 1% to Logfire, 100% to S3) is possible with a small extra
184
    # sampler; see the post.
185
    logfire.configure(
186
        send_to_logfire="if-token-present",
187
        sampling=SamplingOptions(head=1.0, tail=None),
188
        additional_span_processors=[PerTraceAuditProcessor(_ship_trace)],
189
    )
190
    # include_content=True is the pydantic-ai default; pinned because the
191
    # audit copy needs the actual prompts, tool args, and responses to be
192
    # useful for after-the-fact forensics. If that ever becomes a
193
    # compliance problem (PII, secrets in prompts), flip to False and
194
    # accept a metadata-only audit trail.
195
    logfire.instrument_pydantic_ai(version=5, include_content=True)
196

197

198

199

200
_INFERENCE_PROFILE_ARN = os.environ["BEDROCK_INFERENCE_PROFILE_ARN"]
201
_MODEL_ID = os.environ["BEDROCK_MODEL_ID"]
202

203

204
def _build_model() -> BedrockConverseModel:
205
    return BedrockConverseModel(
206
        _MODEL_ID,
207
        settings={"bedrock_inference_profile": _INFERENCE_PROFILE_ARN},
208
    )
209

210

211

212

213
SYSTEM_PROMPT = (
214
    "You are the terraform-pr-agent. For now you are a placeholder; "
215
    "respond briefly to whatever prompt you are given."
216
)
217

218

219
agent = Agent(_build_model(), system_prompt=SYSTEM_PROMPT)
220

221

222
class HandlerEvent(TypedDict):
223
    prompt: NotRequired[str]
224

225

226
class HandlerResponse(TypedDict):
227
    status: str
228
    output: str
229

230

231
def handler(event: HandlerEvent, context: object) -> HandlerResponse:
232
    """Lambda entry point.
233

234
    Falls back to a default prompt so the function can be smoke-tested
235
    with an empty payload.
236

237
    The audit copy ships from inside PerTraceAuditProcessor.on_end when
238
    the agent's root span closes, so the handler does not need a finally
239
    block: a Firehose failure raises on the same thread as agent.run_sync
240
    and propagates as a Lambda 5xx. A failed agent run also closes its
241
    root span (with status=ERROR) before the exception unwinds, so the
242
    partial trace still ships.
243
    """
244
    _init_logfire()
245
    prompt = event.get("prompt", "Say hello.")
246
    result = agent.run_sync(prompt)
247
    return {
248
        "status": "ok",
249
        "output": str(result.output),
250
    }

Querying traces

DuckDB on your laptop

The audit copy is GZIP’d OTLP-JSON under a Hive-partitioned prefix. Anything that reads NDJSON over S3 can query it; for a single-laptop workflow the lightest option is DuckDB. Install it once (brew install duckdb on macOS; see the install guide for other platforms).

The view reads the bucket name from AUDIT_BUCKET in your shell (via DuckDB’s CLI-only getenv()), so park that alongside AWS_PROFILE in .envrc.local and let direnv export both on every cd:

43
# Bucket name for the audit copy. The DuckDB view in scripts/traces.sql
44
# reads it via the CLI-only getenv() function so the SQL stays free of
45
# account-specific values.
46
export AUDIT_BUCKET=terraform-pr-agent-audit-$(aws sts get-caller-identity --query Account --output text)-${AWS_REGION}

Then opening the local database file is a one-liner:

1
duckdb traces.duckdb

First session only, create a persistent S3 secret and a view that flattens the OTLP shape into the columns you actually want. Paste this in (or .read it from a saved file):

1
CREATE PERSISTENT SECRET (
2
    TYPE s3,
3
    PROVIDER credential_chain,
4
    REFRESH auto
5
);
6
SET VARIABLE audit_bucket = getenv('AUDIT_BUCKET');
7

8
-- hive_partitioning = true reads year=YYYY/month=MM/day=DD/ from the
9
-- object path as virtual columns, so the partition predicate in a
10
-- query below prunes objects before any file is opened.
11
CREATE OR REPLACE VIEW traces AS
12
WITH spans AS (
13
    -- Flatten the OTLP-JSON envelope into one row per span, with the
14
    -- common span fields lifted out as named columns so downstream
15
    -- CTEs and ad-hoc queries can work against `name`, `trace_id`,
16
    -- `dur_ms`, etc. without re-doing the struct navigation each time.
17
    SELECT
18
        year, month, day,
19
        span.name                                                                AS name,
20
        lower(hex(from_base64(span.traceId::VARCHAR)))                           AS trace_id,
21
        lower(hex(from_base64(span.spanId::VARCHAR)))                            AS span_id,
22
        lower(hex(from_base64(span.parentSpanId::VARCHAR)))                      AS parent_span_id,
23
        make_timestamp_ns(span.startTimeUnixNano::BIGINT)                        AS started,
24
        make_timestamp_ns(span.endTimeUnixNano::BIGINT)                          AS ended,
25
        (span.endTimeUnixNano::BIGINT - span.startTimeUnixNano::BIGINT) / 1e6    AS dur_ms,
26
        span.status.code::VARCHAR                                                AS status_code,
27
        span.attributes                                                          AS attributes,
28
        data.filename                                                            AS source_file,
29
        -- Firehose names objects <stream>-<ver>-<YYYY-MM-DD-HH-MM-SS>-<uuid>.gz;
30
        -- stripping the trailing -<uuid>.gz collapses rows from the same flush
31
        -- batch onto a stable key for grouping.
32
        regexp_replace(
33
            split_part(data.filename, '/', -1),
34
            '-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\.gz$',
35
            ''
36
        )                                                                        AS batch_key
37
    FROM read_ndjson(
38
        's3://' || getvariable('audit_bucket') || '/traces/**/*.gz',
39
        compression = 'gzip', hive_partitioning = true, filename = true) AS data
40
       , UNNEST(data.resourceSpans) AS u1(rs)
41
       , UNNEST(rs.scopeSpans)      AS u2(ss)
42
       , UNNEST(ss.spans)           AS u3(span)
43
),
44
roots AS (
45
    -- One row per trace: the invoke_agent root span pydantic-ai emits per run.
46
    SELECT * FROM spans
47
    WHERE parent_span_id IS NULL OR parent_span_id = ''
48
),
49
chats AS (
50
    -- Per-trace summary of the LLM call: pulls the GenAI semantic
51
    -- convention attributes off the chat span and exposes each as a
52
    -- named column.
53
    SELECT
54
        trace_id,
55
        list_filter(attributes, x -> x.key = 'gen_ai.system')[1].value.stringValue                     AS gen_ai_system,
56
        list_filter(attributes, x -> x.key = 'gen_ai.operation.name')[1].value.stringValue             AS operation,
57
        list_filter(attributes, x -> x.key = 'gen_ai.request.model')[1].value.stringValue              AS request_model,
58
        list_filter(attributes, x -> x.key = 'gen_ai.response.model')[1].value.stringValue             AS response_model,
59
        list_filter(attributes, x -> x.key = 'gen_ai.usage.input_tokens')[1].value.intValue::BIGINT    AS in_tokens,
60
        list_filter(attributes, x -> x.key = 'gen_ai.usage.output_tokens')[1].value.intValue::BIGINT   AS out_tokens,
61
        list_filter(attributes, x -> x.key = 'gen_ai.response.finish_reasons')[1]
62
            .value.arrayValue.values[1].stringValue                                                    AS finish,
63
        list_filter(attributes, x -> x.key = 'gen_ai.conversation.id')[1].value.stringValue            AS conversation_id,
64
        list_filter(attributes, x -> x.key = 'gen_ai.agent.name')[1].value.stringValue                 AS agent_name,
65
        list_filter(attributes, x -> x.key = 'gen_ai.agent.call.id')[1].value.stringValue              AS agent_call_id,
66
        list_filter(attributes, x -> x.key = 'gen_ai.input.messages')[1].value.stringValue             AS input_messages,
67
        list_filter(attributes, x -> x.key = 'gen_ai.output.messages')[1].value.stringValue            AS output_messages
68
    FROM spans
69
    WHERE name LIKE 'chat %'
70
)
71
SELECT
72
    roots.started,
73
    roots.year, roots.month, roots.day,
74
    roots.trace_id,
75
    substr(roots.trace_id, 1, 8)                       AS trace,
76
    roots.batch_key,
77
    roots.source_file,
78
    roots.dur_ms,
79
    regexp_extract(chats.request_model, '[^./]+$')     AS model,
80
    chats.in_tokens,
81
    chats.out_tokens,
82
    chats.finish,
83
    chats.agent_name,
84
    chats.conversation_id,
85
    -- Convenience columns for the common single-turn shape: system at
86
    -- input[0], user at input[1], assistant at output[0]. Multi-turn
87
    -- runs invalidate the indices, so reach for input_messages and
88
    -- output_messages directly for those.
89
    json_extract_string(chats.input_messages,  '$[0].parts[0].content')  AS system_prompt,
90
    json_extract_string(chats.input_messages,  '$[1].parts[0].content')  AS user_prompt,
91
    json_extract_string(chats.output_messages, '$[0].parts[0].content')  AS assistant_response,
92
    chats.input_messages,
93
    chats.output_messages,
94
    -- Per the OTel spec, instrumentation libraries leave status unset
95
    -- on success (only application code may set it to Ok). Every OTel
96
    -- backend treats unset as "no error reported"; we render the same.
97
    CASE roots.status_code
98
        WHEN 'STATUS_CODE_ERROR' THEN 'err'
99
        ELSE 'ok'
100
    END                                                AS status
101
FROM roots
102
JOIN chats USING (trace_id);

After that, every later duckdb traces.duckdb session lands you in a shell where the view is already there. One caveat: the view body references getvariable('audit_bucket'), and SET VARIABLE is session-scoped, so a fresh session needs the variable re-set before any SELECT against traces will run. The simplest fix is to .read scripts/traces.sql again at the top of each session - CREATE OR REPLACE VIEW is idempotent, and the same script sets the variable. Hive partitioning means year, month, and day are real columns DuckDB prunes on before opening any object:

2
SELECT *
3
FROM traces
4
WHERE year = 2026
5
  AND month = 6
6
  AND day = 8
7
ORDER BY started DESC;
8

9
-- Example output (narrowed to 7 of the view's 18 columns; SELECT * also
10
-- returns conversation_id, system_prompt, user_prompt, assistant_response,
11
-- input_messages, output_messages, status, batch_key, source_file,
12
-- year/month/day):
13
--
14
-- ┌───────────────────────────────┬──────────┬────────────────────────────────┬─────────────┬───────────┬────────────┬────────┐
15
-- │            started            │  trace   │             model              │   dur_ms    │ in_tokens │ out_tokens │ finish │
16
-- ├───────────────────────────────┼──────────┼────────────────────────────────┼─────────────┼───────────┼────────────┼────────┤
17
-- │ 2026-06-08 20:31:06.044709547 │ 019ea8ee │ claude-haiku-4-5-20251001-v1:0 │ 6815.733257 │ 36        │ 59         │ stop   │
18
-- │ 2026-06-08 20:31:05.984812728 │ 019ea8ee │ claude-haiku-4-5-20251001-v1:0 │ 1459.186377 │ 36        │ 38         │ stop   │
19
-- │ 2026-06-08 20:31:05.964196348 │ 019ea8ee │ claude-haiku-4-5-20251001-v1:0 │ 2024.133412 │ 36        │ 123        │ stop   │
20
-- │ 2026-06-08 20:31:05.944446527 │ 019ea8ee │ claude-haiku-4-5-20251001-v1:0 │ 1581.984687 │ 36        │ 50         │ stop   │
21
-- │ 2026-06-08 20:31:05.926721468 │ 019ea8ee │ claude-haiku-4-5-20251001-v1:0 │ 1480.496671 │ 36        │ 43         │ stop   │
22
-- └───────────────────────────────┴──────────┴────────────────────────────────┴─────────────┴───────────┴────────────┴────────┘

26
SELECT day,
27
       count(*) AS runs
28
FROM traces
29
WHERE year = 2026
30
  AND month = 6
31
GROUP BY day
32
ORDER BY day;
33

34
-- Example output:
35
--
36
-- ┌─────┬──────┐
37
-- │ day │ runs │
38
-- ├─────┼──────┤
39
-- │ 04  │ 2    │
40
-- │ 08  │ 11   │
41
-- │ 09  │ 1    │
42
-- └─────┴──────┘

46
SELECT day,
47
       sum(in_tokens)  AS input_tokens,
48
       sum(out_tokens) AS output_tokens
49
FROM traces
50
WHERE year = 2026
51
GROUP BY day
52
ORDER BY day;
53

54
-- Example output:
55
--
56
-- ┌─────┬──────────────┬───────────────┐
57
-- │ day │ input_tokens │ output_tokens │
58
-- ├─────┼──────────────┼───────────────┤
59
-- │ 04  │ 80           │ 43            │
60
-- │ 08  │ 397          │ 605           │
61
-- │ 09  │ 38           │ 145           │
62
-- └─────┴──────────────┴───────────────┘

The view is the interface; everything downstream is normal SQL against a traces table that happens to live in S3. The partition predicates are the same shape the Athena section below uses.

The local traces.duckdb only holds the view DDL and the cached S3 secret. Trace data is streamed from the bucket on every SELECT via httpfs, not copied locally.

Athena/Snowflake

DuckDB is the simplest choice for the post, but every query pulls bytes to your laptop. Because the bucket is Hive-partitioned, the same query shape works in Athena or Snowflake with no schema rebuild.

Reference:

AWS docs: Athena partition projection for Firehose data. CREATE EXTERNAL TABLE with projection.* properties and a storage.location.template infers partitions from the year=YYYY/month=MM/day=DD/hour=HH/ path at query time, so there is no Glue crawler to schedule or partitions to register as new objects land.
Snowflake docs: creating an S3 stage. Create an external stage defining JSON format.

The same trace in Logfire

The same data as above can be viewed independently in the Logfire SaaS platform if you signed up and created a token. You can see in the screenshot that System Prompt, User Prompt, and Output of the agent have been captured.

Logfire trace UI showing one terraform-pr-agent invocation: the invoke_agent root span with a nested chat span on claude-haiku-4-5, gen_ai.input.messages and gen_ai.output.messages attributes expanded.

End state

Every agent run lands in S3 as an immutable OTel trace, with full input/output messages, tool calls, retries, and token usage captured. Athena queries the audit copy when you need to answer compliance questions; Logfire renders the same trace live when you need to debug. The next four posts build retries, conventions, evals, and the PR flow on top of these traces; everything they do shows up in the trace tree without further audit plumbing.

Coming next: workspace and small toolkit for the agent to get to work.