Skip to main content

From Chaos to Clarity: Starting with Structured Logging

36 min read

Transform your Python logs from unreadable text dumps into powerful, queryable data streams. Perfect for developers ready to level up their debugging game with structlog.

S
Staff Machine Learning Engineer

Table of Contents


About Code Examples

Quick Setup: Examples assume you have structlog installed:

bash
pip install structlog

Variables: Examples use placeholder variables (user_id, error, etc.) - replace with your actual data.

Full Examples: Find complete, runnable code at the end of this guide.

Prerequisites

Required:

  • Python basics (functions, dictionaries, try/except)
  • How to install packages with pip
  • Running Python scripts from command line

Helpful but Not Required (we explain these):

  • SQL basics - We provide all queries with explanations
  • Command-line tools (grep, curl) - All commands are annotated
  • Environment variables - Shown with examples
  • Web frameworks - Only for optional integration sections

The call comes at 3 AM. Production is down. You SSH into the server, and the first thing you see makes your heart sink:

ERROR: Failed to process
ERROR: Failed to process
ERROR: Failed to process
WARNING: Something went wrong
ERROR: Failed to process

Fifty gigabytes of logs (roughly 500 million lines of text - about 100 copies of Wikipedia). No context. No correlation IDs. Just "Failed to process" repeated thousands of times.

You start with grep:

bash
grep -i error app.log | tail -1000
#    └─ -i: Case-insensitive search (matches ERROR, Error, error, etc.)
#       app.log: The log file to search
#       | tail -1000: Show only the last 1000 matching lines

Same useless message repeated. You try to narrow it down by timestamp, but there are twelve microservices all logging to different files. Some use UTC, others use local time - making it impossible to correlate events across services.

Three hours in, you're building regex patterns that would make a Perl developer weep:

bash
# Search for error patterns in all log files
grep -E "(ERROR|FAIL|Exception).*2024-01-15T0[0-3]:.*payment" *.log | \
#    └─ -E: Extended regex pattern matching
#       (ERROR|FAIL|Exception): Find any of these three words
#       .*: Match any characters (the actual error message)
#       2024-01-15T0[0-3]: Match timestamps between midnight and 3:59 AM
#       .*payment: Match any text containing "payment"
#       *.log: Search in all files ending with .log
  awk '{print $1, $2, $NF}' | \
#  └─ Extract specific fields from each matching line
#     $1: First field (usually timestamp)
#     $2: Second field (usually log level)
#     $NF: Last field (NF = Number of Fields, so $NF = the last field)
  sort | uniq -c
#  └─ sort: Arrange output alphabetically
#     uniq -c: Count how many times each unique line appears

This command attempts to find payment-related errors between midnight and 4 AM, extract key fields, and count occurrences. The complexity shows why text-based log analysis doesn't scale. Finally, buried in service number seven, you spot it: timeout errors that started exactly when the third-party payment API changed their timeout from 10 seconds to 30 seconds without telling anyone.

This scenario represents a systemic failure in traditional logging approaches. When important debugging relies on manual text parsing at scale, the cost--measured in both downtime and engineer burnout--becomes unsustainable.

The financial impact of these delays adds up quickly (see detailed cost analysis below). I learned this lesson the hard way during a payment processing outage where we lost $180,000 in just two hours--not from the bug itself, but from the time it took to find it.

There's a much better approach. Here's the same scenario with structured logging:

json
{
  "event": "payment_processing_failed",
  "service": "payment-gateway",
  "error_type": "timeout",
  "api_endpoint": "https://api.stripe.com/v1/charges",
  "timeout_ms": 30000,
  "expected_timeout_ms": 10000,
  "user_id": "usr_123",
  "amount": 299.99,
  "currency": "USD",
  "correlation_id": "req_abc123",
  "timestamp": "2024-01-15T03:23:45.123Z"
}

With structured logs, you'd run one query:

sql
SELECT * FROM logs 
-- SELECT *: Retrieve all columns/fields from matching records
WHERE error_type = 'timeout' 
-- WHERE: Filter records based on conditions
  AND service = 'payment-gateway'
-- AND: Both conditions must be true
  AND timestamp > '2024-01-15T00:00:00'
-- >: Greater than comparison - finds logs after this date/time

Root cause found in under a minute. Not four hours.

What This Guide Covers

This guide shows how to implement structured logging in Python, from basic concepts to production use.

What you'll learn:

  • Why traditional logging fails at scale (with real numbers) and the shift to structured events
  • How to use structlog effectively (and why it beats Loguru for production)
  • 5-minute quickstart plus production-ready configurations that scale
  • Integration patterns for FastAPI, Flask, Celery, and more
  • Performance optimization for high-throughput logging and correlation across distributed services
  • Real migration stories and lessons from when things went wrong

Quick Start: See the Difference

Python 3.6 Compatibility

If you're stuck on Python 3.6, use structlog 20.1.0 - the last version with 3.6 support. But seriously, upgrade Python. 3.6 hit end-of-life in December 2021.

Before we look at the problems, let's see structured logging in action:

python
import logging

# Sample data for this example
user_id = 12345
error = "Connection timeout"

# Traditional approach: Everything is baked into a string
logging.warning(f"User {user_id} payment failed: {error}")

# Output:
# WARNING:root:User 12345 payment failed: Connection timeout

# To find all payment failures, you need regex gymnastics:
# grep "payment failed" app.log | grep "timeout" | wc -l
# If someone changes "failed" to "failure", your query breaks entirely

The problem: This log line mixes data (user_id, error) with text ("payment failed"). To extract any specific information, you need to parse the entire string with regex - and hope the format never changes.

Understanding Logging: A Foundation for Excellence

Logging creates an audit trail through your application's execution path. When failures occur--and they will--these recorded events help you reconstruct what happened. Think of it as a flight recorder for your code. I've come to think of good logging as insurance: you hope you never need it, but when things go wrong, you'll be incredibly grateful it's there.

Debugging approaches evolve naturally as systems mature:

python
# The universal starting point
def process_order(order_id, amount):
    print(f"Processing order {order_id}")
    print(f"Amount: ${amount}")
    # ... code ...
    print("Order processed successfully!")

# Problems:
# - No severity levels
# - No timestamps
# - Hard to filter
# - Clutters stdout

Why We Need Logs

Logs serve multiple critical purposes:

  • Debug failures when your Discord bot crashes at 2 AM
  • Identify performance bottlenecks when response times spike
  • Answer business questions ("Why did order processing drop 50% yesterday?")
  • Provide audit trails for security incidents
  • Show what actually happened when customer support gets that angry email

The Problem We're Solving

Traditional logging treats logs as text meant for humans to read. But humans don't scale. Your eyeballs can't grep through 50GB of logs.

Modern systems need logs that machines can query. You need to search for "all errors for user_id=12345" and get instant results. You need to ask "How many payments failed per hour?" and get an answer with one query, not thirty minutes of grep and awk. When a request flows through your entire system, you need to track it--not guess where it went.

As applications mature, logging complexity grows exponentially--from simple print("User logged in") to complex formatted strings with dozens of variables. This is the unstructured logging paradox: the more useful you try to make your logs, the harder they become to actually use. Every developer formats logs differently, creating technical debt that compounds across microservices.

Text-only logs make debugging harder than it needs to be.

The Crisis: Why Traditional Logging Fails at Scale

Beyond Emergency Response: A Systematic Analysis

The scenario presented in our introduction--with its million-dollar impact--isn't unusual--it's predictable. Traditional logging approaches contain basic flaws that guarantee failure at scale.

Across the industry, certain failure patterns show up consistently. These aren't isolated incidents--they're core problems with traditional logging:

The Invisible Error Pattern

The core issue? Unstructured logs are write-only data - easy to create, nearly impossible to analyze at scale. When you have 10K orders/minute, finding a specific failure requires complex grep chains that take minutes to run. We learned this after adding "helpful" context to our logs for six months straight.

Fundamental Failure Modes

Traditional logging has problems that get worse at scale:

Parsing Brittleness

Here's a real production regular expression (simplified for clarity):

python
# The regex nightmare begins
log_pattern = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s+'
    r'(?P<level>\w+):\s+'
    r'(?:User\s+(?P<user_id>\d+)|user=(?P<user_id2>\d+))\s+'
    r'(?:payment\s+failed|Failed\s+to\s+process\s+payment):\s+'
    r'(?P<reason>.*?)(?:\s+after\s+(?P<duration>\d+)s)?$'
)

# Functions until the log format changes
logger.error(f"Payment failed for user {user_id}: {reason}")  # Original
logger.error(f"User {user_id} payment failure: {reason}")     # New intern's version
# The regex fails silently, returning None

This regex actually failed in production when someone changed "User" to "UserID" in the log format. The regex silently returned None, and our alerting system missed 3 hours of payment errors.

Performance at Scale

The performance hit from unstructured logging gets worse at scale. We measured this in production:

python
# Real benchmark from our production system (100GB daily logs, ~1B lines)
# Running on AWS EC2 m5.xlarge instance (4 vCPUs, 16GB RAM)

# Unstructured approach with grep
$ time grep -i "timeout.*error" /var/log/app/*.log | wc -l
# 43,521 matches found
# real    2m17.332s  (137 seconds - enough time to lose a customer)
# CPU usage: 100% on single core (grep isn't parallelized)

# Structured approach with indexed Elasticsearch
GET /logs/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"error_type": "timeout"}},
        {"range": {"@timestamp": {"gte": "now-1d"}}}
      ]
    }
  }
}
# 43,521 matches found
# Took: 4.7 seconds  (29x faster - the difference between finding the bug now vs. during your lunch break)
# Elasticsearch version: 7.15.2 with 3 nodes, 2TB total storage

Cost of slow debugging at scale: See cost analysis table below for detailed impact.

The difference is architectural: grep performs a linear scan (O(n) complexity), while structured logging systems use indexes (O(log n) complexity). Performance improvements of 10-100x are common.

Context Loss

Traditional logging loses important context through string serialization (converting data objects into plain text):

python
# Traditional logging - important data gets lost
try:
    payment = process_payment(user, amount, card)
    logger.info(f"Payment {payment.id} succeeded for user {user.id}")
    # f-string: Python 3.6+ string formatting. Variables in {} are evaluated
except PaymentError as e:
    logger.error(f"Payment failed for user {user.id}: {str(e)}")
    # Missing: subscription tier, country, card type, error codes, retry info

String serialization means converting complex objects (like user, payment, error) into simple text using f-strings or .format(). This process discards the object's rich data structure, keeping only what you explicitly include in the string template.

You find yourself asking questions your logs can't answer. Which premium users had payment failures? Are European transactions failing more often? How many retries before this error? The data was there when the code ran, but string formatting threw it away.

python
# Structured logging keeps everything
try:
    payment = process_payment(user, amount, card)
    log.info("payment_processed",
        payment_id=payment.id,
        user_id=user.id,
        user_tier=user.subscription_tier,
        user_country=user.country,
        amount=amount,
        currency=payment.currency,
        card_type=card.type,
        processing_time_ms=payment.duration_ms
    )
except PaymentError as e:
    log.error("payment_failed",
        user_id=user.id,
        user_tier=user.subscription_tier,
        user_country=user.country,
        amount=amount,
        card_type=card.type,
        error_code=e.decline_code,
        error_message=str(e),
        retry_count=e.retry_count,
        gateway_response_time_ms=e.response_time,
        request_id=request.id,
        session_id=session.id
    )

Now this query works: user_tier="premium" AND error_code="insufficient_funds" AND user_country="DE"

We measured the difference:

  • Payment debugging went from 45 minutes to 3 minutes
  • We capture 25-30 fields instead of 3-5
  • Finding related logs is automatic with request_id instead of manual grep

Last month we had a checkout failure that touched 3 services. With traditional logs, an engineer spent 2 hours correlating 6 different log files. After switching to structured logging, the same investigation took 30 seconds with one query: request_id="abc-123" | sort timestamp.

Traditional vs Structured: The Complete Comparison

Here's exactly what changes when you switch to structured logging:

Here's the same error in both formats:

python
# Traditional: A string that needs parsing
"2024-01-15 10:30:45 ERROR [req-123] PaymentService: Failed to process payment for user 12345: Gateway timeout after 30s. Amount: $99.99"

# Structured: Immediately queryable data
{
    "timestamp": "2024-01-15T10:30:45.123Z",
    "level": "ERROR",
    "service": "PaymentService",
    "event": "payment_failed",
    "request_id": "req-123",
    "user_id": 12345,
    "error_type": "gateway_timeout",
    "timeout_seconds": 30,
    "amount": 99.99,
    "currency": "USD",
    "gateway": "stripe",
    "retry_count": 3
}

With structured logging, you can instantly:

  • Find all timeouts over 20 seconds: timeout_seconds > 20
  • Calculate revenue impact: SUM(amount) WHERE event="payment_failed"
  • Track user experience: COUNT(*) WHERE user_id=12345 AND level="ERROR"

The Real Cost of Unstructured Logging

Here's the true cost with real data from production systems:

Real Impact: According to the State of DevOps Report, high-performing teams that invest in observability (including structured logging) achieve up to 9x faster incident resolution than their peers. The time saved adds up--every hour not spent debugging is an hour spent building better products.

These numbers represent the hidden technical debt that accumulates every day you delay implementing structured logging.

The cost isn't just financial--it's measured in developer burnout and lost opportunities.


The Solution: Logs as Queryable Data Streams

The solution addresses these core issues. Instead of treating logs as unstructured text, we treat them as structured data records. This shift from text to data turns logging from a debugging afterthought into a powerful observability tool.

structlog emerges as the optimal balance of capability and maintainability--powerful enough for enterprise use, simple enough for quick productivity.

From Strings to Events: A Different Way of Thinking

The key change involves reconceptualizing "log messages" as "log events":

python
# Traditional approach: Writing messages for human consumption
logger.info(f"User {user_id} logged in successfully from {ip_address}")

# Structured approach: Recording events as data
log.info("user_login", user_id=user_id, ip_address=ip_address, success=True)

The traditional approach yields text. The structured approach generates queryable data:

json
{
  "event": "user_login",
  "user_id": 12345,
  "ip_address": "192.168.1.100",
  "success": true,
  "timestamp": "2024-01-15T10:30:45.123Z",
  "level": "info"
}

Log entries evolve from strings requiring parsing into data structures for direct processing.

Practical Implementation

The following examples show this change, starting simple and getting more advanced.

Example 1: Discord Bot Debugging

Say you have a Discord bot that crashes during specific commands. Here's how structured logging changes the debugging process:

python
# Traditional logging approach
import logging
from discord.ext import commands

bot = commands.Bot(command_prefix='!')

@bot.command()
async def play_music(ctx, url):
    logging.info(f"User {ctx.author} requested {url}")
    try:
        # ... music playing logic ...
        logging.info(f"Playing {url} for {ctx.author}")
    except Exception as e:
        logging.error(f"Failed to play music: {e}")
        
# Real output:
# 2024-01-12 15:23:18 ERROR: Failed to play music: 'NoneType' object has no attribute 'play'
# No guild info, no voice channel state. Took 2 hours to realize bot wasn't connected.

Example 2: Web Authentication Systems

A more complex scenario: authentication failures hitting only premium users in European regions during peak traffic.

With structured logging (as shown in the comparison above), we can capture geo-location, user tier, and user agent as first-class queryable fields. This enables finding region-specific issues instantly instead of grep-ing through millions of lines.

The Real Impact

Structured logging enables answering previously intractable questions:

The transition from regular expressions to direct queries represents more than convenience--it greatly expands analytical capabilities. Questions requiring hours of custom scripting now resolve within seconds.

Example 3: Enterprise Payment Processing

Structured logging demonstrates its full potential when correlating events across distributed services.

Consider this payment processing implementation:

python
# Service A: Payment Gateway
log.info("payment_initiated", 
    transaction_id=txn_id,
    amount=99.99,
    currency="USD",
    trace_id=trace_id  # Consistent ID across all services
)

# Service B: Fraud Detection
log.warning("fraud_check_failed",
    transaction_id=txn_id,
    risk_score=0.89,
    flags=["velocity", "geo_mismatch"],
    trace_id=trace_id  # Consistent identifier
)

# Service C: Notification Service  
log.info("notification_sent",
    transaction_id=txn_id,
    channel="email",
    template="payment_declined",
    trace_id=trace_id  # Consistent identifier
)

With structured logs, you can instantly trace the entire transaction flow:

sql
SELECT * FROM logs 
WHERE trace_id = 'abc-123-def' 
ORDER BY timestamp;

This query reconstructs the complete transaction flow across all services, maintaining temporal ordering.

We started with Discord bot debugging, moved to authentication systems, then distributed payment processing. The same ideas work from small projects to large systems.

Emergent Capabilities

When logs become structured data, new capabilities emerge beyond debugging:

Automatic Dashboards

Instead of parsing logs by hand and updating dashboards when formats change, structured logs enable self-building dashboards from your log fields.

Here's how it works:

python
# This single log line...
log.info("api_request_completed",
    endpoint="/api/v1/users",
    method="GET",
    status_code=200,
    response_time_ms=127,
    user_tier="premium"
)

# Automatically generates dashboards for:
# - API response time by endpoint
# - Error rate by status code
# - Request volume by user tier
# - P95 latency trends (95th percentile - the response time that 95% of requests are faster than)

Proactive Alerting

Quick story: Our old regex-based alerts broke so often we started ignoring them. Classic alert fatigue.

With structured logs? Different game entirely:

yaml
# Alert when premium users experience degraded performance
alert: PremiumUserLatency
expr: |
  histogram_quantile(0.95,
    rate(api_request_duration_ms{user_tier="premium"}[5m])
  ) > 500
# This is Prometheus alerting syntax (PromQL):
# - histogram_quantile(0.95, ...): Calculate 95th percentile
# - rate(...[5m]): Rate of change over 5-minute windows
# - {user_tier="premium"}: Filter to only premium users
# - > 500: Alert if P95 latency exceeds 500ms
for: 5m  # Alert must be true for 5 minutes before firing (prevents flapping)
annotations:
  summary: "Premium users experiencing high latency"
  query: 'event="api_request_completed" AND user_tier="premium" AND response_time_ms>500'

This alert automatically monitors your most valuable users' experience. Without structured logging, you'd need complex regex patterns that break whenever log formats change.

Business Intelligence

Logs become powerful business intelligence resources. Product teams gain autonomous analytical capabilities:

sql
-- What features do enterprise customers use most?
SELECT feature_name, COUNT(*) as usage_count
-- COUNT(*): Counts total number of rows/records
-- as usage_count: Gives the count a readable column name
FROM logs
WHERE event = 'feature_used' 
  AND customer_tier = 'enterprise'
  AND timestamp > NOW() - INTERVAL '30 days'
-- NOW(): Current date/time
-- INTERVAL '30 days': Subtracts 30 days from NOW()
-- Result: Only logs from the last 30 days
GROUP BY feature_name
-- GROUP BY: Combines rows with same feature_name
-- Required when using COUNT() with other columns
ORDER BY usage_count DESC;
-- ORDER BY: Sorts results
-- DESC: Descending order (highest count first)

-- What's our API adoption rate by customer segment?
SELECT 
  customer_segment,
  COUNT(DISTINCT customer_id) as api_users,
  -- COUNT(DISTINCT ...): Counts unique values only
  -- Ensures each customer counted once, even with multiple API calls
  COUNT(DISTINCT customer_id) * 100.0 / total_customers as adoption_rate
  -- * 100.0: Converts to percentage (the .0 ensures decimal math)
FROM logs
WHERE event = 'api_request'
GROUP BY customer_segment;
-- Note: This query assumes 'total_customers' exists as a column
-- In practice, you'd likely JOIN with a customers table

These features work together. Better logging becomes better monitoring, which helps you make better decisions about your systems.


Production Implementation: From Setup to Scale

Moving from learning about structured logging to using it in production has its own challenges. The ideas are simple, but the details matter.

Configuration Strategies

Here are configuration approaches for different needs, from quick setup to enterprise-grade:

python
# minimal_config.py - Foundation configuration
import structlog

# Basic configuration that just works
structlog.configure(
    # Processors: Functions that transform log data in sequence (like a pipeline)
    # Each processor receives the log event dict and returns a modified version
    # Order matters: each processor sees the output of the previous one
    processors=[  # Pipeline docs: https://www.structlog.org/en/stable/processors.html
        # Filter & enrich
        structlog.stdlib.filter_by_level,           # Honor log level settings
        structlog.stdlib.add_logger_name,           # Add module name
        structlog.stdlib.add_log_level,             # Add log level field
        structlog.stdlib.PositionalArgumentsFormatter(), # Handle log.info("user %s", user_id)
        
        # Add metadata
        structlog.processors.TimeStamper(fmt="iso"), # Add timestamp
        structlog.processors.StackInfoRenderer(),     # Add stack info if requested
        structlog.processors.format_exc_info,         # Format exceptions nicely
        structlog.processors.UnicodeDecoder(),        # Handle unicode properly
        
        # Output formatting
        structlog.processors.dict_tracebacks,         # Format tracebacks as dicts
        structlog.dev.ConsoleRenderer()               # Pretty colors for development
    ],
    wrapper_class=structlog.stdlib.BoundLogger,
    logger_factory=structlog.stdlib.LoggerFactory(),
    cache_logger_on_first_use=True,  # Improves performance by ~10x
)

# Usage
log = structlog.get_logger()
log.info("service_started", version="1.0.0", environment="development")

What this gives you:

  • ✅ Timestamps on every log
  • ✅ Log levels (info, warning, error)
  • ✅ Pretty colors in development
  • ✅ Exception formatting
  • ✅ Works with existing Python logging

What you see in the console:

2024-01-15 10:30:45 [info    ] service_started    environment=development version=1.0.0

Implementation Roadmap

Here's a week-by-week plan to get structured logging working in your codebase:

Day 1-2: Foundation Building

  • Day 1: Install structlog (pip install structlog), implement foundation configuration, convert your most painful debug spots
  • Day 2: Convert more logs, get comfortable with the syntax

Goal: Make logs queryable by field--no more text parsing.

Day 3-4: Enhanced Context

Now add the context that actually helps during debugging:

  • Implement request correlation - Just add UUIDs to requests. Nothing fancy.
  • Bind user context - log = log.bind(user_id=current_user.id)
Tip

We added request IDs on day 3 and immediately found a race condition we'd been hunting for weeks. Sometimes you get lucky.

Goal: Trace any request through your entire application.

Day 5-6: Production Hardening

Establish production-grade capabilities:

  • Deploy PII masking processor - Implement automatic sensitive data protection
  • Configure environment-specific formatting - Development uses console rendering, production outputs JSON
  • Construct analytical queries - Example: "Retrieve all errors for specific users within time windows"
  • Set up alerts - Configure alerts for important events (e.g., event="payment_failed" AND amount > 100)

Goal: Production-ready logs that meet compliance needs and stay queryable.

Day 7: Knowledge Transfer

Share what you've learned with your team:

  • Document logging patterns - Define standard events and naming conventions
  • Develop team reference guide - Compile common queries and implementation patterns
  • Conduct knowledge transfer - Demonstrate rapid debugging capabilities to team members
  • Plan migration strategy - Identify next services for structured logging adoption

Goal: Get your team excited about using structured logging effectively.

Next Steps

Structured logging takes some setup and training, but the payoff is huge. When you can fix issues in under an hour like DORA's elite teams, it's worth it. Debugging becomes querying instead of searching.

Start Today

Install structlog: pip install "structlog>=21.1.0"

Grab the Quick Start configuration from this guide and convert just 5 log statements. Next time something breaks, you'll see the difference.

For immediate implementation, here's a complete example:

Complete FastAPI Example

Complete FastAPI example with request correlation, performance timing, and automatic error context:

python
# app.py - Complete FastAPI example with structured logging
import os
import structlog
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import time
import uuid
from contextvars import ContextVar

# Context variable for request ID (use async endpoints to avoid context propagation issues)
request_id_var: ContextVar[str] = ContextVar("request_id", default="")

# Configure structured logging
# Production-tested configuration
def configure_logging():
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.CallsiteParameterAdder(
                parameters=[
                    structlog.processors.CallsiteParameter.FUNC_NAME,
                    structlog.processors.CallsiteParameter.LINENO,
                ]
            ),
            structlog.processors.dict_tracebacks,
            structlog.processors.JSONRenderer()
        ],
        wrapper_class=structlog.stdlib.BoundLogger,
        logger_factory=structlog.stdlib.LoggerFactory(),
        cache_logger_on_first_use=True,
    )

# Initialize logging
configure_logging()
log = structlog.get_logger()

# Create FastAPI app
app = FastAPI(title="Structured Logging Demo")

# Middleware to add request ID to all logs
@app.middleware("http")
async def add_request_id(request: Request, call_next):
    request_id = str(uuid.uuid4())
    request_id_var.set(request_id)
    
    # Log the incoming request
    log.info("request_started",
        method=request.method,
        path=request.url.path,
        request_id=request_id
    )
    
    # Time the request
    start_time = time.time()
    
    try:
        response = await call_next(request)
        duration = time.time() - start_time
        
        # Log the response
        log.info("request_completed",
            method=request.method,
            path=request.url.path,
            status_code=response.status_code,
            duration_ms=round(duration * 1000, 2),
            request_id=request_id
        )
        
        # Add request ID to response headers
        response.headers["X-Request-ID"] = request_id
        return response
        
    except Exception as e:
        duration = time.time() - start_time
        log.error("request_failed",
            method=request.method,
            path=request.url.path,
            error=str(e),
            duration_ms=round(duration * 1000, 2),
            request_id=request_id,
            exc_info=True
        )
        return JSONResponse(
            status_code=500,
            content={"detail": "Internal server error", "request_id": request_id}
        )

# Example endpoints
@app.get("/users/{user_id}")
async def get_user(user_id: int):
    log.info("fetching_user", user_id=user_id)
    
    if user_id == 999:
        log.warning("user_not_found", user_id=user_id)
        return JSONResponse(status_code=404, content={"detail": "User not found"})
    
    user = {"id": user_id, "name": f"User {user_id}", "email": f"user{user_id}@example.com"}
    log.info("user_fetched", user_id=user_id, user_name=user["name"])
    return user

@app.post("/payments")
async def process_payment(amount: float, user_id: int):
    payment_id = str(uuid.uuid4())
    log.info("payment_initiated", payment_id=payment_id, user_id=user_id, amount=amount)
    
    if amount > 10000:
        log.error("payment_failed", payment_id=payment_id, user_id=user_id, amount=amount, reason="Amount exceeds limit")
        return JSONResponse(status_code=400, content={"detail": "Payment amount exceeds limit"})
    
    log.info("payment_completed", payment_id=payment_id, user_id=user_id, amount=amount, processing_time_ms=123)
    return {"payment_id": payment_id, "status": "completed", "amount": amount}

# Execution: uvicorn app:app --reload
# Testing: curl http://localhost:8000/users/123
# Output: JSON-formatted logs suitable for aggregation platforms

Running the Example:

  1. Install dependencies: pip install fastapi uvicorn structlog

  2. Save the code as app.py

  3. Run the server: uvicorn app:app --reload

    Windows users: If you get ModuleNotFoundError: No module named 'fcntl', use --workers 1 flag. The fcntl module is Unix-only.

  4. Test it:

    bash
    # Get a user
    curl http://localhost:8000/users/123
    
    # Process a payment
    curl -X POST http://localhost:8000/payments \
      -H "Content-Type: application/json" \
      -d '{"amount": 99.99, "user_id": 123}'
    

What You'll See in the Logs:

json
{"event": "request_started", "method": "GET", "path": "/users/123", "request_id": "abc-123", "timestamp": "2024-01-15T10:30:45.123Z", "level": "info"}
{"event": "fetching_user", "user_id": 123, "request_id": "abc-123", "timestamp": "2024-01-15T10:30:45.124Z", "level": "info"}
{"event": "user_fetched", "user_id": 123, "user_name": "User 123", "request_id": "abc-123", "timestamp": "2024-01-15T10:30:45.125Z", "level": "info"}
{"event": "request_completed", "method": "GET", "path": "/users/123", "status_code": 200, "duration_ms": 2.5, "request_id": "abc-123", "timestamp": "2024-01-15T10:30:45.126Z", "level": "info"}

Consistent request_id values enable full request tracing throughout the system.

Continue Learning

For production-scale implementations, refer to our companion post: Structured Logging at Scale: Production Patterns & Advanced Techniques which explores:

  • Deep-dive into processor pipelines and performance optimization
  • Integration patterns for FastAPI, Flask, Django, and Celery
  • Advanced techniques: sampling, batching, and async logging
  • Real-world case studies from companies processing billions of events
  • Migration strategies for large codebases

Start with one log statement. When you're debugging at 2 AM and find the issue in minutes instead of hours, you'll be glad you made the switch.

Insights That Connect Code to Consequences.

I share battle-tested lessons from building (and breaking) systems at scale. Every two weeks, you'll get a deep dive on reliability, observability, and the art of connecting technical decisions to their real-world business impact. No theory, just hard-won wisdom.

Unsubscribe anytime.






References
Comprehensive resources for structured logging implementation with Python

Core Documentation
• structlog - Official documentation and performance guidelines
• Python Logging - Standard library docs and cookbook

Getting Started Guides
• Better Stack Team. (2024). A Complete Guide to Logging in Python
• Real Python. (2024). Logging in Python
• Twilio. (2018). Python Logging: An In-Depth Tutorial

Framework Integration
• FastAPI: https://fastapi.tiangolo.com/tutorial/debugging/
• Flask: https://flask.palletsprojects.com/en/2.3.x/logging/
• Django: https://docs.djangoproject.com/en/4.2/topics/logging/
• Celery: https://docs.celeryq.dev/en/stable/userguide/tasks.html#logging

Alternative Libraries
• Loguru - Simplified Python logging
• python-json-logger - JSON formatter

Log Management Platforms
• Elastic (ELK Stack): https://www.elastic.co/what-is/elk-stack
• Datadog: https://docs.datadoghq.com/logs/

Community & Resources
structlog GitHub
Python Discord
Reddit r/Python
Loggly benchmarks
Honeycomb on sampling

Research & Industry Data
• IT Downtime Costs - EMA/BigPanda 2024 Report
• DORA Metrics - State of DevOps
• Cost of Downtime - Atlassian Analysis
• Context Switching - Atlassian Research
• Alert Fatigue - AI-Driven Monitoring Study
• Structured Logging Overhead - Tech Couch
• Regex Performance - Last9 Optimization Guide
• Logging Architecture - Netdata Academy
Datadog Trace Correlation: https://docs.datadoghq.com/tracing/other_telemetry/connect_logs_and_traces/python/