Image Processing — Validation, Resizing and Cloud Storage

Raw image uploads from users need processing before storage: validating dimensions and format, stripping private EXIF metadata (GPS coordinates, device info), creating standard-size thumbnails, and converting to a consistent format (WebP for web delivery). The Pillow library handles all of this in Python. For production applications, images are typically uploaded to cloud storage (AWS S3, Cloudflare R2, or a compatible service) rather than the local filesystem — a server filesystem is ephemeral in containerised deployments and does not scale across multiple instances.

Image Validation and Processing with Pillow

pip install Pillow boto3   # boto3 for AWS S3
from PIL import Image, UnidentifiedImageError
from io import BytesIO
import uuid

MAX_IMAGE_SIZE    = 10 * 1024 * 1024   # 10 MB
MAX_WIDTH         = 4096               # px
MAX_HEIGHT        = 4096               # px
THUMBNAIL_SIZE    = (400, 400)         # profile avatar thumbnail
ALLOWED_FORMATS   = {"JPEG", "PNG", "WEBP", "GIF"}

def process_avatar(image_bytes: bytes) -> tuple[bytes, bytes]:
    """
    Validate, strip EXIF, resize, and convert an uploaded avatar.
    Returns (original_webp_bytes, thumbnail_webp_bytes).
    """
    # Validate it is an actual image (not a renamed script)
    try:
        img = Image.open(BytesIO(image_bytes))
        img.verify()   # checks for corruption
        img = Image.open(BytesIO(image_bytes))   # re-open after verify
    except (UnidentifiedImageError, Exception) as e:
        raise ValueError(f"Invalid image: {e}")

    # Validate format
    if img.format not in ALLOWED_FORMATS:
        raise ValueError(f"Unsupported format: {img.format}. Allowed: {ALLOWED_FORMATS}")

    # Validate dimensions
    w, h = img.size
    if w > MAX_WIDTH or h > MAX_HEIGHT:
        raise ValueError(f"Image too large: {w}x{h} px (max {MAX_WIDTH}x{MAX_HEIGHT})")

    # Strip EXIF metadata (GPS, camera info, etc.)
    if hasattr(img, "_getexif"):
        data = list(img.getdata())
        clean = Image.new(img.mode, img.size)
        clean.putdata(data)
        img = clean
    else:
        # Pillow 9+ method
        img_no_exif = Image.new(img.mode, img.size)
        img_no_exif.putdata(list(img.getdata()))
        img = img_no_exif

    # Convert to RGB (strips alpha for JPEG compat; use RGBA for PNG/WebP)
    if img.mode not in ("RGB", "RGBA"):
        img = img.convert("RGB")

    # Save original as WebP (better compression than JPEG)
    orig_buf = BytesIO()
    img.save(orig_buf, format="WEBP", quality=85)
    original_bytes = orig_buf.getvalue()

    # Create thumbnail (aspect-ratio-preserving crop to square)
    thumb = img.copy()
    thumb.thumbnail(THUMBNAIL_SIZE, Image.LANCZOS)
    # Pad to exact square (optional)
    thumb_buf = BytesIO()
    thumb.save(thumb_buf, format="WEBP", quality=80)
    thumbnail_bytes = thumb_buf.getvalue()

    return original_bytes, thumbnail_bytes
Note: img.verify() checks that the image file is not corrupted, but after calling it, the file pointer is exhausted. You must re-open the image from the bytes to use it further — this is the standard Pillow pattern: Image.open(BytesIO(bytes)).verify() then img = Image.open(BytesIO(bytes)). Forgetting to re-open after verify raises an IOError: seek of closed file on the next operation.
Tip: WebP format typically produces files 25–35% smaller than JPEG at equivalent visual quality, while supporting transparency (like PNG). Converting uploaded images to WebP on the server reduces storage costs and improves delivery speed for web clients. All modern browsers support WebP. For backward compatibility, also generate a JPEG fallback — but for a new blog application targeting modern browsers, WebP-only is fine.
Warning: EXIF stripping is important for user privacy. GPS coordinates embedded in photos taken on smartphones can reveal the user’s home address or workplace. Camera serial numbers can fingerprint devices. Simply saving the raw uploaded bytes preserves all this metadata. Always strip EXIF before storing or serving user-uploaded images, unless your application explicitly needs that metadata (photography portfolio, GPS-tagged content).

Upload Endpoint with Image Processing

from fastapi import APIRouter, Depends, UploadFile, File, HTTPException
from sqlalchemy.orm import Session

router = APIRouter()

@router.post("/users/me/avatar", response_model=AvatarResponse)
async def upload_avatar(
    file:         UploadFile = File(..., description="Profile image (JPEG/PNG/WebP, max 10 MB)"),
    db:           Session    = Depends(get_db),
    current_user: User       = Depends(get_current_user),
):
    # Read and size-check
    content = await file.read()
    if len(content) > MAX_IMAGE_SIZE:
        raise HTTPException(413, "Image too large (max 10 MB)")

    # Process image
    try:
        original_bytes, thumb_bytes = process_avatar(content)
    except ValueError as e:
        raise HTTPException(422, str(e))

    # Generate paths
    uid           = str(uuid.uuid4())
    original_key  = f"avatars/{current_user.id}/{uid}.webp"
    thumbnail_key = f"avatars/{current_user.id}/{uid}_thumb.webp"

    # Save to disk (or upload to S3 — see below)
    save_path = UPLOAD_DIR / original_key
    save_path.parent.mkdir(parents=True, exist_ok=True)
    save_path.write_bytes(original_bytes)
    (UPLOAD_DIR / thumbnail_key).write_bytes(thumb_bytes)

    # Update profile
    profile = get_or_create_profile(db, current_user.id)
    profile.avatar_url       = f"/uploads/{original_key}"
    profile.avatar_thumb_url = f"/uploads/{thumbnail_key}"
    db.flush()

    return {
        "avatar_url":       profile.avatar_url,
        "avatar_thumb_url": profile.avatar_thumb_url,
    }

Uploading to AWS S3 (or Compatible)

import boto3
from botocore.exceptions import ClientError

s3 = boto3.client(
    "s3",
    aws_access_key_id     = settings.aws_access_key_id,
    aws_secret_access_key = settings.aws_secret_access_key,
    region_name           = settings.aws_region,
)

def upload_to_s3(data: bytes, key: str, content_type: str = "image/webp") -> str:
    """Upload bytes to S3 and return the public URL."""
    try:
        s3.put_object(
            Bucket      = settings.s3_bucket,
            Key         = key,
            Body        = data,
            ContentType = content_type,
        )
        return f"https://{settings.s3_bucket}.s3.{settings.aws_region}.amazonaws.com/{key}"
    except ClientError as e:
        raise HTTPException(500, f"Storage upload failed: {e}")

# In the upload endpoint, replace the disk save with:
url       = upload_to_s3(original_bytes,  original_key)
thumb_url = upload_to_s3(thumb_bytes,     thumbnail_key)
profile.avatar_url       = url
profile.avatar_thumb_url = thumb_url

Common Mistakes

Mistake 1 — Forgetting to re-open image after verify()

❌ Wrong — closed file after verify:

img = Image.open(BytesIO(content))
img.verify()   # closes the file internally!
img.resize(...)   # IOError: seek of closed file

✅ Correct — re-open after verify:

Image.open(BytesIO(content)).verify()   # verify separately
img = Image.open(BytesIO(content))       # re-open for use ✓

Mistake 2 — Saving uploaded images without EXIF stripping

❌ Wrong — GPS coordinates preserved in stored image:

with open(dest, "wb") as f:
    f.write(raw_upload_bytes)   # stores all EXIF including GPS!

✅ Correct — process through Pillow to strip metadata.

Mistake 3 — Storing images on local filesystem in Docker/Kubernetes

❌ Wrong — files lost on container restart or across multiple pods:

Path("uploads/avatar.webp").write_bytes(data)   # ephemeral container storage!

✅ Correct — use cloud storage (S3, R2) or a persistent volume for production.

Quick Reference

Task Code
Open image from bytes Image.open(BytesIO(content))
Validate image Image.open(BytesIO(bytes)).verify() then re-open
Create thumbnail img.thumbnail((400, 400), Image.LANCZOS)
Save to bytes buf = BytesIO(); img.save(buf, "WEBP"); buf.getvalue()
Strip EXIF Create new image, copy pixel data without metadata
Upload to S3 s3.put_object(Bucket=..., Key=..., Body=data)

🧠 Test Yourself

A user uploads a photo taken on their phone. After storing it, another user’s browser downloads it and sees the original GPS coordinates embedded in the file. What went wrong and how do you fix it?