AWS or DigitalOcean for a small team?

DigitalOcean (or Hostinger VPS) for predictable workloads under ~$200/month in spend. The operational simplicity is real — fewer services, simpler pricing, no surprise bills. AWS once you need its breadth (Lambda, RDS, CloudFront, SQS) or you're already past ~$500/month in spend (the per-service economics flip). The migration path from VPS to AWS is real but manageable if you design your app to be stateless (state in Postgres and S3, not on disk).

Do I really need Kubernetes?

Almost certainly not. Most products ship and scale fine on a single VM with Docker Compose, or AWS ECS Fargate for managed containers, or even a serverless platform. K8s is correct when you have 10+ services with complex orchestration needs, multiple teams, and someone whose actual job is to maintain the cluster. Adopting K8s without those conditions is one of the most common architectural mistakes I see — it adds operational complexity orders of magnitude beyond what the project needs.

What's the simplest production-grade CI/CD setup?

GitHub Actions with one workflow per deployment target (staging, production), explicit job dependencies, secrets in Actions secrets, and approval gates for production. For zero-downtime VM deployments, use blue-green via Nginx with health checks. That's the entire shape. Anything fancier — Argo, Spinnaker, custom Jenkins — solves problems you probably don't have. Get this minimal setup right first; you can always grow into more later.

How do I monitor a Node.js production app without enterprise tooling?

Three pieces: Sentry for error tracking (genuinely good signal-to-noise), structured JSON logs shipped to one central place (CloudWatch, Better Stack, or Datadog), and a basic metrics dashboard (CPU, memory, error rate, request rate, p95 latency). Alerts on error rate spikes, sustained high latency, deploy failures. That setup catches most production issues. Every alert must be actionable and have a runbook — alert fatigue is the real failure mode for small teams.

When should I add a CDN like CloudFront or Cloudflare?

Always for static assets (images, CSS, JS) — the cost is negligible and the performance benefit is significant. For dynamic content, when you have users outside your origin region, when traffic spikes need absorption, or when you want DDoS protection. CloudFront integrates cleanly with S3 and ELB; Cloudflare is often easier to set up and has generous free tier. Don't wait for performance complaints — set this up at launch.

AWS, DigitalOcean & DevOps for Indie Engineers

Q: Why is my AWS bill so high?

Usually one of three things: cross-AZ or cross-region data transfer (the silent killer), idle resources nobody remembered they spun up (NAT gateways, unattached EBS volumes, dev RDS instances), or oversized RDS instances bought during a usage spike that never came back down. Set up Cost Explorer alerts, do a monthly cost review, and tag every resource so you know what it's for. The fix is almost always in those three areas, not in some exotic service.

Infra without a platform team.

Most cloud content is written for FAANG-scale engineering organisations with dedicated platform teams, full-time SRE staff, and budgets that don't have to be defended quarterly. That's not me. That's not most of the people reading this. This category is for the rest of us — engineers shipping real products without an SRE org, often as the only person on call, who need infrastructure that's robust enough to sleep at night but small enough to actually understand.

I've deployed on AWS, DigitalOcean, Hostinger VPS, Vercel, and a few smaller providers, across both serverless and traditional VM-based architectures, with budgets ranging from "personal credit card" to "small team's monthly burn." The posts here are the patterns I've trusted across that range. They optimise for understandability and operational simplicity first, performance and cost optimisation second — because the failure mode for small teams is almost always "we built something we don't understand any more," not "we couldn't get the last 5% of performance."

Why this category exists

Read enough DevOps content online and you'll come away thinking you need Kubernetes, service mesh, custom Helm charts, and a six-person SRE team to ship a CRUD app. You don't. You can ship serious products with a single EC2 instance, an Nginx config, and a GitHub Action. I've done it. I've seen it scale further than the K8s evangelists will admit.

This category is the corrective. It assumes you're running lean — a one to ten person team — and that operational simplicity is a feature, not a compromise. It covers the smallest amount of infra you can get away with for a given goal, and the specific points where you should and shouldn't add complexity. It also covers the bigger AWS topics (CloudFront, S3, Lambda, RDS) when they earn their place, with the honest cost analysis that most cloud tutorials skip.

The audience is the engineer who has to choose between "spend three weeks setting up Kubernetes" and "spend three days setting up Docker Compose on a single VPS." Most of the time, the second choice is correct. Posts here help you tell which times are which.

What you'll find here

The posts in this category cover five broad areas: AWS for small teams, the smaller-cloud alternatives (DigitalOcean, Hostinger, Vercel), CI/CD without complexity, observability that's actually useful, and cost optimisation that doesn't require a finance team.

AWS for small teams — EC2, S3, CloudFront, Route 53, Lambda, RDS — when each makes sense, with the configurations that won't require an AWS certification to maintain
DigitalOcean, Hostinger, Vercel — when they beat AWS, when they don't, real cost numbers across providers, and the operational story for each
CI/CD without complexity — GitHub Actions patterns that are easy to read, easy to debug, and easy to maintain — including zero-downtime deployment for traditional VM setups
Observability — logs, metrics, alerts, the minimal viable monitoring that catches the bugs that matter, and how to avoid alert fatigue
Cost optimisation — why your AWS bill is too high, the specific cost categories that pile up silently, and the audit patterns that catch them

AWS for small teams: the small set of services that matter

AWS has hundreds of services. You need maybe seven. EC2 for compute when you need full control, Lambda for event-driven workloads, RDS for managed databases, S3 for object storage, CloudFront for CDN, Route 53 for DNS, and one of the queue services (SQS or EventBridge) for asynchronous work. That's the working set for the vast majority of products. Everything else AWS sells is either a managed version of one of these (with operational tradeoffs to discuss) or solves a problem you don't yet have.

The configuration patterns I trust: EC2 instances behind an Application Load Balancer with a target group, autoscaling group with conservative scaling rules, instances replaced rather than mutated for upgrades. RDS in a private subnet with read replicas added only when query patterns justify them. CloudFront in front of both static assets (S3) and the load balancer (for dynamic content), with cache rules tuned per-route. Posts here cover the specific Terraform or CDK shapes for each of these patterns, with the tradeoffs explained.

The AWS pricing model is its own learning curve. Most small teams' AWS bills are dominated by one of three things: data transfer (especially cross-AZ and cross-region), idle resources nobody remembered they spun up, or oversized RDS instances bought during a usage spike that never came back down. Posts here walk through the audit process I use monthly to catch these.

The IAM story is the hardest part of AWS for most teams. The principle-of-least-privilege is correct but expensive in time. The pragmatic pattern: roles per-service (not per-user), MFA enforced on the root account, access keys rotated annually, CloudTrail enabled. That gets you 80% of the security benefit for 20% of the configuration effort. Posts here cover the specific role shapes that have held up across audits.

DigitalOcean and Hostinger: when they beat AWS

For a meaningful range of projects, DigitalOcean or Hostinger VPS beats AWS on cost, simplicity, and total time-to-deploy. Specifically: monolithic apps with predictable load, where you don't need the breadth of AWS services, and where you'd rather not pay the AWS pricing complexity tax.

A $20/month DigitalOcean droplet running Docker Compose with Nginx, Postgres, and your app will handle more traffic than most products ever see. The operational story is straightforward — SSH in, look at logs, restart services if needed. You won't get the managed-everything story AWS gives you, but you also won't get the AWS bill or the AWS pricing surprises. For early-stage products, this trade is almost always worth it.

Hostinger VPS is similar but cheaper and with marginally less polished tooling. I've used both extensively. The shape of project where each wins: DigitalOcean for SaaS where you want clean Spaces (S3-compatible) integration, Hostinger for cost-sensitive deployments where every dollar matters. Posts here cover the specific deployment patterns I've used on each.

The migration story from VPS to AWS later is real but manageable. If you architect your app to be reasonably stateless (state in Postgres and S3, not on disk), the migration is mostly about reproducing the network topology and re-pointing DNS. Posts here cover the specific migration playbooks I've used, with the gotchas that don't show up in tutorials.

CI/CD without becoming a full-time job

The CI/CD setup that works for most small teams: GitHub Actions. The patterns that keep it maintainable: one workflow per deployment target (staging, production), one job per concern (build, test, deploy), explicit dependencies between jobs, secrets in Actions secrets (not in code), and approval gates for production. That's the entire shape. Anything fancier — Argo, Spinnaker, custom Jenkins — is a problem you may grow into, but you almost certainly don't have today.

Zero-downtime deployment on a single VM is more achievable than people think. The pattern I use: blue-green via a reverse proxy (Nginx) with health checks, where the old version stays running until the new version reports healthy. For Docker-based deployments, this is a 30-line GitHub Actions workflow. For non-Docker, slightly more, but still tractable. Posts here cover the specific configurations with worked examples.

The database migration story is where most CI/CD pipelines get hairy. The pattern that's worked across multiple teams: migrations run as a separate job, gated by manual approval for production, with explicit rollback procedures documented. Schema changes that aren't backwards-compatible are flagged and require a two-phase deploy (deploy code that works with both old and new schema, run migration, deploy code that only works with new schema). Posts here cover the specific patterns for both SQL (Postgres) and NoSQL (MongoDB) migrations.

Testing in CI is its own topic. The pragmatic prioritisation I follow: unit tests for business logic (fast, ubiquitous), integration tests for the critical user paths (slower, focused), end-to-end tests for the truly user-facing flows (slowest, sparse). Don't try to E2E test everything — the maintenance burden will kill you. Posts here cover the specific Playwright and Vitest patterns I trust.

Observability that catches the bugs that matter

The minimum viable monitoring story: structured JSON logs shipped to a central place (CloudWatch, Datadog, or Better Stack), basic metrics (CPU, memory, error rate, request rate) on a dashboard, alerts on the few signals that mean "something is actually wrong" (high error rate, sustained high latency, deploys failing). That setup catches most production issues. Anything more elaborate should earn its place.

The single highest-leverage observability investment for small teams: error tracking with Sentry (or Bugsnag, or rollbar — they're all fine). The signal-to-noise ratio is dramatically better than log searching, and the ability to see error frequency over time often surfaces issues you didn't know you had. Posts here cover the specific Sentry setup patterns and the categorisation discipline that makes it actually useful.

Alert fatigue is the real failure mode for small teams. The discipline I've converged on: every alert must be actionable, every alert must have a runbook, alerts that fire more than once a week without being a real issue get deleted. The dashboard you check every Monday morning is more valuable than ten dashboards nobody looks at.

Cost monitoring belongs in observability too. Setting AWS billing alerts, watching the daily cost trend, and doing a monthly cost review catches the runaway bills before they become emergencies. Posts here cover the specific patterns for AWS Cost Explorer and the third-party tools (Vantage, Cloud Custodian) that earn their place for slightly larger teams.

Secrets management: the boring layer that prevents the worst outages

Secrets in code, hardcoded API keys in seed scripts, AWS access keys with `NEXT_PUBLIC_` prefixes that quietly ship to every visitor — every team has done at least one of these, including teams that should have known better. The cost of getting this wrong is asymmetric: one leaked secret can mean a six-figure cloud bill from cryptocurrency miners overnight, or worse.

The pattern that works for small teams: secrets in environment variables, never in code; a single source of truth (`.env` for local dev, a managed secrets store for production); secret rotation as a documented process, not a "we'll figure it out when we need to" hope. AWS Secrets Manager, DigitalOcean Functions secrets, and Vercel env vars are all fine choices — pick one based on where the rest of your infrastructure lives.

The discipline that's hardest to maintain: rotating secrets after team members leave, after suspected exposures, and on a periodic schedule even when nothing has changed. This is the thing that gets skipped first when teams are under pressure to ship, and it's the thing that bites hardest when it bites. Posts here cover the specific rotation playbook I follow, including the dev-environment-doesn't-go-down patterns that make rotation actually feasible to schedule.

The audit trail piece matters too. CloudTrail (for AWS) and equivalents elsewhere give you a record of who used which credential when. Most small teams have these features enabled by default and never look at the logs. The pattern I use: a quarterly review of access patterns, looking for anything unusual, and a hard alert on any usage of access keys from unfamiliar IP ranges.

Common mistakes I see (and have made)

The patterns repeat across cloud deployments:

Kubernetes when Docker Compose would do — adopting massive operational complexity to solve problems you don't have
Snowflake servers — VMs that have been hand-configured over time, impossible to reproduce, terrifying to touch
No backups — or backups that haven't been tested with a real restore
Secrets in code — finding API keys in git history during a security audit
No staging environment — testing in production because staging never got set up properly
Open security groups — RDS or Redis accessible from the world because the security group was opened "temporarily"
Resources without tags — six months in, nobody can remember what each EC2 instance is for

What's coming next in this category

The next few posts on the docket: a deep-dive into the specific AWS architecture I use for SaaS deployments (with CDK code and cost numbers), a comparison of DigitalOcean and Hostinger for early-stage products with real deployment configurations, a GitHub Actions workflow walkthrough for zero-downtime deployment, a post on the Sentry setup that's caught the most production bugs across multiple projects, and a long-overdue piece on AWS cost optimisation specifically for small teams.

If there's a cloud or DevOps problem you're stuck on, the contact form on the homepage works. Reader questions drive the queue here.

Cloud & DevOps