status · page · live
public status page · status.markandrewmarquez.com
// cover · public status page at status.markandrewmarquez.com

Commercial uptime monitoring is a solved problem — Datadog, Pingdom, UptimeRobot, Better Stack. They all work. They also all cost $20–50/month by the time you add enough endpoints and a public status page, which is absurd for a personal setup where I just want to know if my portfolio site, a few dev-tool registries, and the cloud services I depend on are actually reachable. This project is the AWS-native alternative: one CloudWatch Synthetics canary, seven endpoints, a Terraform stack that deploys end-to-end, and a public status page at status.markandrewmarquez.com — all for about the price of a cup of coffee per year.

01 // The Challenge

The brief I set for myself: monitor the services I actually depend on, alert me when they break, and publish uptime history somewhere public — all without the monthly SaaS bill and without writing a single thing I couldn't rebuild from code. The hard constraint was budget: $1–2/month, total. That constraint drove almost every architectural decision that followed.

Two things had to be true for this to work. First, every resource needed to be defined in Terraform so the whole stack could be destroyed and re-applied without ceremony. Second, the monitoring architecture had to collapse cost aggressively — commercial tools charge per monitor, and AWS Synthetics charges per canary run, so the obvious first instinct of "one canary per endpoint" would blow the budget before the first alarm ever fired.

02 // Architecture & Stack

The entire stack lives in a single AWS region (us-east-1, because CloudFront requires its ACM certificate there anyway) and deploys with terraform apply:

03 // Architecture Overview

                 ┌──────────────────────────────┐
                 │  CloudWatch Synthetics Canary │
                 │     (1 canary · 30 min)      │
                 └──────────────┬───────────────┘
                                │  executeHttpStep × 7
                ┌───────────────┼───────────────┐
                ▼               ▼               ▼
        portfolio-site   github-status   ms-graph-api   …
                │               │               │
                └──────────┬────┴────┬──────────┘
                           ▼         ▼
                  SuccessPercent metric  (per step)
                           │
              ┌────────────┼────────────┐
              ▼                         ▼
      CloudWatch Alarms         Lambda (5 min schedule)
              │                         │
              ▼                         ▼
           SNS topic             status.json on S3
              │                         │
              ▼                         ▼
           Email inbox         CloudFront + ACM
                                        │
                                        ▼
                     status.markandrewmarquez.com

04 // The Single-Canary Trick

This is the piece of the architecture I'm most happy with. Synthetics bills per canary run, not per step. A canary that runs every 30 minutes costs roughly the same whether it checks one endpoint or ten. The catch is that a naive implementation — a single fetch() loop with a pass/fail return — gives you one metric for the whole canary, so you can't tell which endpoint broke.

The fix is executeHttpStep(). Each call emits its own SuccessPercent metric in CloudWatch, dimensioned by CanaryName and StepName. So one canary, seven steps, seven independent metrics, seven independent alarms — for the price of one canary run. The script iterates over the monitors list and calls executeHttpStep once per endpoint:

const monitors = ${jsonencode(monitors)};

exports.handler = async () => {
  for (const monitor of monitors) {
    const url = new URL(monitor.url);

    const requestOptions = {
      hostname: url.hostname,
      path:     url.pathname + url.search,
      port:     url.port || (url.protocol === 'https:' ? 443 : 80),
      protocol: url.protocol,
      method:   'GET',
      headers:  { 'User-Agent': 'Amazon-CloudWatch-Synthetics' },
    };

    const stepConfig = {
      includeRequestHeaders:  false,
      includeResponseHeaders: false,
      includeRequestBody:     false,
      includeResponseBody:    monitor.type === 'api',
      continueOnStepFailure:  true,
    };

    await synthetics.executeHttpStep(
      monitor.name,
      requestOptions,
      validateResponse(monitor),
      stepConfig
    );
  }
};

Two subtleties I only found after breaking things:

05 // Endpoint Selection

Seven endpoints, chosen to reflect the services I actually touch in a normal week — not a synthetic "demo" list:

The endpoint list is a Terraform variable. Adding or removing a monitored service is a one-line change and a terraform apply — the script template re-renders, the zip re-packages, and for_each creates or destroys the matching CloudWatch alarm automatically.

06 // Infrastructure as Code

The Terraform project is seven files, split by responsibility so each one does exactly one thing:

cloudwatch-monitor/
├── main.tf                  # Provider config (aws + archive)
├── variables.tf             # Endpoint list, interval, email
├── canary.tf                # S3 artifacts bucket, IAM role, canary
├── alarms.tf                # SNS topic + per-endpoint alarms
├── status-page.tf           # S3, CloudFront, Lambda, ACM cert
├── outputs.tf               # Console URLs, status page URL
└── canary-script/
    └── index.js.tftpl       # Node.js canary template
terraform · apply · output
Terraform apply updating the Synthetics canary resource
// terraform apply — updating the canary monitor resource in place

The alarms file is the piece that benefits most from for_each — one block generates every per-endpoint alarm from the shared monitors variable:

resource "aws_cloudwatch_metric_alarm" "endpoint_alarms" {
  for_each = { for m in var.monitors : m.name => m }

  alarm_name          = "$${var.project_name}-$${each.key}-failed"
  comparison_operator = "LessThanThreshold"
  threshold           = 100
  evaluation_periods  = 2
  datapoints_to_alarm = 2

  metric_name = "SuccessPercent"
  namespace   = "CloudWatchSynthetics"
  statistic   = "Average"
  period      = var.canary_interval_minutes * 60

  dimensions = {
    CanaryName = aws_synthetics_canary.monitor.name
    StepName   = each.key
  }

  treat_missing_data = "breaching"
  alarm_actions      = [aws_sns_topic.alarm_notifications.arn]
  ok_actions         = [aws_sns_topic.alarm_notifications.arn]
}

Two consecutive 100%-success-threshold breaches are required before the alarm fires — one transient network blip at 30-minute intervals isn't enough to wake me up, but an hour of sustained failure will. Missing data is treated as breaching, so if the canary itself stops running entirely, that counts as failure too. ok_actions is wired to the same SNS topic so I get a "back to normal" email when things recover.

cloudwatch · alarms · dashboard
CloudWatch Alarms dashboard — 7 endpoint alarms, all OK
// all seven per-endpoint alarms in OK state — one alarm per executeHttpStep, driven by Terraform for_each

07 // The Status Page Pipeline

The public status page is deliberately boring infrastructure: no server, no database, no framework. Just a JSON file on S3 and a static HTML page that fetches it.

CloudWatch metrics (SuccessPercent per step)
       │
       ▼
Lambda (Python · every 5 min)
  GetMetricData → build status.json
       │
       ▼
S3 bucket (status.markandrewmarquez.com)
       │
       ▼
CloudFront + ACM cert (us-east-1)
       │
       ▼
Browser ← fetch('/status.json') every 30s

The Lambda function is about 60 lines of Python. It reads the endpoint list from an environment variable, calls GetMetricData for each step's SuccessPercent over the last 24 hours, and writes a small JSON document with each endpoint's current status and recent uptime percentage. The static frontend polls that file every 30 seconds and re-renders — green dots for healthy, red for failing, a last-updated timestamp so it's obvious when the data is stale.

The DNS handoff is the only piece Terraform doesn't manage: GoDaddy hosts the zone for markandrewmarquez.com, so adding the CNAME for the status subdomain is a manual two-minute step in the GoDaddy console after terraform apply finishes. Terraform outputs the CloudFront distribution domain as a convenience so there's nothing to copy-paste from the console.

cloudwatch · synthetics · canary run
CloudWatch Synthetics canary run — passed, 15.6s duration, scheduled run
// canary run detail — 101% success across all steps, 15.6s total duration

08 // Results

▸ cost
~$1.73/monthSingle canary at 30-minute intervals, dominating the bill. Everything else sits in free-tier noise.
▸ coverage
7 endpoints · 1 canaryPer-step metrics mean every endpoint gets its own alarm without per-endpoint billing.
▸ deploy
One commandterraform apply stands up the entire stack from zero, including CloudFront and ACM.
▸ alerting
Email in under 2 minAlarm fires after 2 breaches × 30-minute period; SNS delivery is effectively instant.
▸ public
Status page livestatus.markandrewmarquez.com · static, cached at the edge, polls status.json every 30 s.
▸ state
Fully reproducibleDestroy & re-apply in under 10 minutes. Secrets stay out of the repo; state stays local.

09 // What I Took From It

The biggest lesson is the same one that keeps showing up across every build: constraints make architecture better. A $2/month budget eliminated every lazy option and forced the single-canary pattern, which turned out to be the cleanest design anyway.

10 // Try It

Open-source at github.com/marky224/cloudwatch-monitor. Live status page at status.markandrewmarquez.com. Full setup: an AWS account with CLI credentials configured, Terraform 1.5+, an email address to receive alerts, and a domain if you want the public status page. Clone, copy terraform.tfvars.example to terraform.tfvars, set your email, then terraform init && terraform apply. The endpoint list lives in variables.tf — swap in the services you actually care about and re-apply.