Claude Code Routines: How I Automate Real Work on a Schedule (2026)

How I use Claude Code routines to triage production Cloudflare Worker errors every 6 hours: the prompt, the API calls, and the trust boundaries that keep it safe.

Claude Code routines are scheduled cloud agents that run a prompt on a cron schedule without you in the loop. I use them for the boring, recurring work I used to do by hand. The one I lean on most runs every 6 hours, reads production exceptions from my Cloudflare Workers, diagnoses each one against the codebase, and then either opens a pull request with a fix, files a ClickUp task and tags me, or just posts an all-clear to Slack. This guide shows exactly how it works, the prompt behind it, and the rules that keep it from doing anything stupid.

The concept is simple, and so is the setup. Because Slack and ClickUp connect through Claude Connectors, wiring up a routine like this takes under five minutes: pick the connectors, paste the prompt, set the schedule. The value is entirely in the judgment you encode and the boundaries you set. That is what this post is about.

What are Claude Code routines?

What are Claude Code routines?

Claude Code routines are scheduled cloud agents that run a defined prompt on a cron schedule, autonomously, with access to the tools and MCP servers you grant them. They run in an isolated cloud environment instead of your terminal, report back when finished, and need no human prompting to start. Use a routine when a task is recurring, rule-based, and worth doing whether or not you remember to do it.

If you have used Claude Code in your terminal, a routine is the same agent with two differences: it runs on a schedule you set instead of when you type, and it runs in the cloud instead of on your machine. You give it a prompt, a cadence, and a set of tools. It wakes up in a fresh session with zero prior context, does the work, and tells you what it did.

People also call these "Claude Code scheduled tasks" or "Claude routines". Same thing.

Why I automate triage first

I pick automation targets with one question: what do I do on a recurring basis that follows rules I can write down?

Log triage was the obvious first answer. Before the routine, my morning looked like this: open the Cloudflare dashboard, scan for exception spikes, cross-reference anything weird against open issues, decide whether it was real or noise, and only then start actual work. Twenty to thirty minutes of context-switching before writing a line of code, every day, and I still missed things that happened overnight.

That task fits a routine because the decisions are mostly mechanical:

  • Is this error new, or already tracked?
  • Is it a real code bug, a config issue, or just a bot probing a route?
  • Was it an unhandled crash, or an error we already caught?
  • Can I describe a safe fix, or do I need a human to look closer?

Those are rules. Rules are exactly what an agent can run on a schedule.

The triage routine, end to end

Here is the whole shape before the details. This runs against one of my production apps, a Cloudflare Workers project:

Every 6 hours (fresh cloud session, no prior context):
1. Compute a 360-minute window from `date -u`
2. Run a saved Observability query against the Cloudflare API
(production workers only; skip anything -staging / -preview / -dev)
3. Dedupe within the run, then classify each error
4. Read the codebase to diagnose
5a. Isolated, testable code bug -> open ONE PR against `canary` (TDD, never main)
5b. Anything ambiguous -> create a ClickUp task and tag me
6. Post a summary to Slack #claude-logs (every run, even when clean)

Four moving parts: the query, the prompt, the schedule, and the output contract. The output contract is the part people skip, and it is the most important.

Step 1: A saved query against the Workers Observability API

I keep a saved query inside Cloudflare's Workers Observability and the routine calls it by id. Authentication is two environment variables ($CF_API_TOKEN and $CF_ACCOUNT_ID) and a plain curl. No wrangler, no installed tooling, because a fresh cloud session should stay minimal and predictable. The routine enumerates production workers and ignores anything ending in -staging, -preview, or -dev.

The core call posts the saved query id and a time window to the telemetry endpoint, filtered to records that have an error, grouped by worker, error, and outcome:

curl -sS -X POST \
"https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/workers/observability/telemetry/query" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"queryId": "'"$CF_OBSERVABILITY_QUERY_ID"'",
"timeframe": { "from": '"$FROM_MS"', "to": '"$TO_MS"' },
"parameters": {
"filters": [
{ "key": "$metadata.error", "operation": "exists", "type": "string" }
],
"filterCombination": "and",
"calculations": [{ "operator": "count", "alias": "Count" }],
"groupBys": [
{ "type": "string", "value": "$workers.scriptName" },
{ "type": "string", "value": "$metadata.error" },
{ "type": "string", "value": "$workers.outcome" }
],
"orderBy": { "value": "Count" },
"limit": 100
}
}'

The key field is $workers.outcome. An exception is an unhandled crash; an ok with an error attached means the Worker caught and handled it. Both are worth a look, but a handled error is lower severity than a crash. For stack traces, the routine runs a second query per affected worker grouped by the error stack.

If the telemetry query shape gets rejected after a real attempt to fix it, the routine falls back to the GraphQL Analytics API (workersInvocationsAdaptive) and notes in Slack that full stack traces were unavailable. It never uses the Log Explorer API, which has no Workers dataset.

Step 2: The routine prompt

The prompt is where the judgment lives. Here is a faithful, trimmed version of mine:

# Role
You run every 6 hours, unattended, in a fresh cloud session with zero prior
context. Surface PRODUCTION Cloudflare Worker exceptions from the last 6 hours,
diagnose them against the codebase, and either propose a fix via PR or escalate
to ClickUp. Read AGENTS.md at the repo root first; it governs architecture, TDD,
and the multi-tenancy invariants you must follow for any code change.
# Tools & auth
- Query the Workers Observability telemetry API with curl, using $CF_API_TOKEN
and $CF_ACCOUNT_ID. Do NOT use wrangler or install tooling.
- If a token, account id, or query id is missing, or the API returns 401/403:
STOP and post the failure to Slack #claude-logs. No fallback, no guessing.
- Slack and ClickUp are reached via their MCP connectors.
# Scope (hard constraints)
- PRODUCTION ONLY. Ignore staging/preview/dev workers entirely.
- Window: exceptions from the last 360 minutes, computed from `date -u` at start.
- Enumerate every production worker from Cloudflare; do not assume a fixed list.
- Never push to main. Never deploy. Never touch infrastructure or secrets.
# Procedure
1. Compute [from, to] in epoch ms (now-360min .. now).
2. POST the saved telemetry query ($CF_OBSERVABILITY_QUERY_ID), filtered to
errors, grouped by scriptName + error + outcome. Pull stack traces with a
second query per affected worker.
3. Dedupe within the run: group by (worker, error type, normalized message, top
stack frame); collapse IDs, URLs, and timestamps first. The run is stateless.
4. Classify each group: CODE BUG / CONFIG-ENV / TRANSIENT / UNCLEAR.
# Act
- Open ONE PR against `canary` only for an isolated, well-understood code bug
with a safe, testable fix. Follow AGENTS.md TDD (failing test first); run
typecheck, lint, and format before opening. Bias toward escalation when unsure.
- Otherwise create a ClickUp task assigned to me, titled "[prod-triage] <worker>
— <issue>", with the exception, a stack trace, occurrence count, root-cause
analysis, and what needs verifying. Before creating, search for an open
"[prod-triage]" task for the same worker and issue; if one exists, comment the
new count instead of creating a duplicate.
# Report
Post to Slack #claude-logs every run: PRs opened, items needing verification with
ClickUp links, and transient/no-action notes. If zero exceptions, say so.

Notice how much of this is restraint. "Bias toward escalation when unsure." "Do not report anything already tracked." "If a query id is missing, STOP and post to Slack, no guessing." A routine that cannot stay quiet, or that improvises when it lacks data, becomes noise you will mute within a week.

Step 3: The schedule

Every 6 hours. In cron terms that is 0 */6 * * *, and the routine derives its 360-minute lookback window from date -u at the start of each run, so the schedule and the query window stay in sync. That cadence matches my traffic: enough that nothing festers overnight, rare enough that runs are cheap and I am not drowning in updates. On a higher-traffic app I would tighten it to hourly. On a side project, once a day.

The cadence is a dial you tune to your traffic, not a fixed default. Set it to how fast a problem can hurt you before you would otherwise notice.

Step 4: The output contract (the part that matters)

This is the difference between a useful routine and a liability:

  • Confident, isolated, testable code bug: open one PR against canary with a failing-test-first fix, after running typecheck, lint, and format. I review and merge.
  • Anything ambiguous or risky: create a ClickUp task with the evidence and tag me. Bias here is heavy; the agent escalates rather than guesses.
  • Clean run: post one line to Slack and stop.

I will say the important rule plainly: the routine cannot push to main and cannot deploy. Its most powerful action is opening a PR against canary, a branch that does not ship on its own, where a human gates the merge. Everything else is just writing things down for me to look at. That single boundary is what lets me point an autonomous agent at production logs and sleep fine.

The triage logic worth stealing

The mechanics are easy. These three rules are what make the output good instead of annoying.

Dedupe within a stateless run. Each run starts fresh with no memory of the last one, so the agent groups errors by worker, error type, normalized message, and top stack frame, collapsing high-cardinality noise like IDs and URLs before grouping. Without this, one ongoing incident shows up as fifty rows and you learn to ignore the report.

Treat a handled error and a crash differently. Cloudflare's outcome=exception is an unhandled crash; outcome=ok with an error means your code caught it. Both are worth seeing, but they are not the same severity, and folding that distinction into the prompt stops the agent from paging you over an error you already handle gracefully.

Make the default "escalate," not "act." The agent opens a PR only for an isolated bug with an obvious, testable fix. Everything else becomes a ClickUp task for a human. An agent that acts when uncertain is far more expensive than one that asks, so the prompt biases hard toward escalation.

What it can and cannot do

I think about routine permissions the way I think about a junior engineer's first week: lots of room to investigate and propose, no ability to break production.

When to give a routine write access:

  • Opening PRs against a non-deploying branch (a human gates the merge)
  • Creating issues or tasks
  • Adding labels and comments
  • Posting run summaries to a Slack channel

When to withhold it:

  • Pushing to main or any branch that deploys
  • Closing issues or resolving tasks (it lacks the full context)
  • Messaging customers or pinging the whole team
  • Changing infrastructure, billing, secrets, or auth config

If unsure: make the routine propose, not perform. A PR or a task is a proposal. A merge or a deploy is an action. Keep actions on the human side of the line, and make the routine fail loud (stop and post to Slack) the moment it lacks the access or data it needs.

What it has caught, and what I am still watching

Honest status: this routine is young, so I am reporting what it has done, not a year of triumphant metrics.

So far it has been most useful on the unglamorous stuff. It flagged a misconfiguration I would not have gone looking for, and it consistently surfaces bad bots, the probe-and-error pattern that never shows up in a quick dashboard glance but is obvious when something reads every production worker's exceptions on a schedule.

What I am watching for, because I have not hit it yet and I expect to:

  • Severity inflation: agents tend to think everything is important. The outcome-aware classification and the escalate-by-default bias are my hedges, and I will tighten them if it over-reports.
  • Log noise read as signal: a third-party outage can look like your bug. The dedupe and "classify transient, do not act" rules help, but this is the failure mode I trust least.
  • Cost creep: every 6 hours is cheap; hourly across more workers is not free. Worth metering before you scale the cadence.

I would rather tell you the guardrails I built for problems I am anticipating than invent war stories. If you run this, watch those three.

Other routines worth automating

Triage is one example. The same pattern (stable saved input, encoded judgment, propose-don't-perform output) fits plenty of recurring work:

  • Dependency triage: weekly, read the changelog and lockfile diff for outdated packages, open a PR for safe patch bumps, file a task for anything with breaking changes.
  • Daily SEO brief: every morning, pull ranking and traffic changes, summarize what moved and why, flag pages that dropped.
  • PR hygiene: a few times a day, find PRs that are stale, missing a description, or failing CI, and nudge with a comment.
  • Docs drift: weekly, diff recent code changes against the docs and open tasks where they disagree.

Each one follows the same contract: it gathers, it judges against rules you wrote, and it proposes rather than acts.

Routine vs script vs interactive agent

A routine is not always the right tool. Quick framework:

ToolBest forAvoid when
Cron scriptDeterministic work with no judgment (backups, syncs)The task needs reasoning about context
Claude Code routineRecurring work that needs judgment and can propose changesThe task is one-off, or needs your input mid-task
Interactive agentOne-off or exploratory work where you steerThe work is recurring and you keep redoing it

If a plain script can do it, use the script. Reach for a routine when the task needs to read a situation and decide, not just execute fixed steps.

Quick Recommendation

Claude Code routines are best for:

  • Recurring work that follows rules you can write down (triage, audits, briefs)
  • Tasks worth doing on a schedule whether or not you remember
  • Anywhere a proposal (PR, task, Slack summary) is more useful than a raw alert

Skip routines if:

  • The task is one-off or needs you steering it mid-flight
  • A deterministic cron script already handles it
  • You are not willing to define a strict output contract (an unbounded routine becomes noise)

My pick: start with one read-only routine that only files tasks and posts to Slack, run it for a week, and only grant PR access once you trust its judgment. Earn the write access incrementally, exactly like you would with a new hire.

Frequently Asked Questions

What are Claude Code routines?
Claude Code routines are scheduled cloud agents that run a defined prompt on a cron schedule, autonomously, with access to the tools and MCP servers you grant them. They run in an isolated cloud environment rather than your terminal and report back when finished. Use one for recurring, rule-based work like log triage, dependency checks, or daily briefs.
How are Claude Code routines different from a cron job?
A cron job runs fixed, deterministic steps. A Claude Code routine runs an agent that reasons about context: it can read your logs and codebase, decide whether an error is a real bug or bot noise, and propose a fix as a pull request. Use a script when no judgment is needed and a routine when the task requires reading a situation and deciding.
How do I pull Cloudflare logs into a routine?
I query the Workers Observability telemetry API with curl, authenticating with two environment variables and calling a saved query by id so the data shape stays consistent every run. The GraphQL Analytics API is a fallback when the telemetry query is rejected. Avoid the Log Explorer API, which has no Workers dataset, and avoid wrangler so the cloud session stays minimal.
Is it safe to give a scheduled agent access to production logs?
Yes, if you constrain what it can do. My routine reads logs and opens PRs against a non-deploying branch (canary), but it cannot push to main, deploy, close tasks, message customers, or change infrastructure. It also fails loud, stopping and posting to Slack, if its auth or query id is missing rather than guessing. Keep proposals on the agent side and actions on the human side.
How often should a triage routine run?
Match the cadence to how fast a problem can hurt you. I run mine every 6 hours and compute a matching 360-minute lookback window, which catches overnight issues without flooding me. Higher-traffic apps benefit from hourly runs; side projects are fine once a day. Tighten the schedule only if problems slip through between runs.
What stops the routine from spamming me with false positives?
Three rules: dedupe within each stateless run by worker, error type, normalized message, and top stack frame; separate handled errors (outcome ok) from unhandled crashes (outcome exception) when setting severity; and default to escalating a ClickUp task rather than acting whenever a fix is not obvious and testable. A routine that cannot stay quiet on a clean run gets muted within a week.

Next steps

If you want the foundations first, read our Claude Code best practices guide for how we structure AGENTS.md rules, MCP servers, and subagents, then see how to build a SaaS with Claude Code for the interactive version of this workflow. If you are choosing where to run the app this triages, we cover that in best hosting for Next.js. Routines are what you graduate to once those patterns are stable.

MakerKit ships with the AGENTS.md rules and MCP setup that make routines like this reliable out of the box, because an agent is only as good as the codebase it reasons about. That is the same principle behind every automation here: encode good structure once, then let the schedule do the rest.

Set one up and it handles the unglamorous part of running production while you sleep. And if you are feeling very brave, point it at a frontier model like Opus 4.8 with its 1M-token context, drop the canary guardrail, and let it push straight to production on its own. I would not, and everything above is an argument for why. But the routine will happily match whatever risk appetite you give it. Use at your own risk.