What happens to requests that exceed the rate limit?

Rate-limited requests receive a 429 status code with a retry-after header indicating when the user can send the next request. The custom message you configured is included in the response body.

Can I set different limits for different user tiers?

Yes. You can define user tiers with different rate limits. For example, free-tier users get 10 requests per minute while premium users get 100. Tier assignment is based on the user's authentication token.

Do rate limits apply to scheduled tasks?

Scheduled tasks run under a separate internal quota that does not count against user-facing rate limits. However, you can set a dedicated limit for scheduled tasks to control their token consumption.

Advanced

Rate Limiting: Optimize API Usage

Set up rate limits on your OpenClaw agent to control costs, prevent abuse, and ensure fair usage across all users and endpoints.

Deploy OpenClaw See How It Works

What You Will Get

By the end of this guide, your OpenClaw agent will enforce rate limits that prevent excessive usage, control costs, and protect against abuse. You will configure limits at the global, per-user, and per-endpoint levels with custom responses for rate-limited requests.

Rate limiting is a critical production concern. Without it, a single user or a misconfigured integration can exhaust your token budget in minutes. Proper rate limiting ensures every user gets fair access and your costs stay predictable.

You will set up token-bucket and sliding-window rate limiters, configure burst allowances, customize rate-limit error messages, and monitor limit utilization. The result is a resilient agent that handles traffic spikes gracefully without degrading service for other users.

Step-by-Step Setup

Follow these steps to configure rate limiting.

Open the Rate Limiting Panel

Navigate to your agent's settings in the RunTheAgent dashboard and select the Rate Limiting tab. This panel shows any existing limits and provides controls for creating new ones. The overview section displays current utilization metrics.

Set a Global Rate Limit

Define the maximum number of requests your agent can process per minute across all users and endpoints. This is your safety net that prevents total usage from exceeding your budget. Start with a limit that matches your expected peak traffic plus a 20% buffer.

Configure Per-User Limits

Set a maximum number of requests each individual user can make within a time window. This prevents any single user from monopolizing the agent. Common settings are 20 requests per minute for chat users and 60 requests per minute for API integrations.

Add Per-Endpoint Limits

If you have custom API endpoints, set limits for each one independently. A lightweight health-check endpoint can have a higher limit than a token-intensive generation endpoint. This granular control ensures expensive operations are throttled appropriately.

Configure Burst Allowance

Allow short bursts of traffic above the sustained limit using a token-bucket algorithm. For example, allow a user to send 10 messages in quick succession even if their sustained limit is 5 per minute. This accommodates natural conversation patterns without triggering false rate-limit errors.

Customize Rate-Limit Responses

Define the message users see when they are rate-limited. Include information about when they can retry, such as 'You have reached your request limit. Please wait 30 seconds before trying again.' A helpful message reduces frustration and prevents users from flooding the system with retries.

Monitor and Adjust

Use the rate limiting analytics to track how often limits are hit, by which users, and on which endpoints. If legitimate users are frequently rate-limited, increase the limit. If you see abuse patterns, tighten the limits or block the offending source.

Tips and Best Practices

Use Sliding Windows Over Fixed Windows

Sliding window rate limiters distribute load more evenly than fixed windows. Fixed windows can cause traffic spikes at the start of each interval when all users' quotas reset simultaneously.

Exempt Internal Services

If you have trusted internal services that call your agent, consider exempting them from rate limits or giving them a much higher quota. Verify their identity with API keys to prevent abuse of the exemption.

Combine Rate Limiting with Caching

Cache frequent responses so repeated identical requests are served without consuming rate limit quota. This effectively increases your throughput for common queries without raising the actual limit.

Set Up Alerts for Threshold Breaches

Configure alerts that notify you when rate limit utilization exceeds 80%. This early warning gives you time to scale up or optimize before users start experiencing rate-limit errors.

Frequently Asked Questions

Rate Limit Troubleshooting Custom API Endpoints Performance Optimization

Ready to get started?

Deploy your own OpenClaw instance in under 60 seconds. No VPS, no Docker, no SSH. Just your personal AI assistant, ready to work.

Deploy OpenClaw View Pricing

Starting at $24.50/mo. Everything included. 3-day money-back guarantee.