API Rate Limits

Learn about DocuPanda's REST API rate limits, how to avoid exceeding them, and what to do if you do exceed them.

DocuPanda API Rate Limits

To maintain a fair and stable environment for all developers, the DocuPanda REST API enforces rate limits. These limits govern the volume and pace of your requests, ensuring consistent platform performance for everyone. Rate limits are enforced independently of your billing credits (the units you purchase or receive under your plan). While billing credits are tied to cost and monthly usage allowances, rate limit tokens are a free, internal measure to protect against abuse and excessive traffic. You may use up all your monthly billing credits without once hitting a rate limit, or you might hit rate limits even if you have ample billing credits. Understanding and respecting these two separate concepts will help you manage both your costs and your application’s performance.


Understanding Our Rate Limits

Token-Based Request Costs

Every API request consumes rate limit tokens from your allocated bucket. This rate-based accounting system is separate from billing credits. Rate limit tokens never cost money; they only limit how rapidly you can make calls. The cost in tokens depends on the HTTP method used:

  • GET requests: Cost 1 token each
  • POST requests: Cost 10 tokens each
  • DELETE requests: Cost 10 tokens each

We don’t differentiate between API endpoints for rate limits. Only the HTTP method affects the number of tokens consumed.

Steady-State and Bursting

Subscribed non-enterprise users receive a continuous replenishment of rate limit tokens at 1,200 tokens per minute. This steady influx supports:

  • About 2 POST/DELETE requests per second (2 * 10 tokens = 20 tokens per second, or 1,200 tokens per minute.
  • About 20 GET requests per second (20 * 1 token = 20 tokens per second, again coming to 1,200 token per minute).

To handle short-term spikes, we employ a leaky bucket algorithm with a maximum capacity of 5 times your per-minute refill rate. At 1,200 tokens/min, this equates to a bucket of 6,000 tokens at full capacity. This reservoir lets you exceed the steady-state rate for short periods until these tokens are used up, after which you must wait while tokens refill at the normal rate.


The Leaky Bucket Algorithm

Think of your rate limit like a bucket of tokens:

  1. Bucket Capacity: Holds up to 6,000 tokens.
  2. Refill Rate: Adds 1,200 tokens per minute continuously.
  3. Cost Per Request:
    • GET requests consume 1 token.
    • POST and DELETE requests consume 10 tokens.

If you have enough tokens, your request succeeds. If you repeatedly exceed the steady-state rate, you’ll eventually drain the bucket. Once empty, further requests fail with a rate limit error until tokens replenish.


Example Usage Scenario

  • Short burst: Need to issue 30 GET requests instantly? Even though this is more than the “per second” baseline, the bucket’s capacity allows it. Your burst is absorbed by the stored tokens.
  • Sustained load: Continually making more than the “steady-state” number of calls over time will drain your bucket. Once empty, you must slow down or pause until it refills.

Rate Limit Headers

We provide helpful headers in each response so you can track and manage your rate limit usage in real-time.

  • X-RateLimit-Limit: Maximum size of your token bucket (e.g., 60000 means at most you can accumulate 6000 tokens if you hold off making any requests for a while).
  • X-RateLimit-Remaining: Tokens still available after this request.
  • X-RateLimit-Reset: The UTC epoch timestamp when your bucket would be fully refilled if it were empty now. (In practice, tokens refill continuously, but this gives you a reference point.)
  • X-RateLimit-Used: How many tokens this request you just made has consumed (currently this will always be 1 for GET, 10 for POST/DELETE).

Example Response Headers

X-RateLimit-Limit: 6000 X-RateLimit-Remaining: 5990 X-RateLimit-Reset: 1700000000 X-RateLimit-Used: 10

Here, the bucket can hold 6,000 tokens. After this request, 5,990 remain. This particular request cost you 10 tokens. The X-RateLimit-Reset indicates when a fully drained bucket would be back at full capacity.


Handling Rate Limit Errors

If you deplete your bucket’s tokens, you’ll see a 429 Too Many Requests error, which can be identified via status code 429. At that point:

  1. Do not retry immediately.
  2. Wait for tokens to replenish. Wait for a random interval between 10 and 60 seconds before retrying. The randomness is there to avoid the thundering herd effect, where you potentially many requests all synchronize to repeat at the exact same time.
  3. Understand why you are hitting the rate limit. The most common reason is that you are using a polling strategy to see when results are available with excessively fast polling. For example you maybe checking every second for a result, instead of using exponential backoff. Or you may have an infinite loop that never gives up on failures.
  4. Consider using webhooks to reduce API consumption. You can avoid polling altogether by using webhooks and waiting for an event that indicates a document processing, or standardization job has completed.
  5. Consider leveraging our Workflow abstraction. A workflow chains together common operation like "upload a document, then classify its contents, then standardize it using a specific schema depending on the classification result". This reduces multiple POST requests into one, and avoids all polling.
  6. Reach to Customer Support for increase limits. If your use case legitimately necessitates a very large volume of simultaneous document processing calls, reach out to customer support for increase rate limits. Enterprise plans can scale up to 10K simultaneous document processing requests and beyond.

Getting More Capacity

If your application requires higher throughput or more generous bursting capacity, consider upgrading to an enterprise plan. Enterprise customers can receive adjusted rate limit configurations. For more information, please contact our team by pressing the chat icon at the bottom right of this page, or submitting a request.


Summary

DocuPanda’s rate limits ensure that everyone shares a stable, responsive platform. By monitoring your token usage, adjusting your request strategy, and understanding the difference between billing credits and rate limit tokens, you can maintain smooth, uninterrupted access to the DocuPanda REST API—even under periods of high demand.