Many load balancers have 60s configured as default timeout. Our API timeouts are designed to work within these bounds.
Client -> Cloudflare -> NLB -> Traefik -> api-monolith
These are the timeouts that our API servers are restricted to:
- Cloudflare: 100s
(source)
- Behavior Returns a 524
- Cannot be configured unless paying for Cloudflare Enterprise
- AWS NAT Gateway: 350 seconds idle (without keepalive)
(source)
- Behavior Connection drop
- AWS NLB: 350 seconds
(source)
- Behavior Connection drop
- Traefik: 60s, 120s
(source)
- Behavior Unknown
- Unlike the other timeouts, this is configurable by us
- 60s timeout for active requests before traefik stops
- 120s timeout for reading the body and writing the response
- ATS (Through Traefik): 15s
(source)
- Behavior Unknown
We use long polling (i.e. watch_index
) to implement real time functionality. This means we need to be
cautious about existing timeouts.
Current timeouts:
api-helper
: 50s (source)- Behavior Returns
API_REQUEST_TIMEOUT
- Motivation This gives a 10s budget for any other 60s timeout
- Behavior Returns
select_with_timeout!
: 40s (source)- Behavior Timeout handled by API endpoint, usually 200
- Motivation This gives a 10s budget for any requests before/after the select statement
tail!
andtail_all!
: 40s (depending onTailAllConfig
) (source)- Behavior Timeout handled by API endpoint, usually 200
- Motivation This gives a 10s budget for any requests before/after the select statement
idle_timeout
is set to 3 minutes, which is less than the NAT Gateway timeouttest_before_acquire
is left as true in order to ensure we don't run in to timeouts, even though this adds significant overhead
- We ping the database manually every 15 seconds
- Back off retries is set to infinity in order to ensure that
ConnectionManager
always returns to a valid state no matter the connection issues- The current internal logic will cause the Redis connection to fail after 6 automatic disconnects, which will cause the cluster to fail if idle for too long
- Implementing long-running TCP Connections within VPC networking
- Introducing configurable Idle timeout for Connection tracking (this is intentionally not configured)