You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're evaluating / trying Cosmo Router for adoption in our organization, and the rate limiting behavior during Redis outages is a critical blocker for us. As described in #1555, when Redis is unavailable (scaled to zero, network partition, transient connectivity issue), the router returns HTTP 500 for every request that goes through rate limit evaluation. This effectively turns an optional feature (rate limiting) into a hard dependency that can cause a full service outage.
PR #1659 made an initial attempt at addressing this but was closed as stale (not sure why)
Lets revive this conversation and discuss the right approach before contributing a new implementation.
Right now there are 2 problems, that fall into rate-limit fail open bucket
If redis is unreachable at the startup, router fails to start
If redis becomes temp unavailable, all rate-limited requests fail
For us, rate limiting is a "best effort" safeguard — ideally, it should never be the reason the entire router goes down. A Redis blip causing a full outage is rather unacceptable in production.
Potential solution
A new configuration option that, when enabled:
Allows the router to start even if Redis is not reachable, logging a warning
Allows requests to proceed without rate limiting when Redis is unreachable at runtime ( logging a warning ?)
Does the general approach align with your vision for this feature? Any suggestions?
I'd love to contribute the implementation for this if you're open to it.
Cosmo is the top choice for a federated GraphQL gateway and this is the last piece my org needs to move forward with adoption.
I'm ready to contribute a well-tested implementation as soon as there's alignment on the approach. Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey folks!
We're evaluating / trying Cosmo Router for adoption in our organization, and the rate limiting behavior during Redis outages is a critical blocker for us. As described in #1555, when Redis is unavailable (scaled to zero, network partition, transient connectivity issue), the router returns HTTP 500 for every request that goes through rate limit evaluation. This effectively turns an optional feature (rate limiting) into a hard dependency that can cause a full service outage.
PR #1659 made an initial attempt at addressing this but was closed as stale (not sure why)
Lets revive this conversation and discuss the right approach before contributing a new implementation.
Right now there are 2 problems, that fall into rate-limit fail open bucket
For us, rate limiting is a "best effort" safeguard — ideally, it should never be the reason the entire router goes down. A Redis blip causing a full outage is rather unacceptable in production.
Potential solution
A new configuration option that, when enabled:
Does the general approach align with your vision for this feature? Any suggestions?
I'd love to contribute the implementation for this if you're open to it.
Cosmo is the top choice for a federated GraphQL gateway and this is the last piece my org needs to move forward with adoption.
I'm ready to contribute a well-tested implementation as soon as there's alignment on the approach. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions