qertcolumbus.blogg.se - Golang http client timeout

Otherwise, clients will see sporadic HTTP 502s. In a (reverse_proxy + app_server) setup, the reverse proxy should be dropping connections more frequently than the app server. HTTP Keep-Alive can cause TCP race conditions 🤯 And this is completely invisible on the application side! So, no application logs will be helpful in such a case. Oftentimes, the proxy just gives up on such requests and responds with immediate HTTP 502 to the client. And instead of getting the TCP ACK for the sent bytes, the proxy gets TCP FIN (or even RST) from the upstream. So, the proxy starts writing a request into a connection that is being closed by the upstream.

While one endpoint (the proxy) can still be thinking that the connection is totally fine, the other endpoint may have already started closing the connection (the upstream). More specifically, if the proxy and the upstream have exactly the same idle timeout durations or the upstream drops connection sooner than the proxy.

So, what sometimes happens is that the proxy and the upstream have some misfortunate idle connection timeout settings. However, the server-side (i.e., the upstream) also can drop idle connections. Normally, idle connections are closed after some period of inactivity. When a connection stays in the pool for too long without being reused, it's marked as idle. This is a so-called connection pool pattern when just a few connections are heavily reused to handle numerous requests. While this serverfault answer says that the HTTP Keep-Alive should be used only for the client-to-proxy communications, from my experience, proxies often keep the upstream connections alive too. Hence, the state of a TCP connection is only eventually consistent! And as the CAP theorem dictates, any distributed system can be either available 100% of the time or consistent 100% of the time, but not both. Well, it should make sense - the actual state of a connection is distributed between its endpoints. And here, I'll just try to give a quick summary.įirst off, any TCP connection is a distributed system.

Long story short, it was a TCP race condition.Ī detailed explanation of the problem, including some low-level TCP packet analysis, can be found in this lovely article. And it took quite some time for my team to debug it back in the day (we even involved premium AWS support to pin down the root cause). HTTP 502 response generated by a proxy after it tries to send data upstream to a partially closed connectionįor me personally, it happened twice already - the first time, with a Node.js application running behind AWS ALB and the second time, with a Python (uWSGI) application running behind Traefik reverse proxy.Sporadic 502 response only when running through traefik.Proxy persistent upstream: occasional 502.NodeJs application behind Amazon ELB throws 502.However, sometimes there seems to be no apparent reason for HTTP 502s responses while clients sporadically see them: Reverse proxy is misconfigured (e.g., my favourite one - trying to call uWSGI over HTTP while it listens on uwsgi protocol 🙈).Typical problems causing it ( summarized by Datadog folks): HTTP 502 status code, also know as Bad Gateway indicates that a server, while acting as a gateway or proxy, received an invalid response from the upstream server. Make your Slack alerts rock by installing Robusta - Kubernetes monitoring that just works. For example, CrashLoopBackOffs arrive in your Slack with relevant logs, so you don't need to open the terminal and run kubectl logs. Robusta is based on Prometheus and uses webhooks to add context to each alert.