nginx错误分析 Connection reset by peer
nginx上下游针对请求处理的超时时间配置不合理,导致报connection reset by peer问题,即低频502,如图:
此类问题主要原因为,客户端在对上游长连接fd读写时,正好此fd被上游服务器关闭了,此时会报connection reset by peer,所以需要尽量避免上游服务器主动断开连接;
故障描述
根据tomcat官方文档说明,keepAliveTimeout默认等于connectionTimeout,我们这里配置的是20s。
参数 | 解释 |
---|---|
keepAliveTimeout | The number of milliseconds this Connector will wait for another HTTP request before closing the connection. The default value is to use the value that has been set for the connectionTimeout attribute. Use a value of -1 to indicate no (i.e. infinite) timeout. |
maxKeepAliveRequests | The maximum number of HTTP requests which can be pipelined until the connection is closed by the server. Setting this attribute to 1 will disable HTTP/1.0 keep-alive, as well as HTTP/1.1 keep-alive and pipelining. Setting this to -1 will allow an unlimited amount of pipelined or keep-alive HTTP requests. If not specified, this attribute is set to 100. |
connectionTimeout | The number of milliseconds this Connector will wait, after accepting a connection, for the request URI line to be presented. Use a value of -1 to indicate no (i.e. infinite) timeout. The default value is 60000 (i.e. 60 seconds) but note that the standard server.xml that ships with Tomcat sets this to 20000 (i.e. 20 seconds). Unless disableUploadTimeout is set to false, this timeout will also be used when reading the request body (if any). |
tomcat官方文档 https://tomcat.apache.org/tomcat-7.0-doc/config/http.html
nginx(nginx-ingress-controller)中的配置
由于没有显示的配置,所以使用的是nginx的默认参数配置,默认是60s。
1 | http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive_timeout |
解决
竟然有了大概的分析猜测,可以尝试调整nginx的keepalive timeout为15s(需要小于tomcat的超时时间),测试了之后,故障就这样得到了解决。