nginx错误分析 Connection reset by peer

nginx上下游针对请求处理的超时时间配置不合理,导致报connection reset by peer问题,即低频502,如图:
img.png

此类问题主要原因为,客户端在对上游长连接fd读写时,正好此fd被上游服务器关闭了,此时会报connection reset by peer,所以需要尽量避免上游服务器主动断开连接;

故障描述

根据tomcat官方文档说明,keepAliveTimeout默认等于connectionTimeout,我们这里配置的是20s。

参数 解释
keepAliveTimeout The number of milliseconds this Connector will wait for another HTTP request before closing the connection. The default value is to use the value that has been set for the connectionTimeout attribute. Use a value of -1 to indicate no (i.e. infinite) timeout.
maxKeepAliveRequests The maximum number of HTTP requests which can be pipelined until the connection is closed by the server. Setting this attribute to 1 will disable HTTP/1.0 keep-alive, as well as HTTP/1.1 keep-alive and pipelining. Setting this to -1 will allow an unlimited amount of pipelined or keep-alive HTTP requests. If not specified, this attribute is set to 100.
connectionTimeout The number of milliseconds this Connector will wait, after accepting a connection, for the request URI line to be presented. Use a value of -1 to indicate no (i.e. infinite) timeout. The default value is 60000 (i.e. 60 seconds) but note that the standard server.xml that ships with Tomcat sets this to 20000 (i.e. 20 seconds). Unless disableUploadTimeout is set to false, this timeout will also be used when reading the request body (if any).

tomcat官方文档 https://tomcat.apache.org/tomcat-7.0-doc/config/http.html
nginx(nginx-ingress-controller)中的配置

由于没有显示的配置,所以使用的是nginx的默认参数配置,默认是60s。

1
2
3
4
5
6
7
8
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive_timeout            
Syntax: keepalive_timeout timeout;
Default:
keepalive_timeout 60s;
Context: upstream
This directive appeared in version 1.15.3.

Sets a timeout during which an idle keepalive connection to an upstream server will stay open.

解决

竟然有了大概的分析猜测,可以尝试调整nginx的keepalive timeout为15s(需要小于tomcat的超时时间),测试了之后,故障就这样得到了解决。

参考文章

评论