Google收录的相关信息特别少,只有下面两个链接有些帮助:
http://stackoverflow.com/questions/19995855/increase-max-open-files-for-
https://groups.google.com/g/scrapy-users/c/rArexq4tI7I
问题的根源在于,我的scrapy是作为一个service启动的,当初是为了方便程序管理和重启。这样一来scrapyd启动后并不遵守系统的ulimit设置,所以网上的设置ulimit open files都是没用的。比如我的系统设置open files设置的已经足够高了,即使scrapy高并发按理说也能应付。
(boss) [chen@VM_0_2_centos product_1]$ ulimit -a
core file size (blocks, -c) 4194304
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31143
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1000001
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31143
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
解决办法 – 单独给scrapyd设置open file limits:
找到scrapyd的PID,比如这里的1011,然后运行下面的命令:
sudo prlimit --pid 1011 --nofile=1000000:1000000
查找PID:
(boss) [chen@VM_0_2_centos product_1]$ ps -ef|grep scrapyd
chen 1011 1 1 Dec23 ? 00:39:50 /home/chen/anaconda3/envs/boss/bin/python /home/chen/anaconda3/envs/boss/bin/scrapyd
chen 4476 1011 19 Dec25 ? 00:23:58 /home/chen/anaconda3/envs/boss/bin/python -m scrapyd.runner crawl jd_comment -a _job=e78327f046c611eba6e5bd9b7c1c3b18
chen 10838 1011 23 Dec25 ? 01:41:02 /home/chen/anaconda3/envs/boss/bin/python -m scrapyd.runner crawl product_1 -a _job=45e566ae469b11eba6e5bd9b7c1c3b18
chen 14434 21260 0 01:38 pts/3 00:00:00 grep --color=auto scrapyd
查看该PID下的Process的limits参数:
(boss) [chen@VM_0_2_centos product_1]$ cat /proc/1011/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 31143 31143 processes
Max open files 1000000 1000000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31143 31143 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us