scrapy httpcache刷新重试

scrapy重试仍然得不到数据的罪魁祸首可能是你开启了Httpcache

stackoverflow这个帖子写的很清楚了:I crawl a webpage page by page, by following urls in pagination. In some pages, website detects that I use a bot and gives me an error in html. Since it is a successful request, it caches the page and when I run it again, I get the same error.

https://stackoverflow.com/questions/41743071/scrapy-how-to-remove-a-url-from-httpcache-or-prevent-adding-to-cache

爬虫最常见的场景就是被反爬,scrapy开启Httpcache之后会在本地(项目目录下的.scrapy文件夹)缓存请求和响应进而加速爬虫,但是问题是你被反爬重试时这个缓存并没有更新,所以你得到的解析页面还是被反爬时的页,总是得不到数据。所以这里有必要重写中间件来刷新这个缓存。