Robots.txt

Robots协议（也称为爬虫协议、机器人协议等）的全称是“网络爬虫排除标准”（Robots Exclusion Protocol），网站通过Robots协议告诉搜索引擎哪些页面可以抓取，哪些页面不能抓取。禁止整站被抓取:

1
2
3


User-agent: *
Disallow: /

常见的User-agent # robots.txt User-agent: Baiduspider User-agent: Sosospider User-agent: sogou spider User-agent: YodaoBot User-agent: Googlebot User-agent: Bingbot User-agent: Slurp User-agent: Teoma User-agent: ia_archiver User-agent: twiceler User-agent: MSNBot User-agent: Scrubby User-agent: Robozilla User-agent: Gigabot User-agent: googlebot-image User-agent: googlebot-mobile User-agent: yahoo-mmcrawler User-agent: psbot Disallow 指定不让抓取的文件路径，“ / ”代表所有文件

uncategorized

Robots.txt

http://example.com/2017-11-27 robots-txt/

作者

csorz

发布于

2017年11月27日

许可协议

数据序列化上一篇

无题下一篇