小华生活
字落虚空

屏蔽AI的蜘蛛

AI是一个热门,但是网络上的ai就需要有内容是投喂它们,然后它们通过自己的机制进行消化和训练等,于是就出现了一个新的问题,这些ai就是到处派出蜘蛛去爬各式各样的内容,当然博客暴露在互联网上,这也是逃不掉的,最近看见https://github.com/ai-robots-txt/ai.robots.txt这个新项目,就是各种ai的爬虫的User-Agent头部,我们就可以进行屏蔽。目前他收集的是:
AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot

我用的是1panel面板,可以在waf中建立规则进行屏蔽:

已经看到阻止的效果,其它的也是差不多,怎么说呢,互联网上这些东西只是自己安慰自己,但是必须得安慰自己。

赞(0)
版权声明:本站所有文章均来自小华生活(https://blog.imhua.com),欢迎留言交流。
文章名称:《屏蔽AI的蜘蛛》
文章链接:https://blog.imhua.com/2025/564.html

评论 抢沙发