chrome 发表于 2020-12-15 21:53:16

谷歌浏览器爬虫插件Webscraper抓取理想生活实验室列表


1、 数据字段

文章标题
文章详情链接
文章简介
作者
发布日期
类别
标签
地点
内容

2、结果示例截图



3、sitemap json

{"_id":"toodaylab","startUrl":["https://www.toodaylab.com/posts/page/"],"selectors":[{"id":"element","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.single-post","multiple":true,"delay":0},{"id":"title","type":"SelectorLink","parentSelectors":["element"],"selector":".title a","multiple":false,"delay":0},{"id":"intro","type":"SelectorText","parentSelectors":["element"],"selector":"p.excerpt","multiple":false,"regex":"","delay":0},{"id":"author","type":"SelectorText","parentSelectors":["element"],"selector":".left-infos a","multiple":false,"regex":"","delay":0},{"id":"date","type":"SelectorText","parentSelectors":["element"],"selector":".left-infos p","multiple":false,"regex":"(?<=(//)).*","delay":0},{"id":"category","type":"SelectorText","parentSelectors":["element"],"selector":"p:nth-of-type(3) a","multiple":false,"regex":"","delay":0},{"id":"tag","type":"SelectorText","parentSelectors":["element"],"selector":"p:nth-of-type(2) a","multiple":false,"regex":"","delay":0},{"id":"location","type":"SelectorText","parentSelectors":["element"],"selector":".right-infos p:nth-of-type(1) a","multiple":false,"regex":"","delay":0},{"id":"content","type":"SelectorText","parentSelectors":["title"],"selector":"div.post-content","multiple":false,"regex":"","delay":0}]}

作者:iWebscraper
页: [1]
查看完整版本: 谷歌浏览器爬虫插件Webscraper抓取理想生活实验室列表