A simple scraper made for scraping recipes but not limited to such.
Use puppeteer to crawl using an actual browser and have events that inform of the status of the call.
The way to configure a crawler.
crawl
startUrl
where the crawl should startlinkExtractors
arraylinkExtraction
css selector for links to followshouldExtract
function taking the page url deciding if to use this extractor or not
detailExtractor
details
key value map key is key of data extarcted and the value is a css selector to get the text to be extracted by that keyshouldExtract
takes the page url and decides if to extract data or not
crawler
launchConfig
headless
boolean - show brawser or not
politeness
number - milliseconds to wait between each page- defaults to 1000 aka a second
data
fires witha record of data extracted.start
fires when crawl starts with start url and the datecrawled
fires with the url of the crawled pageinfo
tag and message, just misscelaneous infoerror
fires when something unexpected happensfinish
fires at the end with all crawled urls and the date finished