Skip to content

A simple scraper uses puppeteer to scrape recipes and more from the web

Notifications You must be signed in to change notification settings

seanowenhayes/recipe-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recipe Scraper

A simple scraper made for scraping recipes but not limited to such.

Idea

Use puppeteer to crawl using an actual browser and have events that inform of the status of the call.

Config

The way to configure a crawler.

  • crawl
    • startUrl where the crawl should start
    • linkExtractors array
      • linkExtraction css selector for links to follow
      • shouldExtract function taking the page url deciding if to use this extractor or not
    • detailExtractor
      • details key value map key is key of data extarcted and the value is a css selector to get the text to be extracted by that key
      • shouldExtract takes the page url and decides if to extract data or not
  • crawler
    • launchConfig
      • headless boolean - show brawser or not
    • politeness number - milliseconds to wait between each page
      • defaults to 1000 aka a second

Events

  • data fires witha record of data extracted.
  • start fires when crawl starts with start url and the date
  • crawled fires with the url of the crawled page
  • info tag and message, just misscelaneous info
  • error fires when something unexpected happens
  • finish fires at the end with all crawled urls and the date finished

About

A simple scraper uses puppeteer to scrape recipes and more from the web

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published