-
Notifications
You must be signed in to change notification settings - Fork 141
Using SCE
Run:
docker pull registry.gitlab.com/sparkler-crawl-environment/sparkler/sparkler:memex-dd
This isn't mandatory but will speed up your execution time on your first run.
To create a new model, click on the Models button in the toolbar. Then go to New Model, enter your model name and click Create Model.
Enter your search term in the Search terms box on the left press Go.
The search may take a few minutes as under the hood its rendering each website and creating a screenshot.
To check its running, you can look at the log output from the API container or Splash container or run Top or similar to ensure you've got a reasonably high CPU load.
Eventually, it will render the images in the containers.
From here you can then select which of the previews are Highly Relevant, Relevant and Not Relevant. Once you are happy with your selection press the Update Model button.
To upload seed urls, select the Paste Seed URLs button and then insert your URLs. A single URL needs to go on each row. Press Save and it will update the index with the URLs you've requested.
Finally, you can run a Crawl by pressing the Start Crawler button. The duration of the crawl depends on how many pages it's attempting to index. You can run more crawls by pressing the button once more after it has completed a crawl.
You can also kill a crawl by pressing the Kill Crawl button.