Writing Data to Elasticsearch Storage Engine #224

Kefaun2601 · 2021-03-24T16:05:07Z

Task Description

This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the Factory Pattern outlined in Issue 218 where we abstract out storage engine-specific implementation.

To achieve the final goal of being able to write Sparkler data into the Elasticsearch storage engine, the team envisions that we'll be following these steps:

Make sure the Elasticsearch storage engine is set up appropriately and ready to accept data
Write simple data to Elasticsearch
a. Perhaps a simple visualization to prove functionality
Reorganize Sparkler data into a format conducive to Elasticsearch indexing
Write data into Elasticsearch
Visualize data in Elasticsearch (this will likely be brought up in a future issue)

This is a WIP and updates will be posted here as we make progress.

slhsxcmy · 2021-03-26T19:00:14Z

@thammegowda @buggtb @lewismc We had a few questions about Crawler.scala while adding Elasticsearch:

How is the deep crawl different from a "normal" crawl? We only run deep crawl when -dc flag is enabled, but we always run normal crawl?
What does the FairFetcher class do? Do we need to know since FairFetcher is not specific to Solr?
Why is "storageProxy.commitCrawlDb()" called before the crawl, after the deep crawl, and after the normal crawl again?

slhsxcmy mentioned this issue Mar 24, 2021

Writing Data to Elasticsearch Storage Engine #225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing Data to Elasticsearch Storage Engine #224

Writing Data to Elasticsearch Storage Engine #224

Kefaun2601 commented Mar 24, 2021

slhsxcmy commented Mar 26, 2021

Writing Data to Elasticsearch Storage Engine #224

Writing Data to Elasticsearch Storage Engine #224

Comments

Kefaun2601 commented Mar 24, 2021

Task Description

slhsxcmy commented Mar 26, 2021