Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing Data to Elasticsearch Storage Engine #224

Open
Kefaun2601 opened this issue Mar 24, 2021 · 1 comment
Open

Writing Data to Elasticsearch Storage Engine #224

Kefaun2601 opened this issue Mar 24, 2021 · 1 comment

Comments

@Kefaun2601
Copy link
Contributor

Task Description

This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the Factory Pattern outlined in Issue 218 where we abstract out storage engine-specific implementation.

To achieve the final goal of being able to write Sparkler data into the Elasticsearch storage engine, the team envisions that we'll be following these steps:

  1. Make sure the Elasticsearch storage engine is set up appropriately and ready to accept data
  2. Write simple data to Elasticsearch
    a. Perhaps a simple visualization to prove functionality
  3. Reorganize Sparkler data into a format conducive to Elasticsearch indexing
  4. Write data into Elasticsearch
  5. Visualize data in Elasticsearch (this will likely be brought up in a future issue)

This is a WIP and updates will be posted here as we make progress.

@slhsxcmy
Copy link
Contributor

@thammegowda @buggtb @lewismc We had a few questions about Crawler.scala while adding Elasticsearch:

  1. How is the deep crawl different from a "normal" crawl? We only run deep crawl when -dc flag is enabled, but we always run normal crawl?
  2. What does the FairFetcher class do? Do we need to know since FairFetcher is not specific to Solr?
  3. Why is "storageProxy.commitCrawlDb()" called before the crawl, after the deep crawl, and after the normal crawl again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants