Skip to content

Latest commit

 

History

History
86 lines (53 loc) · 4 KB

faq.adoc

File metadata and controls

86 lines (53 loc) · 4 KB
layout permalink
vision
/faq

_guides/attributes.adoc = Frequently Asked Questions

What is your license ?

{project-name} is an Open Source project licensed under the [Apache License version 2.0](https://www.apache.org/licenses/LICENSE-2.0).

Where can I get it ?

{project-name} is published in Maven Central, check out [which extensions](/extensions) you need and just import them in your pom.xml to get {project-name}. We recommend you start your {project-name} experience via our [Getting Started guides](/get-started).

{project-name} is beta ?

Yes, we consider {project-name} beta. However 95% of the features {project-name} apps use are running in production

What is a {project-name} extension ?

One of the main goals of {project-name} is ease of extensibility and to build a vibrant ecosystem. Think of {project-name} extensions as

I already use ELK, why would I need to use {project-name} ?

Well, at first one could say that that both stacks are overlapping, but the real purpose of the {project-name} framework is the abstraction of scalability of log aggregation. If you already have an ELK stack you’ll likely want to make it scale (without pain) in both volume and features ways. {project-name} will be used for this purpose as an EOM (Event Oriented Middleware) based on Kafka & Spark, where you can plug advanced features with ease. So you just have to route your logs from the Logstash (or Flume, or Collectd, …​) agents to Kafka topics and launch parsers and processors.

Do I need Hadoop to play with {project-name} ?

No, if your goal is simply to aggregate a massive amount of logs in an Elasticsearch cluster, and to define complex event processing rules to generate new events you definitely don’t need an Hadoop cluster. Kafka topics can be used as an high throughput log buffer for sliding-windows event processing. But if you need advanced batch analytics, it’s really easy to dump your logs into an hadoop cluster to build machine learning models. Docker container is the easiest way to setup a {project-name} job.

How do I make it scale ?

{project-name} is made for scalability, it relies on Spark and Kafka which are both scalable by essence, to scale {project-name} just have to add more kafka brokers and more Spark slaves. This is the manual way, but we’ve planned in further releases to provide auto-scaling either Docker Swarn support or Mesos Marathon.

What’s the difference between Apache NIFI and {project-name} ?

Apache NIFI is a powerful ETL very well suited to process incoming data such as logs file, process & enrich them and send them out to any datastore. You can do that as well with {project-name} but {project-name} is an event oriented framework designed to process huge amount of events in a Complex Event Processing manner not a Single Event Processing as NIFI does. {project-name} is not an ETL or a DataFlow, the main goal is to extract information from realtime data. Anyway you can use Apache NIFI to process your logs and send them to Kafka in order to be processed by {project-name}

Error : realpath not found

If you don’t have the realpath command on you system you may need to install it

brew install coreutils
sudo apt-get install coreutils
How to deploy {project-name} in an Hadoop cluster ?

When it comes to scale, you’ll need a cluster. {project-name} is just a framework that facilitates running sparks jobs over Kafka topics so if you already have a cluster you just have to get the latest {project-name} binaries and unzip them to a edge node of your hadoop cluster.

git clone [email protected]:Hurence/{project-name}.git
cd {project-name}-0.9.5
mvn clean install -DskipTests
# output file is logisland-assembly/target/logisland-*-bin.tar.gz
how to handle long running jobs ?

Please read this excellent article on spark long running job setup