Skip to content

BigData complex event processing middleware for log mining (based on Kafka & Spark)

License

Notifications You must be signed in to change notification settings

lhubert/log-island

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Island

Build Status

LogIsland is an event mining platform based on Spark and Kafka to handle a huge amount of log files.

log-island architecture

You can start right now to play with LogIsland through the Docker image, by following the getting started guide

The documentation also explains how to build the source code in order to implement your own plugins.

Once you know how to run and build your own parsers and processors, you'll want to deploy and scale them.

Basic Workflow

  1. Raw log files are sent to Kafka topics by a NIFI / Logstash / Flume / Collectd (or whatever) agent
  2. Logs in Kafka topic are translated into Events and pushed back to another Kafka topic by a Spark streaming job
  3. Events in Kafka topic are sent to Elasticsearch (or Solr or whatever backend) for online analytics (Kibana or Banana) by a Spark streaming job
  4. Log topics can also dumped to HDFS (master dataset) for offline analytics
  5. Event processor do some time window based analytics on events to build new events

Start a log parser

A Log parser takes a log line as a String and computes an Event as a sequence of fields. Let's start a LogParser streaming job with a custom ApacheLogParser. This stream will process log entries as soon as they will be queued into li-apache-logs Kafka topics, each log will be parsed as an event which will be pushed back to Kafka in the li-apache-event topic.

$LOGISLAND_HOME/bin/log-parser \
    --kafka-brokers sandbox:9092 \
    --input-topics li-apache-logs \
    --output-topics li-apache-event \
    --max-rate-per-partition 10000 \
    --log-parser com.hurence.logisland.plugin.apache.ApacheLogParser

Start an event mapper

An event mapper takes an event and serialize it as an Elasticsearch document. Let's start an EventIndexer with a custom mapper. This stream will process event entries as soon as they will be queued into li-apache-event Kafka topics. Each event will be sent to Elasticsearch by bulk.

$LOGISLAND_HOME/bin/event-indexer \
    --kafka-brokers sandbox:9092 \
    --es-host sandbox \
    --index-name li-apache \
    --input-topics li-apache-event \
    --max-rate-per-partition 10000 \
    --event-mapper com.hurence.logisland.plugin.apache.ApacheEventMapper

Start an event processor

//TODO

About

BigData complex event processing middleware for log mining (based on Kafka & Spark)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 67.8%
  • Scala 27.1%
  • Shell 5.1%