BigData Log High Availability Import Framework

Welcome to STREAMS!

Introduction

Streams is a high availability, extremely fast, low resource usage real time log collection framework for terrabytes of data.

Downloads

Version 0.2.0 and below are available from the Downloads link, for more up to date versions please use sourceforge:

http://sourceforge.net/projects/bigstreams/files/

Links

Project Aims

Streams main aims to

High Availability for big data log import
Maintain data correctness
Be scalable to terrabytes of data per day.
Provide integration with hadoop for importing data into hadoop hdfs.

Overview

Streams is inspired by Chukwa, an apache hadoop project for importing hadoop log data for monitoring of clusters. Streams aims to provided support for collecting application log data, i.e. not debug information but application logs such as Adserver Logs, Transactional Logs for banking etc.

These logs cannot afford any data loss, data corruption or row duplication. Files are normally in the terrabytes spread accross a cluster of servers. Streams is used to import these data to a smaller cluster 2,3 machines of collectors, then import the collector compressed data into HDFS.

Logs collected are partitioned per date,hour and size, allowing administrators to specify the chunk sizes of collected logs. e.g. Lets say we have log type A and we want to use this on a hadoop cluster for block size 128MB. Streams can import all logs for type A base on daydate and hour and in chunks more or less in 128MB size. This makes the files easier to process in M/R and allows non splittable compression formats to be used.

Contact

Email: [email protected]

Twitter: @gerrit_jvv

License

Distributed under the Eclipse Public License either version 1.0

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
agent		agent
collector-coordination		collector-coordination
collector		collector
commons		commons
kafka-collector		kafka-collector
src		src
streams-log		streams-log
zookeeper-rpms/zookeeper		zookeeper-rpms/zookeeper
zookeeper		zookeeper
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigData Log High Availability Import Framework

Introduction

Downloads

Links

Project Aims

Overview

Contact

License

About

Releases

Packages

Languages

License

yasarbaigh/bigstreams

Folders and files

Latest commit

History

Repository files navigation

BigData Log High Availability Import Framework

Introduction

Downloads

Links

Project Aims

Overview

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages