diff --git a/README.rst b/README.rst index f91d1cf8c..9ffdad01e 100755 --- a/README.rst +++ b/README.rst @@ -16,6 +16,8 @@ chat with us on `gitter `_ **LogIsland is an event mining scalable platform designed to handle a high throughput of events.** +It is highly inspired from DataFlow programming tools such as Apache Nifi, but with a highly scalable architecture. + Event mining Workflow --------------------- @@ -49,18 +51,20 @@ to build from the source just clone source and package with maven git clone https://github.com/Hurence/logisland.git cd logisland - mvn install + mvn clean install -the final package is available at `logisland-assembly/target/logisland-0.12.2-bin-hdp2.5.tar.gz` +the final package is available at `logisland-assembly/target/logisland-0.13.0-bin-hdp2.5.tar.gz` You can also download the `latest release build `_ +Quick start +----------- Local Setup ------------ -basically **logisland** depends on Kafka and Spark, you can deploy it on any linux server ++++++++++++ +Alternatively you can deploy **logisland** on any linux server from which Kafka and Spark are available -.. code-block:: +.. code-block:: sh # install Kafka 0.10.0.0 & start a zookeeper node + a broker curl -s http://apache.crihan.fr/dist/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz | tar -xz -C /usr/local/ @@ -72,9 +76,9 @@ basically **logisland** depends on Kafka and Spark, you can deploy it on any lin curl -s http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz | tar -xz -C /usr/local/ export SPARK_HOME=/usr/local/spark-2.1.0-bin-hadoop2.7 - # install Logisland 0.12.2 - curl -s https://github.com/Hurence/logisland/releases/download/v0.10.0/logisland-0.12.2-bin-hdp2.5.tar.gz | tar -xz -C /usr/local/ - cd /usr/local/logisland-0.12.2 + # install Logisland 0.13.0 + curl -s https://github.com/Hurence/logisland/releases/download/v0.10.0/logisland-0.13.0-bin-hdp2.5.tar.gz | tar -xz -C /usr/local/ + cd /usr/local/logisland-0.13.0 # launch a logisland job bin/logisland.sh --conf conf/index-apache-logs.yml @@ -82,6 +86,33 @@ basically **logisland** depends on Kafka and Spark, you can deploy it on any lin you can find some **logisland** job configuration samples under `$LOGISLAND_HOME/conf` folder +Docker setup +++++++++++++ +The easiest way to start is the launch a docker compose stack + +.. code-block:: sh + + # launch logisland environment + cd /tmp + curl -s https://raw.githubusercontent.com/Hurence/logisland/master/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml > docker-compose.yml + docker-compose up + + # sample execution of a logisland job + docker exec -i -t logisland conf/index-apache-logs.yml + + +Hadoop distribution setup ++++++++++++++++++++++++++ +Launching logisland streaming apps is just easy as unarchiving logisland distribution on an edge node, editing a config with YARN parameters and submitting job. + +.. code-block:: sh + + # install Logisland 0.13.0 + curl -s https://github.com/Hurence/logisland/releases/download/v0.10.0/logisland-0.13.0-bin-hdp2.5.tar.gz | tar -xz -C /usr/local/ + cd /usr/local/logisland-0.13.0 + bin/logisland.sh --conf conf/index-apache-logs.yml + + Start a stream processing job ----------------------------- @@ -99,7 +130,7 @@ The first part is the `ProcessingEngine` configuration (here a Spark streaming e .. code-block:: yaml - version: 0.12.2 + version: 0.13.0 documentation: LogIsland job config file engine: component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine @@ -107,7 +138,7 @@ The first part is the `ProcessingEngine` configuration (here a Spark streaming e documentation: Index some apache logs with logisland configuration: spark.app.name: IndexApacheLogsDemo - spark.master: local[4] + spark.master: yarn-cluster spark.driver.memory: 1G spark.driver.cores: 1 spark.executor.memory: 2G @@ -204,7 +235,11 @@ Once you've edited your configuration file, you can submit it to execution engin .. code-block:: bash - bin/process-stream.sh -conf conf/job-configuration.yml + bin/logisland.sh -conf conf/job-configuration.yml + + +You should jump to the `tutorials section `_ of the documentation. +And then continue with `components documentation`_ Contributing ------------ diff --git a/ROADMAP.rst b/ROADMAP.rst index e1ae5ad72..fe19c04a1 100755 --- a/ROADMAP.rst +++ b/ROADMAP.rst @@ -1,27 +1,21 @@ -Log Island Roadmap and future work -==== +Logisland Roadmap and future work +================================= follow the roadmap through `github issues `_ too -GUI ----- - -- manage visualy the streams -- search kafka topics Engine ----- +------ -- Add KafkaStreamEngine -- Add autoscaler component -- move offsets management from Zookeeper to Kafka +- Dynamic config via REST API +- Autoscaler - whole integration test framework (file => kafka topic => process stream => es => query) Components ----- +---------- +- Alert & threshold managment - add EventField mutator based on EL -- add an HDFS bulk loader - add a generic parser that infers a Regexp from a list (Streaming Deep Learning) diff --git a/launch-tuto.sh b/launch-tuto.sh index 0f03c187a..6e2e62be1 100755 --- a/launch-tuto.sh +++ b/launch-tuto.sh @@ -1,4 +1,4 @@ #!/bin/bash -logisland-assembly/target/logisland-0.12.2-bin-hdp2.5/logisland-0.12.2/bin/logisland.sh \ +logisland-assembly/target/logisland-0.13.0-bin-hdp2.5/logisland-0.13.0/bin/logisland.sh \ --conf logisland-framework/logisland-resources/src/main/resources/conf/$1 diff --git a/logisland-api/pom.xml b/logisland-api/pom.xml index b42c6a132..97e856c71 100644 --- a/logisland-api/pom.xml +++ b/logisland-api/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 logisland-api jar diff --git a/logisland-api/src/main/java/com/hurence/logisland/component/AbstractPropertyValue.java b/logisland-api/src/main/java/com/hurence/logisland/component/AbstractPropertyValue.java index 3be217ed8..e03db2aac 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/component/AbstractPropertyValue.java +++ b/logisland-api/src/main/java/com/hurence/logisland/component/AbstractPropertyValue.java @@ -21,6 +21,9 @@ import com.hurence.logisland.record.Record; import com.hurence.logisland.record.StandardRecord; import com.hurence.logisland.registry.VariableRegistry; +import com.hurence.logisland.util.FormatUtils; + +import java.util.concurrent.TimeUnit; /** * Created by mathieu on 08/06/17. @@ -70,6 +73,12 @@ public Double asDouble() { return (getRawValue() == null) ? null : Double.parseDouble(getRawValue().trim()); } + @Override + public Long asTimePeriod(final TimeUnit timeUnit) { + return (rawValue == null) ? null : FormatUtils.getTimeDuration(rawValue.toString().trim(), timeUnit); + } + + @Override public boolean isSet() { return getRawValue() != null; diff --git a/logisland-api/src/main/java/com/hurence/logisland/component/PropertyValue.java b/logisland-api/src/main/java/com/hurence/logisland/component/PropertyValue.java index 2ae11e761..91d220ed0 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/component/PropertyValue.java +++ b/logisland-api/src/main/java/com/hurence/logisland/component/PropertyValue.java @@ -21,6 +21,7 @@ import com.hurence.logisland.record.Record; import java.io.Serializable; +import java.util.concurrent.TimeUnit; /** *

@@ -91,7 +92,7 @@ public interface PropertyValue extends Serializable { * in terms of the specified TimeUnit; if the property is not set, returns * null */ - // public Long asTimePeriod(TimeUnit timeUnit); + public Long asTimePeriod(TimeUnit timeUnit); diff --git a/logisland-api/src/main/java/com/hurence/logisland/config/DefaultConfigValues.java b/logisland-api/src/main/java/com/hurence/logisland/config/DefaultConfigValues.java index f48a1f94e..b6ae5e041 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/config/DefaultConfigValues.java +++ b/logisland-api/src/main/java/com/hurence/logisland/config/DefaultConfigValues.java @@ -20,6 +20,7 @@ */ public enum DefaultConfigValues { + REDIS_CONNECTION("sandbox:6379"), ES_HOSTS("sandbox:9300"), ES_CLUSTER_NAME("es-logisland"), KAFKA_BROKERS("sandbox:9092"), diff --git a/logisland-api/src/main/java/com/hurence/logisland/controller/AbstractControllerService.java b/logisland-api/src/main/java/com/hurence/logisland/controller/AbstractControllerService.java index 8809b7137..254ca0609 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/controller/AbstractControllerService.java +++ b/logisland-api/src/main/java/com/hurence/logisland/controller/AbstractControllerService.java @@ -17,6 +17,8 @@ package com.hurence.logisland.controller; +import com.hurence.logisland.annotation.lifecycle.OnDisabled; +import com.hurence.logisland.annotation.lifecycle.OnEnabled; import com.hurence.logisland.component.AbstractConfigurableComponent; import com.hurence.logisland.component.InitializationException; import com.hurence.logisland.logging.ComponentLog; @@ -26,6 +28,7 @@ public abstract class AbstractControllerService extends AbstractConfigurableComp private ControllerServiceLookup serviceLookup; private ComponentLog logger; + private volatile boolean enabled = true; @Override public final void initialize(final ControllerServiceInitializationContext context) throws InitializationException { @@ -62,4 +65,18 @@ protected ComponentLog getLogger() { return logger; } + + @OnEnabled + public final void enabled() { + this.enabled = true; + } + + @OnDisabled + public final void disabled() { + this.enabled = false; + } + + public boolean isEnabled() { + return this.enabled; + } } diff --git a/logisland-api/src/main/java/com/hurence/logisland/record/Field.java b/logisland-api/src/main/java/com/hurence/logisland/record/Field.java index 0173b9953..107a754ca 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/record/Field.java +++ b/logisland-api/src/main/java/com/hurence/logisland/record/Field.java @@ -17,10 +17,12 @@ import com.hurence.logisland.component.PropertyValue; import com.hurence.logisland.controller.ControllerService; +import com.hurence.logisland.util.FormatUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.Serializable; +import java.util.concurrent.TimeUnit; /** * Primitive Types @@ -208,6 +210,11 @@ public Double asDouble() { } } + @Override + public Long asTimePeriod(final TimeUnit timeUnit) { + return (rawValue == null) ? null : FormatUtils.getTimeDuration(rawValue.toString().trim(), timeUnit); + } + @Override public boolean isSet() { return rawValue != null; diff --git a/logisland-api/src/main/java/com/hurence/logisland/record/StandardRecord.java b/logisland-api/src/main/java/com/hurence/logisland/record/StandardRecord.java index 57d688d82..2d8c494f9 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/record/StandardRecord.java +++ b/logisland-api/src/main/java/com/hurence/logisland/record/StandardRecord.java @@ -65,6 +65,7 @@ public StandardRecord(String type) { } public StandardRecord(Record toClone) { + this(); this.setType(toClone.getType()); this.setTime(toClone.getTime()); this.setId(UUID.randomUUID().toString()); @@ -159,7 +160,9 @@ public Record addFields(Map fields) { @Override public Record setType(String type) { - this.setField(FieldDictionary.RECORD_TYPE, FieldType.STRING, type); + if (type != null) { + this.setField(FieldDictionary.RECORD_TYPE, FieldType.STRING, type); + } return this; } diff --git a/logisland-api/src/main/java/com/hurence/logisland/serializer/Deserializer.java b/logisland-api/src/main/java/com/hurence/logisland/serializer/Deserializer.java index 519da8203..8d8a9d3b3 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/serializer/Deserializer.java +++ b/logisland-api/src/main/java/com/hurence/logisland/serializer/Deserializer.java @@ -17,6 +17,7 @@ import java.io.IOException; +import java.io.InputStream; /** * Provides an interface for deserializing an array of bytes into an Object @@ -28,12 +29,12 @@ public interface Deserializer { /** * Deserializes the given byte array input an Object and returns that value. * - * @param input input + * @param objectDataInput input * @return returns deserialized value * @throws DeserializationException if a valid object cannot be deserialized * from the given byte array * @throws IOException ex */ - T deserialize(byte[] input) throws DeserializationException, IOException; + T deserialize(InputStream objectDataInput) throws DeserializationException, IOException; } diff --git a/logisland-api/src/main/java/com/hurence/logisland/serializer/RecordSerializer.java b/logisland-api/src/main/java/com/hurence/logisland/serializer/RecordSerializer.java index ff4424eab..d6b153826 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/serializer/RecordSerializer.java +++ b/logisland-api/src/main/java/com/hurence/logisland/serializer/RecordSerializer.java @@ -39,7 +39,7 @@ import java.io.Serializable; -public interface RecordSerializer extends Serializable { +public interface RecordSerializer extends Serializable, Serializer, Deserializer { void serialize(OutputStream objectDataOutput, Record record) throws RecordSerializationException; Record deserialize(InputStream objectDataInput) throws RecordSerializationException; diff --git a/logisland-api/src/main/java/com/hurence/logisland/serializer/Serializer.java b/logisland-api/src/main/java/com/hurence/logisland/serializer/Serializer.java index ed19bf527..47f1a77eb 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/serializer/Serializer.java +++ b/logisland-api/src/main/java/com/hurence/logisland/serializer/Serializer.java @@ -34,6 +34,6 @@ public interface Serializer { * @throws SerializationException If unable to serialize the given value * @throws IOException ex */ - void serialize(T value, OutputStream output) throws SerializationException, IOException; + void serialize(OutputStream output, T value) throws SerializationException, IOException; } diff --git a/logisland-api/src/main/java/com/hurence/logisland/util/FormatUtils.java b/logisland-api/src/main/java/com/hurence/logisland/util/FormatUtils.java new file mode 100644 index 000000000..0380e9dc8 --- /dev/null +++ b/logisland-api/src/main/java/com/hurence/logisland/util/FormatUtils.java @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.util; + +import java.text.NumberFormat; +import java.util.concurrent.TimeUnit; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + + +// this class is coming from Apache NIFI project +public class FormatUtils { + + private static final String UNION = "|"; + + // for Data Sizes + private static final double BYTES_IN_KILOBYTE = 1024; + private static final double BYTES_IN_MEGABYTE = BYTES_IN_KILOBYTE * 1024; + private static final double BYTES_IN_GIGABYTE = BYTES_IN_MEGABYTE * 1024; + private static final double BYTES_IN_TERABYTE = BYTES_IN_GIGABYTE * 1024; + + // for Time Durations + private static final String NANOS = join(UNION, "ns", "nano", "nanos", "nanosecond", "nanoseconds"); + private static final String MILLIS = join(UNION, "ms", "milli", "millis", "millisecond", "milliseconds"); + private static final String SECS = join(UNION, "s", "sec", "secs", "second", "seconds"); + private static final String MINS = join(UNION, "m", "min", "mins", "minute", "minutes"); + private static final String HOURS = join(UNION, "h", "hr", "hrs", "hour", "hours"); + private static final String DAYS = join(UNION, "d", "day", "days"); + private static final String WEEKS = join(UNION, "w", "wk", "wks", "week", "weeks"); + + private static final String VALID_TIME_UNITS = join(UNION, NANOS, MILLIS, SECS, MINS, HOURS, DAYS, WEEKS); + public static final String TIME_DURATION_REGEX = "(\\d+)\\s*(" + VALID_TIME_UNITS + ")"; + public static final Pattern TIME_DURATION_PATTERN = Pattern.compile(TIME_DURATION_REGEX); + + /** + * Formats the specified count by adding commas. + * + * @param count the value to add commas to + * @return the string representation of the given value with commas included + */ + public static String formatCount(final long count) { + return NumberFormat.getIntegerInstance().format(count); + } + + /** + * Formats the specified duration in 'mm:ss.SSS' format. + * + * @param sourceDuration the duration to format + * @param sourceUnit the unit to interpret the duration + * @return representation of the given time data in minutes/seconds + */ + public static String formatMinutesSeconds(final long sourceDuration, final TimeUnit sourceUnit) { + final long millis = TimeUnit.MILLISECONDS.convert(sourceDuration, sourceUnit); + + final long millisInMinute = TimeUnit.MILLISECONDS.convert(1, TimeUnit.MINUTES); + final int minutes = (int) (millis / millisInMinute); + final long secondsMillisLeft = millis - minutes * millisInMinute; + + final long millisInSecond = TimeUnit.MILLISECONDS.convert(1, TimeUnit.SECONDS); + final int seconds = (int) (secondsMillisLeft / millisInSecond); + final long millisLeft = secondsMillisLeft - seconds * millisInSecond; + + return pad2Places(minutes) + ":" + pad2Places(seconds) + "." + pad3Places(millisLeft); + } + + /** + * Formats the specified duration in 'HH:mm:ss.SSS' format. + * + * @param sourceDuration the duration to format + * @param sourceUnit the unit to interpret the duration + * @return representation of the given time data in hours/minutes/seconds + */ + public static String formatHoursMinutesSeconds(final long sourceDuration, final TimeUnit sourceUnit) { + final long millis = TimeUnit.MILLISECONDS.convert(sourceDuration, sourceUnit); + + final long millisInHour = TimeUnit.MILLISECONDS.convert(1, TimeUnit.HOURS); + final int hours = (int) (millis / millisInHour); + final long minutesSecondsMillisLeft = millis - hours * millisInHour; + + return pad2Places(hours) + ":" + formatMinutesSeconds(minutesSecondsMillisLeft, TimeUnit.MILLISECONDS); + } + + private static String pad2Places(final long val) { + return (val < 10) ? "0" + val : String.valueOf(val); + } + + private static String pad3Places(final long val) { + return (val < 100) ? "0" + pad2Places(val) : String.valueOf(val); + } + + /** + * Formats the specified data size in human readable format. + * + * @param dataSize Data size in bytes + * @return Human readable format + */ + public static String formatDataSize(final double dataSize) { + // initialize the formatter + final NumberFormat format = NumberFormat.getNumberInstance(); + format.setMaximumFractionDigits(2); + + // check terabytes + double dataSizeToFormat = dataSize / BYTES_IN_TERABYTE; + if (dataSizeToFormat > 1) { + return format.format(dataSizeToFormat) + " TB"; + } + + // check gigabytes + dataSizeToFormat = dataSize / BYTES_IN_GIGABYTE; + if (dataSizeToFormat > 1) { + return format.format(dataSizeToFormat) + " GB"; + } + + // check megabytes + dataSizeToFormat = dataSize / BYTES_IN_MEGABYTE; + if (dataSizeToFormat > 1) { + return format.format(dataSizeToFormat) + " MB"; + } + + // check kilobytes + dataSizeToFormat = dataSize / BYTES_IN_KILOBYTE; + if (dataSizeToFormat > 1) { + return format.format(dataSizeToFormat) + " KB"; + } + + // default to bytes + return format.format(dataSize) + " bytes"; + } + + public static long getTimeDuration(final String value, final TimeUnit desiredUnit) { + final Matcher matcher = TIME_DURATION_PATTERN.matcher(value.toLowerCase()); + if (!matcher.matches()) { + throw new IllegalArgumentException("Value '" + value + "' is not a valid Time Duration"); + } + + final String duration = matcher.group(1); + final String units = matcher.group(2); + TimeUnit specifiedTimeUnit = null; + switch (units.toLowerCase()) { + case "ns": + case "nano": + case "nanos": + case "nanoseconds": + specifiedTimeUnit = TimeUnit.NANOSECONDS; + break; + case "ms": + case "milli": + case "millis": + case "milliseconds": + specifiedTimeUnit = TimeUnit.MILLISECONDS; + break; + case "s": + case "sec": + case "secs": + case "second": + case "seconds": + specifiedTimeUnit = TimeUnit.SECONDS; + break; + case "m": + case "min": + case "mins": + case "minute": + case "minutes": + specifiedTimeUnit = TimeUnit.MINUTES; + break; + case "h": + case "hr": + case "hrs": + case "hour": + case "hours": + specifiedTimeUnit = TimeUnit.HOURS; + break; + case "d": + case "day": + case "days": + specifiedTimeUnit = TimeUnit.DAYS; + break; + case "w": + case "wk": + case "wks": + case "week": + case "weeks": + final long durationVal = Long.parseLong(duration); + return desiredUnit.convert(durationVal, TimeUnit.DAYS)*7; + } + + final long durationVal = Long.parseLong(duration); + return desiredUnit.convert(durationVal, specifiedTimeUnit); + } + + public static String formatUtilization(final double utilization) { + return utilization + "%"; + } + + private static String join(final String delimiter, final String... values) { + if (values.length == 0) { + return ""; + } else if (values.length == 1) { + return values[0]; + } + + final StringBuilder sb = new StringBuilder(); + sb.append(values[0]); + for (int i = 1; i < values.length; i++) { + sb.append(delimiter).append(values[i]); + } + + return sb.toString(); + } + + /** + * Formats nanoseconds in the format: + * 3 seconds, 8 millis, 3 nanos - if includeTotalNanos = false, + * 3 seconds, 8 millis, 3 nanos (3008000003 nanos) - if includeTotalNanos = true + * + * @param nanos the number of nanoseconds to format + * @param includeTotalNanos whether or not to include the total number of nanoseconds in parentheses in the returned value + * @return a human-readable String that is a formatted representation of the given number of nanoseconds. + */ + public static String formatNanos(final long nanos, final boolean includeTotalNanos) { + final StringBuilder sb = new StringBuilder(); + + final long seconds = nanos > 1000000000L ? nanos / 1000000000L : 0L; + long millis = nanos > 1000000L ? nanos / 1000000L : 0L; + final long nanosLeft = nanos % 1000000L; + + if (seconds > 0) { + sb.append(seconds).append(" seconds"); + } + if (millis > 0) { + if (seconds > 0) { + sb.append(", "); + millis -= seconds * 1000L; + } + + sb.append(millis).append(" millis"); + } + if (seconds > 0 || millis > 0) { + sb.append(", "); + } + sb.append(nanosLeft).append(" nanos"); + + if (includeTotalNanos) { + sb.append(" (").append(nanos).append(" nanos)"); + } + + return sb.toString(); + } +} diff --git a/logisland-api/src/main/java/com/hurence/logisland/util/Tuple.java b/logisland-api/src/main/java/com/hurence/logisland/util/Tuple.java new file mode 100644 index 000000000..271720567 --- /dev/null +++ b/logisland-api/src/main/java/com/hurence/logisland/util/Tuple.java @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.util; + +public class Tuple { + + final A key; + final B value; + + public Tuple(A key, B value) { + this.key = key; + this.value = value; + } + + public A getKey() { + return key; + } + + public B getValue() { + return value; + } + + @Override + public boolean equals(final Object other) { + if (other == null) { + return false; + } + if (other == this) { + return true; + } + if (!(other instanceof Tuple)) { + return false; + } + + final Tuple tuple = (Tuple) other; + if (key == null) { + if (tuple.key != null) { + return false; + } + } else { + if (!key.equals(tuple.key)) { + return false; + } + } + + if (value == null) { + if (tuple.value != null) { + return false; + } + } else { + if (!value.equals(tuple.value)) { + return false; + } + } + + return true; + } + + @Override + public int hashCode() { + return 581 + (this.key == null ? 0 : this.key.hashCode()) + (this.value == null ? 0 : this.value.hashCode()); + } +} diff --git a/logisland-api/src/main/java/com/hurence/logisland/validator/StandardValidators.java b/logisland-api/src/main/java/com/hurence/logisland/validator/StandardValidators.java index 812ee53c0..ac97a7459 100644 --- a/logisland-api/src/main/java/com/hurence/logisland/validator/StandardValidators.java +++ b/logisland-api/src/main/java/com/hurence/logisland/validator/StandardValidators.java @@ -1,12 +1,12 @@ /** * Copyright (C) 2016 Hurence (support@hurence.com) - * + *

* Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

* Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -16,21 +16,55 @@ package com.hurence.logisland.validator; +import com.hurence.logisland.util.FormatUtils; + import java.io.File; import java.net.URI; import java.nio.charset.Charset; import java.nio.charset.UnsupportedCharsetException; -import java.util.Arrays; -import java.util.Objects; -import java.util.TimeZone; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; +import java.util.Arrays; import java.util.Locale; +import java.util.Objects; +import java.util.TimeZone; import java.util.regex.Pattern; public class StandardValidators { + /** + * Validator for java class descending for a base class. + */ + private static final class TypeValidator implements Validator { + + private final Class clz; + + + public TypeValidator(Class clz) { + this.clz = clz; + } + + @Override + public ValidationResult validate(String subject, String input) { + String reason = null; + try { + Class c = Class.forName(input); + if (!clz.isAssignableFrom(c)) { + reason = c.getCanonicalName() + " does not inherit from " + input; + } + } catch (ClassNotFoundException e) { + reason = "Could not find class " + input; + } + return new ValidationResult.Builder().subject(subject).input(input).explanation(reason).valid(reason == null).build(); + } + } + + + public static final Validator TYPE_VALIDATOR(Class clz) { + return new TypeValidator(clz); + } + public static final Validator DOUBLE_VALIDATOR = new Validator() { @Override public ValidationResult validate(final String subject, final String value) { @@ -72,7 +106,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = null; try { - if (value==null) { + if (value == null) { reason = "null is not a valid integer"; } else { final int intVal = Integer.parseInt(value); @@ -95,7 +129,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = null; try { - if (value==null) { + if (value == null) { reason = "null is not a valid integer"; } else { final long longVal = Long.parseLong(value); @@ -138,7 +172,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = null; try { - if (value==null) { + if (value == null) { reason = "null is not a valid integer"; } else { Integer.parseInt(value); @@ -158,7 +192,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = null; try { - if (value==null) { + if (value == null) { reason = "null is not a valid long"; } else { Long.parseLong(value); @@ -178,7 +212,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = null; try { - if (value==null) { + if (value == null) { reason = "null is not a valid integer"; } else { final int intVal = Integer.parseInt(value); @@ -255,7 +289,7 @@ public ValidationResult validate(final String subject, final String value) { String reason = String.format("'%s' is not a supported language tag", value); - for (String tag: Locale.getISOLanguages()) { + for (String tag : Locale.getISOLanguages()) { if (tag.equals(value)) reason = null; } return new ValidationResult.Builder().subject(subject).input(value).explanation(reason).valid(reason == null).build(); @@ -294,6 +328,48 @@ public ValidationResult validate(final String subject, final String value) { return new ValidationResult.Builder().subject(subject).input(value).explanation(reason).valid(reason == null).build(); } }; + + /** + * {@link Validator} that ensures that value has 1+ non-whitespace + * characters + */ + public static final Validator NON_BLANK_VALIDATOR = new Validator() { + @Override + public ValidationResult validate(final String subject, final String value) { + return new ValidationResult.Builder().subject(subject).input(value) + .valid(value != null && !value.trim().isEmpty()) + .explanation(subject + + " must contain at least one character that is not white space").build(); + } + }; + + public static final Validator TIME_PERIOD_VALIDATOR = new Validator() { + private final Pattern TIME_DURATION_PATTERN = Pattern.compile(FormatUtils.TIME_DURATION_REGEX); + + @Override + public ValidationResult validate(final String subject, final String input) { + /* if (context.isExpressionLanguageSupported(subject) && context.isExpressionLanguagePresent(input)) { + return new ValidationResult.Builder().subject(subject).input(input).explanation("Expression Language Present").valid(true).build(); + }*/ + + if (input == null) { + return new ValidationResult.Builder().subject(subject).input(input).valid(false).explanation("Time Period cannot be null").build(); + } + if (TIME_DURATION_PATTERN.matcher(input.toLowerCase()).matches()) { + return new ValidationResult.Builder().subject(subject).input(input).valid(true).build(); + } else { + return new ValidationResult.Builder() + .subject(subject) + .input(input) + .valid(false) + .explanation("Must be of format where is a " + + "non-negative integer and TimeUnit is a supported Time Unit, such " + + "as: nanos, millis, secs, mins, hrs, days") + .build(); + } + } + }; + // // // FACTORY METHODS FOR VALIDATORS @@ -318,8 +394,6 @@ public ValidationResult validate(final String subject, final String input) { } - - public static Validator createRegexMatchingValidator(final Pattern pattern) { return new Validator() { @Override @@ -338,7 +412,6 @@ public ValidationResult validate(final String subject, final String input) { } - public static Validator createLongValidator(final long minimum, final long maximum, final boolean inclusive) { return new Validator() { @Override @@ -362,8 +435,6 @@ public ValidationResult validate(final String subject, final String input) { } - - public static class StringLengthValidator implements Validator { private final int minimum; private final int maximum; @@ -377,17 +448,17 @@ public StringLengthValidator(int minimum, int maximum) { public ValidationResult validate(final String subject, final String value) { if (value.length() < minimum || value.length() > maximum) { return new ValidationResult.Builder() - .subject(subject) - .valid(false) - .input(value) - .explanation(String.format("String length invalid [min: %d, max: %d]", minimum, maximum)) - .build(); + .subject(subject) + .valid(false) + .input(value) + .explanation(String.format("String length invalid [min: %d, max: %d]", minimum, maximum)) + .build(); } else { return new ValidationResult.Builder() - .valid(true) - .input(value) - .subject(subject) - .build(); + .valid(true) + .input(value) + .subject(subject) + .build(); } } } @@ -404,7 +475,7 @@ public FileExistsValidator(final boolean allowExpressionLanguage) { } @Override - public ValidationResult validate(final String subject, final String value ) { + public ValidationResult validate(final String subject, final String value) { final String substituted = value; @@ -472,8 +543,7 @@ public ValidationResult validate(final String subject, final String value) { try { Enum.valueOf(this.enumClass, value); builder.valid(true); - } - catch(final Exception e) { + } catch (final Exception e) { builder.explanation(e.getLocalizedMessage()).valid(false); } diff --git a/logisland-assembly/pom.xml b/logisland-assembly/pom.xml index 2a5677775..90d470d94 100644 --- a/logisland-assembly/pom.xml +++ b/logisland-assembly/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 logisland-assembly pom @@ -154,6 +154,7 @@ com.hurence.logisland logisland-documentation + com.hurence.logisland logisland-elasticsearch_2_4_0-client-service @@ -174,6 +175,16 @@ com.hurence.logisland logisland-solr_6_4_2-chronix-client-service + + com.hurence.logisland + logisland-excel-plugin + + + + com.hurence.logisland + logisland-redis_4-client-service + + + + org.apache.spark + spark-core_${scala.binary.version} + + + org.apache.spark + spark-sql_${scala.binary.version} + + + org.apache.spark + spark-streaming_${scala.binary.version} + + + org.apache.spark + spark-streaming-kafka-0-10_${scala.binary.version} + + + org.apache.spark + spark-sql-kafka-0-10_${scala.binary.version} + + + + + + junit + junit + test + + + + + + + net.alchim31.maven + scala-maven-plugin + 3.2.2 + + + scala-compile-first + process-resources + + compile + + + + scala-test-compile-first + process-test-resources + + testCompile + + + + attach-scaladocs + verify + + doc-jar + + + + + ${scala.binary.version} + ${scala.version} + incremental + false + + -unchecked + -deprecation + + + -Xms64m + -Xms1024m + -Xmx1024m + -XX:MaxMetaspaceSize=${MaxPermGen} + + + -source + ${maven.compiler.source} + -target + ${maven.compiler.source} + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 1.8 + 1.8 + + + + + + + diff --git a/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/converter/LogIslandRecordConverter.java b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/converter/LogIslandRecordConverter.java new file mode 100644 index 000000000..361b98c88 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/converter/LogIslandRecordConverter.java @@ -0,0 +1,162 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.converter; + +import com.hurence.logisland.record.*; +import com.hurence.logisland.serializer.RecordSerializer; +import com.hurence.logisland.serializer.SerializerProvider; +import com.hurence.logisland.stream.StreamProperties; +import org.apache.kafka.connect.data.ConnectSchema; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.data.SchemaAndValue; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.errors.DataException; +import org.apache.kafka.connect.storage.Converter; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.Collection; +import java.util.HashMap; +import java.util.Map; +import java.util.stream.Collectors; + +public class LogIslandRecordConverter implements Converter { + + /** + * Record serializer class (instance of {@link com.hurence.logisland.serializer.RecordSerializer}) + */ + public static final String PROPERTY_RECORD_SERIALIZER = "record.serializer"; + /** + * Avro schema to use (only apply to {@link com.hurence.logisland.serializer.AvroSerializer}) + */ + public static final String PROPERTY_AVRO_SCHEMA = "avro.schema"; + + /** + * The record type to use. If not provided {@link LogIslandRecordConverter#PROPERTY_RECORD_TYPE} will be used. + */ + public static final String PROPERTY_RECORD_TYPE = StreamProperties.RECORD_TYPE().getName(); + + /** + * The default type for logisland {@link Record} created by this converter. + */ + private static final String DEFAULT_RECORD_TYPE = "kafka_connect"; + + private RecordSerializer recordSerializer; + private String recordType; + private boolean isKey; + + + @Override + public void configure(Map configs, boolean isKey) { + recordSerializer = SerializerProvider.getSerializer((String) configs.get(PROPERTY_RECORD_SERIALIZER), (String) configs.get(PROPERTY_AVRO_SCHEMA)); + recordType = ((Map) configs).getOrDefault(PROPERTY_RECORD_TYPE, DEFAULT_RECORD_TYPE).toString(); + this.isKey = isKey; + } + + @Override + public byte[] fromConnectData(String topic, Schema schema, Object value) { + try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { + recordSerializer.serialize(baos, + new StandardRecord(recordType).setField(toFieldRecursive(FieldDictionary.RECORD_VALUE, schema, value, isKey))); + return baos.toByteArray(); + } catch (IOException ioe) { + throw new DataException("Unexpected IO Exception occurred while serializing data [topic " + topic + "]", ioe); + } + + } + + @Override + public SchemaAndValue toConnectData(String topic, byte[] value) { + throw new UnsupportedOperationException("Not yet implemented! Please try later on ;-)"); + } + + private Field toFieldRecursive(String name, Schema schema, Object value, boolean isKey) { + try { + if (value == null) { + return new Field(name, FieldType.NULL, null); + } + final Schema.Type schemaType; + if (schema == null) { + schemaType = ConnectSchema.schemaType(value.getClass()); + if (schemaType == null) + throw new DataException("Java class " + value.getClass() + " does not have corresponding schema type."); + } else { + schemaType = schema.type(); + } + switch (schemaType) { + case INT8: + case INT16: + case INT32: + return new Field(name, FieldType.INT, value); + case INT64: + return new Field(name, FieldType.LONG, value); + case FLOAT32: + return new Field(name, FieldType.FLOAT, value); + case FLOAT64: + return new Field(name, FieldType.DOUBLE, value); + case BOOLEAN: + return new Field(name, FieldType.BOOLEAN, value); + case STRING: + return new Field(name, FieldType.STRING, value); + case BYTES: + byte[] bytes = null; + if (value instanceof byte[]) { + bytes = (byte[]) value; + } else if (value instanceof ByteBuffer) { + bytes = ((ByteBuffer) value).array(); + } else { + throw new DataException("Invalid type for bytes type: " + value.getClass()); + } + return new Field(name, FieldType.BYTES, bytes); + case ARRAY: { + return new Field(name, FieldType.ARRAY, + ((Collection) value).stream().map(item -> { + Schema valueSchema = schema == null ? null : schema.valueSchema(); + return toFieldRecursive(FieldDictionary.RECORD_VALUE, valueSchema, item, true); + }) + .map(Field::getRawValue) + .collect(Collectors.toList())); + } + case MAP: { + return new Field(name, FieldType.MAP, value); + } + case STRUCT: { + Struct struct = (Struct) value; + + if (struct.schema() != schema) { + throw new DataException("Mismatching schema."); + } + if (isKey) { + Map ret = new HashMap<>(); + struct.schema().fields().forEach(field -> ret.put(field.name(), toFieldRecursive(field.name(), field.schema(), struct.get(field), true).getRawValue())); + return new Field(name, FieldType.MAP, ret); + } else { + Record ret = new StandardRecord(); + struct.schema().fields().forEach(field -> ret.setField(toFieldRecursive(field.name(), field.schema(), struct.get(field), true))); + return new Field(name, FieldType.RECORD, ret); + } + + } + } + throw new DataException("Couldn't convert " + value + " to a logisland Record."); + } catch (ClassCastException e) { + throw new DataException("Invalid type for " + schema.type() + ": " + value.getClass()); + } + } +} diff --git a/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSource.java b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSource.java new file mode 100644 index 000000000..449c234cf --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSource.java @@ -0,0 +1,306 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.source; + + +import com.hurence.logisland.stream.spark.StreamOptions; +import org.apache.kafka.connect.connector.ConnectorContext; +import org.apache.kafka.connect.errors.DataException; +import org.apache.kafka.connect.json.JsonConverter; +import org.apache.kafka.connect.source.SourceConnector; +import org.apache.kafka.connect.source.SourceTask; +import org.apache.kafka.connect.storage.Converter; +import org.apache.kafka.connect.storage.OffsetBackingStore; +import org.apache.kafka.connect.storage.OffsetStorageReaderImpl; +import org.apache.kafka.connect.storage.OffsetStorageWriter; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.catalyst.expressions.GenericRow; +import org.apache.spark.sql.execution.streaming.Offset; +import org.apache.spark.sql.execution.streaming.Source; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import scala.Option; + +import java.util.*; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.stream.Collectors; + +/** + * Kafka connect to spark sql streaming bridge. + * + * @author amarziali + */ +public class KafkaConnectStreamSource implements Source { + + private final static Logger LOGGER = LoggerFactory.getLogger(KafkaConnectStreamSource.class); + + /** + * The Schema used for this source. + */ + public final static StructType SCHEMA = new StructType(new StructField[]{ + new StructField(StreamOptions.KAFKA_CONNECT_CONNECTOR_PROPERTIES().getName(), + DataTypes.createMapType(DataTypes.StringType, DataTypes.StringType), false, Metadata.empty()), + new StructField(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER().getName(), + DataTypes.StringType, false, Metadata.empty()), + new StructField(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES().getName(), + DataTypes.createMapType(DataTypes.StringType, DataTypes.StringType), false, Metadata.empty()), + new StructField(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER().getName(), + DataTypes.StringType, false, Metadata.empty()), + new StructField(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES().getName(), + DataTypes.createMapType(DataTypes.StringType, DataTypes.StringType), false, Metadata.empty()), + new StructField(StreamOptions.KAFKA_CONNECT_MAX_TASKS().getName(), + DataTypes.createMapType(DataTypes.IntegerType, DataTypes.StringType), false, Metadata.empty()) + }); + /** + * The schema used to represent the outgoing dataframe. + */ + public final static StructType DATA_SCHEMA = new StructType(new StructField[]{ + new StructField("topic", DataTypes.StringType, false, Metadata.empty()), + new StructField("partition", DataTypes.IntegerType, true, Metadata.empty()), + new StructField("key", DataTypes.BinaryType, true, Metadata.empty()), + new StructField("value", DataTypes.BinaryType, false, Metadata.empty()) + + }); + + + private final SourceConnector connector; + private ExecutorService executorService; + private final List sourceThreads = new ArrayList<>(); + private final OffsetBackingStore offsetBackingStore; + private final AtomicBoolean startWatch = new AtomicBoolean(false); + + private final SharedSourceTaskContext sharedSourceTaskContext; + private final SQLContext sqlContext; + private final Converter keyConverter; + private final Converter valueConverter; + private final int maxTasks; + + + /** + * Base constructor. Should be called by {@link KafkaConnectStreamSourceProvider} + * + * @param sqlContext the spark sql context. + * @param connectorProperties the connector related properties. + * @param keyConverter the converter for the data key + * @param valueConverter the converter for the data body + * @param offsetBackingStore the backing store implementation (can be in-memory, file based, kafka based, etc...) + * @param maxTasks the maximum theoretical number of tasks this source should spawn. + * @param connectorClass the class of kafka connect source connector to wrap. + * = + */ + public KafkaConnectStreamSource(SQLContext sqlContext, + Map connectorProperties, + Converter keyConverter, + Converter valueConverter, + OffsetBackingStore offsetBackingStore, + int maxTasks, + Class connectorClass) { + try { + this.sqlContext = sqlContext; + this.maxTasks = maxTasks; + //instantiate connector + connector = connectorClass.newInstance(); + //create converters + this.keyConverter = keyConverter; + this.valueConverter = valueConverter; + final Converter internalConverter = createInternalConverter(); + + //Create the connector context + final ConnectorContext connectorContext = new ConnectorContext() { + @Override + public void requestTaskReconfiguration() { + try { + stopAllThreads(); + startAllThreads(); + } catch (Throwable t) { + LOGGER.error("Unable to reconfigure tasks for connector " + connectorName(), t); + } + } + + @Override + public void raiseError(Exception e) { + LOGGER.error("Connector " + connectorName() + " raised error : " + e.getMessage(), e); + } + }; + + LOGGER.info("Starting connector {}", connectorClass.getCanonicalName()); + connector.initialize(connectorContext); + connector.start(connectorProperties); + this.offsetBackingStore = offsetBackingStore; + offsetBackingStore.start(); + sharedSourceTaskContext = new SharedSourceTaskContext( + new OffsetStorageReaderImpl(offsetBackingStore, connectorClass.getCanonicalName(), internalConverter, internalConverter), + new OffsetStorageWriter(offsetBackingStore, connectorClass.getCanonicalName(), internalConverter, internalConverter)); + + //create and start tasks + startAllThreads(); + } catch (IllegalAccessException | InstantiationException e) { + try { + stopAllThreads(); + } catch (Throwable t) { + LOGGER.error("Unable to properly stop threads of connector " + connectorName(), t); + } + throw new DataException("Unable to create connector " + connectorName(), e); + } + + } + + /** + * Create all the {@link Runnable} workers needed to host the source threads. + * + * @return + * @throws IllegalAccessException if task instantiation fails. + * @throws InstantiationException if task instantiation fails. + */ + private List createThreadTasks() throws IllegalAccessException, InstantiationException { + Class taskClass = (Class) connector.taskClass(); + List> configs = connector.taskConfigs(maxTasks); + List ret = new ArrayList<>(); + LOGGER.info("Creating {} tasks for connector {}", configs.size(), connectorName()); + for (Map conf : configs) { + //create the task + final SourceThread t = new SourceThread(taskClass, conf, sharedSourceTaskContext); + ret.add(t); + } + return ret; + } + + /** + * Start all threads. + * + * @throws IllegalAccessException if task instantiation fails. + * @throws InstantiationException if task instantiation fails. + */ + private void startAllThreads() throws IllegalAccessException, InstantiationException { + if (!startWatch.compareAndSet(false, true)) { + throw new IllegalStateException("Connector is already started"); + } + //Give a meaningful name to thread belonging to this connector + final ThreadGroup threadGroup = new ThreadGroup(connector.getClass().getSimpleName()); + final List sourceThreads = createThreadTasks(); + //Configure a new executor service ] + executorService = Executors.newFixedThreadPool(sourceThreads.size(), r -> { + Thread t = new Thread(threadGroup, r); + t.setDaemon(true); + return t; + }); + createThreadTasks().forEach(st -> { + executorService.execute(st.start()); + sourceThreads.add(st); + }); + } + + + /** + * Create a converter to be used to translate internal data. + * Child classes can override this method to provide alternative converters. + * + * @return an instance of {@link Converter} + */ + protected Converter createInternalConverter() { + JsonConverter internalConverter = new JsonConverter(); + internalConverter.configure(Collections.singletonMap("schemas.enable", "false"), false); + return internalConverter; + } + + /** + * Gets the connector name used by this stream source. + * + * @return + */ + private String connectorName() { + return connector.getClass().getCanonicalName(); + } + + + @Override + public StructType schema() { + return SCHEMA; + } + + @Override + public Option getOffset() { + Optional offset = sharedSourceTaskContext.lastOffset(); + return Option.apply(offset.orElse(null)); + + } + + + @Override + public Dataset getBatch(Option start, Offset end) { + return sqlContext.createDataFrame( + sharedSourceTaskContext.read(start.isDefined() ? Optional.of(start.get()) : Optional.empty(), end) + .stream() + .map(record -> new GenericRow(new Object[]{ + record.topic(), + record.kafkaPartition(), + keyConverter.fromConnectData(record.topic(), record.keySchema(), record.key()), + valueConverter.fromConnectData(record.topic(), record.valueSchema(), record.value()) + })).collect(Collectors.toList()), + DATA_SCHEMA); + } + + @Override + public void commit(Offset end) { + sharedSourceTaskContext.commit(end); + } + + /** + * Stops every threads running and serving for this connector. + */ + private void stopAllThreads() { + LOGGER.info("Stopping every threads for connector {}", connectorName()); + while (!sourceThreads.isEmpty()) { + try { + sourceThreads.remove(0).stop(); + } catch (Throwable t) { + LOGGER.warn("Error occurring while stopping a thread of connector " + connectorName(), t); + } + } + } + + @Override + public void stop() { + if (!startWatch.compareAndSet(true, false)) { + throw new IllegalStateException("Connector is not started"); + } + LOGGER.info("Stopping connector {}", connectorName()); + stopAllThreads(); + sharedSourceTaskContext.clean(); + offsetBackingStore.stop(); + connector.stop(); + } + + /** + * Check the stream source state. + * + * @return + */ + public boolean isRunning() { + return startWatch.get(); + } +} + diff --git a/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSourceProvider.java b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSourceProvider.java new file mode 100644 index 000000000..36b8e3417 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/KafkaConnectStreamSourceProvider.java @@ -0,0 +1,149 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.source; + +import com.hurence.logisland.stream.spark.StreamOptions; +import org.apache.kafka.common.config.ConfigDef; +import org.apache.kafka.connect.runtime.WorkerConfig; +import org.apache.kafka.connect.runtime.distributed.DistributedConfig; +import org.apache.kafka.connect.runtime.standalone.StandaloneConfig; +import org.apache.kafka.connect.source.SourceConnector; +import org.apache.kafka.connect.storage.*; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.execution.streaming.Source; +import org.apache.spark.sql.sources.StreamSourceProvider; +import org.apache.spark.sql.types.StructType; +import scala.Option; +import scala.Tuple2; +import scala.collection.immutable.Map; + +import java.io.IOException; +import java.io.StringReader; +import java.util.Properties; +import java.util.stream.Collectors; + +/** + * A {@link StreamSourceProvider} capable of creating spark {@link com.hurence.logisland.stream.spark.structured.StructuredStream} + * enabled kafka sources. + * + * @author amarziali + */ +public class KafkaConnectStreamSourceProvider implements StreamSourceProvider { + + /** + * Configuration definition for {@link MemoryOffsetBackingStore} + */ + private static class MemoryConfig extends WorkerConfig { + public MemoryConfig(java.util.Map props) { + super(new ConfigDef(), props); + } + } + + /** + * Configuration definition for {@link FileOffsetBackingStore} + */ + private static class FileConfig extends WorkerConfig { + public FileConfig(java.util.Map props) { + super(new ConfigDef() + .define(StandaloneConfig.OFFSET_STORAGE_FILE_FILENAME_CONFIG, + ConfigDef.Type.STRING, + ConfigDef.Importance.HIGH, + "file to store offset data in") + , props); + } + } + + /** + * Configuration definition for {@link KafkaOffsetBackingStore} + */ + private static class KafkaConfig extends WorkerConfig { + public KafkaConfig(java.util.Map props) { + super(new ConfigDef() + .define(BOOTSTRAP_SERVERS_CONFIG, + ConfigDef.Type.LIST, + BOOTSTRAP_SERVERS_DEFAULT, + ConfigDef.Importance.HIGH, + BOOTSTRAP_SERVERS_DOC) + .define(DistributedConfig.OFFSET_STORAGE_TOPIC_CONFIG, + ConfigDef.Type.STRING, + ConfigDef.Importance.HIGH, + "kafka topic to store connector offsets in") + , props); + } + } + + private Converter createConverter(Map parameters, String classKey, String propertyKey, boolean isKey) + throws ClassNotFoundException, IllegalAccessException, InstantiationException, IOException { + Converter ret = (Converter) Class.forName(parameters.get(classKey).get()).newInstance(); + ret.configure(propertiesToMap(parameters.get(propertyKey).get()), isKey); + return ret; + } + + private java.util.Map propertiesToMap(String propertiesAsString) throws IOException { + Properties props = new Properties(); + props.load(new StringReader(propertiesAsString)); + return props.entrySet().stream().collect(Collectors.toMap(e -> e.getKey().toString(), e -> e.getValue().toString())); + } + + @Override + public Source createSource(SQLContext sqlContext, String metadataPath, Option schema, String providerName, Map parameters) { + try { + Converter keyConverter = createConverter(parameters, StreamOptions.KAFKA_CONNECT_KEY_CONVERTER().getName(), + StreamOptions.KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES().getName(), true); + Converter valueConverter = createConverter(parameters, StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER().getName(), + StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES().getName(), false); + //create the right backing store + OffsetBackingStore offsetBackingStore = null; + WorkerConfig workerConfig = null; + java.util.Map offsetBackingStoreProperties = + propertiesToMap(parameters.get(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE_PROPERTIES().getName()).get()); + String bs = parameters.get(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE().getName()).get(); + if (StreamOptions.FILE_BACKING_STORE().getValue().equals(bs)) { + offsetBackingStore = new FileOffsetBackingStore(); + workerConfig = new FileConfig(offsetBackingStoreProperties); + } else if (StreamOptions.MEMORY_BACKING_STORE().getValue().equals(bs)) { + offsetBackingStore = new MemoryOffsetBackingStore(); + workerConfig = new MemoryConfig(offsetBackingStoreProperties); + } else if (StreamOptions.KAFKA_BACKING_STORE().getValue().equals(bs)) { + offsetBackingStore = new KafkaOffsetBackingStore(); + workerConfig = new KafkaConfig(offsetBackingStoreProperties); + } else { + throw new IllegalArgumentException(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE().getName() + + " must be set!"); + } + + offsetBackingStore.configure(workerConfig); + return new KafkaConnectStreamSource(sqlContext, + propertiesToMap(parameters.get(StreamOptions.KAFKA_CONNECT_CONNECTOR_PROPERTIES().getName()).get()), + keyConverter, + valueConverter, + offsetBackingStore, + Integer.parseInt(parameters.get(StreamOptions.KAFKA_CONNECT_MAX_TASKS().getName()).get()), + (Class) Class.forName(parameters.get(StreamOptions.KAFKA_CONNECT_CONNECTOR_CLASS().getName()).get())); + } catch (Exception e) { + throw new IllegalArgumentException("Unable to create kafka connect stream source: " + e.getMessage(), e); + } + + + } + + @Override + public Tuple2 sourceSchema(SQLContext sqlContext, Option schema, String providerName, Map parameters) { + return Tuple2.apply(providerName, KafkaConnectStreamSource.DATA_SCHEMA); + } +} diff --git a/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SharedSourceTaskContext.java b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SharedSourceTaskContext.java new file mode 100644 index 000000000..51ff13f8d --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SharedSourceTaskContext.java @@ -0,0 +1,170 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.source; + +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; +import org.apache.kafka.connect.source.SourceTaskContext; +import org.apache.kafka.connect.storage.OffsetStorageReader; +import org.apache.kafka.connect.storage.OffsetStorageWriter; +import org.apache.spark.sql.execution.streaming.Offset; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import scala.Tuple3; + +import java.util.*; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReadWriteLock; +import java.util.concurrent.locks.ReentrantReadWriteLock; + +/** + * A {@link SourceTaskContext} shared among all task spawned by a connector. + *

+ * An instance of this class is regularly polled by spark structured stream engine. + */ +public class SharedSourceTaskContext implements SourceTaskContext { + + private static final Logger LOGGER = LoggerFactory.getLogger(SharedSourceTaskContext.class); + + + private final OffsetStorageReader offsetStorageReader; + private final OffsetStorageWriter offsetStorageWriter; + private final Deque> buffer = new LinkedList<>(); + private final ReadWriteLock rwLock = new ReentrantReadWriteLock(); + + + /** + * Create a new instance. + * + * @param offsetStorageReader the offset reader (managed by creating class). + * @param offsetStorageWriter the offset writer (managed by the creating class) + */ + public SharedSourceTaskContext(OffsetStorageReader offsetStorageReader, OffsetStorageWriter offsetStorageWriter) { + this.offsetStorageReader = offsetStorageReader; + this.offsetStorageWriter = offsetStorageWriter; + + } + + @Override + public OffsetStorageReader offsetStorageReader() { + return offsetStorageReader; + } + + /** + * Fetch last offset available. + * + * @return the last available offset if any. + */ + public Optional lastOffset() { + Lock lock = rwLock.readLock(); + try { + lock.lock(); + return Optional.ofNullable(buffer.isEmpty() ? null : buffer.getLast()._2()); + } finally { + lock.unlock(); + } + } + + /** + * Read the received data according to provided offsets. + * + * @param from the optional starting offset. If missing data will be fetched since the beginning of available one. + * @param to the mandatory ending offset. + * @return the {@link SourceRecord} that have been read. + */ + public Collection read(Optional from, Offset to) { + Lock lock = rwLock.readLock(); + try { + lock.lock(); + Collection ret = new ArrayList<>(); + while (!buffer.isEmpty()) { + Tuple3 current = buffer.removeFirst(); + ret.add(current._1()); + try { + if (current._3() != null) { + current._3().commitRecord(current._1()); + } + offsetStorageWriter.offset(current._1().sourcePartition(), current._1().sourceOffset()); + } catch (Throwable t) { + LOGGER.warn("Unable to properly commit offset " + current._2(), t); + } + if (to.equals(current._2())) { + break; + } + } + return ret; + } finally { + lock.unlock(); + + } + } + + /** + * Enqueue a new record emitted by a {@link SourceTask} + * + * @param record the {@link SourceRecord} coming from the connector + * @param offset the corresponding {@link Offset} + * @param emitter the record emitter. + */ + public void offer(SourceRecord record, Offset offset, SourceTask emitter) { + Lock lock = rwLock.writeLock(); + try { + lock.lock(); + buffer.addLast(Tuple3.apply(record, offset, emitter)); + } finally { + lock.unlock(); + } + } + + /** + * Confirms that data read since offset endOffset has been successfully handled by the streaming engine. + * + * @param endOffset the last offset read and committed by the spark engine. + */ + public void commit(Offset endOffset) { + try { + + if (offsetStorageWriter.beginFlush()) { + offsetStorageWriter.doFlush((error, result) -> { + if (error == null) { + LOGGER.info("Flushing till offset {} with result {}", endOffset, result); + } else { + LOGGER.error("Unable to commit records till source offset " + endOffset, error); + + } + }).get(30, TimeUnit.SECONDS); + } + } catch (Exception e) { + LOGGER.error("Unable to commit records till source offset " + endOffset, e); + } + } + + /** + * Clean up buffered data. + */ + public void clean() { + Lock lock = rwLock.writeLock(); + try { + lock.lock(); + buffer.clear(); + } finally { + lock.unlock(); + } + } +} diff --git a/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SourceThread.java b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SourceThread.java new file mode 100644 index 000000000..3dd370f60 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/java/com/hurence/logisland/connect/source/SourceThread.java @@ -0,0 +1,107 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.source; + +import org.apache.commons.lang3.RandomUtils; +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; +import org.apache.spark.sql.execution.streaming.LongOffset; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import java.util.concurrent.atomic.AtomicBoolean; + +/** + * Source polling thread. + */ +class SourceThread implements Runnable { + + private static final Logger LOGGER = LoggerFactory.getLogger(SourceTask.class); + + private final SourceTask task; + private final Map config; + private final SharedSourceTaskContext sharedSourceTaskContext; + private final AtomicBoolean running = new AtomicBoolean(false); + + + /** + * Construct a new instance. + * + * @param taskClass The task to execute. + * @param config the task configuration + * @param sharedSourceTaskContext the shared task context. + */ + public SourceThread(Class taskClass, Map config, SharedSourceTaskContext sharedSourceTaskContext) throws IllegalAccessException, InstantiationException { + this.task = taskClass.newInstance(); + this.config = Collections.unmodifiableMap(config); + this.sharedSourceTaskContext = sharedSourceTaskContext; + task.initialize(sharedSourceTaskContext); + } + + @Override + public void run() { + while (running.get()) { + try { + List records = task.poll(); + if (records != null) { + + records.forEach(sourceRecord -> sharedSourceTaskContext.offer(sourceRecord, + LongOffset.apply(sourceRecord.sourceOffset() == null || sourceRecord.sourceOffset().isEmpty() ? UUID.randomUUID().hashCode() :sourceRecord.sourceOffset().hashCode()), + task)); + } + } catch (InterruptedException ie) { + break; + } catch (Exception e) { + LOGGER.warn("Unexpected error occurred while polling task " + task.getClass().getCanonicalName(), e); + } + } + } + + /** + * Start the worker. + * + * @return itself + */ + public SourceThread start() { + try { + task.start(config); + running.set(true); + } catch (Throwable t) { + LOGGER.error("Unable to start task " + task.getClass().getCanonicalName(), t); + try { + task.stop(); + } catch (Throwable tt) { + //swallow + } + } + + return this; + } + + /** + * Tell the work loop to end any activity ASAP. + */ + public void stop() { + running.set(false); + + } +} diff --git a/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/KafkaConnectStructuredProviderService.scala b/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/KafkaConnectStructuredProviderService.scala new file mode 100644 index 000000000..bfeb7a4fb --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/KafkaConnectStructuredProviderService.scala @@ -0,0 +1,136 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.stream.spark.provider + +import java.util +import java.util.Collections + +import com.hurence.logisland.annotation.lifecycle.OnEnabled +import com.hurence.logisland.component.{InitializationException, PropertyDescriptor} +import com.hurence.logisland.controller.{AbstractControllerService, ControllerServiceInitializationContext} +import com.hurence.logisland.record.{FieldDictionary, FieldType, Record, StandardRecord} +import com.hurence.logisland.stream.StreamContext +import com.hurence.logisland.stream.spark.StreamOptions +import com.hurence.logisland.stream.spark.structured.provider.StructuredStreamProviderService +import org.apache.spark.sql.{Dataset, SparkSession} + +class KafkaConnectStructuredProviderService extends AbstractControllerService with StructuredStreamProviderService { + + var connectorProperties = "" + var keyConverter = "" + var valueConverter = "" + var keyConverterProperties = "" + var valueConverterProperties = "" + var maxConfigurations = 1 + var delegateConnectorClass = "" + var offsetBackingStore = "" + var offsetBackingStoreProperties = "" + + + @OnEnabled + @throws[InitializationException] + override def init(context: ControllerServiceInitializationContext): Unit = { + this.synchronized { + try { + delegateConnectorClass = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_CONNECTOR_CLASS).asString() + connectorProperties = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_CONNECTOR_PROPERTIES).asString() + valueConverter = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER).asString() + valueConverterProperties = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES).asString() + keyConverter = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER).asString() + keyConverterProperties = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES).asString() + maxConfigurations = (context getPropertyValue StreamOptions.KAFKA_CONNECT_MAX_TASKS).asInteger() + offsetBackingStore = (context getPropertyValue StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE).asString() + offsetBackingStoreProperties = context.getPropertyValue(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE_PROPERTIES).asString() + } catch { + case e: Exception => + throw new InitializationException(e) + } + } + } + + + /** + * Allows subclasses to register which property descriptor objects are + * supported. + * + * @return PropertyDescriptor objects this processor currently supports + */ + override def getSupportedPropertyDescriptors() = { + val descriptors: util.List[PropertyDescriptor] = new util.ArrayList[PropertyDescriptor] + descriptors.add(StreamOptions.KAFKA_CONNECT_CONNECTOR_CLASS) + descriptors.add(StreamOptions.KAFKA_CONNECT_CONNECTOR_PROPERTIES) + descriptors.add(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER) + descriptors.add(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES) + descriptors.add(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER) + descriptors.add(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES) + descriptors.add(StreamOptions.KAFKA_CONNECT_MAX_TASKS) + descriptors.add(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE) + descriptors.add(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE_PROPERTIES) + Collections.unmodifiableList(descriptors) + } + + + + /** + * create a streaming DataFrame that represents data received + * + * @param spark + * @param streamContext + * @return DataFrame currently loaded + */ + override def read(spark: SparkSession, streamContext: StreamContext) = { + + import spark.implicits._ + implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[Record] + + getLogger.info(s"Connecting kafka-connect source $delegateConnectorClass") + spark.readStream + .format("com.hurence.logisland.connect.source.KafkaConnectStreamSourceProvider") + .option(StreamOptions.KAFKA_CONNECT_CONNECTOR_PROPERTIES.getName, connectorProperties) + .option(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER.getName, keyConverter) + .option(StreamOptions.KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES.getName, keyConverterProperties) + .option(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER.getName, valueConverter) + .option(StreamOptions.KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES.getName, valueConverterProperties) + .option(StreamOptions.KAFKA_CONNECT_MAX_TASKS.getName, maxConfigurations) + .option(StreamOptions.KAFKA_CONNECT_CONNECTOR_CLASS.getName, delegateConnectorClass) + .option(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE.getName, offsetBackingStore) + .option(StreamOptions.KAFKA_CONNECT_OFFSET_BACKING_STORE_PROPERTIES.getName, offsetBackingStoreProperties) + + + .load() + //Topic, Partition, Key, Value + .as[(String, Int, Array[Byte], Array[Byte])] + .map(r => + new StandardRecord("kafka_connect") + .setField(FieldDictionary.RECORD_KEY, FieldType.BYTES, r._3) + .setField(FieldDictionary.RECORD_VALUE, FieldType.BYTES, r._4)) + } + + + /** + * create a streaming DataFrame that represents data to be written + * + * @param streamContext + * @return DataFrame currently loaded + */ + override def write(df: Dataset[Record], streamContext: StreamContext) = { + //TODO: Add sink support + df + } + +} diff --git a/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/package.scala b/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/package.scala new file mode 100644 index 000000000..0efa12304 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/main/scala/com/hurence/logisland/stream/spark/provider/package.scala @@ -0,0 +1,120 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.stream.spark + +import com.hurence.logisland.component.{AllowableValue, PropertyDescriptor} +import com.hurence.logisland.validator.StandardValidators +import org.apache.kafka.connect.connector.Connector +import org.apache.kafka.connect.runtime.standalone.StandaloneConfig +import org.apache.kafka.connect.storage.Converter + +object StreamOptions { + + val MEMORY_BACKING_STORE = new AllowableValue("memory", "In memory backing store", + "Standalone in memory offset backing store. Not suitable for clustered deployments unless source is unique or stateless") + + val FILE_BACKING_STORE = new AllowableValue("file", "File based backing store", + "Standalone filesystem based offset backing store. " + + "You have to specify the property " + StandaloneConfig.OFFSET_STORAGE_FILE_FILENAME_CONFIG + " for the file path." + + "Not suitable for clustered deployments unless source is unique or standalone") + + val KAFKA_BACKING_STORE = new AllowableValue("kafka", "Kafka topic based backing store", + "Distributed kafka topic based offset backing store. " + + "See the javadoc of class org.apache.kafka.connect.storage.KafkaOffsetBackingStore for the configuration options." + + "This backing store is well suited for distributed deployments.") + + + ////////////////////////////////////// + // Kafka Connect options + ////////////////////////////////////// + + + val KAFKA_CONNECT_CONNECTOR_CLASS = new PropertyDescriptor.Builder() + .name("kc.connector.class") + .description("The class canonical name of the kafka connector to use.") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .addValidator(StandardValidators.TYPE_VALIDATOR(classOf[Connector])) + .build + + val KAFKA_CONNECT_CONNECTOR_PROPERTIES = new PropertyDescriptor.Builder() + .name("kc.connector.properties") + .description("The properties (key=value) for the connector.") + .required(false) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build + + val KAFKA_CONNECT_MAX_TASKS = new PropertyDescriptor.Builder() + .name("kc.worker.tasks.max") + .description("Max number of threads for this connector") + .required(true) + .defaultValue("1") + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) + .build + + val KAFKA_CONNECT_KEY_CONVERTER = new PropertyDescriptor.Builder() + .name("kc.data.key.converter") + .description("Key converter class") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .addValidator(StandardValidators.TYPE_VALIDATOR(classOf[Converter])) + .build + + val KAFKA_CONNECT_VALUE_CONVERTER = new PropertyDescriptor.Builder() + .name("kc.data.value.converter") + .description("Value converter class") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .addValidator(StandardValidators.TYPE_VALIDATOR(classOf[Converter])) + .build + + val KAFKA_CONNECT_KEY_CONVERTER_PROPERTIES = new PropertyDescriptor.Builder() + .name("kc.data.key.converter.properties") + .description("Key converter properties") + .required(false) + .defaultValue("") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build + + val KAFKA_CONNECT_VALUE_CONVERTER_PROPERTIES = new PropertyDescriptor.Builder() + .name("kc.data.value.converter.properties") + .description("Value converter properties") + .required(false) + .defaultValue("") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build + + + val KAFKA_CONNECT_OFFSET_BACKING_STORE = new PropertyDescriptor.Builder() + .name("kc.connector.offset.backing.store") + .required(false) + .description("The underlying backing store to be used.") + .defaultValue(MEMORY_BACKING_STORE.getValue) + .allowableValues(MEMORY_BACKING_STORE, FILE_BACKING_STORE, KAFKA_BACKING_STORE) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build() + + val KAFKA_CONNECT_OFFSET_BACKING_STORE_PROPERTIES = new PropertyDescriptor.Builder() + .name("kc.connector.offset.backing.store.properties") + .description("Properties to configure the offset backing store") + .required(false) + .defaultValue("") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build + +} diff --git a/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/KafkaConnectTest.java b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/KafkaConnectTest.java new file mode 100644 index 000000000..19d5fbf4f --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/KafkaConnectTest.java @@ -0,0 +1,83 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package com.hurence.logisland.connect; + +import com.hurence.logisland.component.ComponentFactory; +import com.hurence.logisland.config.ConfigReader; +import com.hurence.logisland.config.LogislandConfiguration; +import com.hurence.logisland.engine.EngineContext; +import com.hurence.logisland.util.runner.TestRunner; +import org.junit.Assert; +import org.junit.Ignore; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Optional; + + +/** + * End to end test. + */ +public class KafkaConnectTest { + private static Logger logger = LoggerFactory.getLogger(KafkaConnectTest.class); + + private static final String JOB_CONF_FILE = "/conf/kafka-connect-stream.yml"; + + @Test + @Ignore + public void remoteTest() { + + + logger.info("starting StreamProcessingRunner"); + + Optional engineInstance = Optional.empty(); + try { + + String configFile = KafkaConnectTest.class.getResource(JOB_CONF_FILE).getPath(); + + // load the YAML config + LogislandConfiguration sessionConf = ConfigReader.loadConfig(configFile); + + // instantiate engine and all the processor from the config + engineInstance = ComponentFactory.getEngineContext(sessionConf.getEngine()); + assert engineInstance.isPresent(); + assert engineInstance.get().isValid(); + + logger.info("starting Logisland session version {}", sessionConf.getVersion()); + logger.info(sessionConf.getDocumentation()); + } catch (Exception e) { + logger.error("unable to launch runner : {}", e); + } + + try { + // start the engine + EngineContext engineContext = engineInstance.get(); + engineInstance.get().getEngine().start(engineContext); + } catch (Exception e) { + Assert.fail("something went bad while running the job : " + e); + + } + + + + + + + } + +} diff --git a/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/converter/LogIslandRecordConverterTest.java b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/converter/LogIslandRecordConverterTest.java new file mode 100644 index 000000000..3228df6e0 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/converter/LogIslandRecordConverterTest.java @@ -0,0 +1,123 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.converter; + +import com.hurence.logisland.record.Field; +import com.hurence.logisland.record.FieldDictionary; +import com.hurence.logisland.record.Record; +import com.hurence.logisland.serializer.BytesArraySerializer; +import com.hurence.logisland.serializer.KryoSerializer; +import com.hurence.logisland.serializer.RecordSerializer; +import com.hurence.logisland.serializer.SerializerProvider; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.data.SchemaBuilder; +import org.apache.kafka.connect.data.Struct; +import org.junit.Test; + +import java.io.ByteArrayInputStream; +import java.util.*; + +import static org.junit.Assert.*; + +public class LogIslandRecordConverterTest { + + private LogIslandRecordConverter setupInstance(Class serializerClass, boolean isKey) { + final LogIslandRecordConverter instance = new LogIslandRecordConverter(); + instance.configure( + Collections.singletonMap(LogIslandRecordConverter.PROPERTY_RECORD_SERIALIZER, serializerClass.getCanonicalName()), + isKey); + return instance; + } + + private void assertFieldEquals(Record record, String fieldName, Object expected) { + Field field = record.getField(fieldName); + assertNotNull(field); + assertEquals(expected, record.getField(fieldName).getRawValue()); + } + + private void assertFieldEquals(Record record, String fieldName, byte[] expected) { + Field field = record.getField(fieldName); + assertNotNull(field); + assertArrayEquals(expected, (byte[]) record.getField(fieldName).getRawValue()); + } + + + @Test + public void testBytesSchema() { + byte[] data = new byte[16]; + new Random().nextBytes(data); + RecordSerializer serializer = new BytesArraySerializer(); + LogIslandRecordConverter instance = setupInstance(serializer.getClass(), false); + byte[] serialized = instance.fromConnectData("", Schema.BYTES_SCHEMA, data); + Record record = serializer.deserialize(new ByteArrayInputStream(serialized)); + assertNotNull(record); + assertFieldEquals(record, FieldDictionary.RECORD_VALUE, data); + } + + @Test + public void testComplexSchema() { + //our schema + + final Schema complexSchema = SchemaBuilder + .struct() + .field("f1", SchemaBuilder.bool()) + .field("f2", SchemaBuilder.string()) + .field("f3", SchemaBuilder.int8()) + .field("f4", SchemaBuilder.int16()) + .field("f5", SchemaBuilder.string().optional()) + .field("f6", SchemaBuilder.float32()) + .field("arr", SchemaBuilder.array(SchemaBuilder.int32())) + .field("map", SchemaBuilder.map(SchemaBuilder.string(), SchemaBuilder.string())) + .field("struct", SchemaBuilder.struct() + .field("child", SchemaBuilder.string()).build()) + .build(); + + //setup converters + LogIslandRecordConverter instance = setupInstance(KryoSerializer.class, false); + RecordSerializer serializer = SerializerProvider.getSerializer(KryoSerializer.class.getName(), null); + Struct complex = new Struct(complexSchema) + .put("f1", true) + .put("f2", "test") + .put("f3", (byte) 0) + .put("f4", (short) 1) + .put("f5", null) + .put("f6", 3.1415f) + .put("arr", new ArrayList<>(Arrays.asList(0, 1, 2))) + .put("map", new HashMap<>(Collections.singletonMap("key", "value"))) + .put("struct", + new Struct(complexSchema.field("struct").schema()) + .put("child", "child")); + + Record record = serializer.deserialize(new ByteArrayInputStream(instance.fromConnectData(null, complexSchema, complex))); + System.out.println(record); + //assertions + assertNotNull(record); + Record extracted = record.getField(FieldDictionary.RECORD_VALUE).asRecord(); + assertNotNull(extracted); + assertFieldEquals(extracted, "f1", true); + assertFieldEquals(extracted, "f2", "test"); + assertFieldEquals(extracted, "f3", (byte) 0); + assertFieldEquals(extracted, "f4", (short) 1); + assertFieldEquals(extracted, "f5", null); + assertFieldEquals(extracted, "f6", (float) 3.1415); + assertFieldEquals(extracted, "arr", new ArrayList<>(Arrays.asList(0, 1, 2))); + assertFieldEquals(extracted, "map", new HashMap<>(Collections.singletonMap("key", "value"))); + //assertFieldEquals(((Map)extracted.getField("struct").getRawValue()).get("child"), "child", "child"); + + } +} diff --git a/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/fake/FakeConnector.java b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/fake/FakeConnector.java new file mode 100644 index 000000000..eae6b060d --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/test/java/com/hurence/logisland/connect/fake/FakeConnector.java @@ -0,0 +1,106 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.fake; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.kafka.common.config.ConfigDef; +import org.apache.kafka.connect.connector.Task; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.source.SourceConnector; +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; + +import java.util.*; +import java.util.concurrent.SynchronousQueue; + +public class FakeConnector extends SourceConnector { + + + public static class FakeTask extends SourceTask { + + private SynchronousQueue queue = new SynchronousQueue<>(); + private final Timer timer = new Timer(); + + + @Override + public void start(Map props) { + timer.scheduleAtFixedRate(new TimerTask() { + @Override + public void run() { + try { + queue.put(RandomStringUtils.randomAscii(30)); + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + }, 0, 500); + + } + + @Override + public List poll() throws InterruptedException { + + return Collections.singletonList(new SourceRecord(null , + Collections.singletonMap("offset", System.currentTimeMillis()), + "", + null, + Schema.STRING_SCHEMA, + queue.take())); + } + + + @Override + public void stop() { + timer.cancel(); + } + + @Override + public String version() { + return "1.0"; + } + } + + @Override + public String version() { + return "1.0"; + } + + @Override + public void start(Map props) { + } + + @Override + public Class taskClass() { + return FakeTask.class; + } + + @Override + public List> taskConfigs(int maxTasks) { + return Collections.singletonList(Collections.emptyMap()); + } + + @Override + public void stop() { + + } + + @Override + public ConfigDef config() { + return new ConfigDef(); + } +} diff --git a/logisland-connect/logisland-connect-spark/src/test/resources/conf/kafka-connect-stream.yml b/logisland-connect/logisland-connect-spark/src/test/resources/conf/kafka-connect-stream.yml new file mode 100644 index 000000000..f3df717de --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/test/resources/conf/kafka-connect-stream.yml @@ -0,0 +1,119 @@ +version: 0.13.0 +documentation: LogIsland future factory job + +engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index some apache logs with logisland + configuration: + spark.app.name: ConnectTest + spark.master: local[*] + spark.driver.memory: 512M + spark.driver.cores: 1 + spark.executor.memory: 512M + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 2000 + spark.streaming.backpressure.enabled: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 10000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4040 + + controllerServiceConfigurations: + + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + configuration: + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter.properties: | + schemas.enable=false + kc.data.key.converter: org.apache.kafka.connect.storage.StringConverter + kc.worker.tasks.max: 1 + kc.connector.class: com.hurence.logisland.connect.fake.FakeConnector + kc.connector.offset.backing.store: memory + kc.connector.properties: | + foo=bar + dummy=a long string + + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + + streamConfigurations: + ################ indexing stream ############### + - stream: indexing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw excel file content into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + processorConfigurations: + # do something useful here + - processor: stream_debugger + component: com.hurence.logisland.processor.DebugStream + type: processor + documentation: debug records + configuration: + event.serializer: json + + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.StringSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.StringSerializer + write.topics.client.service: kafka_out_service + processorConfigurations: + - processor: stream_debugger2 + component: com.hurence.logisland.processor.DebugStream + type: processor + documentation: debug records + configuration: + event.serializer: json + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "extract from root record" + configuration: + keep.root.record: false + copy.root.record.fields: true diff --git a/logisland-connect/logisland-connect-spark/src/test/resources/logback.xml b/logisland-connect/logisland-connect-spark/src/test/resources/logback.xml new file mode 100644 index 000000000..20a6c7a28 --- /dev/null +++ b/logisland-connect/logisland-connect-spark/src/test/resources/logback.xml @@ -0,0 +1,56 @@ + + + + + + + + %d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/logisland-connect/logisland-connectors-bundle/pom.xml b/logisland-connect/logisland-connectors-bundle/pom.xml new file mode 100644 index 000000000..a6a819001 --- /dev/null +++ b/logisland-connect/logisland-connectors-bundle/pom.xml @@ -0,0 +1,268 @@ + + + 4.0.0 + + com.hurence.logisland + logisland-connect + 0.13.0 + + jar + + logisland-connectors-bundle + + + + includeOpcDaConnector + + true + + !skipDefaultConnectors + + + + + + com.hurence.logisland + logisland-connector-opcda + + + + + + + includeConnectFtp + + false + + withConnectFtp + + + + + + com.eneco + kafka-connect-ftp + 0.1.4 + + + + + + org.immutables.tools + maven-shade-plugin + 4 + + + package-ftp-source + package + + shade + + + true + + + com.eneco:kafka-connect-ftp + commons-net:commons-net + commons-compress:commons-compress + + + + + *:* + + META-INF/license/** + META-INF/* + META-INF/maven/** + LICENSE + NOTICE + /*.txt + build.properties + + + + + + + + + org.apache.commons.net + ${logisland.shade.packageName}.kc.ftp.org.apache.commons.net + + + + + + + + + + + + + + includeConnectSimulator + + true + + !skipDefaultConnectors + + + + + + + com.github.jcustenborder.kafka.connect + kafka-connect-simulator + 0.1.118 + + + + + + org.immutables.tools + maven-shade-plugin + 4 + + + package-simulator-plugin + package + + shade + + + true + + + com.github.jcustenborder.kafka.connect:kafka-connect-simulator + + io.codearte.jfairy:jfairy + com.google.guava:guava + com.google.inject:guice + com.google.inject.extensions:guice-assistedinject + org.yaml:snakeyaml + + + + + *:* + + META-INF/license/** + META-INF/* + META-INF/maven/** + LICENSE + NOTICE + /*.txt + build.properties + + + + + + + + + io.codearte.jfairy + + ${logisland.shade.packageName}.kc.simulator.io.codearte.jfairy + + + + com.google + ${logisland.shade.packageName}.kc.simulator.com.google + + + + org.yaml + ${logisland.shade.packageName}.kc.simulator.org.yaml + + + + + + + + + + + + + + includeConnectBlockchain + + false + + withConnectBlockchain + + + + + + com.datamountaineer + kafka-connect-blockchain + 1.0.0 + + + + + + org.immutables.tools + maven-shade-plugin + 4 + + + package-blockchain-source + package + + shade + + + true + + + com.datamountaineer:kafka-connect-blockchain + org.codehaus.jackson:* + + + + + *:* + + META-INF/license/** + META-INF/* + META-INF/maven/** + LICENSE + NOTICE + /*.txt + build.properties + + + + + + + + + org.codehaus.jackson + + ${logisland.shade.packageName}.kc.blockchain.org.codehaus.jackson + + + + + + + + + + + + + + + diff --git a/logisland-connect/logisland-connectors-bundle/src/main/resources/git.properties b/logisland-connect/logisland-connectors-bundle/src/main/resources/git.properties new file mode 100644 index 000000000..e69de29bb diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/pom.xml b/logisland-connect/logisland-connectors/logisland-connector-opcda/pom.xml new file mode 100644 index 000000000..bfaae5149 --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/pom.xml @@ -0,0 +1,45 @@ + + + 4.0.0 + + com.hurence.logisland + logisland-connectors + 0.13.0 + + jar + + logisland-connector-opcda + + + + com.github.Hurence + opc-simple + 1.1.2 + + + com.hurence.logisland + logisland-api + + + + + + + + org.apache.maven.plugins + maven-jar-plugin + + + + true + true + + + + + + + + diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaFields.java b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaFields.java new file mode 100644 index 000000000..34d8f0cd9 --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaFields.java @@ -0,0 +1,62 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.opcda; + +import com.hurence.logisland.record.FieldDictionary; + +public interface OpcDaFields { + + /** + * The update period in milliseconds. + */ + String UPDATE_PERIOD = "update_period_millis"; + /** + * The timestamp when the OPC server acquired data. + */ + String TIMESTAMP = "tag_timestamp"; + /** + * The fully qualified tag name (with group). + */ + String TAG_NAME = FieldDictionary.RECORD_NAME; + /** + * The quality of the measurement (in case server caching is used). + * The value is managed by the OPC server. + */ + String QUALITY = "quality"; + /** + * The record value. Can be missing in case an error occurred. + */ + String VALUE = FieldDictionary.RECORD_VALUE; + /** + * The OPC server error code in case the tag reading is in error. + */ + String ERROR_CODE = "error_code"; + /** + * The OPC server host generating the event. + */ + String OPC_SERVER_HOST = "server"; + /** + * The OPC server domain generating the event. + */ + String OPC_SERVER_DOMAIN = "domain"; + /** + * The tag group. + */ + String TAG_GROUP = "group"; + +} diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnector.java b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnector.java new file mode 100644 index 000000000..d8aaaf631 --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnector.java @@ -0,0 +1,150 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.opcda; + +import org.apache.kafka.common.config.ConfigDef; +import org.apache.kafka.common.config.ConfigException; +import org.apache.kafka.common.config.ConfigValue; +import org.apache.kafka.common.utils.Utils; +import org.apache.kafka.connect.connector.Task; +import org.apache.kafka.connect.source.SourceConnector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.*; +import java.util.function.Function; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import java.util.stream.Collectors; +import java.util.stream.IntStream; + +/** + * OPC-DA Connector. + * + * @author amarziali + */ +public class OpcDaSourceConnector extends SourceConnector { + + private static final Logger logger = LoggerFactory.getLogger(OpcDaSourceConnector.class); + + private Map configValues; + + public static final String PROPERTY_HOST = "host"; + public static final String PROPERTY_PORT = "port"; + public static final String PROPERTY_DOMAIN = "domain"; + public static final String PROPERTY_USER = "user"; + public static final String PROPERTY_PASSWORD = "password"; + public static final String PROPERTY_CLSID = "clsId"; + public static final String PROPERTY_PROGID = "progId"; + public static final String PROPERTY_TAGS = "tags"; + public static final String PROPERTY_SOCKET_TIMEOUT = "socketTimeoutMillis"; + public static final String PROPERTY_DEFAULT_REFRESH_PERIOD = "defaultRefreshPeriodMillis"; + public static final String PROPERTY_DIRECT_READ = "directReadFromDevice"; + + private static final Pattern TAG_FORMAT_MATCHER = Pattern.compile("^([^:]+)(:(\\d+))?$"); + + public static Map.Entry parseTag(String t, Long defaultRefreshPeriod) { + Matcher matcher = TAG_FORMAT_MATCHER.matcher(t); + if (matcher.matches()) { + String tagName = matcher.group(1); + String refresh = matcher.groupCount() == 3 ? matcher.group(3) : null; + return new AbstractMap.SimpleEntry<>(tagName, refresh != null ? Long.parseLong(refresh) : defaultRefreshPeriod); + } + throw new IllegalArgumentException("" + t + " does not match"); + } + + + /** + * The configuration. + */ + private static final ConfigDef CONFIG = new ConfigDef() + .define(PROPERTY_HOST, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH, "The OPC-DA server host") + .define(PROPERTY_PORT, ConfigDef.Type.INT, ConfigDef.Importance.LOW, "The OPC-DA server port") + .define(PROPERTY_DOMAIN, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH, "The logon domain") + .define(PROPERTY_USER, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH, "The logon user") + .define(PROPERTY_PASSWORD, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH, "The logon password") + .define(PROPERTY_CLSID, ConfigDef.Type.STRING, ConfigDef.Importance.MEDIUM, "The CLSID of the OPC server COM component") + .define(PROPERTY_PROGID, ConfigDef.Type.STRING, ConfigDef.Importance.MEDIUM, "The Program ID of the OPC server COM component") + .define(PROPERTY_TAGS, ConfigDef.Type.LIST, Collections.emptyList(), (name, value) -> { + if (value == null) { + throw new ConfigException("Cannot be null"); + } + List list = (List) value; + for (String s : list) { + if (!TAG_FORMAT_MATCHER.matcher(s).matches()) { + throw new ConfigException("Tag list should be like [tag_name]:[refresh_period_millis] with optional refresh period"); + } + } + }, ConfigDef.Importance.HIGH, "The tags to subscribe to following format tagname:refresh_period_millis. E.g. myTag:1000") + .define(PROPERTY_SOCKET_TIMEOUT, ConfigDef.Type.LONG, ConfigDef.Importance.LOW, "The socket timeout") + .define(PROPERTY_DEFAULT_REFRESH_PERIOD, ConfigDef.Type.LONG, 1000, ConfigDef.Importance.LOW, "The default data refresh period in milliseconds") + .define(PROPERTY_DIRECT_READ, ConfigDef.Type.BOOLEAN, false, ConfigDef.Importance.LOW, "Use server cache or read directly from device"); + + + @Override + public String version() { + return getClass().getPackage().getImplementationVersion(); + } + + @Override + public void start(Map props) { + //shallow copy + configValues = config().validate(props).stream().collect(Collectors.toMap(ConfigValue::name, Function.identity())); + logger.info("Starting OPC-DA connector (version {}) on server {} reading tags {}", version(), + configValues.get(PROPERTY_HOST).value(), configValues.get(PROPERTY_TAGS).value()); + } + + @Override + public Class taskClass() { + return OpcDaSourceTask.class; + } + + @Override + public List> taskConfigs(int maxTasks) { + Long defaultRefreshPeriod = (Long) configValues.get(PROPERTY_DEFAULT_REFRESH_PERIOD).value(); + //first partition tags per refresh period + Map> tagPartitions = ((List) configValues.get(PROPERTY_TAGS).value()) + .stream().collect(Collectors.groupingBy(tag -> parseTag(tag, defaultRefreshPeriod).getValue())); + List> tags = new ArrayList<>(tagPartitions.values()); + int maxPartitions = Math.min(maxTasks, tags.size()); + int batchSize = (int) Math.ceil((double) tags.size() / maxPartitions); + //then find the ideal partition size and flatten tag list into a comma separated string (since config is a map of string,string) + return IntStream.range(0, maxPartitions) + .mapToObj(i -> tags.subList(i * batchSize, Math.min((i + 1) * batchSize, tags.size()))) + .map(l -> { + Map ret = configValues.entrySet().stream() + .filter(a -> a.getValue().value() != null) + .collect(Collectors.toMap(a -> a.getKey(), a -> a.getValue().value().toString())); + ret.put(PROPERTY_TAGS, Utils.join(l.stream().flatMap(List::stream).collect(Collectors.toList()), ",")); + return ret; + }) + .collect(Collectors.toList()); + + + } + + @Override + public void stop() { + logger.info("Stopping OPC-DA connector (version {}) on server {}", version(), configValues.get(PROPERTY_HOST).value()); + } + + @Override + public ConfigDef config() { + return CONFIG; + } +} diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceTask.java b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceTask.java new file mode 100644 index 000000000..1f0d67109 --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/OpcDaSourceTask.java @@ -0,0 +1,346 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.opcda; + +import com.hurence.opc.da.OpcDaConnectionProfile; +import com.hurence.opc.da.OpcDaOperations; +import com.hurence.opc.da.OpcDaSession; +import com.hurence.opc.da.OpcDaSessionProfile; +import org.apache.kafka.connect.data.Schema; +import org.apache.kafka.connect.data.SchemaAndValue; +import org.apache.kafka.connect.data.SchemaBuilder; +import org.apache.kafka.connect.data.Struct; +import org.apache.kafka.connect.errors.ConnectException; +import org.apache.kafka.connect.errors.SchemaBuilderException; +import org.apache.kafka.connect.source.SourceRecord; +import org.apache.kafka.connect.source.SourceTask; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.math.BigDecimal; +import java.time.Duration; +import java.time.Instant; +import java.util.*; +import java.util.concurrent.*; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * OPC-DA Worker task. + */ +public class OpcDaSourceTask extends SourceTask { + + private static class TagInfo { + final String group; + final String name; + final Long refreshPeriodMillis; + + public TagInfo(String raw, long defaultRefreshPeriod) { + Map.Entry parsed = OpcDaSourceConnector.parseTag(raw, defaultRefreshPeriod); + String tag = parsed.getKey(); + this.refreshPeriodMillis = parsed.getValue(); + int idx = tag.lastIndexOf('.'); + if (idx > 0) { + this.group = tag.substring(0, idx); + } else { + this.group = ""; + } + this.name = tag; + } + } + + private static final Logger logger = LoggerFactory.getLogger(OpcDaSourceTask.class); + + private SmartOpcOperations opcOperations; + private TransferQueue transferQueue; + private Lock lock = new ReentrantLock(); + private String tags[]; + private Map sessions; + private Map tagInfoMap; + private Set tagReadingQueue; + private ScheduledExecutorService executorService; + private String host; + private String domain; + private boolean directRead; + private long defaultRefreshPeriodMillis; + private long minWaitTime; + private volatile boolean running = false; + + + private synchronized void createSessionsIfNeeded() { + if (opcOperations != null && opcOperations.resetStale()) { + sessions = new HashMap<>(); + tagInfoMap.entrySet().stream().collect(Collectors.groupingBy(entry -> entry.getValue().refreshPeriodMillis)) + .forEach((a, b) -> { + OpcDaSessionProfile sessionProfile = new OpcDaSessionProfile().withDirectRead(directRead) + .withRefreshPeriodMillis(a); + OpcDaSession session = opcOperations.createSession(sessionProfile); + b.forEach(c -> sessions.put(c.getKey(), session)); + }); + } + } + + + private OpcDaConnectionProfile propertiesToConnectionProfile(Map properties) { + OpcDaConnectionProfile ret = new OpcDaConnectionProfile(); + ret.setHost(properties.get(OpcDaSourceConnector.PROPERTY_HOST)); + ret.setComClsId(properties.get(OpcDaSourceConnector.PROPERTY_CLSID)); + ret.setComProgId(properties.get(OpcDaSourceConnector.PROPERTY_PROGID)); + if (properties.containsKey(OpcDaSourceConnector.PROPERTY_PORT)) { + ret.setPort(Integer.parseInt(properties.get(OpcDaSourceConnector.PROPERTY_PORT))); + } + ret.setUser(properties.get(OpcDaSourceConnector.PROPERTY_USER)); + ret.setPassword(properties.get(OpcDaSourceConnector.PROPERTY_PASSWORD)); + ret.setDomain(properties.get(OpcDaSourceConnector.PROPERTY_DOMAIN)); + + if (properties.containsKey(OpcDaSourceConnector.PROPERTY_SOCKET_TIMEOUT)) { + ret.setSocketTimeout(Duration.ofMillis(Long.parseLong(properties.get(OpcDaSourceConnector.PROPERTY_SOCKET_TIMEOUT)))); + } + return ret; + } + + private SchemaAndValue convertToNativeType(final Object value) { + + Class cls = value != null ? value.getClass() : Void.class; + final ArrayList objs = new ArrayList<>(); + + if (cls.isArray()) { + final Object[] array = (Object[]) value; + + Schema arraySchema = null; + + for (final Object element : array) { + SchemaAndValue tmp = convertToNativeType(element); + if (arraySchema == null) { + arraySchema = tmp.schema(); + } + objs.add(tmp.value()); + } + + return new SchemaAndValue(SchemaBuilder.array(arraySchema), objs); + } + + if (cls.isAssignableFrom(Void.class)) { + return SchemaAndValue.NULL; + } else if (cls.isAssignableFrom(String.class)) { + return new SchemaAndValue(SchemaBuilder.string().optional(), value); + } else if (cls.isAssignableFrom(Short.class)) { + return new SchemaAndValue(SchemaBuilder.int16().optional(), value); + } else if (cls.isAssignableFrom(Integer.class)) { + + return new SchemaAndValue(SchemaBuilder.int32().optional(), value); + } else if (cls.isAssignableFrom(Long.class)) { + + return new SchemaAndValue(SchemaBuilder.int64().optional(), value); + } else if (cls.isAssignableFrom(Byte.class)) { + return new SchemaAndValue(SchemaBuilder.int8().optional(), value); + } else if (cls.isAssignableFrom(Character.class)) { + return new SchemaAndValue(SchemaBuilder.int32().optional(), value == null ? null : new Integer(((char) value))); + } else if (cls.isAssignableFrom(Boolean.class)) { + return new SchemaAndValue(SchemaBuilder.bool().optional(), value); + } else if (cls.isAssignableFrom(Float.class)) { + return new SchemaAndValue(SchemaBuilder.float32().optional(), value); + } else if (cls.isAssignableFrom(BigDecimal.class)) { + return new SchemaAndValue(SchemaBuilder.float64().optional(), value == null ? null : ((BigDecimal) value).doubleValue()); + } else if (cls.isAssignableFrom(Double.class)) { + return new SchemaAndValue(SchemaBuilder.float64().optional(), value); + } else if (cls.isAssignableFrom(Instant.class)) { + return new SchemaAndValue(SchemaBuilder.int64().optional(), value == null ? null : ((Instant) value).toEpochMilli()); + + } + throw new SchemaBuilderException("Unknown type presented (" + cls + ")"); + + } + + private Schema buildSchema(Schema valueSchema) { + SchemaBuilder ret = SchemaBuilder.struct() + .field(OpcDaFields.TAG_NAME, SchemaBuilder.string()) + .field(OpcDaFields.TIMESTAMP, SchemaBuilder.int64()) + .field(OpcDaFields.QUALITY, SchemaBuilder.int32().optional()) + .field(OpcDaFields.UPDATE_PERIOD, SchemaBuilder.int64().optional()) + .field(OpcDaFields.TAG_GROUP, SchemaBuilder.string().optional()) + .field(OpcDaFields.OPC_SERVER_DOMAIN, SchemaBuilder.string().optional()) + .field(OpcDaFields.OPC_SERVER_HOST, SchemaBuilder.string()); + + if (valueSchema != null) { + ret = ret.field(OpcDaFields.VALUE, valueSchema); + } else { + ret = ret.field(OpcDaFields.ERROR_CODE, SchemaBuilder.int32().optional()); + } + return ret; + } + + + @Override + public void start(Map props) { + transferQueue = new LinkedTransferQueue<>(); + opcOperations = new SmartOpcOperations<>(new OpcDaOperations()); + OpcDaConnectionProfile connectionProfile = propertiesToConnectionProfile(props); + tags = props.get(OpcDaSourceConnector.PROPERTY_TAGS).split(","); + host = connectionProfile.getHost(); + domain = connectionProfile.getDomain() != null ? connectionProfile.getDomain() : ""; + defaultRefreshPeriodMillis = Long.parseLong(props.get(OpcDaSourceConnector.PROPERTY_DEFAULT_REFRESH_PERIOD)); + directRead = Boolean.parseBoolean(props.get(OpcDaSourceConnector.PROPERTY_DIRECT_READ)); + tagInfoMap = Arrays.stream(tags).map(t -> new TagInfo(t, defaultRefreshPeriodMillis)) + .collect(Collectors.toMap(t -> t.name, Function.identity())); + opcOperations.connect(connectionProfile); + if (!opcOperations.awaitConnected()) { + throw new ConnectException("Unable to connect"); + } + logger.info("Started OPC-DA task for tags {}", (Object) tags); + minWaitTime = Math.max(10, gcd(tagInfoMap.values().stream().mapToLong(t -> t.refreshPeriodMillis).toArray())); + tagReadingQueue = new HashSet<>(); + running = true; + executorService = Executors.newSingleThreadScheduledExecutor(); + tagInfoMap.forEach((k, v) -> executorService.scheduleAtFixedRate(() -> { + try { + lock.lock(); + tagReadingQueue.add(k); + } finally { + lock.unlock(); + } + }, 0, v.refreshPeriodMillis, TimeUnit.MILLISECONDS)); + + executorService.scheduleAtFixedRate(() -> { + try { + Set tagsToRead; + try { + lock.lock(); + tagsToRead = new HashSet<>(tagReadingQueue); + tagReadingQueue.clear(); + } finally { + lock.unlock(); + } + if (tagsToRead.isEmpty()) { + return; + } + createSessionsIfNeeded(); + Map> sessionTags = + tagsToRead.stream().collect(Collectors.groupingBy(sessions::get)); + logger.debug("Reading {}", sessionTags); + sessionTags.entrySet().parallelStream() + .map(entry -> entry.getKey().read(entry.getValue().toArray(new String[entry.getValue().size()]))) + .flatMap(Collection::stream) + .map(opcData -> { + SchemaAndValue tmp = convertToNativeType(opcData.getValue()); + Schema valueSchema = buildSchema(tmp.schema()); + TagInfo meta = tagInfoMap.get(opcData.getTag()); + Struct value = new Struct(valueSchema) + .put(OpcDaFields.TIMESTAMP, opcData.getTimestamp().toEpochMilli()) + .put(OpcDaFields.TAG_NAME, opcData.getTag()) + .put(OpcDaFields.QUALITY, opcData.getQuality()) + .put(OpcDaFields.UPDATE_PERIOD, meta.refreshPeriodMillis) + .put(OpcDaFields.TAG_GROUP, meta.group) + .put(OpcDaFields.OPC_SERVER_HOST, host) + .put(OpcDaFields.OPC_SERVER_DOMAIN, domain); + + if (tmp.value() != null) { + value = value.put(OpcDaFields.VALUE, tmp.value()); + } + if (opcData.getErrorCode().isPresent()) { + value.put(OpcDaFields.ERROR_CODE, opcData.getErrorCode().get()); + } + + Map partition = new HashMap<>(); + partition.put(OpcDaFields.TAG_NAME, opcData.getTag()); + partition.put(OpcDaFields.OPC_SERVER_DOMAIN, domain); + partition.put(OpcDaFields.OPC_SERVER_HOST, host); + + + return new SourceRecord( + partition, + Collections.singletonMap(OpcDaFields.TIMESTAMP, opcData.getTimestamp().toEpochMilli()), + "", + SchemaBuilder.STRING_SCHEMA, + domain + "|" + host + "|" + opcData.getTag(), + valueSchema, + value); + } + ).forEach(sourceRecord -> { + try { + transferQueue.put(sourceRecord); + } catch (InterruptedException e) { + throw new RuntimeException("Interrupted", e); + } + }); + } catch (Exception e) { + logger.error("Got exception while reading tags", e); + } + }, 0L, minWaitTime, TimeUnit.MILLISECONDS); + } + + @Override + public List poll() throws InterruptedException { + if (transferQueue.isEmpty()) { + Thread.sleep(minWaitTime); + } + List ret = new ArrayList<>(); + transferQueue.drainTo(ret); + return ret; + } + + @Override + public void stop() { + running = false; + if (executorService != null) { + executorService.shutdown(); + executorService = null; + } + + if (opcOperations != null) { + opcOperations.disconnect(); + opcOperations.awaitDisconnected(); + } + //session are automatically cleaned up and detached when the connection is closed. + sessions = null; + transferQueue = null; + tagReadingQueue = null; + tagInfoMap = null; + logger.info("Stopped OPC-DA task for tags {}", (Object) tags); + } + + @Override + public String version() { + return getClass().getPackage().getImplementationVersion(); + } + + /** + * GCD recursive version + * + * @param x dividend + * @param y divisor + * @return + */ + private static long gcdInternal(long x, long y) { + return (y == 0) ? x : gcdInternal(y, x % y); + } + + /** + * Great common divisor (An elegant way to do it with a lambda). + * + * @param numbers list of number + * @return the GCD. + */ + private static long gcd(long... numbers) { + return Arrays.stream(numbers).reduce(0, (x, y) -> (y == 0) ? x : gcdInternal(y, x % y)); + } + + +} diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/SmartOpcOperations.java b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/SmartOpcOperations.java new file mode 100644 index 000000000..c67f7190b --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/main/java/com/hurence/logisland/connect/opcda/SmartOpcOperations.java @@ -0,0 +1,78 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.opcda; + +import com.hurence.opc.ConnectionProfile; +import com.hurence.opc.OpcOperations; +import com.hurence.opc.OpcSession; +import com.hurence.opc.SessionProfile; +import com.hurence.opc.util.AutoReconnectOpcOperations; + +import java.util.concurrent.atomic.AtomicBoolean; + +/** + * A 'smart' version of {@link AutoReconnectOpcOperations}. + * It tracks a stale flag becoming true if the connection has been interrupted and recreated. + * The stale flag can be reset upon call of method {@link SmartOpcOperations#resetStale()}. + * + * @author amarziali + */ +public class SmartOpcOperations, T extends SessionProfile, U extends OpcSession> + extends AutoReconnectOpcOperations { + + private final AtomicBoolean stale = new AtomicBoolean(); + + /** + * Construct an instance. + * + * @param delegate the deletegate {@link OpcOperations}. + */ + public SmartOpcOperations(OpcOperations delegate) { + super(delegate); + } + + @Override + public void connect(S connectionProfile) { + stale.set(true); + super.connect(connectionProfile); + awaitConnected(); + } + + @Override + public void disconnect() { + super.disconnect(); + } + + /** + * Reset the connection stale flag and return previous state. + * + * @return the stale flag. + */ + public synchronized boolean resetStale() { + awaitConnected(); + return stale.getAndSet(false); + } + + @Override + public String toString() { + return "SmartOpcOperations{" + + "stale=" + stale + + "} " + super.toString(); + } +} + diff --git a/logisland-connect/logisland-connectors/logisland-connector-opcda/src/test/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnectorTest.java b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/test/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnectorTest.java new file mode 100644 index 000000000..afe0c215e --- /dev/null +++ b/logisland-connect/logisland-connectors/logisland-connector-opcda/src/test/java/com/hurence/logisland/connect/opcda/OpcDaSourceConnectorTest.java @@ -0,0 +1,130 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.connect.opcda; + +import com.hurence.opc.OpcTagInfo; +import com.hurence.opc.da.OpcDaConnectionProfile; +import com.hurence.opc.da.OpcDaOperations; +import org.junit.Assert; +import org.junit.Ignore; +import org.junit.Test; + +import java.time.Duration; +import java.time.temporal.ChronoUnit; +import java.util.*; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.stream.Collectors; + +public class OpcDaSourceConnectorTest { + + @Test(expected = IllegalArgumentException.class) + public void parseFailureTest() { + OpcDaSourceConnector.parseTag("test1:2aj", 500L); + } + + @Test + public void tagParseTest() { + Map.Entry toTest = OpcDaSourceConnector.parseTag("test1:1000", 500L); + Assert.assertEquals("test1", toTest.getKey()); + Assert.assertEquals(new Long(1000), toTest.getValue()); + + toTest = OpcDaSourceConnector.parseTag("test2", 500L); + Assert.assertEquals("test2", toTest.getKey()); + Assert.assertEquals(new Long(500), toTest.getValue()); + } + + @Test + public void configParseAndPartitionTest() { + OpcDaSourceConnector connector = new OpcDaSourceConnector(); + Map properties = new HashMap<>(); + properties.put(OpcDaSourceConnector.PROPERTY_DOMAIN, "domain"); + properties.put(OpcDaSourceConnector.PROPERTY_SOCKET_TIMEOUT, "2000"); + properties.put(OpcDaSourceConnector.PROPERTY_PASSWORD, "password"); + properties.put(OpcDaSourceConnector.PROPERTY_USER, "user"); + properties.put(OpcDaSourceConnector.PROPERTY_HOST, "host"); + properties.put(OpcDaSourceConnector.PROPERTY_CLSID, "clsId"); + properties.put(OpcDaSourceConnector.PROPERTY_TAGS, "tag1:1000,tag2,tag3:3000,tag4:3000"); + connector.start(properties); + List> configs = connector.taskConfigs(2); + Assert.assertEquals(2, configs.size()); + System.out.println(configs); + configs.stream().map(m -> m.get(OpcDaSourceConnector.PROPERTY_TAGS)) + .map(s -> s.split(",")).forEach(a -> Assert.assertEquals(2, a.length)); + } + + @Test + @Ignore + public void e2eTest() throws Exception { + AtomicInteger atomicInteger = new AtomicInteger(5); + Random r = new Random(); + OpcDaSourceConnector connector = new OpcDaSourceConnector(); + Map properties = new HashMap<>(); + properties.put(OpcDaSourceConnector.PROPERTY_DOMAIN, "OPC-9167C0D9342"); + properties.put(OpcDaSourceConnector.PROPERTY_SOCKET_TIMEOUT, "2000"); + properties.put(OpcDaSourceConnector.PROPERTY_PASSWORD, "opc"); + properties.put(OpcDaSourceConnector.PROPERTY_USER, "OPC"); + properties.put(OpcDaSourceConnector.PROPERTY_HOST, "192.168.56.101"); + properties.put(OpcDaSourceConnector.PROPERTY_CLSID, "F8582CF2-88FB-11D0-B850-00C0F0104305"); + properties.put(OpcDaSourceConnector.PROPERTY_TAGS, listAllTags().stream() + .map(s -> s + ":" + atomicInteger.getAndAdd(r.nextInt(130))) + .collect(Collectors.joining(",")) + ); + //"Read Error.Int4:1000,Square Waves.Real8,Random.ArrayOfString:100" + connector.start(properties); + OpcDaSourceTask task = new OpcDaSourceTask(); + task.start(connector.taskConfigs(1).get(0)); + ScheduledExecutorService es = Executors.newSingleThreadScheduledExecutor(); + es.scheduleAtFixedRate(() -> { + try { + task.poll().forEach(System.out::println); + } catch (InterruptedException e) { + //do nothing + } + }, 0, 10, TimeUnit.MILLISECONDS); + + Thread.sleep(10000); + task.stop(); + es.shutdown(); + connector.stop(); + } + + private Collection listAllTags() throws Exception { + //create a connection profile + OpcDaConnectionProfile connectionProfile = new OpcDaConnectionProfile() + .withComClsId("F8582CF2-88FB-11D0-B850-00C0F0104305") + .withDomain("OPC-9167C0D9342") + .withUser("OPC") + .withPassword("opc") + .withHost("192.168.56.101") + .withSocketTimeout(Duration.of(1, ChronoUnit.SECONDS)); + + //Create an instance of a da operations + try (OpcDaOperations opcDaOperations = new OpcDaOperations()) { + //connect using our profile + opcDaOperations.connect(connectionProfile); + if (!opcDaOperations.awaitConnected()) { + throw new IllegalStateException("Unable to connect"); + } + return opcDaOperations.browseTags().stream().map(OpcTagInfo::getName).collect(Collectors.toList()); + } + } + +} diff --git a/logisland-connect/logisland-connectors/pom.xml b/logisland-connect/logisland-connectors/pom.xml new file mode 100644 index 000000000..9232401bf --- /dev/null +++ b/logisland-connect/logisland-connectors/pom.xml @@ -0,0 +1,25 @@ + + + 4.0.0 + + com.hurence.logisland + logisland-connect + 0.13.0 + + pom + + logisland-connectors + + + org.apache.kafka + connect-api + + + + logisland-connector-opcda + + + + diff --git a/logisland-connect/pom.xml b/logisland-connect/pom.xml new file mode 100644 index 000000000..c122ef751 --- /dev/null +++ b/logisland-connect/pom.xml @@ -0,0 +1,41 @@ + + + 4.0.0 + + com.hurence.logisland + logisland + 0.13.0 + + pom + + logisland-connect + Kafka Connect Logisland Modules + + + logisland-connect-spark + logisland-connectors-bundle + logisland-connectors + + + + + org.apache.kafka + connect-api + ${kafka.version} + + + org.apache.kafka + connect-runtime + ${kafka.version} + + + org.apache.kafka + connect-json + ${kafka.version} + + + + + diff --git a/logisland-docker/full-container/Dockerfile b/logisland-docker/full-container/Dockerfile index e8952a8e0..42b0f9184 100644 --- a/logisland-docker/full-container/Dockerfile +++ b/logisland-docker/full-container/Dockerfile @@ -7,11 +7,13 @@ USER root COPY logisland-*.tar.gz /usr/local/ RUN cd /usr/local; \ tar -xzf logisland-*.tar.gz; \ - ln -s /usr/local/logisland-0.12.2 /usr/local/logisland; \ + ln -s /usr/local/logisland-0.13.0 /usr/local/logisland; \ mkdir /usr/local/logisland/log; \ rm -f /usr/local/*.gz ENV LOGISLAND_HOME /usr/local/logisland RUN mv /usr/local/logisland/conf/log4j.properties /usr/local/spark/conf +ENV PATH $PATH:$LOGISLAND_HOME/bin +WORKDIR $LOGISLAND_HOME/ # update boot script diff --git a/logisland-docker/full-container/README.rst b/logisland-docker/full-container/README.rst index 97c5f7610..51dc8ffb2 100644 --- a/logisland-docker/full-container/README.rst +++ b/logisland-docker/full-container/README.rst @@ -7,7 +7,7 @@ Small standalone Hadoop distribution for development and testing purpose : - Elasticsearch 2.3.3 - Kibana 4.5.1 - Kafka 0.9.0.1 -- Logisland 0.12.2 +- Logisland 0.13.0 This repository contains a Docker file to build a Docker image with Apache Spark, HBase, Flume & Zeppelin. @@ -32,14 +32,14 @@ Building the image # build logisland mvn clean install - cp logisland-assembly/target/logisland-0.12.2-bin.tar.gz logisland-docker + cp logisland-assembly/target/logisland-0.13.0-bin.tar.gz logisland-docker The archive is generated under dist directory, you have to copy this file into your Dockerfile directory you can now issue .. code-block:: sh - docker build --rm -t hurence/logisland:0.12.2 . + docker build --rm -t hurence/logisland:0.13.0 . Running the image @@ -64,13 +64,13 @@ Running the image -p 4040-4060:4040-4060 \ --name logisland \ -h sandbox \ - hurence/logisland-hdp2.4:0.12.2 bash + hurence/logisland-hdp2.4:0.13.0 bash or .. code-block:: - docker run -d -h sandbox hurence/logisland-hdp2.4:0.12.2 -d + docker run -d -h sandbox hurence/logisland-hdp2.4:0.13.0 -d if you want to mount a directory from your host, add the following option : diff --git a/logisland-docker/lightweight-container/Dockerfile b/logisland-docker/lightweight-container/Dockerfile index bb82fd80e..400038973 100644 --- a/logisland-docker/lightweight-container/Dockerfile +++ b/logisland-docker/lightweight-container/Dockerfile @@ -1,6 +1,6 @@ FROM anapsix/alpine-java -ARG kafka_version=0.12.2.1 +ARG kafka_version=0.13.0.1 ARG scala_version=2.11 MAINTAINER wurstmeister diff --git a/logisland-docker/pom.xml b/logisland-docker/pom.xml index 900171ae1..bcd8042c0 100644 --- a/logisland-docker/pom.xml +++ b/logisland-docker/pom.xml @@ -7,10 +7,13 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 pom logisland-docker + + false + @@ -62,10 +66,10 @@ --> - - - - - - + + + + + + diff --git a/logisland-docker/src/it/resources/data/all-tutorials.yml b/logisland-docker/src/it/resources/data/all-tutorials.yml index 20d1c0927..cbedd52db 100644 --- a/logisland-docker/src/it/resources/data/all-tutorials.yml +++ b/logisland-docker/src/it/resources/data/all-tutorials.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-documentation/_static/kibana-blockchain-dashboard.png b/logisland-documentation/_static/kibana-blockchain-dashboard.png new file mode 100644 index 000000000..03f6422df Binary files /dev/null and b/logisland-documentation/_static/kibana-blockchain-dashboard.png differ diff --git a/logisland-documentation/_static/kibana-blockchain-records.png b/logisland-documentation/_static/kibana-blockchain-records.png new file mode 100644 index 000000000..9c164cb61 Binary files /dev/null and b/logisland-documentation/_static/kibana-blockchain-records.png differ diff --git a/logisland-documentation/_static/kibana-excel-logs.png b/logisland-documentation/_static/kibana-excel-logs.png new file mode 100644 index 000000000..159c7d61c Binary files /dev/null and b/logisland-documentation/_static/kibana-excel-logs.png differ diff --git a/logisland-documentation/api.rst b/logisland-documentation/api.rst index b835bd2c8..9eee9c16c 100644 --- a/logisland-documentation/api.rst +++ b/logisland-documentation/api.rst @@ -409,7 +409,7 @@ You can then start to generate the source code from the swgger yaml file swagger-codegen generate \ --group-id com.hurence.logisland \ --artifact-id logisland-agent \ - --artifact-version 0.12.2 \ + --artifact-version 0.13.0 \ --api-package com.hurence.logisland.agent.rest.api \ --model-package com.hurence.logisland.agent.rest.model \ -o logisland-framework/logisland-agent \ diff --git a/logisland-documentation/changes.rst b/logisland-documentation/changes.rst index 5873217d2..cdeae8308 100644 --- a/logisland-documentation/changes.rst +++ b/logisland-documentation/changes.rst @@ -3,7 +3,7 @@ What's new in logisland ? -v0.12.2 +v0.13.0 ------- - add support for SOLR diff --git a/logisland-documentation/components.rst b/logisland-documentation/components.rst index 2640e4734..b8e778012 100644 --- a/logisland-documentation/components.rst +++ b/logisland-documentation/components.rst @@ -429,6 +429,38 @@ Dynamic Properties allow the user to specify both the name and value of a proper ---------- +.. _com.hurence.logisland.processor.excel.ExcelExtract: + +ExcelExtract +------------ +Consumes a Microsoft Excel document and converts each worksheet's line to a structured record. The processor is assuming to receive raw excel file as input record. + +Class +_____ +com.hurence.logisland.processor.excel.ExcelExtract + +Tags +____ +excel, processor, poi + +Properties +__________ +In the list below, the names of required properties appear in **bold**. Any other properties (not in bold) are considered optional. The table also indicates any default values +. + +.. csv-table:: allowable-values + :header: "Name","Description","Allowable Values","Default Value","Sensitive","EL" + :widths: 20,60,30,20,10,10 + + "Sheets to Extract", "Comma separated list of Excel document sheet names that should be extracted from the excel document. If this property is left blank then all of the sheets will be extracted from the Excel document. You can specify regular expressions. Any sheets not specified in this value will be ignored.", "", "", "", "" + "Columns To Skip", "Comma delimited list of column numbers to skip. Use the columns number and not the letter designation. Use this to skip over columns anywhere in your worksheet that you don't want extracted as part of the record.", "", "", "", "" + "Field names mapping", "The comma separated list representing the names of columns of extracted cells. Order matters! You should use either field.names either field.row.header but not both together.", "", "null", "", "" + "Number of Rows to Skip", "The row number of the first row to start processing.Use this to skip over rows of data at the top of your worksheet that are not part of the dataset.Empty rows of data anywhere in the spreadsheet will always be skipped, no matter what this value is set to.", "", "0", "", "" + "record.type", "Default type of record", "", "excel_record", "", "" + "Use a row header as field names mapping", "If set, field names mapping will be extracted from the specified row number. You should use either field.names either field.row.header but not both together.", "", "null", "", "" + +---------- + .. _com.hurence.logisland.processor.hbase.FetchHBaseRow: FetchHBaseRow @@ -887,7 +919,7 @@ In the list below, the names of required properties appear in **bold**. Any othe :header: "Name","Description","Allowable Values","Default Value","Sensitive","EL" :widths: 20,60,30,20,10,10 - "**conflict.resolution.policy**", "waht to do when a field with the same name already exists ?", "nothing to do (leave record as it was), overwrite existing field (if field already exist), keep only old field and delete the other (keep only old field and delete the other), keep old field and new one (creates an alias for the new field)", "do_nothing", "", "" + "**conflict.resolution.policy**", "what to do when a field with the same name already exists ?", "nothing to do (leave record as it was), overwrite existing field (if field already exist), keep only old field and delete the other (keep only old field and delete the other), keep old field and new one (creates an alias for the new field)", "do_nothing", "", "" Dynamic Properties __________________ diff --git a/logisland-documentation/conf.py b/logisland-documentation/conf.py index e484b608a..00338dfd8 100644 --- a/logisland-documentation/conf.py +++ b/logisland-documentation/conf.py @@ -21,7 +21,7 @@ #sys.path.insert(0, os.path.abspath('.')) -from recommonmark.parser import CommonMarkParser +#from recommonmark.parser import CommonMarkParser source_parsers = { '.md': CommonMarkParser, @@ -71,9 +71,9 @@ # built documents. # # The short X.Y version. -version = '0.12.2' +version = '0.13.0' # The full version, including alpha/beta/rc tags. -release = '0.12.2' +release = '0.13.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/logisland-documentation/connectors.rst b/logisland-documentation/connectors.rst new file mode 100644 index 000000000..8159c69f0 --- /dev/null +++ b/logisland-documentation/connectors.rst @@ -0,0 +1,180 @@ + +Connectors +========== + +In this chapter we will present you how to integrate kafka connect connectors into logisland. + +.. contents:: Table of Contents + + +Introduction +------------ + +Logisland features the integration between `kafka connect `_ world and the spark structured streaming engine. + +In order to seamlessy integrate both world, we just wrapped out the kafka connectors interfaces (unplugging them from kafka) and let the run in a logisland spark managed container. Hence the name *"Logisland Connect"* :-) + + +This allows you to leverage the existing kafka connectors library to import data into a logisland pipeline without having the need to make use of any another middleware or ETL system. + +Scope & Roadmap +--------------- + +Today only kafka-connect sources are available. + +Sinks will be probably supported in future relases of logisland. + +.. note:: + Please note that kafka connect requires at least kafka 0.10.0.0. Logisland build for hadoop 2.4 / spark 1.6 is hence not supporting this feature. + + +Building +-------- + +Logisland comes with a connectors bundle but those connectors are not bundled by default. You are required to build logisland from sources in order to package the connectors you need into logisland uber jar. + +Actually when building with maven you need to pass some java properties depending on the connector(s) you would like to include. + +Please refer to the following table for the details: + + ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ +| Connector | URL | Build flag | ++==========================+=========================+========================================================+==============================+ +| Simulator | https://github.com/jcustenborder/kafka-connect-simulator | None (Built in) | ++--------------------------+-------------------------+--------------------------------------------------------+------------------------------+ +| OPC-DA (IIoT) | https://github.com/Hurence/logisland | None (Built in) | ++--------------------------+-------------------------+--------------------------------------------------------+------------------------------+ +| FTP | https://github.com/Eneco/kafka-connect-ftp | -DwithConnectFtp | ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ +| Blockchain | https://github.com/Landoop/stream-reactor/tree/master/kafka-connect-blockchain | -DwithConnectBlockchain | ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ + + +Configuring +----------- + +Once you have bundled the connectors you need, you are now ready to use them. + +Let's do it step by step. + +First of all we need to declare a *KafkaConnectStructuredProviderService* that will manage our connector in Logisland. +Along with this we need to put some configuration (In general you can always refer to kafka connect documentation to better understand the underlying architecture and how to configure a connector): + + ++-------------------------------------------------+----------------------------------------------------------+ +| Property | Description | ++=================================================+==========================================================+ +| kc.connector.class | The class of the connector (Fully qualified name) | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.key.converter | The class of the converter to be used for the key. | +| | Please refer to `Choosing the right converter`_ section | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.key.converter.properties | The properties to be provided to the key converter | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.value.converter | The class of the converter to be used for the key. | +| | Please refer to `Choosing the right converter`_ section | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.value.converter.properties | The properties to be provided to the key converter | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.properties | The properties to be provided to the connector and | +| | specific to the connector itself. | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.worker.tasks.max | How many concurrent threads to spawn for a connector | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.offset.backing.store | The offset backing store to use. Choose among: | +| | | +| | * **memory** : standalone in memory | +| | * **file** : standalone file based. | +| | * **kafka** : distributed kafka topic based | +| | | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.offset.backing.store.properties | Specific properties to configure the chosen backing | +| | store. | ++-------------------------------------------------+----------------------------------------------------------+ + +.. note:: Please refer to `Kafka connect guide `_ for further information about offset backing store and how to configure them. + + +Choosing the right converter +---------------------------- + +Choosing the right converter is perhaps one of the most important part. In fact we're going to adapt what is coming from kafka connect to what is flowing into our logisland pipeline. +This means that we have to know how the source is managing its data. + +In order to simplify your choice, we recommend you to follow this simple approach (the same applies for both keys and values): + + ++----------------------------+-----------------------------------+-----------------------------------+ +| Source data | Kafka Converter | Logisland Encoder | ++============================+===================================+===================================+ +| String | StringConverter | StringEncoder | ++----------------------------+-----------------------------------+-----------------------------------+ +| Raw Bytes | ByteArrayConverter | BytesArraySerialiser | ++----------------------------+-----------------------------------+-----------------------------------+ +| Structured | LogIslandRecordConverter | The serializer used by the record | +| | | converter (*) | ++----------------------------+-----------------------------------+-----------------------------------+ + + +.. note:: + (*)In case you deal with structured data, the LogIslandRecordConverter will embed the structured object in a logisland record. In order to do this you have to specify the serializer to be used to convert your data (the serializer property **record.serializer**). Generally the *KryoSerialiser* is a good choice to start with. + + + +Putting all together +-------------------- + +In the previous two sections we explained how to configure a connector and how to choose the right serializer for it. + +The recap we can examine the following configuration example: + + +.. code-block:: yaml + + # Our source service + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + documentation: A kafka source connector provider reading from its own source and providing structured streaming to the underlying layer + configuration: + # We will use the logisland record converter for both key and value + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.key.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + # Only one task to handle source input (unique) + kc.worker.tasks.max: 1 + # The kafka source connector to wrap (here we're using a simulator source) + kc.connector.class: com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector + # The properties for the connector (as per connector documentation) + kc.connector.properties: | + key.schema.fields=email + topic=simulator + value.schema.fields=email,firstName,middleName,lastName,telephoneNumber,dateOfBirth + # We are using a standalone source for testing. We can store processed offsets in memory + kc.connector.offset.backing.store: memory + + + + +In the example both key and value provided by the connector are structured objects. + +For this reason we use for that the converter *LogIslandRecordConverter*. +As well, we provide the serializer to be used for both key and value converter specifying +*record.serializer=com.hurence.logisland.serializer.KryoSerializer* among the related converter properties. + + +Going further +------------- + + +Please do not hesitate to take a look to our kafka connect tutorials for more details and practical use cases. + + diff --git a/logisland-documentation/developer.rst b/logisland-documentation/developer.rst index 647b68d1b..76ffccd5c 100644 --- a/logisland-documentation/developer.rst +++ b/logisland-documentation/developer.rst @@ -204,14 +204,12 @@ to release artifacts (if you're allowed to), follow this guide `release to OSS S .. code-block:: sh - mvn versions:set -DnewVersion=0.12.2 + ./update-version.sh -o 0.13.0 -n 14.4 mvn license:format mvn test - mvn -DperformRelease=true clean deploy + mvn -DperformRelease=true clean deploy -Phdp2.5 mvn versions:commit - git tag -a v0.12.2 -m "new logisland release 0.12.2" - git push origin v0.12.2 follow the staging procedure in `oss.sonatype.org `_ or read `Sonatype book `_ @@ -224,7 +222,7 @@ Publish release assets to github please refer to `https://developer.github.com/v3/repos/releases `_ -curl -XPOST https://uploads.github.com/repos/Hurence/logisland/releases/8905079/assets?name=logisland-0.12.2-bin-hdp2.5.tar.gz -v --data-binary @logisland-assembly/target/logisland-0.10.3-bin-hdp2.5.tar.gz --user oalam -H 'Content-Type: application/gzip' +curl -XPOST https://uploads.github.com/repos/Hurence/logisland/releases/8905079/assets?name=logisland-0.13.0-bin-hdp2.5.tar.gz -v --data-binary @logisland-assembly/target/logisland-0.10.3-bin-hdp2.5.tar.gz --user oalam -H 'Content-Type: application/gzip' @@ -235,7 +233,7 @@ Building the image .. code-block:: sh # build logisland - mvn clean install -DskipTests -Pdocker -Dhdp=2.4 + mvn clean install -DskipTests -Pdocker -Dhdp2.5 # verify image build docker images diff --git a/logisland-documentation/monitoring.rst b/logisland-documentation/monitoring.rst index 809001ed9..0927fe3a0 100644 --- a/logisland-documentation/monitoring.rst +++ b/logisland-documentation/monitoring.rst @@ -63,8 +63,8 @@ Manual mode : # download the latest build of Node Exporter cd /opt - wget https://github.com/prometheus/node_exporter/releases/download/0.12.2/node_exporter-0.12.2.linux-amd64.tar.gz -O /tmp/node_exporter-0.12.2.linux-amd64.tar.gz - sudo tar -xvzf /tmp/node_exporter-0.12.2.linux-amd64.tar.gz + wget https://github.com/prometheus/node_exporter/releases/download/0.13.0/node_exporter-0.13.0.linux-amd64.tar.gz -O /tmp/node_exporter-0.13.0.linux-amd64.tar.gz + sudo tar -xvzf /tmp/node_exporter-0.13.0.linux-amd64.tar.gz # Create a soft link to the node_exporter binary in /usr/bin. sudo ln -s /opt/node_exporter /usr/bin diff --git a/logisland-documentation/overview-slides.md b/logisland-documentation/overview-slides.md index 1c2f2e41e..ed33b565a 100644 --- a/logisland-documentation/overview-slides.md +++ b/logisland-documentation/overview-slides.md @@ -369,7 +369,7 @@ you configure here your Spark job parameters Download the latest release from [github](https://github.com/Hurence/logisland/releases) - tar -xzf logisland-0.12.2-bin.tar.gz + tar -xzf logisland-0.13.0-bin.tar.gz Create a job configuration diff --git a/logisland-documentation/plugins_old.rst b/logisland-documentation/plugins_old.rst index 03e55da19..116df25e7 100644 --- a/logisland-documentation/plugins_old.rst +++ b/logisland-documentation/plugins_old.rst @@ -60,7 +60,7 @@ Write your a custom LogParser for your super-plugin in ``/src/main/java/com/hure Our parser will analyze some Proxy Log String in the following form : - "Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/RightJauge.gif 724 409 false false" + "Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/RightJauge.gif 724 409 false false" .. code-block:: java diff --git a/logisland-documentation/pom.xml b/logisland-documentation/pom.xml index d3f57ed49..b03a9286c 100644 --- a/logisland-documentation/pom.xml +++ b/logisland-documentation/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 logisland-documentation @@ -109,9 +109,16 @@ com.hurence.logisland logisland-useragent-plugin - - + + com.hurence.logisland + logisland-excel-plugin + + + com.hurence.logisland + logisland-redis_4-client-service + + diff --git a/logisland-documentation/tutorials/index-blockchain-transactions.rst b/logisland-documentation/tutorials/index-blockchain-transactions.rst new file mode 100644 index 000000000..8c4e6e396 --- /dev/null +++ b/logisland-documentation/tutorials/index-blockchain-transactions.rst @@ -0,0 +1,274 @@ +Index blockchain transactions +============================= + +In the following getting started tutorial, we'll explain you how to leverage logisland connectors flexibility +in order process in real time every transaction emitted by the bitcoin blockchain platform and index each record +into an elasticsearch platform. + +This will allow us to run some dashboarding and visual data analysis as well. + + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + + For kafka connect related information please follow as well the `connectors <../connectors.html>`_ section. + +1. Logisland job setup +---------------------- + +.. note:: + + To run this tutorial you have to package the blockchain connector into the logisland deployable jar. + You can do this simply by building logisland from sources passing the option *-DwithConnectBlockchain* to maven. + +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + vim conf/index-blockchain-transactions.yml + + + +We will start by explaining each part of the config file. + +========== +The engine +========== + +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index some blockchain transactions with logisland + configuration: + spark.app.name: BlockchainTest + spark.master: local[*] + spark.driver.memory: 512M + spark.driver.cores: 1 + spark.executor.memory: 512M + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 2000 + spark.streaming.backpressure.enabled: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 10000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4040 + + The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job. + + ================== + The parsing stream + ================== + + Here we are going to use a special processor (``KafkaConnectStructuredProviderService``) to use the kafka connect source as input for the structured stream defined below. + + For this example, we are going to use the source *com.datamountaineer.streamreactor.connect.blockchain.source.BlockchainSourceConnector* + that opens a secure websocket connections to the blockchain subscribing to any transaction update stream. + + + .. code-block:: yaml + + ControllerServiceConfigurations: + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + configuration: + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter.properties: | + schemas.enable=false + kc.data.key.converter: org.apache.kafka.connect.storage.StringConverter + kc.worker.tasks.max: 1 + kc.connector.class: com.datamountaineer.streamreactor.connect.blockchain.source.BlockchainSourceConnector + kc.connector.offset.backing.store: memory + kc.connector.properties: | + connect.blockchain.source.url=wss://ws.blockchain.info/inv + connect.blockchain.source.kafka.topic=blockchain + + + +.. note:: Our source is providing structured value hence we convert with LogInslandRecordConverter serializing with Kryo + + +.. code-block:: yaml + + # Kafka sink configuration + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +So that, we can now define the *parsing stream* using those source and sink + +.. code-block:: yaml + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``FlatMap`` processor takes out the value and key (required when using *StructuredStream* as source of records) + +.. code-block:: yaml + + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "Takes out data from record_value" + configuration: + keep.root.record: false + copy.root.record.fields: true + +=================== +The indexing stream +=================== + + +Inside this engine, you will run a Kafka stream of processing, so we set up input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our output records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``BulkAddElasticsearch`` takes care of indexing a ``Record`` sending it to elasticsearch. + +.. code-block:: yaml + + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that indexes processed events in elasticsearch + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type + + +In details, this processor makes use of a ``Elasticsearch_5_4_0_ClientService`` controller service to interact with our Elasticsearch 5.X backend +running locally (and started as part of the docker compose configuration we mentioned above). + +Here below its configuration: + +.. code-block:: yaml + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + +2. Launch the script +-------------------- +Connect a shell to your logisland container to launch the following streaming jobs. + +.. code-block:: sh + + bin/logisland.sh --conf conf/index-blockchain-transactions.yml + + +3. Do some insights and visualizations +-------------------------------------- + +With ElasticSearch, you can use Kibana. + +Open up your browser and go to http://sandbox:5601/app/kibana#/ and you should be able to explore the blockchain transactions. + + +Configure a new index pattern with ``logisland.*`` as the pattern name and ``@timestamp`` as the time value field. + +.. image:: /_static/kibana-configure-index.png + +Then if you go to Explore panel for the latest 15' time window you'll only see logisland process_metrics events which give you +insights about the processing bandwidth of your streams. + + +.. image:: /_static/kibana-blockchain-records.png + + +You can try as well to create some basic visualization in order to draw the total satoshi transacted amount (aggregating sums of ``out.value`` field). + +Below a nice example: + +.. image:: /_static/kibana-blockchain-dashboard.png + + +Ready to discover which addresses received most of the money? Give it a try ;-) + + +4. Monitor your spark jobs and Kafka topics +------------------------------------------- +Now go to `http://sandbox:4050/streaming/ `_ to see how fast Spark can process +your data + +.. image:: /_static/spark-job-monitoring.png + + +Another tool can help you to tweak and monitor your processing `http://sandbox:9000/ `_ + +.. image:: /_static/kafka-mgr.png + + diff --git a/logisland-documentation/tutorials/index-excel-spreadsheet.rst b/logisland-documentation/tutorials/index-excel-spreadsheet.rst new file mode 100644 index 000000000..be2a55ddb --- /dev/null +++ b/logisland-documentation/tutorials/index-excel-spreadsheet.rst @@ -0,0 +1,192 @@ +Extract Records from Excel File +=============================== + +In the following getting started tutorial we'll drive you through the process of extracting data from any Excel file with LogIsland platform. + +Both XLSX and old XLS file format are supported. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +Note, it is possible to store data in different datastores. In this tutorial, we will see the case of ElasticSearch only. + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/index-excel-spreadsheet.yml + +We will start by explaining each part of the config file. + +An Engine is needed to handle the stream processing. This ``conf/extract-excel-data.yml`` configuration file defines a stream processing job setup. +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode with 2 cpu cores and 2G of RAM. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index records of an excel file with LogIsland + configuration: + spark.app.name: IndexExcelDemo + spark.master: local[4] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job, here an Elasticsearch service that will be used later in the ``BulkAddElasticsearch`` processor. + +.. code-block:: yaml + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + +Inside this engine you will run a Kafka stream of processing, so we setup input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + + +We can define some serializers to marshall all records from and to a topic. +We assume that the stream will be serializing the input file as a byte array in a single record. Reason why we will use a ByteArraySerialiser in the configuration below. + +.. code-block:: yaml + + # main processing stream + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw excel file content into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.BytesArraySerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +Within this stream, an ``ExcelExtract`` processor takes a byte array excel file content and computes a list of ``Record``. + +.. code-block:: yaml + + # parse excel cells into records + - processor: excel_parser + component: com.hurence.logisland.processor.excel.ExcelExtract + type: parser + documentation: a parser that produce events from an excel file + configuration: + record.type: excel_record + skip.rows: 1 + field.names: segment,country,product,discount_band,units_sold,manufacturing,sale_price,gross_sales,discounts,sales,cogs,profit,record_time,month_number,month_name,year + + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will +be parsed as an event which will be pushed back to Kafka in the ``logisland_events`` topic. + +.. note:: + + Please note that we are mapping the excel column *Date* to be the timestamp of the produced record (*record_time* field) in order to use this as time reference in elasticsearch/kibana (see below). + +The second processor will handle ``Records`` produced by the ``ExcelExtract`` to index them into elasticsearch + +.. code-block:: yaml + + # add to elasticsearch + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that trace the processed events + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type + + +2. Launch the script +-------------------- +For this tutorial we will handle an excel file. We will process it with an ExcelExtract that will produce a bunch of Records and we'll send them to Elastiscearch +Connect a shell to your logisland container to launch the following streaming jobs. + +For ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/index-excel-spreadsheet.yml + +3. Inject an excel file into the system +--------------------------------------- +Now we're going to send a file to ``logisland_raw`` Kafka topic. + +For testing purposes, we will use `kafkacat `_, +a *generic command line non-JVM Apache Kafka producer and consumer* which can be easily installed. + +.. note:: + + Sending raw files through kafka is not recommended for production use since kafka is designed for high throughput and not big message size. + + +The configuration above is suited to work with the example file *Financial Sample.xlsx*. + +Let's send this file in a single message to LogIsland with kafkacat to ``logisland_raw`` Kafka topic + +.. code-block:: sh + + kafkacat -P -t logisland_raw -v -b sandbox:9092 ./Financial\ Sample.xlsx + + +5. Inspect the logs +--------------------------------- + +Kibana +"""""" + +With ElasticSearch, you can use Kibana. + +Open up your browser and go to `http://sandbox:5601/ `_ and you should be able to explore your excel records. + +Configure a new index pattern with ``logisland.*`` as the pattern name and ``@timestamp`` as the time value field. + +.. image:: /_static/kibana-configure-index.png + +Then if you go to Explore panel for the latest 5 years time window. You are now able to play with the indexed data. + +.. image:: /_static/kibana-excel-logs.png + + +*Thanks logisland! :-)* \ No newline at end of file diff --git a/logisland-documentation/tutorials/index.rst b/logisland-documentation/tutorials/index.rst index 35d5c0ffe..a9f443d19 100644 --- a/logisland-documentation/tutorials/index.rst +++ b/logisland-documentation/tutorials/index.rst @@ -24,6 +24,7 @@ Contents: prerequisites index-apache-logs + store-to-redis match-queries aggregate-events enrich-apache-logs @@ -31,4 +32,8 @@ Contents: indexing-bro-events indexing-netflow-events indexing-network-packets - + generate_unique_ids + index-blockchain-transactions + index-excel-spreadsheets + mqtt-to-historian + integrate-kafka-connect diff --git a/logisland-documentation/tutorials/integrate-kafka-connect.rst b/logisland-documentation/tutorials/integrate-kafka-connect.rst new file mode 100644 index 000000000..4a8a108b5 --- /dev/null +++ b/logisland-documentation/tutorials/integrate-kafka-connect.rst @@ -0,0 +1,259 @@ +Integrate Kafka Connect Sources & Sinks +======================================= + +In the following getting started tutorial, we'll focus on how to seamlessly integrate Kafka connect sources and sinks in logisland. + +We can call this functionality *Logisland connect*. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/logisland-kafka-connect.yml + + + +We will start by explaining each part of the config file. + +========== +The engine +========== + +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Use Kafka connectors with logisland + configuration: + spark.app.name: LogislandConnect + spark.master: local[2] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job. + +================== +The parsing stream +================== + +Here we are going to use a special processor (``KafkaConnectStructuredProviderService``) to use the kafka connect source as input for the structured stream defined below. + +For this example, we are going to use the source *com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector* that generates records containing fake personal data at rate of 100 messages/s. + + +.. code-block:: yaml + + # Our source service + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + documentation: A kafka source connector provider reading from its own source and providing structured streaming to the underlying layer + configuration: + # We will use the logisland record converter for both key and value + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + + kc.data.key.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.key.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + # Only one task to handle source input (unique) + kc.worker.tasks.max: 1 + # The kafka source connector to wrap (here we're using a simulator source) + kc.connector.class: com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector + # The properties for the connector (as per connector documentation) + kc.connector.properties: | + key.schema.fields=email + topic=simulator + value.schema.fields=email,firstName,middleName,lastName,telephoneNumber,dateOfBirth + # We are using a standalone source for testing. We can store processed offsets in memory + kc.connector.offset.backing.store: memory + +.. note:: + + The parameter **kc.connector.properties** contains the connector properties as you would have defined if you were using vanilla kafka connect. + + As well, we are using a *memory* offset backing store. In a distributed scenario, you may have chosen a *kafka* topic based one. + +Since each stream can be read and written, we are going to define as well a Kafka topic sink (``KafkaStructuredStreamProviderService``) that will be used as output for the structured stream defined below. + +.. code-block:: yaml + + # Kafka sink configuration + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +So that, we can now define the *parsing stream* using those source and sink + +.. code-block:: yaml + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``FlatMap`` processor takes out the value and key (required when using *StructuredStream* as source of records) + +.. code-block:: yaml + + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "Takes out data from record_value" + configuration: + keep.root.record: false + copy.root.record.fields: true + +=================== +The indexing stream +=================== + + +Inside this engine, you will run a Kafka stream of processing, so we set up input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our output records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``DebugStream`` processor takes a log line as a String and computes a ``Record`` as a sequence of fields. + +.. code-block:: yaml + + processorConfigurations: + # We just print the received records (but you may do something more interesting!) + - processor: stream_debugger + component: com.hurence.logisland.processor.DebugStream + type: processor + documentation: debug records + configuration: + event.serializer: json + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will be printed in the console and pushed back to Kafka in the ``logisland_events`` topic. + + + +2. Launch the script +-------------------- +Connect a shell to your logisland container to launch the following streaming jobs. + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/logisland-kafka-connect.yml + + +3. Examine your console output +------------------------------ + +Since we put a *DebugStream* processor, messages produced by our source connectors are then output to the console in json. + +.. code-block:: json + + 18/04/06 11:17:06 INFO DebugStream: { + "id" : "9b17a9ac-97c4-44ef-9168-d298e8c53d42", + "type" : "kafka_connect", + "creationDate" : 1523006216376, + "fields" : { + "record_id" : "9b17a9ac-97c4-44ef-9168-d298e8c53d42", + "firstName" : "London", + "lastName" : "Marks", + "telephoneNumber" : "005-694-4540", + "record_key" : { + "email" : "londonmarks@fake.com" + }, + "middleName" : "Anna", + "dateOfBirth" : 836179200000, + "record_time" : 1523006216376, + "record_type" : "kafka_connect", + "email" : "londonmarks@fake.com" + } + } + + + +4. Monitor your spark jobs and Kafka topics +------------------------------------------- +Now go to `http://sandbox:4050/streaming/ `_ to see how fast Spark can process +your data + +.. image:: /_static/spark-job-monitoring.png + + +Another tool can help you to tweak and monitor your processing `http://sandbox:9000/ `_ + +.. image:: /_static/kafka-mgr.png + + diff --git a/logisland-documentation/tutorials/prerequisites.rst b/logisland-documentation/tutorials/prerequisites.rst index 543bd9ac9..61c9995ae 100644 --- a/logisland-documentation/tutorials/prerequisites.rst +++ b/logisland-documentation/tutorials/prerequisites.rst @@ -15,72 +15,83 @@ To facilitate integration testing and to easily run tutorials, you can create a .. code-block:: yaml - # Zookeeper container 172.17.0.1 - zookeeper: - image: hurence/zookeeper - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - - # Kafka container - kafka: - image: hurence/kafka - hostname: kafka - container_name: kafka - links: - - zookeeper - ports: - - "9092:9092" - environment: - KAFKA_ADVERTISED_PORT: 9092 - KAFKA_ADVERTISED_HOST_NAME: sandbox - KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 - KAFKA_JMX_PORT: 7071 - - # ES container - elasticsearch: - environment: - - ES_JAVA_OPT="-Xms1G -Xmx1G" - - cluster.name=es-logisland - - http.host=0.0.0.0 - - transport.host=0.0.0.0 - - xpack.security.enabled=false - hostname: elasticsearch - container_name: elasticsearch - image: 'docker.elastic.co/elasticsearch/elasticsearch:5.4.0' - ports: - - '9200:9200' - - '9300:9300' - - # Kibana container - kibana: - environment: - - 'ELASTICSEARCH_URL=http://elasticsearch:9200' - image: 'docker.elastic.co/kibana/kibana:5.4.0' - container_name: kibana - links: - - elasticsearch - ports: - - '5601:5601' - - # Logisland container : does nothing but launching - logisland: - image: hurence/logisland - command: tail -f bin/logisland.sh - #command: bin/logisland.sh --conf /conf/index-apache-logs.yml - links: - - zookeeper - - kafka - - elasticsearch - ports: - - "4050:4050" - volumes: - - ./conf/logisland:/conf - - ./data/logisland:/data - container_name: logisland - extra_hosts: - - "sandbox:172.17.0.1" + version: "2" + services: + + zookeeper: + container_name: zookeeper + image: hurence/zookeeper + hostname: zookeeper + ports: + - "2181:2181" + + kafka: + container_name: kafka + image: hurence/kafka + hostname: kafka + links: + - zookeeper + ports: + - "9092:9092" + environment: + KAFKA_ADVERTISED_PORT: 9092 + KAFKA_ADVERTISED_HOST_NAME: sandbox + KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 + KAFKA_JMX_PORT: 7071 + + # ES container + elasticsearch: + container_name: elasticsearch + environment: + - ES_JAVA_OPT="-Xms1G -Xmx1G" + - cluster.name=es-logisland + - http.host=0.0.0.0 + - transport.host=0.0.0.0 + - xpack.security.enabled=false + hostname: elasticsearch + container_name: elasticsearch + image: 'docker.elastic.co/elasticsearch/elasticsearch:5.4.0' + ports: + - '9200:9200' + - '9300:9300' + + # Kibana container + kibana: + container_name: kibana + environment: + - 'ELASTICSEARCH_URL=http://elasticsearch:9200' + image: 'docker.elastic.co/kibana/kibana:5.4.0' + container_name: kibana + links: + - elasticsearch + ports: + - '5601:5601' + + # Logisland container : does nothing but launching + logisland: + container_name: logisland + image: hurence/logisland:0.13.0 + command: tail -f bin/logisland.sh + #command: bin/logisland.sh --conf /conf/index-apache-logs.yml + links: + - zookeeper + - kafka + - elasticsearch + - redis + ports: + - "4050:4050" + volumes: + - ./conf/logisland:/conf + - ./data/logisland:/data + container_name: logisland + extra_hosts: + - "sandbox:172.17.0.1" + + redis: + container_name: redis + image: 'redis:latest' + ports: + - '6379:6379' Once you have this file you can run a `docker-compose` command to launch all the needed services (zookeeper, kafka, es, kibana and logisland) @@ -115,10 +126,10 @@ From an edge node of your cluster : .. code-block:: sh cd /opt - sudo wget https://github.com/Hurence/logisland/releases/download/v0.12.2/logisland-0.12.2-bin-hdp2.5.tar.gz + sudo wget https://github.com/Hurence/logisland/releases/download/v0.13.0/logisland-0.13.0-bin-hdp2.5.tar.gz export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ export HADOOP_CONF_DIR=$SPARK_HOME/conf - sudo /opt/logisland-0.12.2/bin/logisland.sh --conf /home/hurence/tom/logisland-conf/v0.10.0/future-factory.yml + sudo /opt/logisland-0.13.0/bin/logisland.sh --conf /home/hurence/tom/logisland-conf/v0.10.0/future-factory.yml diff --git a/logisland-documentation/tutorials/store-to-redis.rst b/logisland-documentation/tutorials/store-to-redis.rst new file mode 100644 index 000000000..f7e69291a --- /dev/null +++ b/logisland-documentation/tutorials/store-to-redis.rst @@ -0,0 +1,180 @@ +Store Apache logs to Redis K/V store +==================================== + +In the following getting started tutorial we'll drive you through the process of Apache log mining with LogIsland platform. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +Note, it is possible to store data in different datastores. In this tutorial, we will see the case of Redis, if you need more in-depth explanations you can read the previous tutorial on indexing apache logs to elasticsearch or solr : `index-apache-logs.html`_ . + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/store-to-redis.yml + +We will start by explaining each part of the config file. + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job, here a Redis KV cache service that will be used later in the ``BulkPut`` processor. + +.. code-block:: yaml + + - controllerService: datastore_service + component: com.hurence.logisland.redis.service.RedisKeyValueCacheService + type: service + documentation: redis datastore service + configuration: + connection.string: localhost:6379 + redis.mode: standalone + database.index: 0 + communication.timeout: 10 seconds + pool.max.total: 8 + pool.max.idle: 8 + pool.min.idle: 0 + pool.block.when.exhausted: true + pool.max.wait.time: 10 seconds + pool.min.evictable.idle.time: 60 seconds + pool.time.between.eviction.runs: 30 seconds + pool.num.tests.per.eviction.run: -1 + pool.test.on.create: false + pool.test.on.borrow: false + pool.test.on.return: false + pool.test.while.idle: true + record.recordSerializer: com.hurence.logisland.serializer.JsonSerializer + + +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our ouput records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw apache logs into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: none + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +Within this stream a ``SplitText`` processor takes a log line as a String and computes a ``Record`` as a sequence of fields. + +.. code-block:: yaml + + # parse apache logs + - processor: apache_parser + component: com.hurence.logisland.processor.SplitText + type: parser + documentation: a parser that produce events from an apache log REGEX + configuration: + value.regex: (\S+)\s+(\S+)\s+(\S+)\s+\[([\w:\/]+\s[+\-]\d{4})\]\s+"(\S+)\s+(\S+)\s*(\S*)"\s+(\S+)\s+(\S+) + value.fields: src_ip,identd,user,record_time,http_method,http_query,http_version,http_status,bytes_out + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will +be parsed as an event which will be pushed back to Kafka in the ``logisland_events`` topic. + +The second processor will handle ``Records`` produced by the ``SplitText`` to index them into datastore previously defined (Redis) + +.. code-block:: yaml + + # all the parsed records are added to datastore by bulk + - processor: datastore_publisher + component: com.hurence.logisland.processor.datastore.BulkPut + type: processor + documentation: "indexes processed events in datastore" + configuration: + datastore.client.service: datastore_service + + + +2. Launch the script +-------------------- +For this tutorial we will handle some apache logs with a splitText parser and send them to Elastiscearch +Connect a shell to your logisland container to launch the following streaming jobs. + +For ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/store-to-redis.yml + + +3. Inject some Apache logs into the system +------------------------------------------ +Now we're going to send some logs to ``logisland_raw`` Kafka topic. + +We could setup a logstash or flume agent to load some apache logs into a kafka topic +but there's a super useful tool in the Kafka ecosystem : `kafkacat `_, +a *generic command line non-JVM Apache Kafka producer and consumer* which can be easily installed. + + +If you don't have your own httpd logs available, you can use some freely available log files from +`NASA-HTTP `_ web site access: + +- `Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed `_ +- `Aug 04 to Aug 31, ASCII format, 21.8 MB gzip compressed `_ + +Let's send the first 500000 lines of NASA http access over July 1995 to LogIsland with kafkacat to ``logisland_raw`` Kafka topic + +.. code-block:: sh + + cd /tmp + wget ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz + gunzip NASA_access_log_Jul95.gz + head -500000 NASA_access_log_Jul95 | kafkacat -b sandbox:9092 -t logisland_raw + + + +4. Inspect the logs +------------------- + +For this part of the tutorial we will use `redis-py a Python client for Redis `_. You can install it by following instructions given on `redis-py `_. + +To install redis-py, simply: + +.. code-block:: sh + + $ sudo pip install redis + + +Getting Started, check if you can connect with Redis + +.. code-block:: python + + >>> import redis + >>> r = redis.StrictRedis(host='localhost', port=6379, db=0) + >>> r.set('foo', 'bar') + >>> r.get('foo') + +Then we want to grab some logs that have been collected to Redis. We first find some keys with a pattern and get the json content of one + +.. code-block:: python + + >>> r.keys('1234*') +['123493eb-93df-4e57-a1c1-4a8e844fa92c', '123457d5-8ccc-4f0f-b4ba-d70967aa48eb', '12345e06-6d72-4ce8-8254-a7cc4bab5e31'] + + >>> r.get('123493eb-93df-4e57-a1c1-4a8e844fa92c') +'{\n "id" : "123493eb-93df-4e57-a1c1-4a8e844fa92c",\n "type" : "apache_log",\n "creationDate" : 804574829000,\n "fields" : {\n "src_ip" : "204.191.209.4",\n "record_id" : "123493eb-93df-4e57-a1c1-4a8e844fa92c",\n "http_method" : "GET",\n "http_query" : "/images/WORLD-logosmall.gif",\n "bytes_out" : "669",\n "identd" : "-",\n "http_version" : "HTTP/1.0",\n "record_raw_value" : "204.191.209.4 - - [01/Jul/1995:01:00:29 -0400] \\"GET /images/WORLD-logosmall.gif HTTP/1.0\\" 200 669",\n "http_status" : "200",\n "record_time" : 804574829000,\n "user" : "-",\n "record_type" : "apache_log"\n }\n}' + + >>> import json + >>> record = json.loads(r.get('123493eb-93df-4e57-a1c1-4a8e844fa92c')) + >>> record['fields']['bytes_out'] + diff --git a/logisland-engines/logisland-spark_1_6-engine/pom.xml b/logisland-engines/logisland-spark_1_6-engine/pom.xml index cb5ebac07..50b24b778 100644 --- a/logisland-engines/logisland-spark_1_6-engine/pom.xml +++ b/logisland-engines/logisland-spark_1_6-engine/pom.xml @@ -23,7 +23,7 @@ http://www.w3.org/2001/XMLSchema-instance "> com.hurence.logisland logisland-engines - 0.12.2 + 0.13.0 logisland-spark_1_6-engine_${scala.binary.version} jar diff --git a/logisland-engines/logisland-spark_1_6-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java b/logisland-engines/logisland-spark_1_6-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java index 36f7c1d62..14b2447b4 100644 --- a/logisland-engines/logisland-spark_1_6-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java +++ b/logisland-engines/logisland-spark_1_6-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java @@ -16,14 +16,16 @@ package com.hurence.logisland.util.spark; +import com.hurence.logisland.metrics.Names; import com.hurence.logisland.record.FieldDictionary; import com.hurence.logisland.record.Record; import org.apache.spark.groupon.metrics.UserMetricsSystem; import org.slf4j.Logger; import org.slf4j.LoggerFactory; - -import java.util.*; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; /** @@ -34,24 +36,23 @@ public class ProcessorMetrics { public synchronized static void resetMetrics(final String metricPrefix) { - logger.info("reseting metrics " +metricPrefix ); - UserMetricsSystem.gauge(metricPrefix + "incoming_messages").set(0); - UserMetricsSystem.gauge(metricPrefix + "incoming_records").set(0); - UserMetricsSystem.gauge(metricPrefix + "outgoing_records").set(0); - UserMetricsSystem.gauge(metricPrefix + "errors").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_record_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "processed_bytes").set(0); - UserMetricsSystem.gauge(metricPrefix + "processed_fields").set(0); - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(0); - UserMetricsSystem.gauge(metricPrefix + "fields_per_record_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "processing_time_ms").set(0); + logger.info("reseting metrics " + metricPrefix); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_MESSAGES).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_RECORDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.OUTGOING_RECORDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.ERRORS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_RECORD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_BYTES).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_FIELDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.FIELDS_PER_RECORD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSING_TIME_MS).set(0); } - /** * publish * @@ -73,12 +74,12 @@ public synchronized static void computeMetrics( if ((outgoingEvents != null) && (outgoingEvents.size() != 0)) { - UserMetricsSystem.gauge(metricPrefix + "incoming_messages").set(untilOffset - fromOffset); - UserMetricsSystem.gauge(metricPrefix + "incoming_records").set(incomingEvents.size()); - UserMetricsSystem.gauge(metricPrefix + "outgoing_records").set(outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_MESSAGES).set(untilOffset - fromOffset); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_RECORDS).set(incomingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.OUTGOING_RECORDS).set(outgoingEvents.size()); long errorCount = outgoingEvents.stream().filter(r -> r.hasField(FieldDictionary.RECORD_ERRORS)).count(); - UserMetricsSystem.gauge(metricPrefix + "errors").set(errorCount); + UserMetricsSystem.gauge(metricPrefix + Names.ERRORS).set(errorCount); if (outgoingEvents.size() != 0) { final List recordSizesInBytes = new ArrayList<>(); final List recordNumberOfFields = new ArrayList<>(); @@ -92,32 +93,32 @@ public synchronized static void computeMetrics( final int numberOfProcessedFields = recordNumberOfFields.stream().mapToInt(Integer::intValue).sum(); if (numberOfProcessedFields != 0) { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(numberOfProcessedBytes / numberOfProcessedFields); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(numberOfProcessedBytes / numberOfProcessedFields); } else { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(0); } if (processingDurationInMillis != 0) { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(numberOfProcessedBytes * 1000 / processingDurationInMillis); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(outgoingEvents.size() * 1000 / processingDurationInMillis); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(numberOfProcessedBytes * 1000 / processingDurationInMillis); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(outgoingEvents.size() * 1000 / processingDurationInMillis); } else { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(0); } - UserMetricsSystem.gauge(metricPrefix + "processed_bytes").set(numberOfProcessedBytes); - UserMetricsSystem.gauge(metricPrefix + "processed_fields").set(numberOfProcessedFields); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_BYTES).set(numberOfProcessedBytes); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_FIELDS).set(numberOfProcessedFields); - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set((long) (100.0f * errorCount / outgoingEvents.size())); - UserMetricsSystem.gauge(metricPrefix + "fields_per_record_average").set(numberOfProcessedFields / outgoingEvents.size()); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_record_average").set(numberOfProcessedBytes / outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set((long) (100.0f * errorCount / outgoingEvents.size())); + UserMetricsSystem.gauge(metricPrefix + Names.FIELDS_PER_RECORD_AVERAGE).set(numberOfProcessedFields / outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_RECORD_AVERAGE).set(numberOfProcessedBytes / outgoingEvents.size()); } else if (errorCount > 0) - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(100L); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(100L); else - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(0L); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(0L); - UserMetricsSystem.gauge(metricPrefix + "processing_time_ms").set(processingDurationInMillis); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSING_TIME_MS).set(processingDurationInMillis); } } diff --git a/logisland-engines/logisland-spark_2_1-engine/pom.xml b/logisland-engines/logisland-spark_2_1-engine/pom.xml index 192d14b38..ad40007d0 100644 --- a/logisland-engines/logisland-spark_2_1-engine/pom.xml +++ b/logisland-engines/logisland-spark_2_1-engine/pom.xml @@ -23,7 +23,7 @@ http://www.w3.org/2001/XMLSchema-instance "> com.hurence.logisland logisland-engines - 0.12.2 + 0.13.0 logisland-spark_2_1-engine_${scala.binary.version} jar @@ -57,10 +57,7 @@ http://www.w3.org/2001/XMLSchema-instance "> org.slf4j slf4j-api - - ch.qos.logback - logback-classic - + com.groupon.dse spark-metrics @@ -123,6 +120,7 @@ http://www.w3.org/2001/XMLSchema-instance "> junit + org.apache.kafka kafka_${scala.binary.version} diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java b/logisland-engines/logisland-spark_2_1-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java index 1f54c4644..febb64205 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/java/com/hurence/logisland/util/spark/ProcessorMetrics.java @@ -16,6 +16,7 @@ package com.hurence.logisland.util.spark; +import com.hurence.logisland.metrics.Names; import com.hurence.logisland.record.FieldDictionary; import com.hurence.logisland.record.Record; import org.apache.spark.groupon.metrics.UserMetricsSystem; @@ -34,18 +35,18 @@ public class ProcessorMetrics { private static Logger logger = LoggerFactory.getLogger(ProcessorMetrics.class.getName()); public synchronized static void resetMetrics(final String metricPrefix) { - UserMetricsSystem.gauge(metricPrefix + "incoming_messages").set(0); - UserMetricsSystem.gauge(metricPrefix + "incoming_records").set(0); - UserMetricsSystem.gauge(metricPrefix + "outgoing_records").set(0); - UserMetricsSystem.gauge(metricPrefix + "errors").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_record_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "processed_bytes").set(0); - UserMetricsSystem.gauge(metricPrefix + "processed_fields").set(0); - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(0); - UserMetricsSystem.gauge(metricPrefix + "fields_per_record_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(0); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_MESSAGES).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_RECORDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.OUTGOING_RECORDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.ERRORS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_RECORD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_BYTES).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_FIELDS).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.FIELDS_PER_RECORD_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(0); //UserMetricsSystem.gauge(metricPrefix + "processing_time_ms").set(0); } @@ -71,9 +72,9 @@ public synchronized static void computeMetrics( if ((outgoingEvents != null) && (outgoingEvents.size() != 0)) { - UserMetricsSystem.gauge(metricPrefix + "incoming_messages").set(untilOffset - fromOffset); - UserMetricsSystem.gauge(metricPrefix + "incoming_records").set(incomingEvents.size()); - UserMetricsSystem.gauge(metricPrefix + "outgoing_records").set(outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_MESSAGES).set(untilOffset - fromOffset); + UserMetricsSystem.gauge(metricPrefix + Names.INCOMING_RECORDS).set(incomingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.OUTGOING_RECORDS).set(outgoingEvents.size()); long errorCount = outgoingEvents.stream().filter(r -> r.hasField(FieldDictionary.RECORD_ERRORS)).count(); UserMetricsSystem.gauge(metricPrefix + "errors").set(errorCount); @@ -90,29 +91,29 @@ public synchronized static void computeMetrics( final int numberOfProcessedFields = recordNumberOfFields.stream().mapToInt(Integer::intValue).sum(); if (numberOfProcessedFields != 0) { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(numberOfProcessedBytes / numberOfProcessedFields); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(numberOfProcessedBytes / numberOfProcessedFields); } else { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_field_average").set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_FIELD_AVERAGE).set(0); } if (processingDurationInMillis != 0) { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(numberOfProcessedBytes * 1000 / processingDurationInMillis); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(outgoingEvents.size() * 1000 / processingDurationInMillis); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(numberOfProcessedBytes * 1000 / processingDurationInMillis); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(outgoingEvents.size() * 1000 / processingDurationInMillis); } else { - UserMetricsSystem.gauge(metricPrefix + "bytes_per_second_average").set(0); - UserMetricsSystem.gauge(metricPrefix + "records_per_second_average").set(0); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_SECOND_AVERAGE).set(0); + UserMetricsSystem.gauge(metricPrefix + Names.RECORDS_PER_SECOND_AVERAGE).set(0); } - UserMetricsSystem.gauge(metricPrefix + "processed_bytes").set(numberOfProcessedBytes); - UserMetricsSystem.gauge(metricPrefix + "processed_fields").set(numberOfProcessedFields); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_BYTES).set(numberOfProcessedBytes); + UserMetricsSystem.gauge(metricPrefix + Names.PROCESSED_FIELDS).set(numberOfProcessedFields); - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set((long) (100.0f * errorCount / outgoingEvents.size())); - UserMetricsSystem.gauge(metricPrefix + "fields_per_record_average").set(numberOfProcessedFields / outgoingEvents.size()); - UserMetricsSystem.gauge(metricPrefix + "bytes_per_record_average").set(numberOfProcessedBytes / outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set((long) (100.0f * errorCount / outgoingEvents.size())); + UserMetricsSystem.gauge(metricPrefix + Names.FIELDS_PER_RECORD_AVERAGE).set(numberOfProcessedFields / outgoingEvents.size()); + UserMetricsSystem.gauge(metricPrefix + Names.BYTES_PER_RECORD_AVERAGE).set(numberOfProcessedBytes / outgoingEvents.size()); } else if (errorCount > 0) - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(100L); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(100L); else - UserMetricsSystem.gauge(metricPrefix + "error_percentage").set(0L); + UserMetricsSystem.gauge(metricPrefix + Names.ERROR_PERCENTAGE).set(0L); // UserMetricsSystem.gauge(metricPrefix + "processing_time_ms").set(processingDurationInMillis); diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/package.scala b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/package.scala index 5b13b878c..d07879bd9 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/package.scala +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/package.scala @@ -270,97 +270,97 @@ object StreamProperties { .build - ////////////////////////////////////// - // MQTT options - ////////////////////////////////////// - - val MQTT_BROKER_URL: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.broker.url") - .description("brokerUrl A url MqttClient connects to. Set this or path as the url of the Mqtt Server. e.g. tcp://localhost:1883") - .addValidator(StandardValidators.URL_VALIDATOR) - .defaultValue("tcp://localhost:1883") - .required(false) - .build - - val MQTT_PERSISTENCE: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.persistence") - .description("persistence By default it is used for storing incoming messages on disk. " + - "If memory is provided as value for this option, then recovery on restart is not supported.") - .defaultValue("memory") - .required(false) - .build - - val MQTT_TOPIC: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.topic") - .description("Topic MqttClient subscribes to.") - .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) - .required(true) - .build - - val MQTT_CLIENTID: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.client.id") - .description("clientID this client is associated. Provide the same value to recover a stopped client.") - .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) - .required(true) - .build - - val MQTT_QOS: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.qos") - .description(" QoS The maximum quality of service to subscribe each topic at.Messages published at a lower " + - "quality of service will be received at the published QoS.Messages published at a higher quality of " + - "service will be received using the QoS specified on the subscribe") - .addValidator(StandardValidators.INTEGER_VALIDATOR) - .defaultValue("0") - .required(false) - .build - - val MQTT_USERNAME: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.username") - .description(" username Sets the user name to use for the connection to Mqtt Server. " + - "Do not set it, if server does not need this. Setting it empty will lead to errors.") - .required(false) - .build - - val MQTT_PASSWORD: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.password") - .description("password Sets the password to use for the connection") - .required(false) - .build - - val MQTT_CLEAN_SESSION: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.clean.session") - .description("cleanSession Setting it true starts a clean session, removes all checkpointed messages by " + - "a previous run of this source. This is set to false by default.") - .addValidator(StandardValidators.BOOLEAN_VALIDATOR) - .defaultValue("true") - .required(false) - .build - - val MQTT_CONNECTION_TIMEOUT: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.connection.timeout") - .description("connectionTimeout Sets the connection timeout, a value of 0 is interpreted as " + - "wait until client connects. See MqttConnectOptions.setConnectionTimeout for more information") - .addValidator(StandardValidators.INTEGER_VALIDATOR) - .defaultValue("5000") - .required(false) - .build - - val MQTT_KEEP_ALIVE: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.keep.alive") - .description("keepAlive Same as MqttConnectOptions.setKeepAliveInterval.") - .addValidator(StandardValidators.INTEGER_VALIDATOR) - .defaultValue("5000") - .required(false) - .build - - - val MQTT_VERSION: PropertyDescriptor = new PropertyDescriptor.Builder() - .name("mqtt.version") - .description("mqttVersion Same as MqttConnectOptions.setMqttVersion") - .addValidator(StandardValidators.INTEGER_VALIDATOR) - .defaultValue("5000") - .required(false) - .build + ////////////////////////////////////// + // MQTT options + ////////////////////////////////////// + + val MQTT_BROKER_URL: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.broker.url") + .description("brokerUrl A url MqttClient connects to. Set this or path as the url of the Mqtt Server. e.g. tcp://localhost:1883") + .addValidator(StandardValidators.URL_VALIDATOR) + .defaultValue("tcp://localhost:1883") + .required(false) + .build + + val MQTT_PERSISTENCE: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.persistence") + .description("persistence By default it is used for storing incoming messages on disk. " + + "If memory is provided as value for this option, then recovery on restart is not supported.") + .defaultValue("memory") + .required(false) + .build + + val MQTT_TOPIC: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.topic") + .description("Topic MqttClient subscribes to.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) + .build + + val MQTT_CLIENTID: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.client.id") + .description("clientID this client is associated. Provide the same value to recover a stopped client.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) + .build + + val MQTT_QOS: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.qos") + .description(" QoS The maximum quality of service to subscribe each topic at.Messages published at a lower " + + "quality of service will be received at the published QoS.Messages published at a higher quality of " + + "service will be received using the QoS specified on the subscribe") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("0") + .required(false) + .build + + val MQTT_USERNAME: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.username") + .description(" username Sets the user name to use for the connection to Mqtt Server. " + + "Do not set it, if server does not need this. Setting it empty will lead to errors.") + .required(false) + .build + + val MQTT_PASSWORD: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.password") + .description("password Sets the password to use for the connection") + .required(false) + .build + + val MQTT_CLEAN_SESSION: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.clean.session") + .description("cleanSession Setting it true starts a clean session, removes all checkpointed messages by " + + "a previous run of this source. This is set to false by default.") + .addValidator(StandardValidators.BOOLEAN_VALIDATOR) + .defaultValue("true") + .required(false) + .build + + val MQTT_CONNECTION_TIMEOUT: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.connection.timeout") + .description("connectionTimeout Sets the connection timeout, a value of 0 is interpreted as " + + "wait until client connects. See MqttConnectOptions.setConnectionTimeout for more information") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("5000") + .required(false) + .build + + val MQTT_KEEP_ALIVE: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.keep.alive") + .description("keepAlive Same as MqttConnectOptions.setKeepAliveInterval.") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("5000") + .required(false) + .build + + + val MQTT_VERSION: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("mqtt.version") + .description("mqttVersion Same as MqttConnectOptions.setMqttVersion") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("5000") + .required(false) + .build val READ_TOPICS: PropertyDescriptor = new PropertyDescriptor.Builder() .name("read.topics") @@ -374,7 +374,16 @@ object StreamProperties { .description("the serializer to use") .required(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) - .allowableValues(KRYO_SERIALIZER, JSON_SERIALIZER, AVRO_SERIALIZER, BYTESARRAY_SERIALIZER, NO_SERIALIZER,KURA_PROTOCOL_BUFFER_SERIALIZER) + .allowableValues(KRYO_SERIALIZER, JSON_SERIALIZER, AVRO_SERIALIZER, BYTESARRAY_SERIALIZER, NO_SERIALIZER, KURA_PROTOCOL_BUFFER_SERIALIZER) + .defaultValue(NO_SERIALIZER.getValue) + .build + + val READ_TOPICS_KEY_SERIALIZER: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("read.topics.key.serializer") + .description("The key serializer to use") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues(KRYO_SERIALIZER, JSON_SERIALIZER, NO_SERIALIZER) .defaultValue(NO_SERIALIZER.getValue) .build @@ -402,6 +411,15 @@ object StreamProperties { .defaultValue(NO_SERIALIZER.getValue) .build + val WRITE_TOPICS_KEY_SERIALIZER: PropertyDescriptor = new PropertyDescriptor.Builder() + .name("write.topics.key.serializer") + .description("The key serializer to use") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues(KRYO_SERIALIZER, JSON_SERIALIZER, AVRO_SERIALIZER, BYTESARRAY_SERIALIZER, NO_SERIALIZER, KURA_PROTOCOL_BUFFER_SERIALIZER) + .defaultValue(NO_SERIALIZER.getValue) + .build + val WRITE_TOPICS_CLIENT_SERVICE: PropertyDescriptor = new PropertyDescriptor.Builder() .name("write.topics.client.service") .description("the controller service that gives connection information") @@ -410,11 +428,6 @@ object StreamProperties { .build - - - - - ////////////////////////////////////// // HDFS options ////////////////////////////////////// @@ -489,4 +502,7 @@ object StreamProperties { .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .defaultValue("aggregation") .build + + + } diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/StructuredStream.scala b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/StructuredStream.scala index a35034980..b968a4499 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/StructuredStream.scala +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/StructuredStream.scala @@ -29,6 +29,7 @@ import com.hurence.logisland.stream.spark.structured.provider.StructuredStreamPr import com.hurence.logisland.stream.{AbstractRecordStream, StreamContext} import com.hurence.logisland.util.spark._ import org.apache.spark.broadcast.Broadcast +import org.apache.spark.groupon.metrics.UserMetricsSystem import org.apache.spark.sql.SparkSession import org.apache.spark.streaming.StreamingContext @@ -58,9 +59,11 @@ class StructuredStream extends AbstractRecordStream with SparkRecordStream { descriptors.add(READ_TOPICS) descriptors.add(READ_TOPICS_CLIENT_SERVICE) descriptors.add(READ_TOPICS_SERIALIZER) + descriptors.add(READ_TOPICS_KEY_SERIALIZER) descriptors.add(WRITE_TOPICS) descriptors.add(WRITE_TOPICS_CLIENT_SERVICE) descriptors.add(WRITE_TOPICS_SERIALIZER) + descriptors.add(WRITE_TOPICS_KEY_SERIALIZER) descriptors.add(LOGISLAND_AGENT_HOST) descriptors.add(LOGISLAND_AGENT_PULL_THROTTLING) @@ -86,6 +89,10 @@ class StructuredStream extends AbstractRecordStream with SparkRecordStream { val agentQuorum = streamContext.getPropertyValue(LOGISLAND_AGENT_HOST).asString val throttling = streamContext.getPropertyValue(LOGISLAND_AGENT_PULL_THROTTLING).asInteger() + val pipelineMetricPrefix = streamContext.getIdentifier /*+ ".partition" + partitionId*/ + "." + val pipelineTimerContext = UserMetricsSystem.timer(pipelineMetricPrefix + "Pipeline.processing_time_ms").time() + + restApiSink = ssc.sparkContext.broadcast(RestJobsApiClientSink(agentQuorum)) controllerServiceLookupSink = ssc.sparkContext.broadcast( ControllerServiceLookupSink(engineContext.getControllerServiceConfigurations) @@ -108,9 +115,9 @@ class StructuredStream extends AbstractRecordStream with SparkRecordStream { val readDF = readStreamService.load(spark, controllerServiceLookupSink, streamContext) - // store current configuration version currentJobVersion = restApiSink.value.getJobApiClient.getJobVersion(appName) + updateConfigFromAgent(agentQuorum, throttling) // apply windowing /*val windowedDF:Dataset[Record] = if (streamContext.getPropertyValue(WINDOW_DURATION).isSet) { @@ -136,10 +143,10 @@ class StructuredStream extends AbstractRecordStream with SparkRecordStream { // Write key-value data from a DataFrame to a specific Kafka topic specified in an option val ds = writeStreamService.save(readDF, streamContext) + pipelineTimerContext.stop() } catch { case ex: Throwable => - ex.printStackTrace() logger.error("something bad happened, please check Kafka or Zookeeper health : {}", ex) } } diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/ConsoleStructuredStreamProviderService.scala b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/ConsoleStructuredStreamProviderService.scala index 3f3fd5655..31265036e 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/ConsoleStructuredStreamProviderService.scala +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/ConsoleStructuredStreamProviderService.scala @@ -84,8 +84,7 @@ class ConsoleStructuredStreamProviderService extends AbstractControllerService w val df2 = df - .map(r => (r.getField(FieldDictionary.RECORD_NAME).asString(), r.getField(FieldDictionary.RECORD_VALUE).asString())) - .toDF("metric", "value") + .writeStream .format("console") .start() diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/KafkaStructuredStreamProviderService.scala b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/KafkaStructuredStreamProviderService.scala index 700104aa9..07d726bca 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/KafkaStructuredStreamProviderService.scala +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/KafkaStructuredStreamProviderService.scala @@ -1,3 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + package com.hurence.logisland.stream.spark.structured.provider import java.util @@ -10,50 +27,28 @@ import com.hurence.logisland.record.{FieldDictionary, FieldType, Record, Standar import com.hurence.logisland.stream.StreamContext import com.hurence.logisland.stream.StreamProperties._ import com.hurence.logisland.util.kafka.KafkaSink -import com.hurence.logisland.util.spark.{ControllerServiceLookupSink, RestJobsApiClientSink} import kafka.admin.AdminUtils import kafka.utils.ZkUtils import org.apache.kafka.clients.consumer.ConsumerConfig import org.apache.kafka.clients.producer.ProducerConfig import org.apache.kafka.common.security.JaasUtils import org.apache.kafka.common.serialization.{ByteArrayDeserializer, ByteArraySerializer} -import org.apache.spark.broadcast.Broadcast -import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} -import org.slf4j.LoggerFactory +import org.apache.spark.sql.{Dataset, ForeachWriter, SparkSession} + -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ class KafkaStructuredStreamProviderService() extends AbstractControllerService with StructuredStreamProviderService { - // private val logger = LoggerFactory.getLogger(this.getClass) + // private val logger = LoggerFactory.getLogger(this.getClass) var appName = "" - var kafkaSink: Broadcast[KafkaSink] = _ - var restApiSink: Broadcast[RestJobsApiClientSink] = _ - var controllerServiceLookupSink: Broadcast[ControllerServiceLookupSink] = _ - - - + var kafkaSinkParams: Map[String, Object] = _ + var kafkaParams: Map[String, Object] = _ // Define the Kafka parameters, broker list must be specified var inputTopics = Set[String]() - var outputTopics = Set[String]() - var errorTopics = Set[String]() - var metricsTopics = Set[String]() + var outputTopics = Set[String]() + var errorTopics = Set[String]() + var metricsTopics = Set[String]() var topicAutocreate = true var topicDefaultPartitions = 3 var topicDefaultReplicationFactor = 1 @@ -85,11 +80,11 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w kafkaBatchSize = context.getPropertyValue(KAFKA_BATCH_SIZE).asString kafkaLingerMs = context.getPropertyValue(KAFKA_LINGER_MS).asString - kafkaAcks = context.getPropertyValue(KAFKA_ACKS).asString - kafkaOffset = context.getPropertyValue(KAFKA_MANUAL_OFFSET_RESET).asString + kafkaAcks = context.getPropertyValue(KAFKA_ACKS).asString + kafkaOffset = context.getPropertyValue(KAFKA_MANUAL_OFFSET_RESET).asString - val kafkaSinkParams = Map( + kafkaSinkParams = Map( ProducerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokerList, ProducerConfig.CLIENT_ID_CONFIG -> appName, ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG -> classOf[ByteArraySerializer].getCanonicalName, @@ -101,8 +96,6 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w ProducerConfig.RETRY_BACKOFF_MS_CONFIG -> "1000", ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG -> "1000") - // kafkaSink = spark.sparkContext.broadcast(KafkaSink(kafkaSinkParams)) - // TODO deprecate topic creation here (must be done through the agent) if (topicAutocreate) { @@ -114,7 +107,7 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w } - val kafkaParams = Map[String, Object]( + kafkaParams = Map[String, Object]( ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokerList, ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[ByteArrayDeserializer], ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[ByteArrayDeserializer], @@ -134,6 +127,7 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w } } } + /** * create a streaming DataFrame that represents data received * @@ -157,12 +151,10 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w .load() .as[(String, String)] .map(r => { - new StandardRecord("kura_metric") + new StandardRecord(inputTopics.head) .setField(FieldDictionary.RECORD_KEY, FieldType.BYTES, r._1) .setField(FieldDictionary.RECORD_VALUE, FieldType.BYTES, r._2) - } ) - /* df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") - .as[(String, String)]*/ + }) df } @@ -229,8 +221,31 @@ class KafkaStructuredStreamProviderService() extends AbstractControllerService w * @return DataFrame currently loaded */ override def write(df: Dataset[Record], streamContext: StreamContext) = { + //implicit val myObjEncoder = org.apache.spark.sql.Encoders.tuple(Encoders.BINARY, Encoders.BINARY) + val writer = new ForeachWriter[Record] { + var sender: KafkaSink = _ + + override def open(partitionId: Long, version: Long) = { + sender = KafkaSink(kafkaSinkParams) + true + } + override def process(value: Record) = { + sender.send(outputTopics.mkString(","), + value.getField(FieldDictionary.RECORD_KEY).getRawValue().asInstanceOf[Array[Byte]], + value.getField(FieldDictionary.RECORD_VALUE).getRawValue().asInstanceOf[Array[Byte]]) + } + + override def close(errorOrNull: Throwable) = { + if (errorOrNull != null) { + logger.error("Error occurred", errorOrNull) + } + sender.producer.close(); + } + } - df + df.writeStream + .foreach(writer) + .start() } } diff --git a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/StructuredStreamProviderService.scala b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/StructuredStreamProviderService.scala index b40c75d48..e5c9a6c6e 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/StructuredStreamProviderService.scala +++ b/logisland-engines/logisland-spark_2_1-engine/src/main/scala/com/hurence/logisland/stream/spark/structured/provider/StructuredStreamProviderService.scala @@ -4,11 +4,11 @@ import java.io.{ByteArrayInputStream, ByteArrayOutputStream} import java.util import com.hurence.logisland.controller.ControllerService -import com.hurence.logisland.record.{FieldDictionary, Record} +import com.hurence.logisland.record._ import com.hurence.logisland.serializer.{RecordSerializer, SerializerProvider} import com.hurence.logisland.stream.StreamContext import com.hurence.logisland.stream.StreamProperties._ -import com.hurence.logisland.util.spark.ControllerServiceLookupSink +import com.hurence.logisland.util.spark.{ControllerServiceLookupSink, ProcessorMetrics} import org.apache.spark.broadcast.Broadcast import org.apache.spark.groupon.metrics.UserMetricsSystem import org.apache.spark.sql.{Dataset, SparkSession} @@ -79,15 +79,20 @@ trait StructuredStreamProviderService extends ControllerService { streamContext.getPropertyValue(READ_TOPICS_SERIALIZER).asString, streamContext.getPropertyValue(AVRO_INPUT_SCHEMA).asString) + val keySerializer = SerializerProvider.getSerializer( + streamContext.getPropertyValue(READ_TOPICS_KEY_SERIALIZER).asString, + null) + val pipelineMetricPrefix = streamContext.getIdentifier /*+ ".partition" + partitionId*/ + "." - val pipelineTimerContext = UserMetricsSystem.timer(pipelineMetricPrefix + "Pipeline.processing_time_ms").time() // convert to logisland records - val incomingEvents = iterator.toList + val inEnvents = iterator.toList + + val incomingEvents = inEnvents .flatMap(r => { - var processingRecords: util.Collection[Record] = deserializeRecords(serializer, r).toList + var processingRecords: util.Collection[Record] = deserializeRecords(serializer, keySerializer, r).toList // loop over processor chain streamContext.getProcessContexts.foreach(processorContext => { @@ -111,13 +116,13 @@ trait StructuredStreamProviderService extends ControllerService { processingRecords = processor.process(processorContext, processingRecords) // compute metrics - /* ProcessorMetrics.computeMetrics( - pipelineMetricPrefix + processorContext.getName + ".", - incomingEvents.toList, - processingRecords, - 0, - 0, - System.currentTimeMillis() - startTime)*/ + ProcessorMetrics.computeMetrics( + pipelineMetricPrefix + processorContext.getName + ".", + inEnvents, + processingRecords, + 0, + inEnvents.size, + System.currentTimeMillis() - startTime) processorTimerContext.stop() @@ -145,62 +150,89 @@ trait StructuredStreamProviderService extends ControllerService { // make sure controller service lookup won't be serialized !! streamContext.addControllerServiceLookup(null) - /* val spark = SparkSession.builder().getOrCreate() - import spark.implicits._ - implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[Record] + // create serializer + val serializer = SerializerProvider.getSerializer( + streamContext.getPropertyValue(WRITE_TOPICS_SERIALIZER).asString, + streamContext.getPropertyValue(AVRO_OUTPUT_SCHEMA).asString) + // create serializer + val keySerializer = SerializerProvider.getSerializer( + streamContext.getPropertyValue(WRITE_TOPICS_KEY_SERIALIZER).asString, null) + // do the parallel processing + implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[Record] + val df2 = df. + mapPartitions(record => + record.map(record => serializeRecords(serializer, keySerializer, record))) + write(df2, streamContext) - // do the parallel processing - df.mapPartitions(partition => { + } - // create serializer - val serializer = SerializerProvider.getSerializer( - streamContext.getPropertyValue(WRITE_TOPICS_SERIALIZER).asString, - streamContext.getPropertyValue(AVRO_OUTPUT_SCHEMA).asString) - partition.toList - .flatMap(r => serializeRecords(serializer, r)) - .iterator - })*/ + protected def serializeRecords(valueSerializer: RecordSerializer, keySerializer: RecordSerializer, record: Record) = { - write(df, streamContext) + try { + val ret = new StandardRecord() + .setField(FieldDictionary.RECORD_VALUE, FieldType.BYTES, doSerialize(valueSerializer, record)) + val fieldKey = record.getField(FieldDictionary.RECORD_KEY); + if (fieldKey != null) { + ret.setField(FieldDictionary.RECORD_KEY, FieldType.BYTES, doSerialize(keySerializer, new StandardRecord().setField(fieldKey))) + } else { + ret.setField(FieldDictionary.RECORD_KEY, FieldType.NULL, null) - } + } + ret + } catch { + case t: Throwable => + logger.error(s"exception while serializing events ${t.getMessage}") + null + } - protected def serializeRecords(serializer: RecordSerializer, record: Record): Array[Byte] = { + } - // messages are serialized with kryo first + private def doSerialize(serializer: RecordSerializer, record: Record): Array[Byte] = { val baos: ByteArrayOutputStream = new ByteArrayOutputStream serializer.serialize(baos, record) - - // and then converted to KeyedMessage - val key = if (record.hasField(FieldDictionary.RECORD_ID)) - record.getField(FieldDictionary.RECORD_ID).asString() - else - "" - val bytes = baos.toByteArray baos.close() - bytes + + } - // TODO handle key also - protected def deserializeRecords(serializer: RecordSerializer, r: Record) = { + private def doDeserialize(serializer: RecordSerializer, field: Field): Record = { + val f = field.getRawValue + val s = if (f.isInstanceOf[String]) f.asInstanceOf[String].getBytes else f; + val bais = new ByteArrayInputStream(s.asInstanceOf[Array[Byte]]) try { - val bais = new ByteArrayInputStream(r.getField(FieldDictionary.RECORD_VALUE).getRawValue.asInstanceOf[Array[Byte]]) - val deserialized = serializer.deserialize(bais) + serializer.deserialize(bais) + } finally { bais.close() + } + } + protected def deserializeRecords(serializer: RecordSerializer, keySerializer: RecordSerializer, r: Record) = { + try { + val deserialized = doDeserialize(serializer, r.getField(FieldDictionary.RECORD_VALUE)) // copy root record field - if(r.hasField(FieldDictionary.RECORD_NAME)) + if (r.hasField(FieldDictionary.RECORD_NAME)) deserialized.setField(r.getField(FieldDictionary.RECORD_NAME)) + if (r.hasField(FieldDictionary.RECORD_KEY) && r.getField(FieldDictionary.RECORD_KEY).getRawValue != null) { + val deserializedKey = doDeserialize(keySerializer, r.getField(FieldDictionary.RECORD_KEY)).asInstanceOf[Record] + if (deserializedKey.hasField(FieldDictionary.RECORD_VALUE) && deserializedKey.getField(FieldDictionary.RECORD_VALUE).getRawValue != null) { + val f = deserializedKey.getField(FieldDictionary.RECORD_VALUE) + deserialized.setField(FieldDictionary.RECORD_KEY, f.getType, f.getRawValue) + } else { + logger.warn("Unable to serialize key for record $r with serializer $keySerializer") + } + } + Some(deserialized) + } catch { case t: Throwable => logger.error(s"exception while deserializing events ${t.getMessage}") diff --git a/logisland-engines/logisland-spark_2_1-engine/src/test/resources/conf/structured-stream.yml b/logisland-engines/logisland-spark_2_1-engine/src/test/resources/conf/structured-stream.yml index f4bed8147..cb502d7e0 100644 --- a/logisland-engines/logisland-spark_2_1-engine/src/test/resources/conf/structured-stream.yml +++ b/logisland-engines/logisland-spark_2_1-engine/src/test/resources/conf/structured-stream.yml @@ -1,4 +1,4 @@ -version: 0.12.2 +version: 0.13.0 documentation: LogIsland future factory job engine: diff --git a/logisland-engines/pom.xml b/logisland-engines/pom.xml index 92f30f251..1ed791fb0 100644 --- a/logisland-engines/pom.xml +++ b/logisland-engines/pom.xml @@ -6,7 +6,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 pom diff --git a/logisland-framework/logisland-agent/README.md b/logisland-framework/logisland-agent/README.md index 9d6f4ade7..d56677ded 100644 --- a/logisland-framework/logisland-agent/README.md +++ b/logisland-framework/logisland-agent/README.md @@ -32,5 +32,5 @@ swagger-ui will be directed to that host and not localhost! ``` -swagger-codegen generate --group-id com.hurence.logisland --artifact-id logisland-agent --artifact-version 0.12.2 --api-package com.hurence.logisland.agent.rest.api --model-package com.hurence.logisland.agent.rest.model -o logisland-framework/logisland-agent -l jaxrs --template-dir logisland-framework/logisland-agent/src/main/raml/templates -i logisland-framework/logisland-agent/src/main/raml/api-swagger.yaml +swagger-codegen generate --group-id com.hurence.logisland --artifact-id logisland-agent --artifact-version 0.13.0 --api-package com.hurence.logisland.agent.rest.api --model-package com.hurence.logisland.agent.rest.model -o logisland-framework/logisland-agent -l jaxrs --template-dir logisland-framework/logisland-agent/src/main/raml/templates -i logisland-framework/logisland-agent/src/main/raml/api-swagger.yaml ``` diff --git a/logisland-framework/logisland-agent/pom.xml b/logisland-framework/logisland-agent/pom.xml index b6250965d..ab064734f 100644 --- a/logisland-framework/logisland-agent/pom.xml +++ b/logisland-framework/logisland-agent/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-agent jar diff --git a/logisland-framework/logisland-agent/src/main/resources/components.json b/logisland-framework/logisland-agent/src/main/resources/components.json index cce92436a..2e45c727e 100644 --- a/logisland-framework/logisland-agent/src/main/resources/components.json +++ b/logisland-framework/logisland-agent/src/main/resources/components.json @@ -10,6 +10,7 @@ {"name":"EnrichRecords","description":"Enrich input records with content indexed in datastore using multiget queries.\nEach incoming record must be possibly enriched with information stored in datastore. \nThe plugin properties are :\n- es.index (String) : Name of the datastore index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- record.key (String) : Name of the field in the input record containing the id to lookup document in elastic search. This field is mandatory.\n- es.key (String) : Name of the datastore key on which the multiget query will be performed. This field is mandatory.\n- includes (ArrayList) : List of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory.\n- excludes (ArrayList) : List of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.\n\nEach outcoming record holds at least the input record plus potentially one or more fields coming from of one datastore document.","component":"com.hurence.logisland.processor.datastore.EnrichRecords","type":"processor","tags":["datastore","enricher"],"properties":[{"name":"datastore.client.service","isRequired":true,"description":"The instance of the Controller Service to use for accessing datastore.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"record.key","isRequired":false,"description":"The name of field in the input record containing the document id to use in ES multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"includes.field","isRequired":false,"description":"The name of the ES fields to include in the record.","defaultValue":"*","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"excludes.field","isRequired":false,"description":"The name of the ES fields to exclude.","defaultValue":"N/A","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"type.name","isRequired":false,"description":"The typle of record to look for","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"collection.name","isRequired":false,"description":"The name of the collection to look for","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true}]}, {"name":"EnrichRecordsElasticsearch","description":"Enrich input records with content indexed in elasticsearch using multiget queries.\nEach incoming record must be possibly enriched with information stored in elasticsearch. \nThe plugin properties are :\n- es.index (String) : Name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- record.key (String) : Name of the field in the input record containing the id to lookup document in elastic search. This field is mandatory.\n- es.key (String) : Name of the elasticsearch key on which the multiget query will be performed. This field is mandatory.\n- includes (ArrayList) : List of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory.\n- excludes (ArrayList) : List of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.\n\nEach outcoming record holds at least the input record plus potentially one or more fields coming from of one elasticsearch document.","component":"com.hurence.logisland.processor.elasticsearch.EnrichRecordsElasticsearch","type":"processor","tags":["elasticsearch"],"properties":[{"name":"elasticsearch.client.service","isRequired":true,"description":"The instance of the Controller Service to use for accessing Elasticsearch.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"record.key","isRequired":true,"description":"The name of field in the input record containing the document id to use in ES multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"es.index","isRequired":true,"description":"The name of the ES index to use in multiget query. ","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"es.type","isRequired":false,"description":"The name of the ES type to use in multiget query.","defaultValue":"default","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"es.includes.field","isRequired":false,"description":"The name of the ES fields to include in the record.","defaultValue":"*","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"es.excludes.field","isRequired":false,"description":"The name of the ES fields to exclude.","defaultValue":"N/A","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"EvaluateJsonPath","description":"Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to Records Fields depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Field Name into which the result will be placed. The value of the property must be a valid JsonPath expression. A Return Type of 'auto-detect' will make a determination based off the configured destination. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to 'scalar' the Record will be routed to error. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value. If the expression matches nothing, Fields will be created with empty strings as the value ","component":"com.hurence.logisland.processor.EvaluateJsonPath","type":"processor","tags":["JSON","evaluate","JsonPath"],"dynamicProperties":[{"name":"A Record field","value":"A JsonPath expression","description":"will be set to any JSON objects that match the JsonPath. ","isExpressionLanguageSupported":false}]}, +{"name":"ExcelExtract","description":"Consumes a Microsoft Excel document and converts each worksheet's line to a structured record. The processor is assuming to receive raw excel file as input record.","component":"com.hurence.logisland.processor.excel.ExcelExtract","type":"processor","tags":["excel","processor","poi"],"properties":[{"name":"sheets","isRequired":false,"description":"Comma separated list of Excel document sheet names that should be extracted from the excel document. If this property is left blank then all of the sheets will be extracted from the Excel document. You can specify regular expressions. Any sheets not specified in this value will be ignored.","defaultValue":"","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"skip.columns","isRequired":false,"description":"Comma delimited list of column numbers to skip. Use the columns number and not the letter designation. Use this to skip over columns anywhere in your worksheet that you don't want extracted as part of the record.","defaultValue":"","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"field.names","isRequired":false,"description":"The comma separated list representing the names of columns of extracted cells. Order matters! You should use either field.names either field.row.header but not both together.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"skip.rows","isRequired":false,"description":"The row number of the first row to start processing.Use this to skip over rows of data at the top of your worksheet that are not part of the dataset.Empty rows of data anywhere in the spreadsheet will always be skipped, no matter what this value is set to.","defaultValue":"0","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"record.type","isRequired":false,"description":"Default type of record","defaultValue":"excel_record","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"field.row.header","isRequired":false,"description":"If set, field names mapping will be extracted from the specified row number. You should use either field.names either field.row.header but not both together.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"FetchHBaseRow","description":"Fetches a row from an HBase table. The Destination property controls whether the cells are added as flow file attributes, or the row is written to the flow file content as JSON. This processor may be used to fetch a fixed row on a interval by specifying the table and row id directly in the processor, or it may be used to dynamically fetch rows by referencing the table and row id from incoming flow files.","component":"com.hurence.logisland.processor.hbase.FetchHBaseRow","type":"processor","tags":["hbase","scan","fetch","get","enrich"],"properties":[{"name":"hbase.client.service","isRequired":true,"description":"The instance of the Controller Service to use for accessing HBase.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"table.name.field","isRequired":true,"description":"The field containing the name of the HBase Table to fetch from.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"row.identifier.field","isRequired":true,"description":"The field containing the identifier of the row to fetch.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"columns.field","isRequired":false,"description":"The field containing an optional comma-separated list of \":\" pairs to fetch. To return all columns for a given family, leave off the qualifier such as \",\".","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":true},{"name":"record.serializer","isRequired":false,"description":"the serializer needed to i/o the record in the HBase row","kryo serialization":"serialize events as json blocs","json serialization":"serialize events as json blocs","avro serialization":"serialize events as avro blocs","no serialization":"send events as bytes","defaultValue":"com.hurence.logisland.serializer.KryoSerializer","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"record.schema","isRequired":false,"description":"the avro schema definition for the Avro serialization","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"table.name.default","isRequired":false,"description":"The table table to use if table name field is not set","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"FilterRecords","description":"Keep only records based on a given field value","component":"com.hurence.logisland.processor.FilterRecords","type":"processor","tags":["record","fields","remove","delete"],"properties":[{"name":"field.name","isRequired":true,"description":"the field name","defaultValue":"record_id","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"field.value","isRequired":true,"description":"the field value to keep","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"FlatMap","description":"Converts each field records into a single flatten record\n...","component":"com.hurence.logisland.processor.FlatMap","type":"processor","tags":["record","fields","flatmap","flatten"],"properties":[{"name":"keep.root.record","isRequired":false,"description":"do we add the original record in","defaultValue":"true","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"copy.root.record.fields","isRequired":false,"description":"do we copy the original record fields into the flattened records","defaultValue":"true","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"leaf.record.type","isRequired":false,"description":"the new type for the flattened records if present","defaultValue":"","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"concat.fields","isRequired":false,"description":"comma separated list of fields to apply concatenation ex : $rootField/$leaffield","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"concat.separator","isRequired":false,"description":"returns $rootField/$leaf/field","defaultValue":"/","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"include.position","isRequired":false,"description":"do we add the original record position in","defaultValue":"true","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, @@ -21,7 +22,7 @@ {"name":"ModifyId","description":"modify id of records or generate it following defined rules","component":"com.hurence.logisland.processor.ModifyId","type":"processor","tags":["record","id","idempotent","generate","modify"],"properties":[{"name":"id.generation.strategy","isRequired":true,"description":"the strategy to generate new Id","generate a random uid":"generate a randomUid using java library","generate a hash from fields":"generate a hash from fields","generate a string from java pattern and fields":"generate a string from java pattern and fields","generate a concatenation of type, time and a hash from fields":"generate a concatenation of type, time and a hash from fields (as for generate_hash strategy)","defaultValue":"randomUuid","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"fields.to.hash","isRequired":true,"description":"the comma separated list of field names (e.g. : 'policyid,date_raw'","defaultValue":"record_raw_value","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"hash.charset","isRequired":true,"description":"the charset to use to hash id string (e.g. 'UTF-8')","defaultValue":"UTF-8","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"hash.algorithm","isRequired":true,"description":"the algorithme to use to hash id string (e.g. 'SHA-256'","SHA-384":null,"SHA-224":null,"SHA-256":null,"MD2":null,"SHA":null,"SHA-512":null,"MD5":null,"defaultValue":"SHA-256","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"java.formatter.string","isRequired":false,"description":"the format to use to build id string (e.g. '%4$2s %3$2s %2$2s %1$2s' (see java Formatter)","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"language.tag","isRequired":true,"description":"the language to use to format numbers in string","aa":null,"ab":null,"ae":null,"af":null,"ak":null,"am":null,"an":null,"ar":null,"as":null,"av":null,"ay":null,"az":null,"ba":null,"be":null,"bg":null,"bh":null,"bi":null,"bm":null,"bn":null,"bo":null,"br":null,"bs":null,"ca":null,"ce":null,"ch":null,"co":null,"cr":null,"cs":null,"cu":null,"cv":null,"cy":null,"da":null,"de":null,"dv":null,"dz":null,"ee":null,"el":null,"en":null,"eo":null,"es":null,"et":null,"eu":null,"fa":null,"ff":null,"fi":null,"fj":null,"fo":null,"fr":null,"fy":null,"ga":null,"gd":null,"gl":null,"gn":null,"gu":null,"gv":null,"ha":null,"he":null,"hi":null,"ho":null,"hr":null,"ht":null,"hu":null,"hy":null,"hz":null,"ia":null,"id":null,"ie":null,"ig":null,"ii":null,"ik":null,"in":null,"io":null,"is":null,"it":null,"iu":null,"iw":null,"ja":null,"ji":null,"jv":null,"ka":null,"kg":null,"ki":null,"kj":null,"kk":null,"kl":null,"km":null,"kn":null,"ko":null,"kr":null,"ks":null,"ku":null,"kv":null,"kw":null,"ky":null,"la":null,"lb":null,"lg":null,"li":null,"ln":null,"lo":null,"lt":null,"lu":null,"lv":null,"mg":null,"mh":null,"mi":null,"mk":null,"ml":null,"mn":null,"mo":null,"mr":null,"ms":null,"mt":null,"my":null,"na":null,"nb":null,"nd":null,"ne":null,"ng":null,"nl":null,"nn":null,"no":null,"nr":null,"nv":null,"ny":null,"oc":null,"oj":null,"om":null,"or":null,"os":null,"pa":null,"pi":null,"pl":null,"ps":null,"pt":null,"qu":null,"rm":null,"rn":null,"ro":null,"ru":null,"rw":null,"sa":null,"sc":null,"sd":null,"se":null,"sg":null,"si":null,"sk":null,"sl":null,"sm":null,"sn":null,"so":null,"sq":null,"sr":null,"ss":null,"st":null,"su":null,"sv":null,"sw":null,"ta":null,"te":null,"tg":null,"th":null,"ti":null,"tk":null,"tl":null,"tn":null,"to":null,"tr":null,"ts":null,"tt":null,"tw":null,"ty":null,"ug":null,"uk":null,"ur":null,"uz":null,"ve":null,"vi":null,"vo":null,"wa":null,"wo":null,"xh":null,"yi":null,"yo":null,"za":null,"zh":null,"zu":null,"defaultValue":"en","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"MultiGet","description":"Retrieves a content from datastore using datastore multiget queries.\nEach incoming record contains information regarding the datastore multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) :\n- collection (String) : name of the datastore collection on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- type (String) : name of the datastore type on which the multiget query will be performed. This field is not mandatory.\n- ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory.\n- excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.\n\nEach outcoming record holds data of one datastore retrieved document. This data is stored in these fields :\n- collection (same field name as the incoming record) : name of the datastore collection.\n- type (same field name as the incoming record) : name of the datastore type.\n- id (same field name as the incoming record) : retrieved document id.\n- a list of String fields containing :\n * field name : the retrieved field name\n * field value : the retrieved field value","component":"com.hurence.logisland.processor.datastore.MultiGet","type":"processor","tags":["datastore","get","multiget"],"properties":[{"name":"datastore.client.service","isRequired":true,"description":"The instance of the Controller Service to use for accessing datastore.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"collection.field","isRequired":true,"description":"the name of the incoming records field containing es collection name to use in multiget query. ","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"type.field","isRequired":true,"description":"the name of the incoming records field containing es type name to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"ids.field","isRequired":true,"description":"the name of the incoming records field containing es document Ids to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"includes.field","isRequired":true,"description":"the name of the incoming records field containing es includes to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"excludes.field","isRequired":true,"description":"the name of the incoming records field containing es excludes to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"MultiGetElasticsearch","description":"Retrieves a content indexed in elasticsearch using elasticsearch multiget queries.\nEach incoming record contains information regarding the elasticsearch multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) :\n- index (String) : name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- type (String) : name of the elasticsearch type on which the multiget query will be performed. This field is not mandatory.\n- ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record.\n- includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory.\n- excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory.\n\nEach outcoming record holds data of one elasticsearch retrieved document. This data is stored in these fields :\n- index (same field name as the incoming record) : name of the elasticsearch index.\n- type (same field name as the incoming record) : name of the elasticsearch type.\n- id (same field name as the incoming record) : retrieved document id.\n- a list of String fields containing :\n * field name : the retrieved field name\n * field value : the retrieved field value","component":"com.hurence.logisland.processor.elasticsearch.MultiGetElasticsearch","type":"processor","tags":["elasticsearch"],"properties":[{"name":"elasticsearch.client.service","isRequired":true,"description":"The instance of the Controller Service to use for accessing Elasticsearch.","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"es.index.field","isRequired":true,"description":"the name of the incoming records field containing es index name to use in multiget query. ","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"es.type.field","isRequired":true,"description":"the name of the incoming records field containing es type name to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"es.ids.field","isRequired":true,"description":"the name of the incoming records field containing es document Ids to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"es.includes.field","isRequired":true,"description":"the name of the incoming records field containing es includes to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"es.excludes.field","isRequired":true,"description":"the name of the incoming records field containing es excludes to use in multiget query","defaultValue":null,"isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, -{"name":"NormalizeFields","description":"Changes the name of a field according to a provided name mapping\n...","component":"com.hurence.logisland.processor.NormalizeFields","type":"processor","tags":["record","fields","normalizer"],"properties":[{"name":"conflict.resolution.policy","isRequired":true,"description":"waht to do when a field with the same name already exists ?","nothing to do":"leave record as it was","overwrite existing field":"if field already exist","keep only old field and delete the other":"keep only old field and delete the other","keep old field and new one":"creates an alias for the new field","defaultValue":"do_nothing","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}],"dynamicProperties":[{"name":"alternative mapping","value":"a comma separated list of possible field name","description":"when a field has a name contained in the list it will be renamed with this property field name","isExpressionLanguageSupported":true}]}, +{"name":"NormalizeFields","description":"Changes the name of a field according to a provided name mapping\n...","component":"com.hurence.logisland.processor.NormalizeFields","type":"processor","tags":["record","fields","normalizer"],"properties":[{"name":"conflict.resolution.policy","isRequired":true,"description":"what to do when a field with the same name already exists ?","nothing to do":"leave record as it was","overwrite existing field":"if field already exist","keep only old field and delete the other":"keep only old field and delete the other","keep old field and new one":"creates an alias for the new field","defaultValue":"do_nothing","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}],"dynamicProperties":[{"name":"alternative mapping","value":"a comma separated list of possible field name","description":"when a field has a name contained in the list it will be renamed with this property field name","isExpressionLanguageSupported":true}]}, {"name":"ParseBroEvent","description":"The ParseBroEvent processor is the Logisland entry point to get and process `Bro `_ events. The `Bro-Kafka plugin `_ should be used and configured in order to have Bro events sent to Kafka. See the `Bro/Logisland tutorial `_ for an example of usage for this processor. The ParseBroEvent processor does some minor pre-processing on incoming Bro events from the Bro-Kafka plugin to adapt them to Logisland.\n\nBasically the events coming from the Bro-Kafka plugin are JSON documents with a first level field indicating the type of the event. The ParseBroEvent processor takes the incoming JSON document, sets the event type in a record_type field and sets the original sub-fields of the JSON event as first level fields in the record. Also any dot in a field name is transformed into an underscore. Thus, for instance, the field id.orig_h becomes id_orig_h. The next processors in the stream can then process the Bro events generated by this ParseBroEvent processor.\n\nAs an example here is an incoming event from Bro:\n\n{\n\n \"conn\": {\n\n \"id.resp_p\": 9092,\n\n \"resp_pkts\": 0,\n\n \"resp_ip_bytes\": 0,\n\n \"local_orig\": true,\n\n \"orig_ip_bytes\": 0,\n\n \"orig_pkts\": 0,\n\n \"missed_bytes\": 0,\n\n \"history\": \"Cc\",\n\n \"tunnel_parents\": [],\n\n \"id.orig_p\": 56762,\n\n \"local_resp\": true,\n\n \"uid\": \"Ct3Ms01I3Yc6pmMZx7\",\n\n \"conn_state\": \"OTH\",\n\n \"id.orig_h\": \"172.17.0.2\",\n\n \"proto\": \"tcp\",\n\n \"id.resp_h\": \"172.17.0.3\",\n\n \"ts\": 1487596886.953917\n\n }\n\n }\n\nIt gets processed and transformed into the following Logisland record by the ParseBroEvent processor:\n\n\"@timestamp\": \"2017-02-20T13:36:32Z\"\n\n\"record_id\": \"6361f80a-c5c9-4a16-9045-4bb51736333d\"\n\n\"record_time\": 1487597792782\n\n\"record_type\": \"conn\"\n\n\"id_resp_p\": 9092\n\n\"resp_pkts\": 0\n\n\"resp_ip_bytes\": 0\n\n\"local_orig\": true\n\n\"orig_ip_bytes\": 0\n\n\"orig_pkts\": 0\n\n\"missed_bytes\": 0\n\n\"history\": \"Cc\"\n\n\"tunnel_parents\": []\n\n\"id_orig_p\": 56762\n\n\"local_resp\": true\n\n\"uid\": \"Ct3Ms01I3Yc6pmMZx7\"\n\n\"conn_state\": \"OTH\"\n\n\"id_orig_h\": \"172.17.0.2\"\n\n\"proto\": \"tcp\"\n\n\"id_resp_h\": \"172.17.0.3\"\n\n\"ts\": 1487596886.953917","component":"com.hurence.logisland.processor.bro.ParseBroEvent","type":"processor","tags":["bro","security","IDS","NIDS"],"properties":[{"name":"debug","isRequired":false,"description":"Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record.","defaultValue":"false","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"ParseGitlabLog","description":"The Gitlab logs processor is the Logisland entry point to get and process `Gitlab `_ logs. This allows for instance to monitor activities in your Gitlab server. The expected input of this processor are records from the production_json.log log file of Gitlab which contains JSON records. You can for instance use the `kafkacat `_ command to inject those logs into kafka and thus Logisland.","component":"com.hurence.logisland.processor.commonlogs.gitlab.ParseGitlabLog","type":"processor","tags":["logs","gitlab"],"properties":[{"name":"debug","isRequired":false,"description":"Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record.","defaultValue":"false","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, {"name":"ParseNetflowEvent","description":"The `Netflow V5 `_ processor is the Logisland entry point to process Netflow (V5) events. NetFlow is a feature introduced on Cisco routers that provides the ability to collect IP network traffic.We can distinguish 2 components:\n\n\t-Flow exporter: aggregates packets into flows and exports flow records (binary format) towards one or more flow collectors\n\n\t-Flow collector: responsible for reception, storage and pre-processing of flow data received from a flow exporter\nThe collected data are then available for analysis purpose (intrusion detection, traffic analysis...)\nNetflow are sent to kafka in order to be processed by logisland.\nIn the tutorial we will simulate Netflow traffic using `nfgen `_. this traffic will be sent to port 2055. The we rely on nifi to listen of that port for incoming netflow (V5) traffic and send them to a kafka topic. The Netflow processor could thus treat these events and generate corresponding logisland records. The following processors in the stream can then process the Netflow records generated by this processor.","component":"com.hurence.logisland.processor.netflow.ParseNetflowEvent","type":"processor","tags":["netflow","security"],"properties":[{"name":"debug","isRequired":false,"description":"Enable debug. If enabled, the original JSON string is embedded in the record_value field of the record.","defaultValue":"false","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"output.record.type","isRequired":false,"description":"the output type of the record","defaultValue":"netflowevent","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false},{"name":"enrich.record","isRequired":false,"description":"Enrich data. If enabledthe netflow record is enriched with inferred data","defaultValue":"false","isDynamic":false,"isSensitive":false,"isExpressionLanguageSupported":false}]}, diff --git a/logisland-framework/logisland-bootstrap/pom.xml b/logisland-framework/logisland-bootstrap/pom.xml index 458592dc8..f293eed5e 100644 --- a/logisland-framework/logisland-bootstrap/pom.xml +++ b/logisland-framework/logisland-bootstrap/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-bootstrap jar diff --git a/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/SparkJobLauncher.java b/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/SparkJobLauncher.java index 234e7f777..a290d2caa 100644 --- a/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/SparkJobLauncher.java +++ b/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/SparkJobLauncher.java @@ -72,7 +72,7 @@ public static void main(String[] args) { "██║ ██║ ██║██║ ███╗ ██║███████╗██║ ███████║██╔██╗ ██║██║ ██║\n" + "██║ ██║ ██║██║ ██║ ██║╚════██║██║ ██╔══██║██║╚██╗██║██║ ██║\n" + "███████╗╚██████╔╝╚██████╔╝ ██║███████║███████╗██║ ██║██║ ╚████║██████╔╝\n" + - "╚══════╝ ╚═════╝ ╚═════╝ ╚═╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ v0.12.2\n\n\n"; + "╚══════╝ ╚═════╝ ╚═════╝ ╚═╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ v0.13.0\n\n\n"; System.out.println(logisland); Optional engineInstance = Optional.empty(); diff --git a/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/StreamProcessingRunner.java b/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/StreamProcessingRunner.java index 72afcb97b..35f44da4f 100644 --- a/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/StreamProcessingRunner.java +++ b/logisland-framework/logisland-bootstrap/src/main/java/com/hurence/logisland/runner/StreamProcessingRunner.java @@ -65,7 +65,7 @@ public static void main(String[] args) { "██║ ██║ ██║██║ ███╗ ██║███████╗██║ ███████║██╔██╗ ██║██║ ██║\n" + "██║ ██║ ██║██║ ██║ ██║╚════██║██║ ██╔══██║██║╚██╗██║██║ ██║\n" + "███████╗╚██████╔╝╚██████╔╝ ██║███████║███████╗██║ ██║██║ ╚████║██████╔╝\n" + - "╚══════╝ ╚═════╝ ╚═════╝ ╚═╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ v0.12.2\n\n\n"; + "╚══════╝ ╚═════╝ ╚═════╝ ╚═╝╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ v0.13.0\n\n\n"; System.out.println(logisland); Optional engineInstance = Optional.empty(); diff --git a/logisland-framework/logisland-hadoop-utils/pom.xml b/logisland-framework/logisland-hadoop-utils/pom.xml index afd0d3a6d..30201ede1 100644 --- a/logisland-framework/logisland-hadoop-utils/pom.xml +++ b/logisland-framework/logisland-hadoop-utils/pom.xml @@ -19,7 +19,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-hadoop-utils jar diff --git a/logisland-framework/logisland-resources/pom.xml b/logisland-framework/logisland-resources/pom.xml index aa99ba7d6..724663aa0 100644 --- a/logisland-framework/logisland-resources/pom.xml +++ b/logisland-framework/logisland-resources/pom.xml @@ -21,7 +21,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-resources pom diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/Financial Sample.xlsx b/logisland-framework/logisland-resources/src/main/resources/conf/Financial Sample.xlsx new file mode 100644 index 000000000..f049f345b Binary files /dev/null and b/logisland-framework/logisland-resources/src/main/resources/conf/Financial Sample.xlsx differ diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/aggregate-events.yml b/logisland-framework/logisland-resources/src/main/resources/conf/aggregate-events.yml index 2229a8e25..3151dc58c 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/aggregate-events.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/aggregate-events.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/configuration-template.yml b/logisland-framework/logisland-resources/src/main/resources/conf/configuration-template.yml index 4da8d353b..0c7b0b733 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/configuration-template.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/configuration-template.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml b/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml index 8ca45bb2a..8f71646e6 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml @@ -1,9 +1,15 @@ -zookeeper: +version: "2" +services: + + zookeeper: + container_name: zookeeper image: hurence/zookeeper hostname: zookeeper ports: - "2181:2181" -kafka: + + kafka: + container_name: kafka image: hurence/kafka hostname: kafka links: @@ -14,4 +20,58 @@ kafka: KAFKA_ADVERTISED_PORT: 9092 KAFKA_ADVERTISED_HOST_NAME: sandbox KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 - KAFKA_JMX_PORT: 7071 \ No newline at end of file + KAFKA_JMX_PORT: 7071 + + # ES container + elasticsearch: + container_name: elasticsearch + environment: + - ES_JAVA_OPT="-Xms1G -Xmx1G" + - cluster.name=es-logisland + - http.host=0.0.0.0 + - transport.host=0.0.0.0 + - xpack.security.enabled=false + hostname: elasticsearch + container_name: elasticsearch + image: 'docker.elastic.co/elasticsearch/elasticsearch:5.4.0' + ports: + - '9200:9200' + - '9300:9300' + + # Kibana container + kibana: + container_name: kibana + environment: + - 'ELASTICSEARCH_URL=http://elasticsearch:9200' + image: 'docker.elastic.co/kibana/kibana:5.4.0' + container_name: kibana + links: + - elasticsearch + ports: + - '5601:5601' + + # Logisland container : does nothing but launching + logisland: + container_name: logisland + image: hurence/logisland:0.13.0 + command: tail -f bin/logisland.sh + #command: bin/logisland.sh --conf /conf/index-apache-logs.yml + links: + - zookeeper + - kafka + - elasticsearch + - redis + ports: + - "4050:4050" + volumes: + - ./conf/logisland:/conf + - ./data/logisland:/data + container_name: logisland + extra_hosts: + - "sandbox:172.17.0.1" + + redis: + container_name: redis + image: 'redis:latest' + ports: + - '6379:6379' \ No newline at end of file diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/enrich-apache-logs.yml b/logisland-framework/logisland-resources/src/main/resources/conf/enrich-apache-logs.yml index 1e0a766fc..2709fd956 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/enrich-apache-logs.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/enrich-apache-logs.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/future-factory-indexer.yml b/logisland-framework/logisland-resources/src/main/resources/conf/future-factory-indexer.yml index e43482c23..3eed1698f 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/future-factory-indexer.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/future-factory-indexer.yml @@ -2,7 +2,7 @@ # Logisland configuration for future factory project ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland future factory job ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs-solr.yml b/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs-solr.yml index 550299a67..f167b2d03 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs-solr.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs-solr.yml @@ -2,7 +2,7 @@ # Logisland configuration script tempate ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs.yml b/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs.yml index ce24a782d..a156aa9b0 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/index-apache-logs.yml @@ -2,7 +2,7 @@ # Logisland configuration script tempate ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/index-blockchain-transactions.yml b/logisland-framework/logisland-resources/src/main/resources/conf/index-blockchain-transactions.yml new file mode 100644 index 000000000..97d01bcde --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/conf/index-blockchain-transactions.yml @@ -0,0 +1,127 @@ +version: 0.13.0 +documentation: LogIsland future factory job + +engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index blockchain transactions with logisland + configuration: + spark.app.name: BlockchainTest + spark.master: local[*] + spark.driver.memory: 512M + spark.driver.cores: 1 + spark.executor.memory: 512M + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 2000 + spark.streaming.backpressure.enabled: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 10000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4040 + + controllerServiceConfigurations: + + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + configuration: + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter.properties: | + schemas.enable=false + kc.data.key.converter: org.apache.kafka.connect.storage.StringConverter + kc.worker.tasks.max: 1 + kc.connector.class: com.datamountaineer.streamreactor.connect.blockchain.source.BlockchainSourceConnector + kc.connector.offset.backing.store: memory + kc.connector.properties: | + connect.blockchain.source.url=wss://ws.blockchain.info/inv + connect.blockchain.source.kafka.topic=blockchain + + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + streamConfigurations: + ################ indexing stream ############### + - stream: indexing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw excel file content into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: none + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + processorConfigurations: + # all the parsed records are added to elasticsearch by bulk + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that indexes processed events in elasticsearch + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type + + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.StringSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.StringSerializer + write.topics.client.service: kafka_out_service + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "extract from root record" + configuration: + keep.root.record: false + copy.root.record.fields: true diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/index-excel-spreadsheet.yml b/logisland-framework/logisland-resources/src/main/resources/conf/index-excel-spreadsheet.yml new file mode 100644 index 000000000..c2841e872 --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/conf/index-excel-spreadsheet.yml @@ -0,0 +1,94 @@ +######################################################################################################### +# Logisland configuration script tempate +######################################################################################################### + +version: 0.11.0 +documentation: LogIsland analytics main config file. Put here every engine or component config + +######################################################################################################### +# engine +engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index records of an excel file with LogIsland + configuration: + spark.app.name: IndexExcelDemo + spark.master: local[4] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + + controllerServiceConfigurations: + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + streamConfigurations: + + # main processing stream + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw excel file content into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.BytesArraySerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + processorConfigurations: + + # parse excel cells into records + - processor: excel_parser + component: com.hurence.logisland.processor.excel.ExcelExtract + type: parser + documentation: a parser that produce events from an excel file + configuration: + record.type: excel_record + skip.rows: 1 + field.names: segment,country,product,discount_band,units_sold,manufacturing,sale_price,gross_sales,discounts,sales,cogs,profit,record_time,month_number,month_name,year + + # all the parsed records are added to elasticsearch by bulk + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that indexes processed events in elasticsearch + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/index-network-packets.yml b/logisland-framework/logisland-resources/src/main/resources/conf/index-network-packets.yml index 7dc019042..023f86fb0 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/index-network-packets.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/index-network-packets.yml @@ -2,7 +2,7 @@ # Logisland configuration script example: parse network packets and display them in Kibana ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/logisland-kafka-connect.yml b/logisland-framework/logisland-resources/src/main/resources/conf/logisland-kafka-connect.yml new file mode 100644 index 000000000..ede884068 --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/conf/logisland-kafka-connect.yml @@ -0,0 +1,128 @@ +version: 0.13.0 +documentation: LogIsland Kafka Connect Integration + +engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Use Kafka connectors with logisland + configuration: + spark.app.name: LogislandConnect + spark.master: local[*] + spark.driver.memory: 512M + spark.driver.cores: 1 + spark.executor.memory: 512M + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 2000 + spark.streaming.backpressure.enabled: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 10000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4040 + + controllerServiceConfigurations: + + # Our source service + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + documentation: A kafka source connector provider reading from its own source and providing structured streaming to the underlying layer + configuration: + # We will use the logisland record converter for both key and value + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + + kc.data.key.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.key.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + # Only one task to handle source input (unique) + kc.worker.tasks.max: 1 + # The kafka source connector to wrap (here we're using a simulator source) + kc.connector.class: com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector + # The properties for the connector (as per connector documentation) + kc.connector.properties: | + key.schema.fields=email + topic=simulator + value.schema.fields=email,firstName,middleName,lastName,telephoneNumber,dateOfBirth + # We are using a standalone source for testing. We can store processed offsets in memory + kc.connector.offset.backing.store: memory + + # Kafka sink configuration + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + + streamConfigurations: + ################ Indexing stream ############### + - stream: indexing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: "Concurrently process source incoming records. Source -> Kafka -> here" + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + processorConfigurations: + # We just print the received records (but you may do something more interesting!) + - processor: stream_debugger + component: com.hurence.logisland.processor.DebugStream + type: processor + documentation: debug records + configuration: + event.serializer: json + + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "Takes out data from record_value" + configuration: + keep.root.record: false + copy.root.record.fields: true + + diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/match-queries.yml b/logisland-framework/logisland-resources/src/main/resources/conf/match-queries.yml index 76b05de18..8b023410b 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/match-queries.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/match-queries.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/mqtt-to-historian.yml b/logisland-framework/logisland-resources/src/main/resources/conf/mqtt-to-historian.yml index 640539dd9..eafcf87c5 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/mqtt-to-historian.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/mqtt-to-historian.yml @@ -1,4 +1,4 @@ -version: 0.12.2 +version: 0.13.0 documentation: LogIsland future factory job engine: diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/outlier-detection.yml b/logisland-framework/logisland-resources/src/main/resources/conf/outlier-detection.yml index 6637b81bc..372d27c70 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/outlier-detection.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/outlier-detection.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/python-processing.yml b/logisland-framework/logisland-resources/src/main/resources/conf/python-processing.yml index d600914f7..d433a75e6 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/python-processing.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/python-processing.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/retrieve-data-from-elasticsearch.yml b/logisland-framework/logisland-resources/src/main/resources/conf/retrieve-data-from-elasticsearch.yml index 2a52c68fa..a4cde65a6 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/retrieve-data-from-elasticsearch.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/retrieve-data-from-elasticsearch.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/save-to-hdfs.yml b/logisland-framework/logisland-resources/src/main/resources/conf/save-to-hdfs.yml index ac2bf2a1e..9060002c3 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/save-to-hdfs.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/save-to-hdfs.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/send-apache-logs-to-hbase.yml b/logisland-framework/logisland-resources/src/main/resources/conf/send-apache-logs-to-hbase.yml index 4f213b356..9a8252375 100644 --- a/logisland-framework/logisland-resources/src/main/resources/conf/send-apache-logs-to-hbase.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/send-apache-logs-to-hbase.yml @@ -2,7 +2,7 @@ # Logisland configuration script template ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: This tutorial job sends apache logs to an HBase table ######################################################################################################### diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/store-to-redis.yml b/logisland-framework/logisland-resources/src/main/resources/conf/store-to-redis.yml new file mode 100644 index 000000000..4e2efeaca --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/conf/store-to-redis.yml @@ -0,0 +1,97 @@ +######################################################################################################### +# Logisland configuration script tempate +######################################################################################################### + +version: 0.13.0 +documentation: LogIsland analytics main config file. Put here every engine or component config + +######################################################################################################### +# engine +engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index some apache logs with logisland + configuration: + spark.app.name: StoreToRedisDemo + spark.master: local[2] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + + controllerServiceConfigurations: + + - controllerService: datastore_service + component: com.hurence.logisland.redis.service.RedisKeyValueCacheService + type: service + documentation: redis datastore service + configuration: + connection.string: ${REDIS_CONNECTION} + redis.mode: standalone + database.index: 0 + communication.timeout: 10 seconds + pool.max.total: 8 + pool.max.idle: 8 + pool.min.idle: 0 + pool.block.when.exhausted: true + pool.max.wait.time: 10 seconds + pool.min.evictable.idle.time: 60 seconds + pool.time.between.eviction.runs: 30 seconds + pool.num.tests.per.eviction.run: -1 + pool.test.on.create: false + pool.test.on.borrow: false + pool.test.on.return: false + pool.test.while.idle: true + record.recordSerializer: com.hurence.logisland.serializer.JsonSerializer + + streamConfigurations: + + # main processing stream + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw apache logs into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: none + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: ${KAFKA_BROKERS} + kafka.zookeeper.quorum: ${ZK_QUORUM} + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + processorConfigurations: + + # parse apache logs into logisland records + - processor: apache_parser + component: com.hurence.logisland.processor.SplitText + type: parser + documentation: a parser that produce events from an apache log REGEX + configuration: + record.type: apache_log + value.regex: (\S+)\s+(\S+)\s+(\S+)\s+\[([\w:\/]+\s[+\-]\d{4})\]\s+"(\S+)\s+(\S+)\s*(\S*)"\s+(\S+)\s+(\S+) + value.fields: src_ip,identd,user,record_time,http_method,http_query,http_version,http_status,bytes_out + + # all the parsed records are added to datastore by bulk + - processor: datastore_publisher + component: com.hurence.logisland.processor.datastore.BulkPut + type: processor + documentation: "indexes processed events in datastore" + configuration: + datastore.client.service: datastore_service diff --git a/logisland-framework/logisland-resources/src/main/resources/conf/timeseries-parsing.yml b/logisland-framework/logisland-resources/src/main/resources/conf/timeseries-parsing.yml index 56f62673c..5b5e11dd7 100755 --- a/logisland-framework/logisland-resources/src/main/resources/conf/timeseries-parsing.yml +++ b/logisland-framework/logisland-resources/src/main/resources/conf/timeseries-parsing.yml @@ -55,7 +55,7 @@ engine: documentation: "store an in-memory cache coming from CSV" configuration: csv.format: excel_fr - csv.file.path: "logisland-assembly/target/logisland-0.12.2-bin-hdp2.5/logisland-0.12.2/conf/timeseries-lookup.csv" + csv.file.path: "logisland-assembly/target/logisland-0.13.0-bin-hdp2.5/logisland-0.13.0/conf/timeseries-lookup.csv" first.line.header: true row.key: tagname encoding.charset: ISO-8859-1 diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-dashboard.png b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-dashboard.png new file mode 100644 index 000000000..03f6422df Binary files /dev/null and b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-dashboard.png differ diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-records.png b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-records.png new file mode 100644 index 000000000..9c164cb61 Binary files /dev/null and b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-blockchain-records.png differ diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-excel-logs.png b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-excel-logs.png new file mode 100644 index 000000000..159c7d61c Binary files /dev/null and b/logisland-framework/logisland-resources/src/main/resources/docs/_static/kibana-excel-logs.png differ diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/api.rst b/logisland-framework/logisland-resources/src/main/resources/docs/api.rst index b835bd2c8..9eee9c16c 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/api.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/api.rst @@ -409,7 +409,7 @@ You can then start to generate the source code from the swgger yaml file swagger-codegen generate \ --group-id com.hurence.logisland \ --artifact-id logisland-agent \ - --artifact-version 0.12.2 \ + --artifact-version 0.13.0 \ --api-package com.hurence.logisland.agent.rest.api \ --model-package com.hurence.logisland.agent.rest.model \ -o logisland-framework/logisland-agent \ diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/changes.rst b/logisland-framework/logisland-resources/src/main/resources/docs/changes.rst index 5873217d2..cdeae8308 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/changes.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/changes.rst @@ -3,7 +3,7 @@ What's new in logisland ? -v0.12.2 +v0.13.0 ------- - add support for SOLR diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/components.rst b/logisland-framework/logisland-resources/src/main/resources/docs/components.rst index 2640e4734..b8e778012 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/components.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/components.rst @@ -429,6 +429,38 @@ Dynamic Properties allow the user to specify both the name and value of a proper ---------- +.. _com.hurence.logisland.processor.excel.ExcelExtract: + +ExcelExtract +------------ +Consumes a Microsoft Excel document and converts each worksheet's line to a structured record. The processor is assuming to receive raw excel file as input record. + +Class +_____ +com.hurence.logisland.processor.excel.ExcelExtract + +Tags +____ +excel, processor, poi + +Properties +__________ +In the list below, the names of required properties appear in **bold**. Any other properties (not in bold) are considered optional. The table also indicates any default values +. + +.. csv-table:: allowable-values + :header: "Name","Description","Allowable Values","Default Value","Sensitive","EL" + :widths: 20,60,30,20,10,10 + + "Sheets to Extract", "Comma separated list of Excel document sheet names that should be extracted from the excel document. If this property is left blank then all of the sheets will be extracted from the Excel document. You can specify regular expressions. Any sheets not specified in this value will be ignored.", "", "", "", "" + "Columns To Skip", "Comma delimited list of column numbers to skip. Use the columns number and not the letter designation. Use this to skip over columns anywhere in your worksheet that you don't want extracted as part of the record.", "", "", "", "" + "Field names mapping", "The comma separated list representing the names of columns of extracted cells. Order matters! You should use either field.names either field.row.header but not both together.", "", "null", "", "" + "Number of Rows to Skip", "The row number of the first row to start processing.Use this to skip over rows of data at the top of your worksheet that are not part of the dataset.Empty rows of data anywhere in the spreadsheet will always be skipped, no matter what this value is set to.", "", "0", "", "" + "record.type", "Default type of record", "", "excel_record", "", "" + "Use a row header as field names mapping", "If set, field names mapping will be extracted from the specified row number. You should use either field.names either field.row.header but not both together.", "", "null", "", "" + +---------- + .. _com.hurence.logisland.processor.hbase.FetchHBaseRow: FetchHBaseRow @@ -887,7 +919,7 @@ In the list below, the names of required properties appear in **bold**. Any othe :header: "Name","Description","Allowable Values","Default Value","Sensitive","EL" :widths: 20,60,30,20,10,10 - "**conflict.resolution.policy**", "waht to do when a field with the same name already exists ?", "nothing to do (leave record as it was), overwrite existing field (if field already exist), keep only old field and delete the other (keep only old field and delete the other), keep old field and new one (creates an alias for the new field)", "do_nothing", "", "" + "**conflict.resolution.policy**", "what to do when a field with the same name already exists ?", "nothing to do (leave record as it was), overwrite existing field (if field already exist), keep only old field and delete the other (keep only old field and delete the other), keep old field and new one (creates an alias for the new field)", "do_nothing", "", "" Dynamic Properties __________________ diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/connectors.rst b/logisland-framework/logisland-resources/src/main/resources/docs/connectors.rst new file mode 100644 index 000000000..8159c69f0 --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/docs/connectors.rst @@ -0,0 +1,180 @@ + +Connectors +========== + +In this chapter we will present you how to integrate kafka connect connectors into logisland. + +.. contents:: Table of Contents + + +Introduction +------------ + +Logisland features the integration between `kafka connect `_ world and the spark structured streaming engine. + +In order to seamlessy integrate both world, we just wrapped out the kafka connectors interfaces (unplugging them from kafka) and let the run in a logisland spark managed container. Hence the name *"Logisland Connect"* :-) + + +This allows you to leverage the existing kafka connectors library to import data into a logisland pipeline without having the need to make use of any another middleware or ETL system. + +Scope & Roadmap +--------------- + +Today only kafka-connect sources are available. + +Sinks will be probably supported in future relases of logisland. + +.. note:: + Please note that kafka connect requires at least kafka 0.10.0.0. Logisland build for hadoop 2.4 / spark 1.6 is hence not supporting this feature. + + +Building +-------- + +Logisland comes with a connectors bundle but those connectors are not bundled by default. You are required to build logisland from sources in order to package the connectors you need into logisland uber jar. + +Actually when building with maven you need to pass some java properties depending on the connector(s) you would like to include. + +Please refer to the following table for the details: + + ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ +| Connector | URL | Build flag | ++==========================+=========================+========================================================+==============================+ +| Simulator | https://github.com/jcustenborder/kafka-connect-simulator | None (Built in) | ++--------------------------+-------------------------+--------------------------------------------------------+------------------------------+ +| OPC-DA (IIoT) | https://github.com/Hurence/logisland | None (Built in) | ++--------------------------+-------------------------+--------------------------------------------------------+------------------------------+ +| FTP | https://github.com/Eneco/kafka-connect-ftp | -DwithConnectFtp | ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ +| Blockchain | https://github.com/Landoop/stream-reactor/tree/master/kafka-connect-blockchain | -DwithConnectBlockchain | ++--------------------------+----------------------------------------------------------------------------------+------------------------------+ + + +Configuring +----------- + +Once you have bundled the connectors you need, you are now ready to use them. + +Let's do it step by step. + +First of all we need to declare a *KafkaConnectStructuredProviderService* that will manage our connector in Logisland. +Along with this we need to put some configuration (In general you can always refer to kafka connect documentation to better understand the underlying architecture and how to configure a connector): + + ++-------------------------------------------------+----------------------------------------------------------+ +| Property | Description | ++=================================================+==========================================================+ +| kc.connector.class | The class of the connector (Fully qualified name) | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.key.converter | The class of the converter to be used for the key. | +| | Please refer to `Choosing the right converter`_ section | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.key.converter.properties | The properties to be provided to the key converter | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.value.converter | The class of the converter to be used for the key. | +| | Please refer to `Choosing the right converter`_ section | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.data.value.converter.properties | The properties to be provided to the key converter | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.properties | The properties to be provided to the connector and | +| | specific to the connector itself. | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.worker.tasks.max | How many concurrent threads to spawn for a connector | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.offset.backing.store | The offset backing store to use. Choose among: | +| | | +| | * **memory** : standalone in memory | +| | * **file** : standalone file based. | +| | * **kafka** : distributed kafka topic based | +| | | +| | | ++-------------------------------------------------+----------------------------------------------------------+ +| kc.connector.offset.backing.store.properties | Specific properties to configure the chosen backing | +| | store. | ++-------------------------------------------------+----------------------------------------------------------+ + +.. note:: Please refer to `Kafka connect guide `_ for further information about offset backing store and how to configure them. + + +Choosing the right converter +---------------------------- + +Choosing the right converter is perhaps one of the most important part. In fact we're going to adapt what is coming from kafka connect to what is flowing into our logisland pipeline. +This means that we have to know how the source is managing its data. + +In order to simplify your choice, we recommend you to follow this simple approach (the same applies for both keys and values): + + ++----------------------------+-----------------------------------+-----------------------------------+ +| Source data | Kafka Converter | Logisland Encoder | ++============================+===================================+===================================+ +| String | StringConverter | StringEncoder | ++----------------------------+-----------------------------------+-----------------------------------+ +| Raw Bytes | ByteArrayConverter | BytesArraySerialiser | ++----------------------------+-----------------------------------+-----------------------------------+ +| Structured | LogIslandRecordConverter | The serializer used by the record | +| | | converter (*) | ++----------------------------+-----------------------------------+-----------------------------------+ + + +.. note:: + (*)In case you deal with structured data, the LogIslandRecordConverter will embed the structured object in a logisland record. In order to do this you have to specify the serializer to be used to convert your data (the serializer property **record.serializer**). Generally the *KryoSerialiser* is a good choice to start with. + + + +Putting all together +-------------------- + +In the previous two sections we explained how to configure a connector and how to choose the right serializer for it. + +The recap we can examine the following configuration example: + + +.. code-block:: yaml + + # Our source service + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + documentation: A kafka source connector provider reading from its own source and providing structured streaming to the underlying layer + configuration: + # We will use the logisland record converter for both key and value + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.key.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + # Only one task to handle source input (unique) + kc.worker.tasks.max: 1 + # The kafka source connector to wrap (here we're using a simulator source) + kc.connector.class: com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector + # The properties for the connector (as per connector documentation) + kc.connector.properties: | + key.schema.fields=email + topic=simulator + value.schema.fields=email,firstName,middleName,lastName,telephoneNumber,dateOfBirth + # We are using a standalone source for testing. We can store processed offsets in memory + kc.connector.offset.backing.store: memory + + + + +In the example both key and value provided by the connector are structured objects. + +For this reason we use for that the converter *LogIslandRecordConverter*. +As well, we provide the serializer to be used for both key and value converter specifying +*record.serializer=com.hurence.logisland.serializer.KryoSerializer* among the related converter properties. + + +Going further +------------- + + +Please do not hesitate to take a look to our kafka connect tutorials for more details and practical use cases. + + diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/developer.rst b/logisland-framework/logisland-resources/src/main/resources/docs/developer.rst index 647b68d1b..76ffccd5c 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/developer.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/developer.rst @@ -204,14 +204,12 @@ to release artifacts (if you're allowed to), follow this guide `release to OSS S .. code-block:: sh - mvn versions:set -DnewVersion=0.12.2 + ./update-version.sh -o 0.13.0 -n 14.4 mvn license:format mvn test - mvn -DperformRelease=true clean deploy + mvn -DperformRelease=true clean deploy -Phdp2.5 mvn versions:commit - git tag -a v0.12.2 -m "new logisland release 0.12.2" - git push origin v0.12.2 follow the staging procedure in `oss.sonatype.org `_ or read `Sonatype book `_ @@ -224,7 +222,7 @@ Publish release assets to github please refer to `https://developer.github.com/v3/repos/releases `_ -curl -XPOST https://uploads.github.com/repos/Hurence/logisland/releases/8905079/assets?name=logisland-0.12.2-bin-hdp2.5.tar.gz -v --data-binary @logisland-assembly/target/logisland-0.10.3-bin-hdp2.5.tar.gz --user oalam -H 'Content-Type: application/gzip' +curl -XPOST https://uploads.github.com/repos/Hurence/logisland/releases/8905079/assets?name=logisland-0.13.0-bin-hdp2.5.tar.gz -v --data-binary @logisland-assembly/target/logisland-0.10.3-bin-hdp2.5.tar.gz --user oalam -H 'Content-Type: application/gzip' @@ -235,7 +233,7 @@ Building the image .. code-block:: sh # build logisland - mvn clean install -DskipTests -Pdocker -Dhdp=2.4 + mvn clean install -DskipTests -Pdocker -Dhdp2.5 # verify image build docker images diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/monitoring.rst b/logisland-framework/logisland-resources/src/main/resources/docs/monitoring.rst index 809001ed9..0927fe3a0 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/monitoring.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/monitoring.rst @@ -63,8 +63,8 @@ Manual mode : # download the latest build of Node Exporter cd /opt - wget https://github.com/prometheus/node_exporter/releases/download/0.12.2/node_exporter-0.12.2.linux-amd64.tar.gz -O /tmp/node_exporter-0.12.2.linux-amd64.tar.gz - sudo tar -xvzf /tmp/node_exporter-0.12.2.linux-amd64.tar.gz + wget https://github.com/prometheus/node_exporter/releases/download/0.13.0/node_exporter-0.13.0.linux-amd64.tar.gz -O /tmp/node_exporter-0.13.0.linux-amd64.tar.gz + sudo tar -xvzf /tmp/node_exporter-0.13.0.linux-amd64.tar.gz # Create a soft link to the node_exporter binary in /usr/bin. sudo ln -s /opt/node_exporter /usr/bin diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/plugins_old.rst b/logisland-framework/logisland-resources/src/main/resources/docs/plugins_old.rst index 03e55da19..116df25e7 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/plugins_old.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/plugins_old.rst @@ -60,7 +60,7 @@ Write your a custom LogParser for your super-plugin in ``/src/main/java/com/hure Our parser will analyze some Proxy Log String in the following form : - "Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/RightJauge.gif 724 409 false false" + "Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/RightJauge.gif 724 409 false false" .. code-block:: java diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-blockchain-transactions.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-blockchain-transactions.rst new file mode 100644 index 000000000..8c4e6e396 --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-blockchain-transactions.rst @@ -0,0 +1,274 @@ +Index blockchain transactions +============================= + +In the following getting started tutorial, we'll explain you how to leverage logisland connectors flexibility +in order process in real time every transaction emitted by the bitcoin blockchain platform and index each record +into an elasticsearch platform. + +This will allow us to run some dashboarding and visual data analysis as well. + + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + + For kafka connect related information please follow as well the `connectors <../connectors.html>`_ section. + +1. Logisland job setup +---------------------- + +.. note:: + + To run this tutorial you have to package the blockchain connector into the logisland deployable jar. + You can do this simply by building logisland from sources passing the option *-DwithConnectBlockchain* to maven. + +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + vim conf/index-blockchain-transactions.yml + + + +We will start by explaining each part of the config file. + +========== +The engine +========== + +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index some blockchain transactions with logisland + configuration: + spark.app.name: BlockchainTest + spark.master: local[*] + spark.driver.memory: 512M + spark.driver.cores: 1 + spark.executor.memory: 512M + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 2000 + spark.streaming.backpressure.enabled: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 10000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4040 + + The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job. + + ================== + The parsing stream + ================== + + Here we are going to use a special processor (``KafkaConnectStructuredProviderService``) to use the kafka connect source as input for the structured stream defined below. + + For this example, we are going to use the source *com.datamountaineer.streamreactor.connect.blockchain.source.BlockchainSourceConnector* + that opens a secure websocket connections to the blockchain subscribing to any transaction update stream. + + + .. code-block:: yaml + + ControllerServiceConfigurations: + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + configuration: + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + kc.data.key.converter.properties: | + schemas.enable=false + kc.data.key.converter: org.apache.kafka.connect.storage.StringConverter + kc.worker.tasks.max: 1 + kc.connector.class: com.datamountaineer.streamreactor.connect.blockchain.source.BlockchainSourceConnector + kc.connector.offset.backing.store: memory + kc.connector.properties: | + connect.blockchain.source.url=wss://ws.blockchain.info/inv + connect.blockchain.source.kafka.topic=blockchain + + + +.. note:: Our source is providing structured value hence we convert with LogInslandRecordConverter serializing with Kryo + + +.. code-block:: yaml + + # Kafka sink configuration + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +So that, we can now define the *parsing stream* using those source and sink + +.. code-block:: yaml + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``FlatMap`` processor takes out the value and key (required when using *StructuredStream* as source of records) + +.. code-block:: yaml + + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "Takes out data from record_value" + configuration: + keep.root.record: false + copy.root.record.fields: true + +=================== +The indexing stream +=================== + + +Inside this engine, you will run a Kafka stream of processing, so we set up input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our output records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``BulkAddElasticsearch`` takes care of indexing a ``Record`` sending it to elasticsearch. + +.. code-block:: yaml + + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that indexes processed events in elasticsearch + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type + + +In details, this processor makes use of a ``Elasticsearch_5_4_0_ClientService`` controller service to interact with our Elasticsearch 5.X backend +running locally (and started as part of the docker compose configuration we mentioned above). + +Here below its configuration: + +.. code-block:: yaml + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + +2. Launch the script +-------------------- +Connect a shell to your logisland container to launch the following streaming jobs. + +.. code-block:: sh + + bin/logisland.sh --conf conf/index-blockchain-transactions.yml + + +3. Do some insights and visualizations +-------------------------------------- + +With ElasticSearch, you can use Kibana. + +Open up your browser and go to http://sandbox:5601/app/kibana#/ and you should be able to explore the blockchain transactions. + + +Configure a new index pattern with ``logisland.*`` as the pattern name and ``@timestamp`` as the time value field. + +.. image:: /_static/kibana-configure-index.png + +Then if you go to Explore panel for the latest 15' time window you'll only see logisland process_metrics events which give you +insights about the processing bandwidth of your streams. + + +.. image:: /_static/kibana-blockchain-records.png + + +You can try as well to create some basic visualization in order to draw the total satoshi transacted amount (aggregating sums of ``out.value`` field). + +Below a nice example: + +.. image:: /_static/kibana-blockchain-dashboard.png + + +Ready to discover which addresses received most of the money? Give it a try ;-) + + +4. Monitor your spark jobs and Kafka topics +------------------------------------------- +Now go to `http://sandbox:4050/streaming/ `_ to see how fast Spark can process +your data + +.. image:: /_static/spark-job-monitoring.png + + +Another tool can help you to tweak and monitor your processing `http://sandbox:9000/ `_ + +.. image:: /_static/kafka-mgr.png + + diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-excel-spreadsheet.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-excel-spreadsheet.rst new file mode 100644 index 000000000..be2a55ddb --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index-excel-spreadsheet.rst @@ -0,0 +1,192 @@ +Extract Records from Excel File +=============================== + +In the following getting started tutorial we'll drive you through the process of extracting data from any Excel file with LogIsland platform. + +Both XLSX and old XLS file format are supported. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +Note, it is possible to store data in different datastores. In this tutorial, we will see the case of ElasticSearch only. + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/index-excel-spreadsheet.yml + +We will start by explaining each part of the config file. + +An Engine is needed to handle the stream processing. This ``conf/extract-excel-data.yml`` configuration file defines a stream processing job setup. +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode with 2 cpu cores and 2G of RAM. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Index records of an excel file with LogIsland + configuration: + spark.app.name: IndexExcelDemo + spark.master: local[4] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job, here an Elasticsearch service that will be used later in the ``BulkAddElasticsearch`` processor. + +.. code-block:: yaml + + - controllerService: elasticsearch_service + component: com.hurence.logisland.service.elasticsearch.Elasticsearch_5_4_0_ClientService + type: service + documentation: elasticsearch service + configuration: + hosts: sandbox:9300 + cluster.name: es-logisland + batch.size: 5000 + + +Inside this engine you will run a Kafka stream of processing, so we setup input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + + +We can define some serializers to marshall all records from and to a topic. +We assume that the stream will be serializing the input file as a byte array in a single record. Reason why we will use a ByteArraySerialiser in the configuration below. + +.. code-block:: yaml + + # main processing stream + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw excel file content into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.BytesArraySerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +Within this stream, an ``ExcelExtract`` processor takes a byte array excel file content and computes a list of ``Record``. + +.. code-block:: yaml + + # parse excel cells into records + - processor: excel_parser + component: com.hurence.logisland.processor.excel.ExcelExtract + type: parser + documentation: a parser that produce events from an excel file + configuration: + record.type: excel_record + skip.rows: 1 + field.names: segment,country,product,discount_band,units_sold,manufacturing,sale_price,gross_sales,discounts,sales,cogs,profit,record_time,month_number,month_name,year + + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will +be parsed as an event which will be pushed back to Kafka in the ``logisland_events`` topic. + +.. note:: + + Please note that we are mapping the excel column *Date* to be the timestamp of the produced record (*record_time* field) in order to use this as time reference in elasticsearch/kibana (see below). + +The second processor will handle ``Records`` produced by the ``ExcelExtract`` to index them into elasticsearch + +.. code-block:: yaml + + # add to elasticsearch + - processor: es_publisher + component: com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch + type: processor + documentation: a processor that trace the processed events + configuration: + elasticsearch.client.service: elasticsearch_service + default.index: logisland + default.type: event + timebased.index: yesterday + es.index.field: search_index + es.type.field: record_type + + +2. Launch the script +-------------------- +For this tutorial we will handle an excel file. We will process it with an ExcelExtract that will produce a bunch of Records and we'll send them to Elastiscearch +Connect a shell to your logisland container to launch the following streaming jobs. + +For ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/index-excel-spreadsheet.yml + +3. Inject an excel file into the system +--------------------------------------- +Now we're going to send a file to ``logisland_raw`` Kafka topic. + +For testing purposes, we will use `kafkacat `_, +a *generic command line non-JVM Apache Kafka producer and consumer* which can be easily installed. + +.. note:: + + Sending raw files through kafka is not recommended for production use since kafka is designed for high throughput and not big message size. + + +The configuration above is suited to work with the example file *Financial Sample.xlsx*. + +Let's send this file in a single message to LogIsland with kafkacat to ``logisland_raw`` Kafka topic + +.. code-block:: sh + + kafkacat -P -t logisland_raw -v -b sandbox:9092 ./Financial\ Sample.xlsx + + +5. Inspect the logs +--------------------------------- + +Kibana +"""""" + +With ElasticSearch, you can use Kibana. + +Open up your browser and go to `http://sandbox:5601/ `_ and you should be able to explore your excel records. + +Configure a new index pattern with ``logisland.*`` as the pattern name and ``@timestamp`` as the time value field. + +.. image:: /_static/kibana-configure-index.png + +Then if you go to Explore panel for the latest 5 years time window. You are now able to play with the indexed data. + +.. image:: /_static/kibana-excel-logs.png + + +*Thanks logisland! :-)* \ No newline at end of file diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index.rst index 35d5c0ffe..a9f443d19 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/index.rst @@ -24,6 +24,7 @@ Contents: prerequisites index-apache-logs + store-to-redis match-queries aggregate-events enrich-apache-logs @@ -31,4 +32,8 @@ Contents: indexing-bro-events indexing-netflow-events indexing-network-packets - + generate_unique_ids + index-blockchain-transactions + index-excel-spreadsheets + mqtt-to-historian + integrate-kafka-connect diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/integrate-kafka-connect.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/integrate-kafka-connect.rst new file mode 100644 index 000000000..4a8a108b5 --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/integrate-kafka-connect.rst @@ -0,0 +1,259 @@ +Integrate Kafka Connect Sources & Sinks +======================================= + +In the following getting started tutorial, we'll focus on how to seamlessly integrate Kafka connect sources and sinks in logisland. + +We can call this functionality *Logisland connect*. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here for ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/logisland-kafka-connect.yml + + + +We will start by explaining each part of the config file. + +========== +The engine +========== + +The first section configures the Spark engine (we will use a `KafkaStreamProcessingEngine <../plugins.html#kafkastreamprocessingengine>`_) to run in local mode. + +.. code-block:: yaml + + engine: + component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine + type: engine + documentation: Use Kafka connectors with logisland + configuration: + spark.app.name: LogislandConnect + spark.master: local[2] + spark.driver.memory: 1G + spark.driver.cores: 1 + spark.executor.memory: 2G + spark.executor.instances: 4 + spark.executor.cores: 2 + spark.yarn.queue: default + spark.yarn.maxAppAttempts: 4 + spark.yarn.am.attemptFailuresValidityInterval: 1h + spark.yarn.max.executor.failures: 20 + spark.yarn.executor.failuresValidityInterval: 1h + spark.task.maxFailures: 8 + spark.serializer: org.apache.spark.serializer.KryoSerializer + spark.streaming.batchDuration: 1000 + spark.streaming.backpressure.enabled: false + spark.streaming.unpersist: false + spark.streaming.blockInterval: 500 + spark.streaming.kafka.maxRatePerPartition: 3000 + spark.streaming.timeout: -1 + spark.streaming.unpersist: false + spark.streaming.kafka.maxRetries: 3 + spark.streaming.ui.retainedBatches: 200 + spark.streaming.receiver.writeAheadLog.enable: false + spark.ui.port: 4050 + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job. + +================== +The parsing stream +================== + +Here we are going to use a special processor (``KafkaConnectStructuredProviderService``) to use the kafka connect source as input for the structured stream defined below. + +For this example, we are going to use the source *com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector* that generates records containing fake personal data at rate of 100 messages/s. + + +.. code-block:: yaml + + # Our source service + - controllerService: kc_source_service + component: com.hurence.logisland.stream.spark.provider.KafkaConnectStructuredProviderService + documentation: A kafka source connector provider reading from its own source and providing structured streaming to the underlying layer + configuration: + # We will use the logisland record converter for both key and value + kc.data.value.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.value.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + + kc.data.key.converter: com.hurence.logisland.connect.converter.LogIslandRecordConverter + # Use kryo to serialize the inner data + kc.data.key.converter.properties: | + record.serializer=com.hurence.logisland.serializer.KryoSerializer + # Only one task to handle source input (unique) + kc.worker.tasks.max: 1 + # The kafka source connector to wrap (here we're using a simulator source) + kc.connector.class: com.github.jcustenborder.kafka.connect.simulator.SimulatorSourceConnector + # The properties for the connector (as per connector documentation) + kc.connector.properties: | + key.schema.fields=email + topic=simulator + value.schema.fields=email,firstName,middleName,lastName,telephoneNumber,dateOfBirth + # We are using a standalone source for testing. We can store processed offsets in memory + kc.connector.offset.backing.store: memory + +.. note:: + + The parameter **kc.connector.properties** contains the connector properties as you would have defined if you were using vanilla kafka connect. + + As well, we are using a *memory* offset backing store. In a distributed scenario, you may have chosen a *kafka* topic based one. + +Since each stream can be read and written, we are going to define as well a Kafka topic sink (``KafkaStructuredStreamProviderService``) that will be used as output for the structured stream defined below. + +.. code-block:: yaml + + # Kafka sink configuration + - controllerService: kafka_out_service + component: com.hurence.logisland.stream.spark.structured.provider.KafkaStructuredStreamProviderService + configuration: + kafka.output.topics: logisland_raw + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +So that, we can now define the *parsing stream* using those source and sink + +.. code-block:: yaml + + ######### parsing stream ############## + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``FlatMap`` processor takes out the value and key (required when using *StructuredStream* as source of records) + +.. code-block:: yaml + + processorConfigurations: + - processor: flatten + component: com.hurence.logisland.processor.FlatMap + type: processor + documentation: "Takes out data from record_value" + configuration: + keep.root.record: false + copy.root.record.fields: true + +=================== +The indexing stream +=================== + + +Inside this engine, you will run a Kafka stream of processing, so we set up input/output topics and Kafka/Zookeeper hosts. +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our output records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + + - stream: parsing_stream_source + component: com.hurence.logisland.stream.spark.structured.StructuredStream + documentation: "Takes records from the kafka source and distributes related partitions over a kafka topic. Records are then handed off to the indexing stream" + configuration: + read.topics: /a/in + read.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + read.topics.client.service: kc_source_service + write.topics: logisland_raw + write.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.key.serializer: com.hurence.logisland.serializer.KryoSerializer + write.topics.client.service: kafka_out_service + + +Within this stream, a ``DebugStream`` processor takes a log line as a String and computes a ``Record`` as a sequence of fields. + +.. code-block:: yaml + + processorConfigurations: + # We just print the received records (but you may do something more interesting!) + - processor: stream_debugger + component: com.hurence.logisland.processor.DebugStream + type: processor + documentation: debug records + configuration: + event.serializer: json + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will be printed in the console and pushed back to Kafka in the ``logisland_events`` topic. + + + +2. Launch the script +-------------------- +Connect a shell to your logisland container to launch the following streaming jobs. + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/logisland-kafka-connect.yml + + +3. Examine your console output +------------------------------ + +Since we put a *DebugStream* processor, messages produced by our source connectors are then output to the console in json. + +.. code-block:: json + + 18/04/06 11:17:06 INFO DebugStream: { + "id" : "9b17a9ac-97c4-44ef-9168-d298e8c53d42", + "type" : "kafka_connect", + "creationDate" : 1523006216376, + "fields" : { + "record_id" : "9b17a9ac-97c4-44ef-9168-d298e8c53d42", + "firstName" : "London", + "lastName" : "Marks", + "telephoneNumber" : "005-694-4540", + "record_key" : { + "email" : "londonmarks@fake.com" + }, + "middleName" : "Anna", + "dateOfBirth" : 836179200000, + "record_time" : 1523006216376, + "record_type" : "kafka_connect", + "email" : "londonmarks@fake.com" + } + } + + + +4. Monitor your spark jobs and Kafka topics +------------------------------------------- +Now go to `http://sandbox:4050/streaming/ `_ to see how fast Spark can process +your data + +.. image:: /_static/spark-job-monitoring.png + + +Another tool can help you to tweak and monitor your processing `http://sandbox:9000/ `_ + +.. image:: /_static/kafka-mgr.png + + diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/prerequisites.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/prerequisites.rst index 543bd9ac9..61c9995ae 100644 --- a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/prerequisites.rst +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/prerequisites.rst @@ -15,72 +15,83 @@ To facilitate integration testing and to easily run tutorials, you can create a .. code-block:: yaml - # Zookeeper container 172.17.0.1 - zookeeper: - image: hurence/zookeeper - hostname: zookeeper - container_name: zookeeper - ports: - - "2181:2181" - - # Kafka container - kafka: - image: hurence/kafka - hostname: kafka - container_name: kafka - links: - - zookeeper - ports: - - "9092:9092" - environment: - KAFKA_ADVERTISED_PORT: 9092 - KAFKA_ADVERTISED_HOST_NAME: sandbox - KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 - KAFKA_JMX_PORT: 7071 - - # ES container - elasticsearch: - environment: - - ES_JAVA_OPT="-Xms1G -Xmx1G" - - cluster.name=es-logisland - - http.host=0.0.0.0 - - transport.host=0.0.0.0 - - xpack.security.enabled=false - hostname: elasticsearch - container_name: elasticsearch - image: 'docker.elastic.co/elasticsearch/elasticsearch:5.4.0' - ports: - - '9200:9200' - - '9300:9300' - - # Kibana container - kibana: - environment: - - 'ELASTICSEARCH_URL=http://elasticsearch:9200' - image: 'docker.elastic.co/kibana/kibana:5.4.0' - container_name: kibana - links: - - elasticsearch - ports: - - '5601:5601' - - # Logisland container : does nothing but launching - logisland: - image: hurence/logisland - command: tail -f bin/logisland.sh - #command: bin/logisland.sh --conf /conf/index-apache-logs.yml - links: - - zookeeper - - kafka - - elasticsearch - ports: - - "4050:4050" - volumes: - - ./conf/logisland:/conf - - ./data/logisland:/data - container_name: logisland - extra_hosts: - - "sandbox:172.17.0.1" + version: "2" + services: + + zookeeper: + container_name: zookeeper + image: hurence/zookeeper + hostname: zookeeper + ports: + - "2181:2181" + + kafka: + container_name: kafka + image: hurence/kafka + hostname: kafka + links: + - zookeeper + ports: + - "9092:9092" + environment: + KAFKA_ADVERTISED_PORT: 9092 + KAFKA_ADVERTISED_HOST_NAME: sandbox + KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 + KAFKA_JMX_PORT: 7071 + + # ES container + elasticsearch: + container_name: elasticsearch + environment: + - ES_JAVA_OPT="-Xms1G -Xmx1G" + - cluster.name=es-logisland + - http.host=0.0.0.0 + - transport.host=0.0.0.0 + - xpack.security.enabled=false + hostname: elasticsearch + container_name: elasticsearch + image: 'docker.elastic.co/elasticsearch/elasticsearch:5.4.0' + ports: + - '9200:9200' + - '9300:9300' + + # Kibana container + kibana: + container_name: kibana + environment: + - 'ELASTICSEARCH_URL=http://elasticsearch:9200' + image: 'docker.elastic.co/kibana/kibana:5.4.0' + container_name: kibana + links: + - elasticsearch + ports: + - '5601:5601' + + # Logisland container : does nothing but launching + logisland: + container_name: logisland + image: hurence/logisland:0.13.0 + command: tail -f bin/logisland.sh + #command: bin/logisland.sh --conf /conf/index-apache-logs.yml + links: + - zookeeper + - kafka + - elasticsearch + - redis + ports: + - "4050:4050" + volumes: + - ./conf/logisland:/conf + - ./data/logisland:/data + container_name: logisland + extra_hosts: + - "sandbox:172.17.0.1" + + redis: + container_name: redis + image: 'redis:latest' + ports: + - '6379:6379' Once you have this file you can run a `docker-compose` command to launch all the needed services (zookeeper, kafka, es, kibana and logisland) @@ -115,10 +126,10 @@ From an edge node of your cluster : .. code-block:: sh cd /opt - sudo wget https://github.com/Hurence/logisland/releases/download/v0.12.2/logisland-0.12.2-bin-hdp2.5.tar.gz + sudo wget https://github.com/Hurence/logisland/releases/download/v0.13.0/logisland-0.13.0-bin-hdp2.5.tar.gz export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ export HADOOP_CONF_DIR=$SPARK_HOME/conf - sudo /opt/logisland-0.12.2/bin/logisland.sh --conf /home/hurence/tom/logisland-conf/v0.10.0/future-factory.yml + sudo /opt/logisland-0.13.0/bin/logisland.sh --conf /home/hurence/tom/logisland-conf/v0.10.0/future-factory.yml diff --git a/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/store-to-redis.rst b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/store-to-redis.rst new file mode 100644 index 000000000..f7e69291a --- /dev/null +++ b/logisland-framework/logisland-resources/src/main/resources/docs/tutorials/store-to-redis.rst @@ -0,0 +1,180 @@ +Store Apache logs to Redis K/V store +==================================== + +In the following getting started tutorial we'll drive you through the process of Apache log mining with LogIsland platform. + +.. note:: + + Be sure to know of to launch a logisland Docker environment by reading the `prerequisites <./prerequisites.html>`_ section + +Note, it is possible to store data in different datastores. In this tutorial, we will see the case of Redis, if you need more in-depth explanations you can read the previous tutorial on indexing apache logs to elasticsearch or solr : `index-apache-logs.html`_ . + +1. Logisland job setup +---------------------- +The logisland job for this tutorial is already packaged in the tar.gz assembly and you can find it here : + +.. code-block:: sh + + docker exec -i -t logisland vim conf/store-to-redis.yml + +We will start by explaining each part of the config file. + +The `controllerServiceConfigurations` part is here to define all services that be shared by processors within the whole job, here a Redis KV cache service that will be used later in the ``BulkPut`` processor. + +.. code-block:: yaml + + - controllerService: datastore_service + component: com.hurence.logisland.redis.service.RedisKeyValueCacheService + type: service + documentation: redis datastore service + configuration: + connection.string: localhost:6379 + redis.mode: standalone + database.index: 0 + communication.timeout: 10 seconds + pool.max.total: 8 + pool.max.idle: 8 + pool.min.idle: 0 + pool.block.when.exhausted: true + pool.max.wait.time: 10 seconds + pool.min.evictable.idle.time: 60 seconds + pool.time.between.eviction.runs: 30 seconds + pool.num.tests.per.eviction.run: -1 + pool.test.on.create: false + pool.test.on.borrow: false + pool.test.on.return: false + pool.test.while.idle: true + record.recordSerializer: com.hurence.logisland.serializer.JsonSerializer + + +Here the stream will read all the logs sent in ``logisland_raw`` topic and push the processing output into ``logisland_events`` topic. + +.. note:: + + We want to specify an Avro output schema to validate our ouput records (and force their types accordingly). + It's really for other streams to rely on a schema when processing records from a topic. + +We can define some serializers to marshall all records from and to a topic. + +.. code-block:: yaml + + - stream: parsing_stream + component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing + type: stream + documentation: a processor that converts raw apache logs into structured log records + configuration: + kafka.input.topics: logisland_raw + kafka.output.topics: logisland_events + kafka.error.topics: logisland_errors + kafka.input.topics.serializer: none + kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer + kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer + kafka.metadata.broker.list: sandbox:9092 + kafka.zookeeper.quorum: sandbox:2181 + kafka.topic.autoCreate: true + kafka.topic.default.partitions: 4 + kafka.topic.default.replicationFactor: 1 + +Within this stream a ``SplitText`` processor takes a log line as a String and computes a ``Record`` as a sequence of fields. + +.. code-block:: yaml + + # parse apache logs + - processor: apache_parser + component: com.hurence.logisland.processor.SplitText + type: parser + documentation: a parser that produce events from an apache log REGEX + configuration: + value.regex: (\S+)\s+(\S+)\s+(\S+)\s+\[([\w:\/]+\s[+\-]\d{4})\]\s+"(\S+)\s+(\S+)\s*(\S*)"\s+(\S+)\s+(\S+) + value.fields: src_ip,identd,user,record_time,http_method,http_query,http_version,http_status,bytes_out + +This stream will process log entries as soon as they will be queued into `logisland_raw` Kafka topics, each log will +be parsed as an event which will be pushed back to Kafka in the ``logisland_events`` topic. + +The second processor will handle ``Records`` produced by the ``SplitText`` to index them into datastore previously defined (Redis) + +.. code-block:: yaml + + # all the parsed records are added to datastore by bulk + - processor: datastore_publisher + component: com.hurence.logisland.processor.datastore.BulkPut + type: processor + documentation: "indexes processed events in datastore" + configuration: + datastore.client.service: datastore_service + + + +2. Launch the script +-------------------- +For this tutorial we will handle some apache logs with a splitText parser and send them to Elastiscearch +Connect a shell to your logisland container to launch the following streaming jobs. + +For ElasticSearch : + +.. code-block:: sh + + docker exec -i -t logisland bin/logisland.sh --conf conf/store-to-redis.yml + + +3. Inject some Apache logs into the system +------------------------------------------ +Now we're going to send some logs to ``logisland_raw`` Kafka topic. + +We could setup a logstash or flume agent to load some apache logs into a kafka topic +but there's a super useful tool in the Kafka ecosystem : `kafkacat `_, +a *generic command line non-JVM Apache Kafka producer and consumer* which can be easily installed. + + +If you don't have your own httpd logs available, you can use some freely available log files from +`NASA-HTTP `_ web site access: + +- `Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed `_ +- `Aug 04 to Aug 31, ASCII format, 21.8 MB gzip compressed `_ + +Let's send the first 500000 lines of NASA http access over July 1995 to LogIsland with kafkacat to ``logisland_raw`` Kafka topic + +.. code-block:: sh + + cd /tmp + wget ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz + gunzip NASA_access_log_Jul95.gz + head -500000 NASA_access_log_Jul95 | kafkacat -b sandbox:9092 -t logisland_raw + + + +4. Inspect the logs +------------------- + +For this part of the tutorial we will use `redis-py a Python client for Redis `_. You can install it by following instructions given on `redis-py `_. + +To install redis-py, simply: + +.. code-block:: sh + + $ sudo pip install redis + + +Getting Started, check if you can connect with Redis + +.. code-block:: python + + >>> import redis + >>> r = redis.StrictRedis(host='localhost', port=6379, db=0) + >>> r.set('foo', 'bar') + >>> r.get('foo') + +Then we want to grab some logs that have been collected to Redis. We first find some keys with a pattern and get the json content of one + +.. code-block:: python + + >>> r.keys('1234*') +['123493eb-93df-4e57-a1c1-4a8e844fa92c', '123457d5-8ccc-4f0f-b4ba-d70967aa48eb', '12345e06-6d72-4ce8-8254-a7cc4bab5e31'] + + >>> r.get('123493eb-93df-4e57-a1c1-4a8e844fa92c') +'{\n "id" : "123493eb-93df-4e57-a1c1-4a8e844fa92c",\n "type" : "apache_log",\n "creationDate" : 804574829000,\n "fields" : {\n "src_ip" : "204.191.209.4",\n "record_id" : "123493eb-93df-4e57-a1c1-4a8e844fa92c",\n "http_method" : "GET",\n "http_query" : "/images/WORLD-logosmall.gif",\n "bytes_out" : "669",\n "identd" : "-",\n "http_version" : "HTTP/1.0",\n "record_raw_value" : "204.191.209.4 - - [01/Jul/1995:01:00:29 -0400] \\"GET /images/WORLD-logosmall.gif HTTP/1.0\\" 200 669",\n "http_status" : "200",\n "record_time" : 804574829000,\n "user" : "-",\n "record_type" : "apache_log"\n }\n}' + + >>> import json + >>> record = json.loads(r.get('123493eb-93df-4e57-a1c1-4a8e844fa92c')) + >>> record['fields']['bytes_out'] + diff --git a/logisland-framework/logisland-scripting/logisland-scripting-base/pom.xml b/logisland-framework/logisland-scripting/logisland-scripting-base/pom.xml index 9744fb95c..becdef7ee 100644 --- a/logisland-framework/logisland-scripting/logisland-scripting-base/pom.xml +++ b/logisland-framework/logisland-scripting/logisland-scripting-base/pom.xml @@ -4,7 +4,7 @@ com.hurence.logisland logisland-scripting - 0.12.2 + 0.13.0 logisland-scripting-base diff --git a/logisland-framework/logisland-scripting/logisland-scripting-mvel/pom.xml b/logisland-framework/logisland-scripting/logisland-scripting-mvel/pom.xml index fb95f6ffb..3f62c6f25 100644 --- a/logisland-framework/logisland-scripting/logisland-scripting-mvel/pom.xml +++ b/logisland-framework/logisland-scripting/logisland-scripting-mvel/pom.xml @@ -4,7 +4,7 @@ com.hurence.logisland logisland-scripting - 0.12.2 + 0.13.0 logisland-scripting-mvel diff --git a/logisland-framework/logisland-scripting/pom.xml b/logisland-framework/logisland-scripting/pom.xml index 43c6d7f2e..9b329529a 100644 --- a/logisland-framework/logisland-scripting/pom.xml +++ b/logisland-framework/logisland-scripting/pom.xml @@ -21,7 +21,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-scripting diff --git a/logisland-framework/logisland-utils/pom.xml b/logisland-framework/logisland-utils/pom.xml index 70b915466..82686174b 100644 --- a/logisland-framework/logisland-utils/pom.xml +++ b/logisland-framework/logisland-utils/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-framework - 0.12.2 + 0.13.0 logisland-utils jar @@ -104,6 +104,7 @@ junit junit + com.101tec zkclient diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/controller/StandardControllerServiceLookup.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/controller/StandardControllerServiceLookup.java index 22ff5182d..3b3c7fc99 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/controller/StandardControllerServiceLookup.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/controller/StandardControllerServiceLookup.java @@ -92,7 +92,6 @@ public ControllerService getControllerService(String serviceIdentifier) { } } - logger.debug("getting controller service {}", new Object[]{serviceIdentifier}); return controllerServiceMap.get(serviceIdentifier); } diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/metrics/Names.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/metrics/Names.java new file mode 100644 index 000000000..24cb8ccf2 --- /dev/null +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/metrics/Names.java @@ -0,0 +1,38 @@ +/* + * * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.metrics; + +public interface Names { + + String INCOMING_MESSAGES = "incoming_messages"; + String INCOMING_RECORDS = "incoming_records"; + String OUTGOING_RECORDS = "outgoing_records"; + String ERRORS = "errors"; + String BYTES_PER_FIELD_AVERAGE = "bytes_per_field_average"; + String BYTES_PER_RECORD_AVERAGE = "bytes_per_record_average"; + String RECORDS_PER_SECOND_AVERAGE = "records_per_second_average"; + String PROCESSED_BYTES = "processed_bytes"; + String PROCESSED_FIELDS = "processed_fields"; + String ERROR_PERCENTAGE = "error_percentage"; + String FIELDS_PER_RECORD_AVERAGE = "fields_per_record_average"; + String BYTES_PER_SECOND_AVERAGE = "bytes_per_second_average"; + String PROCESSING_TIME_MS = "processing_time_ms"; + + +} + diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/BytesArraySerializer.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/BytesArraySerializer.java index 1e17a5034..7675f6f94 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/BytesArraySerializer.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/BytesArraySerializer.java @@ -31,20 +31,24 @@ package com.hurence.logisland.serializer; -import com.hurence.logisland.record.FieldDictionary; -import com.hurence.logisland.record.FieldType; -import com.hurence.logisland.record.Record; -import com.hurence.logisland.record.StandardRecord; +import com.hurence.logisland.record.*; import org.apache.commons.io.IOUtils; +import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; public class BytesArraySerializer implements RecordSerializer { - /* TODO */ public void serialize(OutputStream objectDataOutput, Record record) { - throw new RuntimeException("BytesArraySerializer serialize method not implemented yet"); + Field f = record.getField(FieldDictionary.RECORD_VALUE); + if (f != null && f.isSet() && f.getType().equals(FieldType.BYTES)) { + try { + objectDataOutput.write((byte[])record.getField(FieldDictionary.RECORD_VALUE).getRawValue()); + } catch (IOException ioe) { + throw new RecordSerializationException(ioe.getMessage(), ioe.getCause()); + } + } } public Record deserialize(InputStream objectDataInput) { @@ -54,7 +58,6 @@ public Record deserialize(InputStream objectDataInput) { record.setField(FieldDictionary.RECORD_VALUE, FieldType.BYTES, bytes); return record; } catch (Throwable t) { - // t.printStackTrace(); throw new RecordSerializationException(t.getMessage(), t.getCause()); } } diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/JsonSerializer.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/JsonSerializer.java index d7fa0d535..9260ce163 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/JsonSerializer.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/JsonSerializer.java @@ -84,7 +84,7 @@ public void serialize(Record record, JsonGenerator jgen, com.fasterxml.jackson.d // retrieve event field String fieldName = entry.getKey(); Field field = entry.getValue(); - Object fieldValue = field.getRawValue(); + // Object fieldValue = field.getRawValue(); String fieldType = field.getType().toString(); // dump event field as record attribute @@ -92,25 +92,26 @@ public void serialize(Record record, JsonGenerator jgen, com.fasterxml.jackson.d try { switch (fieldType.toLowerCase()) { case "string": - jgen.writeStringField(fieldName, (String) fieldValue); + jgen.writeStringField(fieldName, field.asString()); break; case "integer": - jgen.writeNumberField(fieldName, (int) fieldValue); + case "int": + jgen.writeNumberField(fieldName, field.asInteger()); break; case "long": - jgen.writeNumberField(fieldName, (long) fieldValue); + jgen.writeNumberField(fieldName, field.asLong()); break; case "float": - jgen.writeNumberField(fieldName, (float) fieldValue); + jgen.writeNumberField(fieldName, field.asFloat()); break; case "double": - jgen.writeNumberField(fieldName, (double) fieldValue); + jgen.writeNumberField(fieldName, field.asDouble()); break; case "boolean": - jgen.writeBooleanField(fieldName, (boolean) fieldValue); + jgen.writeBooleanField(fieldName, field.asBoolean()); break; default: - jgen.writeObjectField(fieldName, fieldValue); + jgen.writeObjectField(fieldName, field.asString()); break; } } catch (Exception ex) { @@ -201,9 +202,11 @@ public Record deserialize(JsonParser jp, DeserializationContext ctxt) throws IOE case VALUE_NUMBER_FLOAT: try { - fields.put(jp.getCurrentName(), new Field(jp.getCurrentName(), FieldType.FLOAT, jp.getFloatValue())); - } catch (JsonParseException ex) { + fields.put(jp.getCurrentName(), new Field(jp.getCurrentName(), FieldType.DOUBLE, jp.getDoubleValue())); + } catch (JsonParseException ex) { + + fields.put(jp.getCurrentName(), new Field(jp.getCurrentName(), FieldType.FLOAT, jp.getFloatValue())); } break; case VALUE_FALSE: @@ -246,11 +249,17 @@ public Record deserialize(JsonParser jp, DeserializationContext ctxt) throws IOE } } - Record record = new StandardRecord(type); - record.setId(id); - record.setType(type); - record.setTime(creationDate); - record.setFields(fields); + Record record = new StandardRecord(); + if (id != null) { + record.setId(id); + } + if (type != null) { + record.setType(type); + } + if (creationDate != null) { + record.setTime(creationDate); + } + record.addFields(fields); return record; diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/KryoSerializer.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/KryoSerializer.java index 11ee1fa24..ca011e2cc 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/KryoSerializer.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/KryoSerializer.java @@ -39,6 +39,7 @@ import com.hurence.logisland.record.Field; import java.io.ByteArrayOutputStream; +import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.util.zip.DeflaterOutputStream; @@ -117,4 +118,5 @@ public Record deserialize(InputStream objectDataInput) { throw new RecordSerializationException(t.getMessage(), t.getCause()); } } + } \ No newline at end of file diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/SerializerProvider.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/SerializerProvider.java index a83eb7196..05daccd3a 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/SerializerProvider.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/SerializerProvider.java @@ -28,6 +28,7 @@ public class SerializerProvider { private static String JSON_SERIALIZER = JsonSerializer.class.getName(); private static String KRYO_SERIALIZER = KryoSerializer.class.getName(); private static String BYTES_ARRAY_SERIALIZER = BytesArraySerializer.class.getName(); + private static String STRING_SERIALIZER = StringSerializer.class.getName(); private static String NOOP_SERIALIZER = NoopSerializer.class.getName(); private static String KURA_PROTOBUF_SERIALIZER = KuraProtobufSerializer.class.getName(); @@ -53,6 +54,8 @@ public static RecordSerializer getSerializer(final String inSerializerClass, fin return new BytesArraySerializer(); } else if (inSerializerClass.equals(KURA_PROTOBUF_SERIALIZER)) { return new KuraProtobufSerializer(); + } else if (inSerializerClass.equals(STRING_SERIALIZER)) { + return new StringSerializer(); } else { return new NoopSerializer(); } diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/StringSerializer.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/StringSerializer.java index 6c82b8311..bee772dc4 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/StringSerializer.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/serializer/StringSerializer.java @@ -1,12 +1,12 @@ /** * Copyright (C) 2016 Hurence (support@hurence.com) - * + *

* Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

* Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -31,20 +31,18 @@ package com.hurence.logisland.serializer; +import com.hurence.logisland.record.FieldDictionary; +import com.hurence.logisland.record.FieldType; import com.hurence.logisland.record.Record; import com.hurence.logisland.record.StandardRecord; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; +import org.apache.commons.io.IOUtils; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; public class StringSerializer implements RecordSerializer { - - private static Logger logger = LoggerFactory.getLogger(StringSerializer.class); - - + @Override public void serialize(OutputStream out, Record record) throws RecordSerializationException { @@ -60,8 +58,11 @@ public void serialize(OutputStream out, Record record) throws RecordSerializatio @Override public Record deserialize(InputStream in) throws RecordSerializationException { - - throw new RuntimeException("not implemented yet"); + try { + return new StandardRecord().setField(FieldDictionary.RECORD_VALUE, FieldType.STRING, IOUtils.toString(in)); + } catch (IOException ioe) { + throw new RecordSerializationException(ioe.getMessage(), ioe); + } } } \ No newline at end of file diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockPropertyValue.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockPropertyValue.java index edf58a57b..2bf5500bd 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockPropertyValue.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockPropertyValue.java @@ -24,6 +24,7 @@ import com.hurence.logisland.record.Record; import com.hurence.logisland.record.StandardRecord; import com.hurence.logisland.registry.VariableRegistry; +import com.hurence.logisland.util.FormatUtils; import java.util.Map; import java.util.concurrent.TimeUnit; @@ -121,7 +122,10 @@ public Double asDouble() { return stdPropValue.asDouble(); } - + @Override + public Long asTimePeriod(final TimeUnit timeUnit) { + return (rawValue == null) ? null : FormatUtils.getTimeDuration(rawValue.trim(), timeUnit); + } private void markEvaluated() { if (Boolean.FALSE.equals(expectExpressions)) { diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockRecord.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockRecord.java index 6bb30f2e9..83135afa9 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockRecord.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/MockRecord.java @@ -89,6 +89,14 @@ public void assertFieldEquals(final String fieldName, final byte[] expectedValue // assertedFields.add(fieldName); } + public void assertNullField(final String fieldName) { + Assert.assertNull(getField(fieldName).getRawValue()); + } + + public void assertNotNullField(final String fieldName) { + Assert.assertNotNull(getField(fieldName).getRawValue()); + } + public void assertFieldNotEquals(final String fieldName, final String expectedValue) { Assert.assertNotSame(expectedValue, getField(fieldName).asString()); //assertedFields.add(fieldName); diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/StandardProcessorTestRunner.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/StandardProcessorTestRunner.java index 8c4df2341..e73511fd6 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/StandardProcessorTestRunner.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/runner/StandardProcessorTestRunner.java @@ -321,8 +321,17 @@ public void enableControllerService(final ControllerService service) { } try { - final ControllerServiceInitializationContext configContext = new MockConfigurationContext(service, configuration.getProperties(), context, variableRegistry); - ReflectionUtils.invokeMethodsWithAnnotation(OnEnabled.class, service, configContext); + // final ControllerServiceInitializationContext configContext = new MockConfigurationContext(service, configuration.getProperties(), context, variableRegistry); + + + final MockControllerServiceInitializationContext initContext = new MockControllerServiceInitializationContext(requireNonNull(service), requireNonNull(service.getIdentifier())); + initContext.addControllerServices(context); + + for(PropertyDescriptor prop : context.getProperties().keySet()) { + initContext.setProperty(prop.getName(), context.getPropertyValue(prop.getName()).asString()); + } + + ReflectionUtils.invokeMethodsWithAnnotation(OnEnabled.class, service, initContext); } catch (final InvocationTargetException ite) { ite.getCause().printStackTrace(); Assert.fail("Failed to enable Controller Service " + service + " due to " + ite.getCause()); diff --git a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/stream/io/StreamUtils.java b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/stream/io/StreamUtils.java index bd5493fdb..75c1f0968 100644 --- a/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/stream/io/StreamUtils.java +++ b/logisland-framework/logisland-utils/src/main/java/com/hurence/logisland/util/stream/io/StreamUtils.java @@ -1,12 +1,12 @@ /** * Copyright (C) 2016 Hurence (support@hurence.com) - * + *

* Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

* Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -16,13 +16,15 @@ package com.hurence.logisland.util.stream.io; - import java.io.EOFException; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.util.ArrayList; +import java.util.Iterator; import java.util.List; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; public class StreamUtils { @@ -40,9 +42,9 @@ public static long copy(final InputStream source, final OutputStream destination /** * Copies numBytes from source to destination. If numBytes are not available from source, throws EOFException * - * @param source the source of bytes to copy + * @param source the source of bytes to copy * @param destination the destination to copy bytes to - * @param numBytes the number of bytes to copy + * @param numBytes the number of bytes to copy * @throws IOException if any issues occur while copying */ public static void copy(final InputStream source, final OutputStream destination, final long numBytes) throws IOException { @@ -62,7 +64,7 @@ public static void copy(final InputStream source, final OutputStream destination /** * Reads data from the given input stream, copying it to the destination byte array. If the InputStream has less data than the given byte array, throws an EOFException * - * @param source the source to copy bytes from + * @param source the source to copy bytes from * @param destination the destination to fill * @throws IOException if any issues occur reading bytes */ @@ -74,8 +76,8 @@ public static void fillBuffer(final InputStream source, final byte[] destination * Reads data from the given input stream, copying it to the destination byte array. If the InputStream has less data than the given byte array, throws an EOFException if * ensureCapacity is true and otherwise returns the number of bytes copied * - * @param source the source to read bytes from - * @param destination the destination to fill + * @param source the source to read bytes from + * @param destination the destination to fill * @param ensureCapacity whether or not to enforce that the InputStream have at least as much data as the capacity of the destination byte array * @return the number of bytes actually filled * @throws IOException if unable to read from the underlying stream @@ -103,8 +105,8 @@ public static int fillBuffer(final InputStream source, final byte[] destination, * Copies data from in to out until either we are out of data (returns null) or we hit one of the byte patterns identified by the stoppers parameter (returns the byte pattern * matched). The bytes in the stopper will be copied. * - * @param in the source to read bytes from - * @param out the destination to write bytes to + * @param in the source to read bytes from + * @param out the destination to write bytes to * @param maxBytes the max bytes to copy * @param stoppers patterns of bytes which if seen will cause the copy to stop * @return the byte array matched, or null if end of stream was reached @@ -143,8 +145,8 @@ public static byte[] copyInclusive(final InputStream in, final OutputStream out, * Copies data from in to out until either we are out of data (returns null) or we hit one of the byte patterns identified by the stoppers parameter (returns the byte pattern * matched). The byte pattern matched will NOT be copied to the output and will be un-read from the input. * - * @param in the source to read bytes from - * @param out the destination to write bytes to + * @param in the source to read bytes from + * @param out the destination to write bytes to * @param maxBytes the maximum number of bytes to copy * @param stoppers byte patterns which will cause the copy to stop if found * @return the byte array matched, or null if end of stream was reached @@ -203,10 +205,10 @@ public static byte[] copyExclusive(final InputStream in, final OutputStream out, /** * Skips the specified number of bytes from the InputStream - * + *

* If unable to skip that number of bytes, throws EOFException * - * @param stream the stream to skip over + * @param stream the stream to skip over * @param bytesToSkip the number of bytes to skip * @throws IOException if any issues reading or skipping underlying stream */ @@ -240,4 +242,30 @@ public static void skip(final InputStream stream, final long bytesToSkip) throws throw new EOFException(); } } + + + /** + * Converts an {@link Iterator} to a java {@link Stream} + * + * @param sourceIterator the iterator + * @param the data type of the stream + * @return the {@link Stream} + */ + public static Stream asStream(Iterator sourceIterator) { + return asStream(sourceIterator, false); + } + + /** + * Converts an {@link Iterator} to a java {@link Stream} + * + * @param sourceIterator the iterator + * @param parallel parallelize the stream + * @param the data type of the stream + * @return the {@link Stream} + */ + public static Stream asStream(Iterator sourceIterator, boolean parallel) { + Iterable iterable = () -> sourceIterator; + return StreamSupport.stream(iterable.spliterator(), parallel); + + } } diff --git a/logisland-framework/logisland-utils/src/test/java/com/hurence/logisland/serializer/JsonSerializerTest.java b/logisland-framework/logisland-utils/src/test/java/com/hurence/logisland/serializer/JsonSerializerTest.java index 55cdd10f3..ccc597ac0 100755 --- a/logisland-framework/logisland-utils/src/test/java/com/hurence/logisland/serializer/JsonSerializerTest.java +++ b/logisland-framework/logisland-utils/src/test/java/com/hurence/logisland/serializer/JsonSerializerTest.java @@ -29,6 +29,7 @@ import java.io.IOException; import java.util.Date; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; /** @@ -70,7 +71,7 @@ public void validateJsonSerialization() throws IOException { ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); Record deserializedRecord = serializer.deserialize(bais); - assertTrue(deserializedRecord.equals(record)); + assertEquals(record,deserializedRecord); } diff --git a/logisland-framework/logisland-utils/src/test/resources/configuration-templatev2.yml b/logisland-framework/logisland-utils/src/test/resources/configuration-templatev2.yml index 1c78bcd2a..4e63607c3 100644 --- a/logisland-framework/logisland-utils/src/test/resources/configuration-templatev2.yml +++ b/logisland-framework/logisland-utils/src/test/resources/configuration-templatev2.yml @@ -2,7 +2,7 @@ # Logisland configuration script tempate ######################################################################################################### -version: 0.12.2 +version: 0.13.0 documentation: LogIsland analytics main config file. Put here every engine or component config ######################################################################################################### diff --git a/logisland-framework/pom.xml b/logisland-framework/pom.xml index a2287a765..56b8ead06 100644 --- a/logisland-framework/pom.xml +++ b/logisland-framework/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 logisland-framework @@ -38,13 +38,6 @@ hdp2.5 - - true - - hdp - 2.5 - - logisland-agent diff --git a/logisland-plugins/logisland-botsearch-plugin/pom.xml b/logisland-plugins/logisland-botsearch-plugin/pom.xml index ab56941ac..cb3c255bc 100644 --- a/logisland-plugins/logisland-botsearch-plugin/pom.xml +++ b/logisland-plugins/logisland-botsearch-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-botsearch-plugin diff --git a/logisland-plugins/logisland-botsearch-plugin/src/main/java/com/hurence/logisland/math/FlowDistanceMeasure.java b/logisland-plugins/logisland-botsearch-plugin/src/main/java/com/hurence/logisland/math/FlowDistanceMeasure.java index 327a48d9d..721b470e8 100755 --- a/logisland-plugins/logisland-botsearch-plugin/src/main/java/com/hurence/logisland/math/FlowDistanceMeasure.java +++ b/logisland-plugins/logisland-botsearch-plugin/src/main/java/com/hurence/logisland/math/FlowDistanceMeasure.java @@ -116,7 +116,7 @@ private static double dn(HttpFlow e1, HttpFlow e2) { * * We define dv(rk, rh) to be equal to the normalized Levenshtein distance * between strings obtained by concatenating the parameter values (e.g., - * 0.12.2US). + * 0.13.0US). */ private static double dv(HttpFlow e1, HttpFlow e2) { List v1 = e1.getUrlQueryValues(); diff --git a/logisland-plugins/logisland-botsearch-plugin/src/test/java/com/hurence/logisland/botsearch/TraceTest.java b/logisland-plugins/logisland-botsearch-plugin/src/test/java/com/hurence/logisland/botsearch/TraceTest.java index eac6486fb..f484be549 100755 --- a/logisland-plugins/logisland-botsearch-plugin/src/test/java/com/hurence/logisland/botsearch/TraceTest.java +++ b/logisland-plugins/logisland-botsearch-plugin/src/test/java/com/hurence/logisland/botsearch/TraceTest.java @@ -37,12 +37,12 @@ private static Trace getSampleTrace() { String[] flows - = {"Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/RightJauge.gif 724 409 false false", - "Thu Jan 02 08:43:40 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/fondJauge.gif 723 402 false false", + = {"Thu Jan 02 08:43:39 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/RightJauge.gif 724 409 false false", + "Thu Jan 02 08:43:40 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/fondJauge.gif 723 402 false false", "Thu Jan 02 08:43:42 CET 2014 GET 10.118.32.164 193.252.23.209 http static1.lecloud.wanadoo.fr 80 /home/fr_FR/20131202100641/img/sprite-icons.pn 495 92518 false false", "Thu Jan 02 08:43:43 CET 2014 GET 10.118.32.164 173.194.66.94 https www.google.fr 443 /complete/search 736 812 false false", - "Thu Jan 02 08:43:45 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/digiposte/archiver-btn.png 736 2179 false false", - "Thu Jan 02 08:43:49 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.12.226/Images/picto_trash.gif 725 544 false false"}; + "Thu Jan 02 08:43:45 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/digiposte/archiver-btn.png 736 2179 false false", + "Thu Jan 02 08:43:49 CET 2014 GET 10.118.32.164 193.251.214.117 http webmail.laposte.net 80 /webmail/fr_FR/Images/Images-2013090.13.026/Images/picto_trash.gif 725 544 false false"}; for (String flowString : flows) { String[] split = flowString.split("\t"); diff --git a/logisland-plugins/logisland-botsearch-plugin/src/test/resources/data/TracesAnalysis_samples.txt b/logisland-plugins/logisland-botsearch-plugin/src/test/resources/data/TracesAnalysis_samples.txt index 0e5774336..e56790b5f 100755 --- a/logisland-plugins/logisland-botsearch-plugin/src/test/resources/data/TracesAnalysis_samples.txt +++ b/logisland-plugins/logisland-botsearch-plugin/src/test/resources/data/TracesAnalysis_samples.txt @@ -75,8 +75,8 @@ 2012-10-19T10:12:00.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 MICROSOFT BITS/6.7 false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:06.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 MICROSOFT BITS/6.7 false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:19.000 GMT 10.112.123.187 CONNECT TCP_MISS_SSL/200 tunnel dl.google.com / 443 - 0 39 MICROSOFT BITS/6.7 false false false false false null 173.194.34.224 9q9hyebw8m76 37.41920471191406 -122.05740356445312 United States 43 null -2012-10-19T10:12:20.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null -2012-10-19T10:12:25.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null +2012-10-19T10.13.00.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null +2012-10-19T10.13.05.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:37.000 GMT 10.112.123.187 CONNECT TCP_MISS_SSL/200 tunnel dl.google.com / 443 - 0 39 - false false false false false null 173.194.34.224 9q9hyebw8m76 37.41920471191406 -122.05740356445312 United States 43 null 2012-10-19T10:17:45.000 GMT 10.112.123.187 POST TCP_DENIED/407 http tools.google.com /service/update2 80 - 0 1942 GOOGLE UPDATE/1.3.21.99,WINHTTP false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:17:51.000 GMT 10.112.123.187 POST TCP_DENIED/407 http tools.google.com /service/update2 80 - 0 472 GOOGLE UPDATE/1.3.21.99,WINHTTP false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null @@ -444,4 +444,4 @@ 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 5222 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talkx.l.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talkx.l.google.com / 5222 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null -2012-10-25T13:04:46.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 \ No newline at end of file +2012-10-25T13:04:46.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 diff --git a/logisland-plugins/logisland-common-logs-plugin/pom.xml b/logisland-plugins/logisland-common-logs-plugin/pom.xml index bac8edb72..3cfcf050a 100644 --- a/logisland-plugins/logisland-common-logs-plugin/pom.xml +++ b/logisland-plugins/logisland-common-logs-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-common-logs-plugin diff --git a/logisland-plugins/logisland-common-logs-plugin/src/main/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLog.java b/logisland-plugins/logisland-common-logs-plugin/src/main/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLog.java index 632383dcf..7ab2a916f 100644 --- a/logisland-plugins/logisland-common-logs-plugin/src/main/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLog.java +++ b/logisland-plugins/logisland-common-logs-plugin/src/main/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLog.java @@ -129,6 +129,9 @@ public Collection process(ProcessContext context, Collection rec // Normalize the map key values (Some special characters like '.' are not possible when indexing in ES) normalizeFields(jsonGitlabLog, null); + + // Explode the params field if any and set its values as first level fields + explodeParams(jsonGitlabLog); // Set every first level fields of the Gitlab log as first level fields of the record for easier processing // in processors following in the current processors stream. @@ -141,7 +144,7 @@ public Collection process(ProcessContext context, Collection rec } return records; } - + /** * Sets the first level fields of the passed Gitlab log as first level fields in the passed Logisland record. * @param gitlabLog Gitlab log. @@ -190,6 +193,101 @@ else if (value instanceof String) } } + private static final String PARAMS = "params"; + private static final String PARAMS_KEY = "key"; + private static final String PARAMS_VALUE = "value"; + private static final String PARAMS_SEP = "_"; + /** + * Explodes the params field (if any) and set its values as first level fields. This helps having simple queries + * in ES once documents are indexed for instance. + * Example: + * + * Input: + * + * "params": [ + * { + * "key": "utf8", + * "value": "✓" + * }, + * { + * "key": "authenticity_token", + * "value": "[FILTERED]" + * }, + * { + * "key": "user", + * "value": { + * "login": "mathieu.rossignol@hurence.com", + * "password": "[FILTERED]", + * "remember_me": "0" + * } + * } + * ], + * + * Output: + * + * "params_utf8": "✓", + * "params_authenticity_token": "[FILTERED]", + * "params_user_login" : "mathieu.rossignol@hurence.com", + * "params_user_password": "[FILTERED]" + * + * @param gitlabLog + */ + private void explodeParams(Map gitlabLog) { + + Object params = gitlabLog.get(PARAMS); + if (params != null) + { + gitlabLog.remove(PARAMS); + + addFlatParams(PARAMS, gitlabLog, params); + } + } + + /** + * See explodeParams + * @param prefix Current prefix tu use for the attributes to add + * @param gitlabLog The Map where to add parsed attributes + * @param params The current params to parse + */ + private void addFlatParams(String prefix, Map gitlabLog, Object params) { + + + if (params == null) + { + // Handle null params values + gitlabLog.put(prefix, null); + return; + } + + if (params instanceof ArrayList) + { + // This is a list of parameters, recall the method for each of them + ArrayList paramsArray = (ArrayList)params; + paramsArray.forEach(param -> addFlatParams(prefix, gitlabLog, param)); + } else if (params instanceof Map) + { + // This is a map, is there a key field ? + Map paramsMap = (Map)params; + Object paramKey = paramsMap.get(PARAMS_KEY); + if (paramKey != null) + { + // There is a key, recall the method with the associated value + String newPrefix = prefix + PARAMS_SEP + paramKey.toString(); + Object paramValue = paramsMap.get(PARAMS_VALUE); + addFlatParams(newPrefix, gitlabLog, paramValue); + } else + { + // There is no key field field. So this is a final map. Explode it by + // recalling the method for eaf internal field + + paramsMap.forEach( (field, value) -> addFlatParams(prefix + PARAMS_SEP + field, gitlabLog, value) ); + } + } else { + // This is a simple type, just add the new field with the passed computed prefix + gitlabLog.put(prefix, params); + } + } + /** * Deeply clones the passed map regarding keys (so that one can modify keys of the original map without changing * the clone). diff --git a/logisland-plugins/logisland-common-logs-plugin/src/test/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLogTest.java b/logisland-plugins/logisland-common-logs-plugin/src/test/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLogTest.java index ee8bda18a..7004e1f3b 100644 --- a/logisland-plugins/logisland-common-logs-plugin/src/test/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLogTest.java +++ b/logisland-plugins/logisland-common-logs-plugin/src/test/java/com/hurence/logisland/processor/commonlogs/gitlab/ParseGitlabLogTest.java @@ -23,6 +23,7 @@ import com.hurence.logisland.util.runner.TestRunners; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNull; import java.util.Arrays; import java.util.List; @@ -39,8 +40,8 @@ public class ParseGitlabLogTest { private static Logger logger = LoggerFactory.getLogger(ParseGitlabLogTest.class); - // Bro conn input event - private static final String GITLAB_BRO_EVENT = + // Gitlab input event + private static final String GITLAB_EVENT = "{" + "\"view\": 94.68," + "\"method\": \"GET\"," + @@ -64,6 +65,45 @@ public class ParseGitlabLogTest { "\"long\": 32345678910" + "}"; + + // Gitlab input event with param + private static final String GITLAB_EVENT_WITH_PARAMS = + "{" + + "\"view\": 94.68," + + "\"method\": \"GET\"," + + "\"path\": \"/dashboard/issues\"," + + "\"params\": [" + + "{" + + "\"key\": \"utf8\"," + + "\"value\": \"✓\"" + + "}," + + "{" + + "\"key\": \"authenticity_token\"," + + "\"value\": \"[FILTERED]\"" + + "}," + + "{" + + "\"key\": \"user\"," + + "\"value\": {" + + "\"login\": \"mathieu.rossignol@hurence.com\"," + + "\"password\": \"[FILTERED]\"," + + "\"remember_me\": \"0\"" + + "}" + + "}," + + "{" + + "\"key\": \"empty\"," + + "\"value\": \"\"" + + "}," + + "{" + + "\"key\": \"null\"," + + "\"value\": null" + + "}," + + "{" + + "\"key\": \"integer\"," + + "\"value\": 7" + + "}" + + "]" + + "}"; + /** * Test fields renaming if deep JSON and also some types */ @@ -71,7 +111,7 @@ public class ParseGitlabLogTest { public void testFakeDeepEvent() { final TestRunner testRunner = TestRunners.newTestRunner(new ParseGitlabLog()); testRunner.assertValid(); - Record record = new StandardRecord("bro_event"); + Record record = new StandardRecord("gitlab_event"); record.setStringField(FieldDictionary.RECORD_VALUE, FAKE_DEEP_EVENT); testRunner.enqueue(record); testRunner.clearQueues(); @@ -126,8 +166,8 @@ public void testFakeDeepEvent() { public void testGitlabLog() { final TestRunner testRunner = TestRunners.newTestRunner(new ParseGitlabLog()); testRunner.assertValid(); - Record record = new StandardRecord("bro_event"); - record.setStringField(FieldDictionary.RECORD_VALUE, GITLAB_BRO_EVENT); + Record record = new StandardRecord("gitlab_event"); + record.setStringField(FieldDictionary.RECORD_VALUE, GITLAB_EVENT); testRunner.enqueue(record); testRunner.clearQueues(); testRunner.run(); @@ -147,4 +187,56 @@ public void testGitlabLog() { out.assertFieldExists("path"); out.assertFieldEquals("path", "/dashboard/issues"); } + + /** + * Test that the special params field as been exploded and replaced with first level fields + */ + @Test + public void testFlatParams() { + final TestRunner testRunner = TestRunners.newTestRunner(new ParseGitlabLog()); + testRunner.assertValid(); + Record record = new StandardRecord("gitlab_event"); + record.setStringField(FieldDictionary.RECORD_VALUE, GITLAB_EVENT_WITH_PARAMS); + testRunner.enqueue(record); + testRunner.clearQueues(); + testRunner.run(); + testRunner.assertAllInputRecordsProcessed(); + testRunner.assertOutputRecordsCount(1); + + MockRecord out = testRunner.getOutputRecords().get(0); + + out.assertFieldExists(FieldDictionary.RECORD_TYPE); + + out.assertFieldExists("view"); + out.assertFieldEquals("view", (float)94.68); + + out.assertFieldExists("method"); + out.assertFieldEquals("method", "GET"); + + out.assertFieldExists("path"); + out.assertFieldEquals("path", "/dashboard/issues"); + + out.assertFieldExists("params_utf8"); + out.assertFieldEquals("params_utf8", "✓"); + + out.assertFieldExists("params_authenticity_token"); + out.assertFieldEquals("params_authenticity_token", "[FILTERED]"); + + out.assertFieldExists("params_user_login"); + out.assertFieldEquals("params_user_login", "mathieu.rossignol@hurence.com"); + + out.assertFieldExists("params_user_password"); + out.assertFieldEquals("params_user_password", "[FILTERED]"); + + out.assertFieldExists("params_empty"); + out.assertFieldEquals("params_empty", ""); + + out.assertFieldExists("params_null"); + out.assertNullField("params_null"); + + out.assertFieldExists("params_integer"); + out.assertFieldEquals("params_integer", 7); + + System.out.println(out); + } } diff --git a/logisland-plugins/logisland-common-processors-plugin/pom.xml b/logisland-plugins/logisland-common-processors-plugin/pom.xml index 9f4fa8b2a..b6f5b528e 100644 --- a/logisland-plugins/logisland-common-processors-plugin/pom.xml +++ b/logisland-plugins/logisland-common-processors-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-common-processors-plugin diff --git a/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/NormalizeFields.java b/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/NormalizeFields.java index 762871fe5..f2a81c45c 100644 --- a/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/NormalizeFields.java +++ b/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/NormalizeFields.java @@ -53,7 +53,7 @@ public class NormalizeFields extends AbstractProcessor { public static final PropertyDescriptor CONFLICT_RESOLUTION_POLICY = new PropertyDescriptor.Builder() .name("conflict.resolution.policy") - .description("waht to do when a field with the same name already exists ?") + .description("what to do when a field with the same name already exists ?") .required(true) .defaultValue(DO_NOTHING.getValue()) .allowableValues(DO_NOTHING, OVERWRITE_EXISTING, KEEP_ONLY_OLD_FIELD, KEEP_BOTH_FIELDS) diff --git a/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/datastore/AbstractDatastoreProcessor.java b/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/datastore/AbstractDatastoreProcessor.java index 97e17aa25..8a4d2598e 100644 --- a/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/datastore/AbstractDatastoreProcessor.java +++ b/logisland-plugins/logisland-common-processors-plugin/src/main/java/com/hurence/logisland/processor/datastore/AbstractDatastoreProcessor.java @@ -45,7 +45,6 @@ public boolean hasControllerService() { @Override public void init(final ProcessContext context) { - logger.info("Datastore client service initialization"); datastoreClientService = context.getPropertyValue(DATASTORE_CLIENT_SERVICE).asControllerService(DatastoreClientService.class); if (datastoreClientService == null) { logger.error("Datastore client service is not initialized!"); diff --git a/logisland-plugins/logisland-common-processors-plugin/src/test/resources/data/TracesAnalysis_samples.txt b/logisland-plugins/logisland-common-processors-plugin/src/test/resources/data/TracesAnalysis_samples.txt index 17da1ef5d..cb32805bf 100644 --- a/logisland-plugins/logisland-common-processors-plugin/src/test/resources/data/TracesAnalysis_samples.txt +++ b/logisland-plugins/logisland-common-processors-plugin/src/test/resources/data/TracesAnalysis_samples.txt @@ -59,8 +59,8 @@ 2012-10-19T10:12:00.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 MICROSOFT BITS/6.7 false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:06.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 MICROSOFT BITS/6.7 false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:19.000 GMT 10.112.123.187 CONNECT TCP_MISS_SSL/200 tunnel dl.google.com / 443 - 0 39 MICROSOFT BITS/6.7 false false false false false null 173.194.34.224 9q9hyebw8m76 37.41920471191406 -122.05740356445312 United States 43 null -2012-10-19T10:12:20.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null -2012-10-19T10:12:25.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null +2012-10-19T10.13.00.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null +2012-10-19T10.13.05.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel dl.google.com / 443 - 0 472 - false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:12:37.000 GMT 10.112.123.187 CONNECT TCP_MISS_SSL/200 tunnel dl.google.com / 443 - 0 39 - false false false false false null 173.194.34.224 9q9hyebw8m76 37.41920471191406 -122.05740356445312 United States 43 null 2012-10-19T10:17:45.000 GMT 10.112.123.187 POST TCP_DENIED/407 http tools.google.com /service/update2 80 - 0 1942 GOOGLE UPDATE/1.3.21.99,WINHTTP false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null 2012-10-19T10:17:51.000 GMT 10.112.123.187 POST TCP_DENIED/407 http tools.google.com /service/update2 80 - 0 472 GOOGLE UPDATE/1.3.21.99,WINHTTP false false false false false null 255.255.255.255 null 0.0 0.0 null 43 null @@ -428,4 +428,4 @@ 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 5222 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talkx.l.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null 2012-10-25T11:34:45.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talkx.l.google.com / 5222 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 null 44 null -2012-10-25T13:04:46.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 \ No newline at end of file +2012-10-25T13:04:46.000 GMT 10.112.123.187 CONNECT TCP_DENIED/407 tunnel talk.google.com / 443 - 0 1942 - false false false false false null 255.255.255.255 null 0.0 0.0 diff --git a/logisland-plugins/logisland-cyber-security-plugin/pom.xml b/logisland-plugins/logisland-cyber-security-plugin/pom.xml index 17c57baf7..66fad828c 100644 --- a/logisland-plugins/logisland-cyber-security-plugin/pom.xml +++ b/logisland-plugins/logisland-cyber-security-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-cyber-security-plugin diff --git a/logisland-plugins/logisland-elasticsearch-plugin/pom.xml b/logisland-plugins/logisland-elasticsearch-plugin/pom.xml index ae682bac5..4cfae185d 100644 --- a/logisland-plugins/logisland-elasticsearch-plugin/pom.xml +++ b/logisland-plugins/logisland-elasticsearch-plugin/pom.xml @@ -23,7 +23,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-elasticsearch-plugin diff --git a/logisland-plugins/logisland-enrichment-plugin/pom.xml b/logisland-plugins/logisland-enrichment-plugin/pom.xml index eaca1c2b5..8ed56d372 100644 --- a/logisland-plugins/logisland-enrichment-plugin/pom.xml +++ b/logisland-plugins/logisland-enrichment-plugin/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-enrichment-plugin diff --git a/logisland-plugins/logisland-excel-plugin/pom.xml b/logisland-plugins/logisland-excel-plugin/pom.xml new file mode 100644 index 000000000..04bf696b2 --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/pom.xml @@ -0,0 +1,65 @@ + + + + 4.0.0 + + + com.hurence.logisland + logisland-plugins + 0.13.0 + + + logisland-excel-plugin + jar + + + 3.17 + + + + + com.hurence.logisland + logisland-api + + + com.hurence.logisland + logisland-utils + + + org.apache.poi + poi + ${poi.version} + + + org.apache.poi + poi-ooxml + ${poi.version} + + + org.slf4j + slf4j-simple + test + + + junit + junit + test + + + diff --git a/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtract.java b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtract.java new file mode 100644 index 000000000..8fb4b8c12 --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtract.java @@ -0,0 +1,273 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package com.hurence.logisland.processor.excel; + +import com.hurence.logisland.annotation.documentation.CapabilityDescription; +import com.hurence.logisland.annotation.documentation.Tags; +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.processor.AbstractProcessor; +import com.hurence.logisland.processor.ProcessContext; +import com.hurence.logisland.processor.ProcessError; +import com.hurence.logisland.record.*; +import com.hurence.logisland.util.stream.io.StreamUtils; +import com.hurence.logisland.validator.ValidationContext; +import com.hurence.logisland.validator.ValidationResult; +import org.apache.commons.io.IOUtils; +import org.apache.poi.openxml4j.exceptions.InvalidFormatException; +import org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException; +import org.apache.poi.ss.usermodel.*; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.util.*; +import java.util.regex.Pattern; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +/** + * Consumes a Microsoft Excel document and converts each spreadsheet row to a {@link Record}. + */ +@Tags({"excel", "processor", "poi"}) +@CapabilityDescription("Consumes a Microsoft Excel document and converts each worksheet's line to a structured " + + "record. The processor is assuming to receive raw excel file as input record.") +public class ExcelExtract extends AbstractProcessor { + + private static final Logger LOGGER = LoggerFactory.getLogger(ExcelExtractProperties.class); + + /** + * The configuration. + */ + private ExcelExtractProperties.Configuration configuration; + + + @Override + public void init(ProcessContext context) { + super.init(context); + configuration = new ExcelExtractProperties.Configuration(context); + LOGGER.info("ExcelExtract successfully initialized"); + } + + + @Override + protected Collection customValidate(ValidationContext context) { + ValidationResult.Builder ret = new ValidationResult.Builder().valid(true); + if (!(context.getPropertyValue(ExcelExtractProperties.FIELD_NAMES).isSet() ^ + context.getPropertyValue(ExcelExtractProperties.HEADER_ROW_NB).isSet())) { + ret.explanation(String.format("You must set exactly one of %s or %s.", + ExcelExtractProperties.FIELD_NAMES.getName(), ExcelExtractProperties.HEADER_ROW_NB.getName())) + .subject(getIdentifier()) + .valid(false); + } + return Collections.singletonList(ret.build()); + } + + @Override + public List getSupportedPropertyDescriptors() { + final List descriptors = new ArrayList<>(); + descriptors.add(ExcelExtractProperties.DESIRED_SHEETS); + descriptors.add(ExcelExtractProperties.COLUMNS_TO_SKIP); + descriptors.add(ExcelExtractProperties.FIELD_NAMES); + descriptors.add(ExcelExtractProperties.ROWS_TO_SKIP); + descriptors.add(ExcelExtractProperties.RECORD_TYPE); + descriptors.add(ExcelExtractProperties.HEADER_ROW_NB); + return Collections.unmodifiableList(descriptors); + } + + @Override + public Collection process(ProcessContext context, Collection records) { + final Collection ret = new ArrayList<>(); + for (Record record : records) { + //Extract source input stream + InputStream is = extractRawContent(record); + //process + ret.addAll(handleExcelStream(is) + //enrich + .map(current -> enrichWithMetadata(current, record)) + //collect and add to global results + .collect(Collectors.toList())); + } + return ret; + } + + + private final Record enrichWithMetadata(Record current, Record source) { + if (source.hasField(Fields.SOURCE_FILE_NAME)) { + current.setField(source.getField(Fields.SOURCE_FILE_NAME)); + } + current.setField(Fields.recordType(configuration.getRecordType())); + return current; + } + + + /** + * Extract the raw byte XLS content from the input record. + * + * @param record + * @return A byte array inputstream (never null). + * @throws IllegalStateException in case of malformed record. + */ + private InputStream extractRawContent(Record record) { + if (!record.hasField(FieldDictionary.RECORD_VALUE)) { + throw new IllegalStateException("Received a record not carrying information on field " + FieldDictionary.RECORD_VALUE); + } + Field field = record.getField(FieldDictionary.RECORD_VALUE); + if (field == null || !FieldType.BYTES.equals(field.getType())) { + throw new IllegalStateException("Unexpected content received. We expect to handle field content with raw byte data."); + } + return new ByteArrayInputStream((byte[]) field.getRawValue()); + } + + /** + * Extract every matching sheet from the raw excel input stream. + * + * @param inputStream an inputstream that will be closed once consumed. + * @return a stream of {@link Record} each containing the stream raw data. + */ + private Stream handleExcelStream(InputStream inputStream) { + List ret = new ArrayList<>(); + try { + try (Workbook workbook = WorkbookFactory.create(inputStream)) { + Iterator iter = workbook.sheetIterator(); + while (iter.hasNext()) { + String sheetName = "unknown"; + List headerNames = null; + + try { + Sheet sheet = iter.next(); + sheetName = sheet.getSheetName(); + if (toBeSkipped(sheetName)) { + LOGGER.info("Skipped sheet {}", sheetName); + continue; + } + LOGGER.info("Extracting sheet {}", sheetName); + int count = 0; + for (Row row : sheet) { + if (row == null) { + continue; + } + if (configuration.getHeaderRowNumber() != null && + configuration.getHeaderRowNumber().equals(row.getRowNum())) { + headerNames = extractFieldNamesFromRow(row); + + } + if (count++ < configuration.getRowsToSkip()) { + continue; + } + Record current = handleRow(row, headerNames); + current.setField(Fields.rowNumber(row.getRowNum())) + .setField(Fields.sheetName(sheetName)); + ret.add(current); + } + + } catch (Exception e) { + LOGGER.error("Unrecoverable exception occurred while processing excel sheet", e); + ret.add(new StandardRecord().addError(ProcessError.RECORD_CONVERSION_ERROR.getName(), + String.format("Unable to parse sheet %s: %s", sheetName, e.getMessage()))); + } + } + } + } catch (InvalidFormatException | NotOfficeXmlFileException ife) { + LOGGER.error("Wrong or unsupported file format.", ife); + ret.add(new StandardRecord().addError(ProcessError.INVALID_FILE_FORMAT_ERROR.getName(), ife.getMessage())); + } catch (IOException ioe) { + LOGGER.error("I/O Exception occurred while processing excel file", ioe); + ret.add(new StandardRecord().addError(ProcessError.RUNTIME_ERROR.getName(), ioe.getMessage())); + + } finally { + IOUtils.closeQuietly(inputStream); + } + return ret.stream(); + } + + + /** + * Handle row content and transform it into a {@link Record} + * + * @param row the {@link Row} + * @return the transformed {@link Record} + */ + private Record handleRow(Row row, List header) { + Record ret = new StandardRecord().setTime(new Date()); + int index = 0; + for (Cell cell : row) { + if (configuration.getFieldNames() != null && index >= configuration.getFieldNames().size()) { + //we've reached the end of mapping. Go to next row. + break; + } + if (configuration.getColumnsToSkip().contains(cell.getColumnIndex())) { + //skip this cell. + continue; + } + String fieldName = header != null ? header.get(cell.getColumnIndex()) : + configuration.getFieldNames().get(index++); + Field field; + // Alternatively, get the value and format it yourself + switch (cell.getCellTypeEnum()) { + case STRING: + field = new Field(fieldName, FieldType.STRING, cell.getStringCellValue()); + break; + case NUMERIC: + if (DateUtil.isCellDateFormatted(cell)) { + field = new Field(fieldName, FieldType.LONG, cell.getDateCellValue().getTime()); + } else { + field = new Field(fieldName, FieldType.DOUBLE, cell.getNumericCellValue()); + } + break; + case BOOLEAN: + field = new Field(fieldName, FieldType.BOOLEAN, cell.getBooleanCellValue()); + break; + case FORMULA: + field = new Field(fieldName, FieldType.STRING, cell.getCellFormula()); + break; + default: + //blank or unknown + field = new Field(fieldName, FieldType.NULL, null); + break; + } + ret.setField(field); + } + return ret; + } + + private List extractFieldNamesFromRow(Row row) { + return StreamUtils.asStream(row.cellIterator()) + .map(Cell::getStringCellValue) + .map(s -> s.replaceAll("\\s+", "_")) + .collect(Collectors.toList()); + } + + + /** + * Looks if a sheet should be extracted or not according to the configuration. + * + * @param sheet the name of the sheet + * @return true if the current sheet has to be skipped. False otherwise. + */ + private boolean toBeSkipped(String sheet) { + for (Pattern pattern : configuration.getSheetsToExtract()) { + if (pattern.matcher(sheet).matches()) { + return false; + } + } + return !configuration.getSheetsToExtract().isEmpty(); + } + + +} diff --git a/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtractProperties.java b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtractProperties.java new file mode 100644 index 000000000..29a107d7f --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/ExcelExtractProperties.java @@ -0,0 +1,238 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.processor.excel; + +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.processor.ProcessContext; +import com.hurence.logisland.validator.StandardValidators; +import org.apache.commons.lang3.StringUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.List; +import java.util.regex.Pattern; +import java.util.stream.Collectors; + +/** + * Common options for {@link ExcelExtract} processor. + */ +public class ExcelExtractProperties implements Serializable { + + public static final PropertyDescriptor RECORD_TYPE = new PropertyDescriptor.Builder() + .name("record.type") + .description("Default type of record") + .required(false) + .defaultValue("excel_record") + .build(); + + public static final PropertyDescriptor DESIRED_SHEETS = new PropertyDescriptor + .Builder().name("sheets") + .displayName("Sheets to Extract") + .description("Comma separated list of Excel document sheet names that should be extracted from the excel document. If this property" + + " is left blank then all of the sheets will be extracted from the Excel document. You can specify regular expressions." + + " Any sheets not specified in this value will be ignored.") + .required(false) + .defaultValue("") + .addValidator(StandardValidators.COMMA_SEPARATED_LIST_VALIDATOR) + .build(); + + /** + * The number of rows to skip. Useful if you want to skip first row (usually the table header). + */ + public static final PropertyDescriptor ROWS_TO_SKIP = new PropertyDescriptor + .Builder().name("skip.rows") + .displayName("Number of Rows to Skip") + .description("The row number of the first row to start processing." + + "Use this to skip over rows of data at the top of your worksheet that are not part of the dataset." + + "Empty rows of data anywhere in the spreadsheet will always be skipped, no matter what this value is set to.") + .required(false) + .defaultValue("0") + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) + .build(); + + /** + * List of column numbers to skip. Empty means include anything. + */ + public static final PropertyDescriptor COLUMNS_TO_SKIP = new PropertyDescriptor + .Builder().name("skip.columns") + .displayName("Columns To Skip") + .description("Comma delimited list of column numbers to skip. Use the columns number and not the letter designation. " + + "Use this to skip over columns anywhere in your worksheet that you don't want extracted as part of the record.") + .required(false) + .defaultValue("") + .addValidator(StandardValidators.COMMA_SEPARATED_LIST_VALIDATOR) + .build(); + + + /** + * Mapping between column extracted and field names in a record. + */ + public static final PropertyDescriptor FIELD_NAMES = new PropertyDescriptor + .Builder().name("field.names") + .displayName("Field names mapping") + .description("The comma separated list representing the names of columns of extracted cells. Order matters!" + + " You should use either field.names either field.row.header but not both together.") + .required(false) + .addValidator(StandardValidators.COMMA_SEPARATED_LIST_VALIDATOR) + .build(); + + /** + * The row number to use to extract field name mapping. + */ + public static final PropertyDescriptor HEADER_ROW_NB = new PropertyDescriptor + .Builder().name("field.row.header") + .displayName("Use a row header as field names mapping") + .description("If set, field names mapping will be extracted from the specified row number." + + " You should use either field.names either field.row.header but not both together.") + .required(false) + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) + .build(); + + public static class Configuration { + + /** + * The list of patterns matching sheets to extract. Empty means everything. + */ + private final List sheetsToExtract; + /** + * List of column numbers to skip. Empty means include anything. + */ + private final List columnsToSkip; + /** + * The number of rows to skip. Useful if you want to skip first row (usually the table header). + */ + private final int rowsToSkip; + + /** + * The prefix to use when defining fields' name in output records. + */ + private final List fieldNames; + + /** + * The record type. + */ + private final String recordType; + + /** + * The row number to use to extract field name mapping. + */ + private final Integer headerRowNumber; + + + /** + * Creates a configuration POJO from the {@link ProcessContext} + * + * @param context the current context. + */ + public Configuration(ProcessContext context) { + sheetsToExtract = Arrays.stream(context.getPropertyValue(DESIRED_SHEETS).asString().split(",")) + .map(String::trim) + .filter(StringUtils::isNotBlank) + .map(Pattern::compile) + .collect(Collectors.toList()); + + if (context.getPropertyValue(HEADER_ROW_NB).isSet()) { + headerRowNumber = context.getPropertyValue(HEADER_ROW_NB).asInteger(); + } else { + headerRowNumber = null; + } + + columnsToSkip = Arrays.stream(context.getPropertyValue(COLUMNS_TO_SKIP).asString().split(",")) + .filter(StringUtils::isNotBlank) + .map(Integer::parseInt) + .collect(Collectors.toList()); + rowsToSkip = context.getPropertyValue(ROWS_TO_SKIP).asInteger(); + + if (context.getPropertyValue(FIELD_NAMES).isSet()) { + fieldNames = Arrays.stream(context.getPropertyValue(FIELD_NAMES).asString().split(",")) + .filter(StringUtils::isNotBlank) + .map(String::trim) + .collect(Collectors.toList()); + } else { + fieldNames = null; + } + recordType = context.getPropertyValue(RECORD_TYPE).asString(); + } + + + /** + * The list of patterns matching sheet to extract. Empty means everything. + * + * @return a never null list. + */ + public List getSheetsToExtract() { + return sheetsToExtract; + } + + + /** + * List of column numbers to skip. Empty means include anything. + * + * @return a never null {@link List} + */ + public List getColumnsToSkip() { + return columnsToSkip; + } + + /** + * The number of rows to skip. Useful if you want to skip first row (usually the table header). + * + * @return + */ + public int getRowsToSkip() { + return rowsToSkip; + } + + + /** + * Mapping between column extracted and field names in a record. + * + * @return + */ + public List getFieldNames() { + return fieldNames; + } + + /** + * The record type. + * + * @return + */ + public String getRecordType() { + return recordType; + } + + + public Integer getHeaderRowNumber() { + return headerRowNumber; + } + + @Override + public String toString() { + return "Configuration{" + + "sheetsToExtract=" + sheetsToExtract + + ", columnsToSkip=" + columnsToSkip + + ", rowsToSkip=" + rowsToSkip + + ", fieldNames=" + fieldNames + + ", recordType='" + recordType + '\'' + + ", headerRowNumber=" + headerRowNumber + + '}'; + } + } + +} diff --git a/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/Fields.java b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/Fields.java new file mode 100644 index 000000000..7368ff682 --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/src/main/java/com/hurence/logisland/processor/excel/Fields.java @@ -0,0 +1,66 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.processor.excel; + +import com.hurence.logisland.record.Field; +import com.hurence.logisland.record.FieldDictionary; +import com.hurence.logisland.record.FieldType; + +/** + * Incapsulated field used by record of this processor. + */ +public class Fields { + + public static final String SHEET_NAME = "excel_extract_sheet_name"; + public static final String SOURCE_FILE_NAME = "source_file_name"; + public static final String ROW_NUMBER = "excel_extract_row_number"; + public static final String RECORD_TYPE = FieldDictionary.RECORD_TYPE; + + /** + * Creates a field for the sheet name. + * + * @param name + * @return + */ + public static Field sheetName(String name) { + return new Field(SHEET_NAME, FieldType.STRING, name); + } + + + /** + * Creates a field for the extract record row number. + * + * @param number + * @return + */ + public static Field rowNumber(long number) { + return new Field(ROW_NUMBER, FieldType.LONG, number); + } + + /** + * Creates a field for the record type. + * + * @param recordType + * @return + */ + public static Field recordType(String recordType) { + return new Field(RECORD_TYPE, FieldType.STRING, recordType); + } + + +} diff --git a/logisland-plugins/logisland-excel-plugin/src/test/java/com/hurence/logisland/processor/excel/ExcelExtractTest.java b/logisland-plugins/logisland-excel-plugin/src/test/java/com/hurence/logisland/processor/excel/ExcelExtractTest.java new file mode 100644 index 000000000..3beda2d87 --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/src/test/java/com/hurence/logisland/processor/excel/ExcelExtractTest.java @@ -0,0 +1,130 @@ +/* + * Copyright (C) 2018 Hurence (support@hurence.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package com.hurence.logisland.processor.excel; + +import com.hurence.logisland.record.FieldDictionary; +import com.hurence.logisland.record.FieldType; +import com.hurence.logisland.util.runner.MockRecord; +import com.hurence.logisland.util.runner.TestRunner; +import com.hurence.logisland.util.runner.TestRunners; +import org.apache.commons.io.IOUtils; +import org.apache.poi.extractor.ExtractorFactory; +import org.junit.Test; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Collection; + +public class ExcelExtractTest { + + private byte[] resolveClassPathResource(String name) throws IOException { + try (InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream(name)) { + return IOUtils.toByteArray(is); + } + } + + private TestRunner initTestRunner(TestRunner testRunner, Integer rowHeaderNumber) { + if (rowHeaderNumber != null) { + testRunner.setProperty(ExcelExtractProperties.HEADER_ROW_NB, rowHeaderNumber.toString()); + } else { + testRunner.setProperty(ExcelExtractProperties.FIELD_NAMES, "Product,Date"); + } + testRunner.setProperty(ExcelExtractProperties.ROWS_TO_SKIP, "1"); + testRunner.setProperty(ExcelExtractProperties.COLUMNS_TO_SKIP, "0,1,3,4,5,6,7,8,9,10,11"); + return testRunner; + } + + private void assertRecordValid(Collection records) { + records.forEach(record -> { + record.assertFieldExists("Product"); + record.assertFieldExists("Date"); + record.assertFieldTypeEquals("Product", FieldType.STRING); + record.assertFieldTypeEquals("Date", FieldType.LONG); + record.assertFieldExists(Fields.SHEET_NAME); + record.assertFieldExists(Fields.ROW_NUMBER); + }); + } + + @Test(expected = AssertionError.class) + public void testConfigurationValidationErrorWithoutFieldMapping() throws Exception { + final TestRunner testRunner = TestRunners.newTestRunner(new ExcelExtract()); + testRunner.assertValid(); + } + + @Test(expected = AssertionError.class) + public void testConfigurationValidationErrorWithBothHeaderAndFieldMappingSet() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), null); + testRunner.setProperty(ExcelExtractProperties.HEADER_ROW_NB, "0"); + testRunner.assertValid(); + } + + @Test() + public void testThrowsExceptionWhenFormatInvalid() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), null); + testRunner.enqueue(FieldDictionary.RECORD_VALUE.getBytes("UTF-8"), + new String("I'm a fake excel file :)").getBytes("UTF-8")); + testRunner.run(); + testRunner.assertOutputErrorCount(1); + } + + + @Test + public void testExtractAllSheets() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), null); + testRunner.enqueue(FieldDictionary.RECORD_VALUE.getBytes("UTF-8"), + resolveClassPathResource("Financial Sample.xlsx")); + testRunner.assertValid(); + testRunner.run(); + testRunner.assertOutputRecordsCount(700); + assertRecordValid(testRunner.getOutputRecords()); + } + + @Test + public void testExtractNothing() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), null); + testRunner.enqueue(FieldDictionary.RECORD_VALUE.getBytes("UTF-8"), + resolveClassPathResource("Financial Sample.xlsx")); + testRunner.setProperty(ExcelExtractProperties.DESIRED_SHEETS, "Sheet2,Sheet3"); + testRunner.assertValid(); + testRunner.run(); + testRunner.assertOutputRecordsCount(0); + } + + @Test + public void testExtractSelected() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), null); + testRunner.enqueue(FieldDictionary.RECORD_VALUE.getBytes("UTF-8"), + resolveClassPathResource("Financial Sample.xlsx")); + testRunner.setProperty(ExcelExtractProperties.DESIRED_SHEETS, "(?i)sheet.*"); + testRunner.assertValid(); + testRunner.run(); + testRunner.assertOutputRecordsCount(700); + assertRecordValid(testRunner.getOutputRecords()); + } + + @Test + public void testExtractWithDynamicMapping() throws Exception { + final TestRunner testRunner = initTestRunner(TestRunners.newTestRunner(new ExcelExtract()), 0); + testRunner.enqueue(FieldDictionary.RECORD_VALUE.getBytes("UTF-8"), + resolveClassPathResource("Financial Sample.xlsx")); + testRunner.assertValid(); + testRunner.run(); + testRunner.assertOutputRecordsCount(700); + assertRecordValid(testRunner.getOutputRecords()); + } +} \ No newline at end of file diff --git a/logisland-plugins/logisland-excel-plugin/src/test/resources/Financial Sample.xlsx b/logisland-plugins/logisland-excel-plugin/src/test/resources/Financial Sample.xlsx new file mode 100644 index 000000000..f049f345b Binary files /dev/null and b/logisland-plugins/logisland-excel-plugin/src/test/resources/Financial Sample.xlsx differ diff --git a/logisland-plugins/logisland-excel-plugin/src/test/resources/log4j.properties b/logisland-plugins/logisland-excel-plugin/src/test/resources/log4j.properties new file mode 100644 index 000000000..6e15ede10 --- /dev/null +++ b/logisland-plugins/logisland-excel-plugin/src/test/resources/log4j.properties @@ -0,0 +1,37 @@ + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set everything to be logged to the console +log4j.rootCategory=WARN, console +log4j.appender.console=org.apache.log4j.ConsoleAppender +log4j.appender.console.target=System.err +log4j.appender.console.layout=org.apache.log4j.PatternLayout +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n + +#org.apache.zookeeper.server.ZooKeeperServer +log4j.logger.org.apache.spark=WARN +log4j.logger.org.apache.spark.scheduler=WARN +log4j.logger.org.apache.spark.history=WARN +log4j.logger.org.spark-project.jetty=WARN +log4j.logger.io.netty=WARN +log4j.logger.org.apache.zookeeper=WARN +log4j.logger.org.apache.hadoop.ipc.Client=WARN +log4j.logger.org.apache.hadoop=WARN +log4j.logger.org.apache.hadoop.ipc.ProtobufRpcEngine=WARN +log4j.logger.parquet.hadoop=WARN +log4j.logger.com.hurence=DEBUG \ No newline at end of file diff --git a/logisland-plugins/logisland-hbase-plugin/pom.xml b/logisland-plugins/logisland-hbase-plugin/pom.xml index 5aa1c0e84..0f5b41f9d 100644 --- a/logisland-plugins/logisland-hbase-plugin/pom.xml +++ b/logisland-plugins/logisland-hbase-plugin/pom.xml @@ -5,7 +5,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-hbase-plugin Support for interacting with HBase diff --git a/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/ObjectSerDe.java b/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/ObjectSerDe.java index d3d054345..62264a028 100644 --- a/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/ObjectSerDe.java +++ b/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/ObjectSerDe.java @@ -20,18 +20,19 @@ import com.hurence.logisland.serializer.Deserializer; import com.hurence.logisland.serializer.SerializationException; import com.hurence.logisland.serializer.Serializer; +import org.apache.commons.io.IOUtils; -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.io.IOException; -import java.io.ObjectInputStream; -import java.io.ObjectOutputStream; -import java.io.OutputStream; +import java.io.*; public class ObjectSerDe implements Serializer, Deserializer { @Override - public Object deserialize(byte[] input) throws DeserializationException, IOException { + public Object deserialize(InputStream is) throws DeserializationException, IOException { + if (is == null) { + return null; + } + + byte[] input = IOUtils.toByteArray(is); if (input == null || input.length == 0) { return null; } @@ -45,7 +46,7 @@ public Object deserialize(byte[] input) throws DeserializationException, IOExcep } @Override - public void serialize(Object value, OutputStream output) throws SerializationException, IOException { + public void serialize(OutputStream output, Object value) throws SerializationException, IOException { try (final ByteArrayOutputStream bOut = new ByteArrayOutputStream(); final ObjectOutputStream objOut = new ObjectOutputStream(bOut)) { objOut.writeObject(value); diff --git a/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/StringSerDe.java b/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/StringSerDe.java index dd4e0e1b2..4ddb9f0e7 100644 --- a/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/StringSerDe.java +++ b/logisland-plugins/logisland-hbase-plugin/src/main/java/com/hurence/logisland/processor/hbase/util/StringSerDe.java @@ -22,15 +22,18 @@ import com.hurence.logisland.serializer.Deserializer; import com.hurence.logisland.serializer.SerializationException; import com.hurence.logisland.serializer.Serializer; +import org.apache.commons.io.IOUtils; import java.io.IOException; +import java.io.InputStream; import java.io.OutputStream; import java.nio.charset.StandardCharsets; public class StringSerDe implements Serializer, Deserializer { @Override - public String deserialize(final byte[] value) throws DeserializationException, IOException { + public String deserialize(final InputStream input) throws DeserializationException, IOException { + byte[] value = IOUtils.toByteArray(input); if ( value == null ) { return null; } @@ -39,7 +42,7 @@ public String deserialize(final byte[] value) throws DeserializationException, I } @Override - public void serialize(final String value, final OutputStream out) throws SerializationException, IOException { + public void serialize(final OutputStream out, final String value) throws SerializationException, IOException { out.write(value.getBytes(StandardCharsets.UTF_8)); } diff --git a/logisland-plugins/logisland-hbase-plugin/src/test/java/com/hurence/logisland/processor/hbase/util/TestObjectSerDe.java b/logisland-plugins/logisland-hbase-plugin/src/test/java/com/hurence/logisland/processor/hbase/util/TestObjectSerDe.java index 6404aaff9..e521bb56c 100644 --- a/logisland-plugins/logisland-hbase-plugin/src/test/java/com/hurence/logisland/processor/hbase/util/TestObjectSerDe.java +++ b/logisland-plugins/logisland-hbase-plugin/src/test/java/com/hurence/logisland/processor/hbase/util/TestObjectSerDe.java @@ -1,12 +1,12 @@ /** * Copyright (C) 2016 Hurence (support@hurence.com) - * + *

* Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

* Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -19,31 +19,28 @@ import org.junit.Assert; import org.junit.Test; -import java.io.ByteArrayInputStream; -import java.io.ByteArrayOutputStream; -import java.io.IOException; -import java.io.ObjectInputStream; -import java.io.ObjectOutputStream; +import java.io.*; public class TestObjectSerDe { - @Test + @Test public void testDeserializeSuccessful() throws IOException { - final ObjectSerDe serDe = new ObjectSerDe(); + final ObjectSerDe serDe = new ObjectSerDe(); - final String myObject = "myObject"; - final ByteArrayOutputStream bOut = new ByteArrayOutputStream(); - final ObjectOutputStream out = new ObjectOutputStream(bOut); - out.writeObject(myObject); + final String myObject = "myObject"; + final ByteArrayOutputStream bOut = new ByteArrayOutputStream(); + final ObjectOutputStream out = new ObjectOutputStream(bOut); + out.writeObject(myObject); - byte[] myObjectBytes = bOut.toByteArray(); - Assert.assertNotNull(myObjectBytes); - Assert.assertTrue(myObjectBytes.length > 0); + byte[] myObjectBytes = bOut.toByteArray(); + Assert.assertNotNull(myObjectBytes); + Assert.assertTrue(myObjectBytes.length > 0); - final Object deserialized = serDe.deserialize(myObjectBytes); - Assert.assertTrue(deserialized instanceof String); - Assert.assertEquals(myObject, deserialized); - } + InputStream input = new ByteArrayInputStream(myObjectBytes); + final Object deserialized = serDe.deserialize(input); + Assert.assertTrue(deserialized instanceof String); + Assert.assertEquals(myObject, deserialized); + } @Test public void testDeserializeNull() throws IOException { @@ -58,7 +55,7 @@ public void testSerialize() throws IOException, ClassNotFoundException { final String myObject = "myObject"; final ObjectSerDe serDe = new ObjectSerDe(); - serDe.serialize(myObject, out); + serDe.serialize(out, myObject); final ByteArrayInputStream bIn = new ByteArrayInputStream(out.toByteArray()); final ObjectInputStream in = new ObjectInputStream(bIn); diff --git a/logisland-plugins/logisland-outlier-detection-plugin/pom.xml b/logisland-plugins/logisland-outlier-detection-plugin/pom.xml index 1b74322fa..8c988ff78 100644 --- a/logisland-plugins/logisland-outlier-detection-plugin/pom.xml +++ b/logisland-plugins/logisland-outlier-detection-plugin/pom.xml @@ -25,7 +25,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 jar diff --git a/logisland-plugins/logisland-querymatcher-plugin/pom.xml b/logisland-plugins/logisland-querymatcher-plugin/pom.xml index 3bba613f0..fb16567c7 100644 --- a/logisland-plugins/logisland-querymatcher-plugin/pom.xml +++ b/logisland-plugins/logisland-querymatcher-plugin/pom.xml @@ -26,7 +26,7 @@ http://www.w3.org/2001/XMLSchema-instance "> com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 jar diff --git a/logisland-plugins/logisland-sampling-plugin/pom.xml b/logisland-plugins/logisland-sampling-plugin/pom.xml index 5c8610378..4d7ffc6ab 100644 --- a/logisland-plugins/logisland-sampling-plugin/pom.xml +++ b/logisland-plugins/logisland-sampling-plugin/pom.xml @@ -26,7 +26,7 @@ http://www.w3.org/2001/XMLSchema-instance "> com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 jar diff --git a/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data1.txt b/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data1.txt index ab32b658c..19fdcbd6c 100644 --- a/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data1.txt +++ b/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data1.txt @@ -76,7 +76,7 @@ 1370551800000,0.10000 1370552100000,0.10400 1370552400000,0.11250 -1370552700000,0.12.20 +1370552700000,0.13.00 1370553000000,0.11450 1370553300000,0.10950 1370553600000,0.10500 @@ -210,14 +210,14 @@ 1370592000000,0.10200 1370592300000,0.10300 1370592600000,0.10950 -1370592900000,0.12.20 -1370593200000,0.12.20 -1370593500000,0.12.20 +1370592900000,0.13.00 +1370593200000,0.13.00 +1370593500000,0.13.00 1370593800000,0.11350 1370594100000,0.11350 -1370594400000,0.12.20 +1370594400000,0.13.00 1370594700000,0.11750 -1370595000000,0.12.20 +1370595000000,0.13.00 1370595300000,0.11850 1370595600000,0.11850 1370595900000,0.12100 @@ -229,26 +229,26 @@ 1370597700000,0.12400 1370598000000,0.12350 1370598300000,0.12050 -1370598600000,0.12.20 +1370598600000,0.13.00 1370598900000,0.11550 1370599200000,0.11550 -1370599500000,0.12.20 -1370599800000,0.12.20 -1370600100000,0.12.20 -1370600400000,0.12.20 -1370600700000,0.12.20 -1370601000000,0.12.20 +1370599500000,0.13.00 +1370599800000,0.13.00 +1370600100000,0.13.00 +1370600400000,0.13.00 +1370600700000,0.13.00 +1370601000000,0.13.00 1370601300000,0.11250 1370601600000,0.11250 -1370601900000,0.12.20 +1370601900000,0.13.00 1370602200000,0.11350 1370602500000,0.11750 -1370602800000,0.12.20 +1370602800000,0.13.00 1370603100000,0.11250 1370603400000,0.10900 1370603700000,0.10800 1370604000000,0.11050 -1370604300000,0.12.20 +1370604300000,0.13.00 1370604600000,0.10900 1370604900000,0.10900 1370605200000,0.10600 @@ -271,9 +271,9 @@ 1370610300000,0.09150 1370610600000,0.09200 1370610900000,0.09150 -1370.12.20000,0.09500 -1370.12.20000,0.09550 -1370.12.20000,0.09300 +1370.13.00000,0.09500 +1370.13.00000,0.09550 +1370.13.00000,0.09300 1370612100000,0.09200 1370612400000,0.09250 1370612700000,0.09400 @@ -438,14 +438,14 @@ 1371144900000,0.11050 1371145200000,0.11250 1371145500000,0.11250 -1371145800000,0.12.20 +1371145800000,0.13.00 1371146100000,0.11250 1371146400000,0.11150 1371146700000,0.11250 -1371147000000,0.12.20 +1371147000000,0.13.00 1371147300000,0.11250 1371147600000,0.11550 -1371147900000,0.12.20 +1371147900000,0.13.00 1371148200000,0.12050 1371148500000,0.12750 1371148800000,0.13500 @@ -495,8 +495,8 @@ 1371162000000,0.11850 1371162300000,0.11850 1371162600000,0.11350 -1371162900000,0.12.20 -1371163200000,0.12.20 +1371162900000,0.13.00 +1371163200000,0.13.00 1371163500000,0.11050 1371163800000,0.10850 1371164100000,0.10650 @@ -611,28 +611,28 @@ 1371196800000,0.10650 1371197100000,0.10550 1371197400000,0.10900 -1371197700000,0.12.20 +1371197700000,0.13.00 1371198000000,0.11450 -1371198300000,0.12.20 +1371198300000,0.13.00 1371198600000,0.11050 1371198900000,0.11050 1371199200000,0.10900 -1371199500000,0.12.20 -1371199800000,0.12.20 -1371200100000,0.12.20 -1371200400000,0.12.20 +1371199500000,0.13.00 +1371199800000,0.13.00 +1371200100000,0.13.00 +1371200400000,0.13.00 1371200700000,0.11150 1371201000000,0.11550 -1371201300000,0.12.20 +1371201300000,0.13.00 1371201600000,0.11550 1371201900000,0.11550 1371202200000,0.11450 1371202500000,0.11550 -1371202800000,0.12.20 +1371202800000,0.13.00 1371203100000,0.11350 -1371203400000,0.12.20 -1371203700000,0.12.20 -1371204000000,0.12.20 +1371203400000,0.13.00 +1371203700000,0.13.00 +1371204000000,0.13.00 1371204300000,0.11150 1371204600000,0.11250 1371204900000,0.11250 @@ -912,30 +912,30 @@ 1371287100000,0.10350 1371287400000,0.10700 1371287700000,0.10950 -1371288000000,0.12.20 -1371288300000,0.12.20 -1371288600000,0.12.20 +1371288000000,0.13.00 +1371288300000,0.13.00 +1371288600000,0.13.00 1371288900000,0.11550 -1371289200000,0.12.20 -1371289500000,0.12.20 -1371289800000,0.12.20 +1371289200000,0.13.00 +1371289500000,0.13.00 +1371289800000,0.13.00 1371290100000,0.11050 1371290400000,0.11350 -1371290700000,0.12.20 -1371291000000,0.12.20 +1371290700000,0.13.00 +1371291000000,0.13.00 1371291300000,0.11650 -1371291600000,0.12.20 +1371291600000,0.13.00 1371291900000,0.11650 -1371292200000,0.12.20 -1371292500000,0.12.20 +1371292200000,0.13.00 +1371292500000,0.13.00 1371292800000,0.11650 -1371293100000,0.12.20 +1371293100000,0.13.00 1371293400000,0.11850 -1371293700000,0.12.20 +1371293700000,0.13.00 1371294000000,0.11850 -1371294300000,0.12.20 +1371294300000,0.13.00 1371294600000,0.11750 -1371294900000,0.12.20 +1371294900000,0.13.00 1371295200000,0.12000 1371295500000,0.12150 1371295800000,0.12000 @@ -944,28 +944,28 @@ 1371296700000,0.11950 1371297000000,0.12100 1371297300000,0.11950 -1371297600000,0.12.20 +1371297600000,0.13.00 1371297900000,0.11350 -1371298200000,0.12.20 +1371298200000,0.13.00 1371298500000,0.11150 -1371298800000,0.12.20 -1371299100000,0.12.20 -1371299400000,0.12.20 -1371299700000,0.12.20 +1371298800000,0.13.00 +1371299100000,0.13.00 +1371299400000,0.13.00 +1371299700000,0.13.00 1371300000000,0.11350 1371300300000,0.11750 -1371300600000,0.12.20 -1371300900000,0.12.20 -1371301200000,0.12.20 -1371301500000,0.12.20 +1371300600000,0.13.00 +1371300900000,0.13.00 +1371301200000,0.13.00 +1371301500000,0.13.00 1371301800000,0.11450 -1371302100000,0.12.20 +1371302100000,0.13.00 1371302400000,0.11150 -1371302700000,0.12.20 +1371302700000,0.13.00 1371303000000,0.11250 -1371303300000,0.12.20 +1371303300000,0.13.00 1371303600000,0.11050 -1371303900000,0.12.20 +1371303900000,0.13.00 1371304200000,0.11050 1371304500000,0.11050 1371304800000,0.10950 @@ -980,7 +980,7 @@ 1371307500000,0.10500 1371307800000,0.10900 1371308100000,0.11150 -1371308400000,0.12.20 +1371308400000,0.13.00 1371308700000,0.10900 1371309000000,0.10650 1371309300000,0.10400 @@ -1216,55 +1216,55 @@ 1371378300000,0.10200 1371378600000,0.10600 1371378900000,0.10900 -1371379200000,0.12.20 -1371379500000,0.12.20 -1371379800000,0.12.20 -1371380100000,0.12.20 -1371380400000,0.12.20 -1371380700000,0.12.20 +1371379200000,0.13.00 +1371379500000,0.13.00 +1371379800000,0.13.00 +1371380100000,0.13.00 +1371380400000,0.13.00 +1371380700000,0.13.00 1371381000000,0.11750 -1371381300000,0.12.20 -1371381600000,0.12.20 -1371381900000,0.12.20 -1371382200000,0.12.20 +1371381300000,0.13.00 +1371381600000,0.13.00 +1371381900000,0.13.00 +1371382200000,0.13.00 1371382500000,0.11650 -1371382800000,0.12.20 -1371383100000,0.12.20 +1371382800000,0.13.00 +1371383100000,0.13.00 1371383400000,0.11450 -1371383700000,0.12.20 -1371384000000,0.12.20 +1371383700000,0.13.00 +1371384000000,0.13.00 1371384300000,0.11250 1371384600000,0.11650 1371384900000,0.11750 1371385200000,0.11750 1371385500000,0.11650 -1371385800000,0.12.20 +1371385800000,0.13.00 1371386100000,0.11650 1371386400000,0.12050 1371386700000,0.12300 1371387000000,0.12200 1371387300000,0.12050 1371387600000,0.12000 -1371387900000,0.12.20 -1371388200000,0.12.20 -1371388500000,0.12.20 +1371387900000,0.13.00 +1371388200000,0.13.00 +1371388500000,0.13.00 1371388800000,0.11450 1371389100000,0.11350 1371389400000,0.11450 1371389700000,0.11550 1371390000000,0.11750 1371390300000,0.11650 -1371390600000,0.12.20 -1371390900000,0.12.20 +1371390600000,0.13.00 +1371390900000,0.13.00 1371391200000,0.11350 -1371391500000,0.12.20 -1371391800000,0.12.20 +1371391500000,0.13.00 +1371391800000,0.13.00 1371392100000,0.11350 -1371392400000,0.12.20 +1371392400000,0.13.00 1371392700000,0.11250 -1371393000000,0.12.20 -1371393300000,0.12.20 -1371393600000,0.12.20 +1371393000000,0.13.00 +1371393300000,0.13.00 +1371393600000,0.13.00 1371393900000,0.11050 1371394200000,0.10800 1371394500000,0.10550 @@ -1563,7 +1563,7 @@ 1371482400000,0.12450 1371482700000,0.12000 1371483000000,0.11550 -1371483300000,0.12.20 +1371483300000,0.13.00 1371483600000,0.11450 1371483900000,0.10950 1371484200000,0.10600 @@ -1765,34 +1765,34 @@ 1371543000000,0.10550 1371543300000,0.10850 1371543600000,0.11150 -1371543900000,0.12.20 +1371543900000,0.13.00 1371544200000,0.11450 -1371544500000,0.12.20 -1371544800000,0.12.20 +1371544500000,0.13.00 +1371544800000,0.13.00 1371545100000,0.11350 -1371545400000,0.12.20 +1371545400000,0.13.00 1371545700000,0.11450 -1371546000000,0.12.20 +1371546000000,0.13.00 1371546300000,0.11450 1371546600000,0.11150 1371546900000,0.10850 1371547200000,0.10750 1371547500000,0.10800 -1371547800000,0.12.20 -1371548100000,0.12.20 +1371547800000,0.13.00 +1371548100000,0.13.00 1371548400000,0.11550 1371548700000,0.11250 -1371549000000,0.12.20 +1371549000000,0.13.00 1371549300000,0.10900 -1371549600000,0.12.20 +1371549600000,0.13.00 1371549900000,0.10950 -1371550200000,0.12.20 -1371550500000,0.12.20 -1371550800000,0.12.20 +1371550200000,0.13.00 +1371550500000,0.13.00 +1371550800000,0.13.00 1371551100000,0.11550 1371551400000,0.11450 1371551700000,0.11250 -1371552000000,0.12.20 +1371552000000,0.13.00 1371552300000,0.10850 1371552600000,0.10550 1371552900000,0.10400 @@ -2046,31 +2046,31 @@ 1371627300000,0.10650 1371627600000,0.10700 1371627900000,0.10950 -1371628200000,0.12.20 +1371628200000,0.13.00 1371628500000,0.11050 -1371628800000,0.12.20 -1371629100000,0.12.20 -1371629400000,0.12.20 -1371629700000,0.12.20 -1371630000000,0.12.20 +1371628800000,0.13.00 +1371629100000,0.13.00 +1371629400000,0.13.00 +1371629700000,0.13.00 +1371630000000,0.13.00 1371630300000,0.10850 1371630600000,0.11150 -1371630900000,0.12.20 +1371630900000,0.13.00 1371631200000,0.11250 -1371631500000,0.12.20 -1371631800000,0.12.20 +1371631500000,0.13.00 +1371631800000,0.13.00 1371632100000,0.10850 1371632400000,0.11050 1371632700000,0.11050 1371633000000,0.11050 -1371633300000,0.12.20 -1371633600000,0.12.20 +1371633300000,0.13.00 +1371633600000,0.13.00 1371633900000,0.11350 1371634200000,0.11150 1371634500000,0.11050 1371634800000,0.10850 -1371635100000,0.12.20 -1371635400000,0.12.20 +1371635100000,0.13.00 +1371635400000,0.13.00 1371635700000,0.10900 1371636000000,0.10750 1371636300000,0.10800 @@ -2338,31 +2338,31 @@ 1371714900000,0.10250 1371715200000,0.10400 1371715500000,0.10700 -1371715800000,0.12.20 +1371715800000,0.13.00 1371716100000,0.10850 1371716400000,0.10800 1371716700000,0.10750 1371717000000,0.10900 1371717300000,0.10800 -1371717600000,0.12.20 -1371717900000,0.12.20 -1371718200000,0.12.20 -1371718500000,0.12.20 -1371718800000,0.12.20 -1371719100000,0.12.20 -1371719400000,0.12.20 -1371719700000,0.12.20 +1371717600000,0.13.00 +1371717900000,0.13.00 +1371718200000,0.13.00 +1371718500000,0.13.00 +1371718800000,0.13.00 +1371719100000,0.13.00 +1371719400000,0.13.00 +1371719700000,0.13.00 1371720000000,0.11150 1371720300000,0.11050 -1371720600000,0.12.20 -1371720900000,0.12.20 +1371720600000,0.13.00 +1371720900000,0.13.00 1371721200000,0.10850 1371721500000,0.10950 -1371721800000,0.12.20 -1371722100000,0.12.20 +1371721800000,0.13.00 +1371722100000,0.13.00 1371722400000,0.11350 1371722700000,0.11550 -1371723000000,0.12.20 +1371723000000,0.13.00 1371723300000,0.10950 1371723600000,0.10800 1371723900000,0.10750 @@ -2422,13 +2422,13 @@ 1371740100000,0.12600 1371740400000,0.12450 1371740700000,0.12100 -1371741000000,0.12.20 +1371741000000,0.13.00 1371741300000,0.11950 1371741600000,0.11850 -1371741900000,0.12.20 -1371742200000,0.12.20 +1371741900000,0.13.00 +1371742200000,0.13.00 1371742500000,0.11350 -1371742800000,0.12.20 +1371742800000,0.13.00 1371743100000,0.12250 1371743400000,0.12750 1371743700000,0.12800 @@ -2438,7 +2438,7 @@ 1371744900000,0.12300 1371745200000,0.12300 1371745500000,0.12150 -1371745800000,0.12.20 +1371745800000,0.13.00 1371746100000,0.11150 1371746400000,0.10900 1371746700000,0.10800 @@ -2532,8 +2532,8 @@ 1371773100000,0.10300 1371773400000,0.10350 1371773700000,0.10600 -1371774000000,0.12.20 -1371774300000,0.12.20 +1371774000000,0.13.00 +1371774300000,0.13.00 1371774600000,0.12300 1371774900000,0.12950 1371775200000,0.13300 @@ -2561,7 +2561,7 @@ 1371781800000,0.12650 1371782100000,0.12450 1371782400000,0.12100 -1371782700000,0.12.20 +1371782700000,0.13.00 1371783000000,0.11750 1371783300000,0.11650 1371783600000,0.11150 @@ -2630,33 +2630,33 @@ 1371802500000,0.10150 1371802800000,0.10000 1371803100000,0.10650 -1371803400000,0.12.20 +1371803400000,0.13.00 1371803700000,0.11250 -1371804000000,0.12.20 -1371804300000,0.12.20 +1371804000000,0.13.00 +1371804300000,0.13.00 1371804600000,0.11250 1371804900000,0.11250 1371805200000,0.11450 -1371805500000,0.12.20 +1371805500000,0.13.00 1371805800000,0.11550 -1371806100000,0.12.20 -1371806400000,0.12.20 +1371806100000,0.13.00 +1371806400000,0.13.00 1371806700000,0.11450 -1371807000000,0.12.20 -1371807300000,0.12.20 -1371807600000,0.12.20 -1371807900000,0.12.20 +1371807000000,0.13.00 +1371807300000,0.13.00 +1371807600000,0.13.00 +1371807900000,0.13.00 1371808200000,0.11350 -1371808500000,0.12.20 -1371808800000,0.12.20 -1371809100000,0.12.20 -1371809400000,0.12.20 +1371808500000,0.13.00 +1371808800000,0.13.00 +1371809100000,0.13.00 +1371809400000,0.13.00 1371809700000,0.11250 1371810000000,0.11450 1371810300000,0.11550 -1371810600000,0.12.20 -1371810900000,0.12.20 -1371811200000,0.12.20 +1371810600000,0.13.00 +1371810900000,0.13.00 +1371811200000,0.13.00 1371811500000,0.10900 1371811800000,0.10750 1371812100000,0.10600 @@ -2713,7 +2713,7 @@ 1371827400000,0.10800 1371827700000,0.11050 1371828000000,0.11150 -1371828300000,0.12.20 +1371828300000,0.13.00 1371828600000,0.11150 1371828900000,0.10850 1371829200000,0.10700 @@ -2735,13 +2735,13 @@ 1371834000000,0.09900 1371834300000,0.10800 1371834600000,0.11050 -1371834900000,0.12.20 -1371835200000,0.12.20 +1371834900000,0.13.00 +1371835200000,0.13.00 1371835500000,0.11150 -1371835800000,0.12.20 -1371836100000,0.12.20 -1371836400000,0.12.20 -1371836700000,0.12.20 +1371835800000,0.13.00 +1371836100000,0.13.00 +1371836400000,0.13.00 +1371836700000,0.13.00 1371837000000,0.12350 1371837300000,0.12700 1371837600000,0.12800 @@ -2754,10 +2754,10 @@ 1371839700000,0.12750 1371840000000,0.12350 1371840300000,0.12250 -1371840600000,0.12.20 -1371840900000,0.12.20 -1371841200000,0.12.20 -1371841500000,0.12.20 +1371840600000,0.13.00 +1371840900000,0.13.00 +1371841200000,0.13.00 +1371841500000,0.13.00 1371841800000,0.11350 1371842100000,0.11150 1371842400000,0.11050 @@ -2935,22 +2935,22 @@ 1371894000000,0.10600 1371894300000,0.10950 1371894600000,0.11050 -1371894900000,0.12.20 +1371894900000,0.13.00 1371895200000,0.11350 -1371895500000,0.12.20 +1371895500000,0.13.00 1371895800000,0.12300 1371896100000,0.11050 1371896400000,0.10900 -1371896700000,0.12.20 -1371897000000,0.12.20 -1371897300000,0.12.20 -1371897600000,0.12.20 -1371897900000,0.12.20 +1371896700000,0.13.00 +1371897000000,0.13.00 +1371897300000,0.13.00 +1371897600000,0.13.00 +1371897900000,0.13.00 1371898200000,0.11750 1371898500000,0.11450 1371898800000,0.11350 -1371899100000,0.12.20 -1371899400000,0.12.20 +1371899100000,0.13.00 +1371899400000,0.13.00 1371899700000,0.11250 1371900000000,0.12100 1371900300000,0.12100 @@ -2964,16 +2964,16 @@ 1371902700000,0.12150 1371903000000,0.12050 1371903300000,0.11950 -1371903600000,0.12.20 +1371903600000,0.13.00 1371903900000,0.11750 -1371904200000,0.12.20 +1371904200000,0.13.00 1371904500000,0.11550 1371904800000,0.12000 -1371905100000,0.12.20 -1371905400000,0.12.20 +1371905100000,0.13.00 +1371905400000,0.13.00 1371905700000,0.11350 1371906000000,0.11250 -1371906300000,0.12.20 +1371906300000,0.13.00 1371906600000,0.11050 1371906900000,0.10950 1371907200000,0.10950 @@ -3226,13 +3226,13 @@ 1371981300000,0.11050 1371981600000,0.11450 1371981900000,0.11350 -1371982200000,0.12.20 -1371982500000,0.12.20 +1371982200000,0.13.00 +1371982500000,0.13.00 1371982800000,0.11250 1371983100000,0.11550 1371983400000,0.11450 1371983700000,0.11250 -1371984000000,0.12.20 +1371984000000,0.13.00 1371984300000,0.11450 1371984600000,0.11350 1371984900000,0.11650 @@ -3246,8 +3246,8 @@ 1371987300000,0.12150 1371987600000,0.12100 1371987900000,0.11950 -1371988200000,0.12.20 -1371988500000,0.12.20 +1371988200000,0.13.00 +1371988500000,0.13.00 1371988800000,0.11850 1371989100000,0.11850 1371989400000,0.11950 @@ -3265,23 +3265,23 @@ 1371993000000,0.12350 1371993300000,0.12400 1371993600000,0.11950 -1371993900000,0.12.20 -1371994200000,0.12.20 +1371993900000,0.13.00 +1371994200000,0.13.00 1371994500000,0.11750 1371994800000,0.12050 1371995100000,0.12200 1371995400000,0.11950 -1371995700000,0.12.20 -1371996000000,0.12.20 +1371995700000,0.13.00 +1371996000000,0.13.00 1371996300000,0.11650 -1371996600000,0.12.20 -1371996900000,0.12.20 -1371997200000,0.12.20 +1371996600000,0.13.00 +1371996900000,0.13.00 +1371997200000,0.13.00 1371997500000,0.11550 -1371997800000,0.12.20 -1371998100000,0.12.20 -1371998400000,0.12.20 -1371998700000,0.12.20 +1371997800000,0.13.00 +1371998100000,0.13.00 +1371998400000,0.13.00 +1371998700000,0.13.00 1371999000000,0.11650 1371999300000,0.10750 1371999600000,0.10450 @@ -3289,7 +3289,7 @@ 1372000200000,0.10150 1372000500000,0.10050 1372000800000,0.10000 -13720.12.2000,0.09900 +13720.13.0000,0.09900 1372001400000,0.10100 1372001700000,0.10200 1372002000000,0.10050 @@ -3389,7 +3389,7 @@ 1372030200000,0.09200 1372030500000,0.09350 1372030800000,0.09400 -13720.12.2000,0.09350 +13720.13.0000,0.09350 1372031400000,0.09350 1372031700000,0.09300 1372032000000,0.09050 @@ -3489,7 +3489,7 @@ 1372060200000,0.09700 1372060500000,0.09600 1372060800000,0.09650 -13720.12.2000,0.09950 +13720.13.0000,0.09950 1372061400000,0.09900 1372061700000,0.09900 1372062000000,0.10000 @@ -3500,23 +3500,23 @@ 1372063500000,0.10950 1372063800000,0.11150 1372064100000,0.11150 -1372064400000,0.12.20 +1372064400000,0.13.00 1372064700000,0.11350 -1372065000000,0.12.20 +1372065000000,0.13.00 1372065300000,0.11250 -1372065600000,0.12.20 -1372065900000,0.12.20 -1372066200000,0.12.20 -1372066500000,0.12.20 +1372065600000,0.13.00 +1372065900000,0.13.00 +1372066200000,0.13.00 +1372066500000,0.13.00 1372066800000,0.11650 -1372067100000,0.12.20 +1372067100000,0.13.00 1372067400000,0.11450 1372067700000,0.11150 -1372068000000,0.12.20 -1372068300000,0.12.20 +1372068000000,0.13.00 +1372068300000,0.13.00 1372068600000,0.10950 1372068900000,0.10850 -1372069200000,0.12.20 +1372069200000,0.13.00 1372069500000,0.10900 1372069800000,0.10850 1372070100000,0.11050 @@ -3589,7 +3589,7 @@ 1372090200000,0.08600 1372090500000,0.08700 1372090800000,0.08500 -13720.12.2000,0.08350 +13720.13.0000,0.08350 1372091400000,0.08250 1372091700000,0.08350 1372092000000,0.08550 @@ -3794,8 +3794,8 @@ 1372151700000,0.10900 1372152000000,0.10950 1372152300000,0.10900 -1372152600000,0.12.20 -1372152900000,0.12.20 +1372152600000,0.13.00 +1372152900000,0.13.00 1372153200000,0.11350 1372153500000,0.11150 1372153800000,0.10800 @@ -4082,7 +4082,7 @@ 1372238100000,0.11250 1372238400000,0.11150 1372238700000,0.10950 -1372239000000,0.12.20 +1372239000000,0.13.00 1372239300000,0.10900 1372239600000,0.10650 1372239900000,0.10450 @@ -4348,33 +4348,33 @@ 1372317900000,0.10550 1372318200000,0.10550 1372318500000,0.10950 -1372318800000,0.12.20 -1372319100000,0.12.20 +1372318800000,0.13.00 +1372319100000,0.13.00 1372319400000,0.11350 -1372319700000,0.12.20 -1372320000000,0.12.20 +1372319700000,0.13.00 +1372320000000,0.13.00 1372320300000,0.10700 1372320600000,0.10700 1372320900000,0.10900 1372321200000,0.10850 -1372321500000,0.12.20 -1372321800000,0.12.20 +1372321500000,0.13.00 +1372321800000,0.13.00 1372322100000,0.11150 -1372322400000,0.12.20 +1372322400000,0.13.00 1372322700000,0.11450 1372323000000,0.11350 -1372323300000,0.12.20 -1372323600000,0.12.20 -1372323900000,0.12.20 -1372324200000,0.12.20 +1372323300000,0.13.00 +1372323600000,0.13.00 +1372323900000,0.13.00 +1372324200000,0.13.00 1372324500000,0.11050 -1372324800000,0.12.20 +1372324800000,0.13.00 1372325100000,0.11050 1372325400000,0.10850 1372325700000,0.10650 1372326000000,0.10700 1372326300000,0.11050 -1372326600000,0.12.20 +1372326600000,0.13.00 1372326900000,0.11250 1372327200000,0.11250 1372327500000,0.10700 @@ -4702,13 +4702,13 @@ 1372424100000,0.12100 1372424400000,0.12100 1372424700000,0.11850 -1372425000000,0.12.20 -1372425300000,0.12.20 +1372425000000,0.13.00 +1372425300000,0.13.00 1372425600000,0.12050 1372425900000,0.12150 1372426200000,0.11750 -1372426500000,0.12.20 -1372426800000,0.12.20 +1372426500000,0.13.00 +1372426800000,0.13.00 1372427100000,0.10900 1372427400000,0.10950 1372427700000,0.10600 @@ -4845,30 +4845,30 @@ 1372467000000,0.10800 1372467300000,0.10950 1372467600000,0.10950 -1372467900000,0.12.20 +1372467900000,0.13.00 1372468200000,0.11750 1372468500000,0.11550 -1372468800000,0.12.20 +1372468800000,0.13.00 1372469100000,0.11750 -1372469400000,0.12.20 -1372469700000,0.12.20 -1372470000000,0.12.20 -1372470300000,0.12.20 -1372470600000,0.12.20 +1372469400000,0.13.00 +1372469700000,0.13.00 +1372470000000,0.13.00 +1372470300000,0.13.00 +1372470600000,0.13.00 1372470900000,0.11950 1372471200000,0.11950 -1372471500000,0.12.20 +1372471500000,0.13.00 1372471800000,0.11550 1372472100000,0.11450 -1372472400000,0.12.20 +1372472400000,0.13.00 1372472700000,0.11450 1372473000000,0.11450 1372473300000,0.11450 -1372473600000,0.12.20 +1372473600000,0.13.00 1372473900000,0.11250 1372474200000,0.11250 -1372474500000,0.12.20 -1372474800000,0.12.20 +1372474500000,0.13.00 +1372474800000,0.13.00 1372475100000,0.13050 1372475400000,0.13600 1372475700000,0.13800 @@ -4915,8 +4915,8 @@ 1372488000000,0.11850 1372488300000,0.11450 1372488600000,0.11250 -1372488900000,0.12.20 -1372489200000,0.12.20 +1372488900000,0.13.00 +1372489200000,0.13.00 1372489500000,0.10450 1372489800000,0.10350 1372490100000,0.10000 @@ -4951,52 +4951,52 @@ 1372498800000,0.10450 1372499100000,0.10550 1372499400000,0.10850 -1372499700000,0.12.20 -1372500000000,0.12.20 -1372500300000,0.12.20 -1372500600000,0.12.20 -1372500900000,0.12.20 -1372501200000,0.12.20 +1372499700000,0.13.00 +1372500000000,0.13.00 +1372500300000,0.13.00 +1372500600000,0.13.00 +1372500900000,0.13.00 +1372501200000,0.13.00 1372501500000,0.11350 1372501800000,0.11550 -1372502100000,0.12.20 +1372502100000,0.13.00 1372502400000,0.11650 1372502700000,0.11750 1372503000000,0.11750 -1372503300000,0.12.20 -1372503600000,0.12.20 +1372503300000,0.13.00 +1372503600000,0.13.00 1372503900000,0.11550 1372504200000,0.11650 -1372504500000,0.12.20 -1372504800000,0.12.20 +1372504500000,0.13.00 +1372504800000,0.13.00 1372505100000,0.11650 1372505400000,0.11850 1372505700000,0.12050 -1372506000000,0.12.20 -1372506300000,0.12.20 -1372506600000,0.12.20 +1372506000000,0.13.00 +1372506300000,0.13.00 +1372506600000,0.13.00 1372506900000,0.11450 1372507200000,0.11450 -1372507500000,0.12.20 -1372507800000,0.12.20 +1372507500000,0.13.00 +1372507800000,0.13.00 1372508100000,0.11450 1372508400000,0.11450 -1372508700000,0.12.20 -1372509000000,0.12.20 -1372509300000,0.12.20 +1372508700000,0.13.00 +1372509000000,0.13.00 +1372509300000,0.13.00 1372509600000,0.11850 1372509900000,0.12050 -1372510200000,0.12.20 -1372510500000,0.12.20 -1372510800000,0.12.20 +1372510200000,0.13.00 +1372510500000,0.13.00 +1372510800000,0.13.00 1372511100000,0.11150 1372511400000,0.10950 -1372511700000,0.12.20 +1372511700000,0.13.00 1372512000000,0.10900 -1372512300000,0.12.20 +1372512300000,0.13.00 1372512600000,0.11150 -1372512900000,0.12.20 -1372513200000,0.12.20 +1372512900000,0.13.00 +1372513200000,0.13.00 1372513500000,0.10900 1372513800000,0.10900 1372514100000,0.10850 @@ -5248,62 +5248,62 @@ 1372587900000,0.10900 1372588200000,0.10950 1372588500000,0.11050 -1372588800000,0.12.20 +1372588800000,0.13.00 1372589100000,0.11250 1372589400000,0.11450 -1372589700000,0.12.20 -1372590000000,0.12.20 +1372589700000,0.13.00 +1372590000000,0.13.00 1372590300000,0.11750 -1372590600000,0.12.20 +1372590600000,0.13.00 1372590900000,0.11850 -1372591200000,0.12.20 -1372591500000,0.12.20 +1372591200000,0.13.00 +1372591500000,0.13.00 1372591800000,0.11650 -1372592100000,0.12.20 +1372592100000,0.13.00 1372592400000,0.12000 1372592700000,0.12250 1372593000000,0.12000 1372593300000,0.12000 1372593600000,0.12000 -1372593900000,0.12.20 +1372593900000,0.13.00 1372594200000,0.11850 1372594500000,0.11850 -1372594800000,0.12.20 -1372595100000,0.12.20 +1372594800000,0.13.00 +1372595100000,0.13.00 1372595400000,0.11850 -1372595700000,0.12.20 +1372595700000,0.13.00 1372596000000,0.11450 1372596300000,0.11450 -1372596600000,0.12.20 +1372596600000,0.13.00 1372596900000,0.11650 1372597200000,0.11650 1372597500000,0.11750 1372597800000,0.12050 1372598100000,0.11450 -1372598400000,0.12.20 -1372598700000,0.12.20 -1372599000000,0.12.20 -1372599300000,0.12.20 -1372599600000,0.12.20 -1372599900000,0.12.20 -1372600200000,0.12.20 +1372598400000,0.13.00 +1372598700000,0.13.00 +1372599000000,0.13.00 +1372599300000,0.13.00 +1372599600000,0.13.00 +1372599900000,0.13.00 +1372600200000,0.13.00 1372600500000,0.11450 1372600800000,0.11450 1372601100000,0.11450 -1372601400000,0.12.20 -1372601700000,0.12.20 -1372602000000,0.12.20 -1372602300000,0.12.20 +1372601400000,0.13.00 +1372601700000,0.13.00 +1372602000000,0.13.00 +1372602300000,0.13.00 1372602600000,0.11250 -1372602900000,0.12.20 -1372603200000,0.12.20 -1372603500000,0.12.20 +1372602900000,0.13.00 +1372603200000,0.13.00 +1372603500000,0.13.00 1372603800000,0.10850 1372604100000,0.10500 1372604400000,0.10500 1372604700000,0.10650 -1372605000000,0.12.20 -1372605300000,0.12.20 +1372605000000,0.13.00 +1372605300000,0.13.00 1372605600000,0.10800 1372605900000,0.10450 1372606200000,0.10250 @@ -5506,4 +5506,4 @@ 1372665300000,0.09950 1372665600000,0.09950 1372665900000,0.09950 -1372666200000,0.12.20 +1372666200000,0.13.00 diff --git a/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data2.txt b/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data2.txt index 682efb01a..8c6769d81 100644 --- a/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data2.txt +++ b/logisland-plugins/logisland-sampling-plugin/src/test/resources/data/raw-data2.txt @@ -75,7 +75,7 @@ 1370551800000,0.10000 1370552100000,0.10400 1370552400000,0.11250 -1370552700000,0.12.20 +1370552700000,0.13.00 1370553000000,0.11450 1370553300000,0.10950 1370553600000,0.10500 @@ -209,14 +209,14 @@ 1370592000000,0.10200 1370592300000,0.10300 1370592600000,0.10950 -1370592900000,0.12.20 -1370593200000,0.12.20 -1370593500000,0.12.20 +1370592900000,0.13.00 +1370593200000,0.13.00 +1370593500000,0.13.00 1370593800000,0.11350 1370594100000,0.11350 -1370594400000,0.12.20 +1370594400000,0.13.00 1370594700000,0.11750 -1370595000000,0.12.20 +1370595000000,0.13.00 1370595300000,0.11850 1370595600000,0.11850 1370595900000,0.12100 @@ -228,26 +228,26 @@ 1370597700000,0.12400 1370598000000,0.12350 1370598300000,0.12050 -1370598600000,0.12.20 +1370598600000,0.13.00 1370598900000,0.11550 1370599200000,0.11550 -1370599500000,0.12.20 -1370599800000,0.12.20 -1370600100000,0.12.20 -1370600400000,0.12.20 -1370600700000,0.12.20 -1370601000000,0.12.20 +1370599500000,0.13.00 +1370599800000,0.13.00 +1370600100000,0.13.00 +1370600400000,0.13.00 +1370600700000,0.13.00 +1370601000000,0.13.00 1370601300000,0.11250 1370601600000,0.11250 -1370601900000,0.12.20 +1370601900000,0.13.00 1370602200000,0.11350 1370602500000,0.11750 -1370602800000,0.12.20 +1370602800000,0.13.00 1370603100000,0.11250 1370603400000,0.10900 1370603700000,0.10800 1370604000000,0.11050 -1370604300000,0.12.20 +1370604300000,0.13.00 1370604600000,0.10900 1370604900000,0.10900 1370605200000,0.10600 @@ -270,9 +270,9 @@ 1370610300000,0.09150 1370610600000,0.09200 1370610900000,0.09150 -1370.12.20000,0.09500 -1370.12.20000,0.09550 -1370.12.20000,0.09300 +1370.13.00000,0.09500 +1370.13.00000,0.09550 +1370.13.00000,0.09300 1370612100000,0.09200 1370612400000,0.09250 1370612700000,0.09400 @@ -514,51 +514,51 @@ 1370683500000,0.10200 1370683800000,0.10250 1370684100000,0.10800 -1370684400000,0.12.20 +1370684400000,0.13.00 1370684700000,0.11950 1370685000000,0.12350 1370685300000,0.12450 1370685600000,0.12150 -1370685900000,0.12.20 -1370686200000,0.12.20 +1370685900000,0.13.00 +1370686200000,0.13.00 1370686500000,0.11750 1370686800000,0.11550 1370687100000,0.11750 1370687400000,0.11750 1370687700000,0.11550 -1370688000000,0.12.20 -1370688300000,0.12.20 +1370688000000,0.13.00 +1370688300000,0.13.00 1370688600000,0.11750 1370688900000,0.11750 1370689200000,0.11650 -1370689500000,0.12.20 -1370689800000,0.12.20 +1370689500000,0.13.00 +1370689800000,0.13.00 1370690100000,0.11750 -1370690400000,0.12.20 -1370690700000,0.12.20 +1370690400000,0.13.00 +1370690700000,0.13.00 1370691000000,0.11850 1370691300000,0.11750 1370691600000,0.11750 -1370691900000,0.12.20 +1370691900000,0.13.00 1370692200000,0.11550 -1370692500000,0.12.20 +1370692500000,0.13.00 1370692800000,0.11450 -1370693100000,0.12.20 -1370693400000,0.12.20 -1370693700000,0.12.20 -1370694000000,0.12.20 +1370693100000,0.13.00 +1370693400000,0.13.00 +1370693700000,0.13.00 +1370694000000,0.13.00 1370694300000,0.11550 -1370694600000,0.12.20 +1370694600000,0.13.00 1370694900000,0.11250 -1370695200000,0.12.20 +1370695200000,0.13.00 1370695500000,0.11650 -1370695800000,0.12.20 +1370695800000,0.13.00 1370696100000,0.11250 1370696400000,0.11150 1370696700000,0.11150 -1370697000000,0.12.20 +1370697000000,0.13.00 1370697300000,0.11050 -1370697600000,0.12.20 +1370697600000,0.13.00 1370697900000,0.10850 1370698200000,0.10750 1370698500000,0.10500 @@ -603,9 +603,9 @@ 1370710200000,0.09600 1370710500000,0.09850 1370710800000,0.09800 -1370.12.20000,0.09550 -1370.12.20000,0.09550 -1370.12.20000,0.09500 +1370.13.00000,0.09550 +1370.13.00000,0.09550 +1370.13.00000,0.09500 1370712000000,0.09550 1370712300000,0.09550 1370712600000,0.09800 @@ -620,8 +620,8 @@ 1370715300000,0.10150 1370715600000,0.10450 1370715900000,0.10800 -1370716200000,0.12.20 -1370716500000,0.12.20 +1370716200000,0.13.00 +1370716500000,0.13.00 1370716800000,0.11450 1370717100000,0.11150 1370717400000,0.11050 @@ -765,9 +765,9 @@ 1370758800000,0.06500 1370759100000,0.07500 1370759400000,0.10850 -1370759700000,0.12.20 -1370760000000,0.12.20 -1370760300000,0.12.20 +1370759700000,0.13.00 +1370760000000,0.13.00 +1370760300000,0.13.00 1370760600000,0.10950 1370760900000,0.10750 1370761200000,0.10450 @@ -801,7 +801,7 @@ 1370769600000,0.09250 1370769900000,0.09950 1370770200000,0.10750 -1370770500000,0.12.20 +1370770500000,0.13.00 1370770800000,0.12450 1370771100000,0.12950 1370771400000,0.13450 @@ -827,7 +827,7 @@ 1370777400000,0.12200 1370777700000,0.12100 1370778000000,0.11850 -1370778300000,0.12.20 +1370778300000,0.13.00 1370778600000,0.12050 1370778900000,0.12050 1370779200000,0.12100 @@ -842,39 +842,39 @@ 1370781900000,0.12650 1370782200000,0.12300 1370782500000,0.11950 -1370782800000,0.12.20 +1370782800000,0.13.00 1370783100000,0.11750 1370783400000,0.11850 1370783700000,0.11750 -1370784000000,0.12.20 +1370784000000,0.13.00 1370784300000,0.11450 -1370784600000,0.12.20 +1370784600000,0.13.00 1370784900000,0.11650 1370785200000,0.11750 1370785500000,0.11750 1370785800000,0.12100 -1370786100000,0.12.20 -1370786400000,0.12.20 -1370786700000,0.12.20 -1370787000000,0.12.20 +1370786100000,0.13.00 +1370786400000,0.13.00 +1370786700000,0.13.00 +1370787000000,0.13.00 1370787300000,0.11150 -1370787600000,0.12.20 -1370787900000,0.12.20 -1370788200000,0.12.20 -1370788500000,0.12.20 -1370788800000,0.12.20 +1370787600000,0.13.00 +1370787900000,0.13.00 +1370788200000,0.13.00 +1370788500000,0.13.00 +1370788800000,0.13.00 1370789100000,0.11550 1370789400000,0.11450 -1370789700000,0.12.20 +1370789700000,0.13.00 1370790000000,0.11450 1370790300000,0.11450 1370790600000,0.11350 -1370790900000,0.12.20 -1370791200000,0.12.20 +1370790900000,0.13.00 +1370791200000,0.13.00 1370791500000,0.11150 -1370791800000,0.12.20 +1370791800000,0.13.00 1370792100000,0.11250 -1370792400000,0.12.20 +1370792400000,0.13.00 1370792700000,0.10850 1370793000000,0.10750 1370793300000,0.10500 @@ -931,15 +931,15 @@ 1370808600000,0.11050 1370808900000,0.10950 1370809200000,0.11050 -1370809500000,0.12.20 +1370809500000,0.13.00 1370809800000,0.10900 1370810100000,0.10600 1370810400000,0.10550 1370810700000,0.10500 -1370.12.20000,0.10600 -1370.12.20000,0.10850 -1370.12.20000,0.10700 -1370.12.20000,0.10750 +1370.13.00000,0.10600 +1370.13.00000,0.10850 +1370.13.00000,0.10700 +1370.13.00000,0.10750 1370812200000,0.10800 1370812500000,0.10750 1370812800000,0.10500 @@ -1065,51 +1065,51 @@ 1370848800000,0.09600 1370849100000,0.09850 1370849400000,0.10100 -1370849700000,0.12.20 -1370850000000,0.12.20 +1370849700000,0.13.00 +1370850000000,0.13.00 1370850300000,0.11950 -1370850600000,0.12.20 -1370850900000,0.12.20 +1370850600000,0.13.00 +1370850900000,0.13.00 1370851200000,0.10900 1370851500000,0.10550 1370851800000,0.10550 1370852100000,0.10800 -1370852400000,0.12.20 -1370852700000,0.12.20 +1370852400000,0.13.00 +1370852700000,0.13.00 1370853000000,0.11850 1370853300000,0.11950 1370853600000,0.11750 1370853900000,0.11650 -1370854200000,0.12.20 +1370854200000,0.13.00 1370854500000,0.11650 -1370854800000,0.12.20 +1370854800000,0.13.00 1370855100000,0.11650 -1370855400000,0.12.20 +1370855400000,0.13.00 1370855700000,0.11550 -1370856000000,0.12.20 -1370856300000,0.12.20 +1370856000000,0.13.00 +1370856300000,0.13.00 1370856600000,0.11550 1370856900000,0.11750 1370857200000,0.11750 1370857500000,0.11850 1370857800000,0.12100 1370858100000,0.12050 -1370858400000,0.12.20 -1370858700000,0.12.20 -1370859000000,0.12.20 -1370859300000,0.12.20 +1370858400000,0.13.00 +1370858700000,0.13.00 +1370859000000,0.13.00 +1370859300000,0.13.00 1370859600000,0.11350 -1370859900000,0.12.20 +1370859900000,0.13.00 1370860200000,0.11250 -1370860500000,0.12.20 +1370860500000,0.13.00 1370860800000,0.10900 1370861100000,0.10850 1370861400000,0.11150 1370861700000,0.11050 -1370862000000,0.12.20 +1370862000000,0.13.00 1370862300000,0.10900 -1370862600000,0.12.20 -1370862900000,0.12.20 +1370862600000,0.13.00 +1370862900000,0.13.00 1370863200000,0.10850 1370863500000,0.10900 1370863800000,0.10750 @@ -1119,7 +1119,7 @@ 1370865000000,0.10100 1370865300000,0.10700 1370865600000,0.11050 -1370865900000,0.12.20 +1370865900000,0.13.00 1370866200000,0.10700 1370866500000,0.10650 1370866800000,0.10500 @@ -1270,9 +1270,9 @@ 1370910300000,0.08650 1370910600000,0.08450 1370910900000,0.08450 -1370.12.20000,0.08450 -1370.12.20000,0.08350 -1370.12.20000,0.08250 +1370.13.00000,0.08450 +1370.13.00000,0.08350 +1370.13.00000,0.08250 1370912100000,0.08100 1370912400000,0.08000 1370912700000,0.07800 @@ -1360,32 +1360,32 @@ 1370937300000,0.10200 1370937600000,0.10800 1370937900000,0.11050 -1370938200000,0.12.20 +1370938200000,0.13.00 1370938500000,0.11350 1370938800000,0.11550 -1370939100000,0.12.20 -1370939400000,0.12.20 +1370939100000,0.13.00 +1370939400000,0.13.00 1370939700000,0.11750 -1370940000000,0.12.20 +1370940000000,0.13.00 1370940300000,0.12000 1370940600000,0.11850 1370940900000,0.11650 -1370941200000,0.12.20 -1370941500000,0.12.20 -1370941800000,0.12.20 +1370941200000,0.13.00 +1370941500000,0.13.00 +1370941800000,0.13.00 1370942100000,0.11350 1370942400000,0.11250 1370942700000,0.11050 -1370943000000,0.12.20 +1370943000000,0.13.00 1370943300000,0.11150 -1370943600000,0.12.20 +1370943600000,0.13.00 1370943900000,0.11450 1370944200000,0.11550 1370944500000,0.11550 -1370944800000,0.12.20 +1370944800000,0.13.00 1370945100000,0.11350 -1370945400000,0.12.20 -1370945700000,0.12.20 +1370945400000,0.13.00 +1370945700000,0.13.00 1370946000000,0.11150 1370946300000,0.10950 1370946600000,0.10750 @@ -1395,10 +1395,10 @@ 1370947800000,0.10800 1370948100000,0.10850 1370948400000,0.10800 -1370948700000,0.12.20 -1370949000000,0.12.20 +1370948700000,0.13.00 +1370949000000,0.13.00 1370949300000,0.11050 -1370949600000,0.12.20 +1370949600000,0.13.00 1370949900000,0.10750 1370950200000,0.10550 1370950500000,0.10550 @@ -1603,7 +1603,7 @@ 1371010200000,0.05100 1371010500000,0.04800 1371010800000,0.04700 -13710.12.2000,0.04850 +13710.13.0000,0.04850 1371011400000,0.04900 1371011700000,0.04800 1371012000000,0.04750 @@ -1649,34 +1649,34 @@ 1371024000000,0.10550 1371024300000,0.10700 1371024600000,0.10850 -1371024900000,0.12.20 +1371024900000,0.13.00 1371025200000,0.11550 1371025500000,0.11950 1371025800000,0.12150 -1371026100000,0.12.20 +1371026100000,0.13.00 1371026400000,0.11750 1371026700000,0.11750 -1371027000000,0.12.20 +1371027000000,0.13.00 1371027300000,0.11750 -1371027600000,0.12.20 +1371027600000,0.13.00 1371027900000,0.11650 -1371028200000,0.12.20 -1371028500000,0.12.20 +1371028200000,0.13.00 +1371028500000,0.13.00 1371028800000,0.11950 -1371029100000,0.12.20 -1371029400000,0.12.20 +1371029100000,0.13.00 +1371029400000,0.13.00 1371029700000,0.11850 -1371030000000,0.12.20 -1371030300000,0.12.20 -1371030600000,0.12.20 -1371030900000,0.12.20 +1371030000000,0.13.00 +1371030300000,0.13.00 +1371030600000,0.13.00 +1371030900000,0.13.00 1371031200000,0.11750 -1371031500000,0.12.20 -1371031800000,0.12.20 -1371032100000,0.12.20 -1371032400000,0.12.20 +1371031500000,0.13.00 +1371031800000,0.13.00 +1371032100000,0.13.00 +1371032400000,0.13.00 1371032700000,0.11550 -1371033000000,0.12.20 +1371033000000,0.13.00 1371033300000,0.11250 1371033600000,0.10900 1371033900000,0.10500 @@ -1703,7 +1703,7 @@ 1371040200000,0.09900 1371040500000,0.09750 1371040800000,0.09700 -13710.12.2000,0.09700 +13710.13.0000,0.09700 1371041400000,0.09650 1371041700000,0.09800 1371042000000,0.09550 @@ -1803,7 +1803,7 @@ 1371070200000,0.10050 1371070500000,0.09900 1371070800000,0.09850 -13710.12.2000,0.09800 +13710.13.0000,0.09800 1371071400000,0.09800 1371071700000,0.09600 1371072000000,0.09450 @@ -1931,42 +1931,42 @@ 1371108600000,0.10450 1371108900000,0.10750 1371109200000,0.10850 -1371109500000,0.12.20 -1371109800000,0.12.20 +1371109500000,0.13.00 +1371109800000,0.13.00 1371110100000,0.10800 1371110400000,0.10700 1371110700000,0.11050 1371111000000,0.11250 -1371111300000,0.12.20 +1371111300000,0.13.00 1371111600000,0.11050 -1371111900000,0.12.20 -1371112200000,0.12.20 +1371111900000,0.13.00 +1371112200000,0.13.00 1371112500000,0.12150 1371112800000,0.12000 1371113100000,0.11950 -1371113400000,0.12.20 -1371113700000,0.12.20 +1371113400000,0.13.00 +1371113700000,0.13.00 1371114000000,0.11850 1371114300000,0.12150 1371114600000,0.12150 1371114900000,0.12050 1371115200000,0.11950 -1371115500000,0.12.20 -1371115800000,0.12.20 +1371115500000,0.13.00 +1371115800000,0.13.00 1371116100000,0.11450 -1371116400000,0.12.20 +1371116400000,0.13.00 1371116700000,0.11350 -1371117000000,0.12.20 +1371117000000,0.13.00 1371117300000,0.11150 1371117600000,0.11350 1371117900000,0.11650 1371118200000,0.11550 1371118500000,0.11450 -1371118800000,0.12.20 -1371119100000,0.12.20 +1371118800000,0.13.00 +1371119100000,0.13.00 1371119400000,0.11350 1371119700000,0.11350 -1371120000000,0.12.20 +1371120000000,0.13.00 1371120300000,0.10900 1371120600000,0.10950 1371120900000,0.10950 @@ -1978,8 +1978,8 @@ 1371122700000,0.10500 1371123000000,0.10500 1371123300000,0.10800 -1371123600000,0.12.20 -1371123900000,0.12.20 +1371123600000,0.13.00 +1371123900000,0.13.00 1371124200000,0.10950 1371124500000,0.10650 1371124800000,0.10200 @@ -2052,14 +2052,14 @@ 1371144900000,0.11050 1371145200000,0.11250 1371145500000,0.11250 -1371145800000,0.12.20 +1371145800000,0.13.00 1371146100000,0.11250 1371146400000,0.11150 1371146700000,0.11250 -1371147000000,0.12.20 +1371147000000,0.13.00 1371147300000,0.11250 1371147600000,0.11550 -1371147900000,0.12.20 +1371147900000,0.13.00 1371148200000,0.12050 1371148500000,0.12750 1371148800000,0.13500 @@ -2109,8 +2109,8 @@ 1371162000000,0.11850 1371162300000,0.11850 1371162600000,0.11350 -1371162900000,0.12.20 -1371163200000,0.12.20 +1371162900000,0.13.00 +1371163200000,0.13.00 1371163500000,0.11050 1371163800000,0.10850 1371164100000,0.10650 @@ -2225,28 +2225,28 @@ 1371196800000,0.10650 1371197100000,0.10550 1371197400000,0.10900 -1371197700000,0.12.20 +1371197700000,0.13.00 1371198000000,0.11450 -1371198300000,0.12.20 +1371198300000,0.13.00 1371198600000,0.11050 1371198900000,0.11050 1371199200000,0.10900 -1371199500000,0.12.20 -1371199800000,0.12.20 -1371200100000,0.12.20 -1371200400000,0.12.20 +1371199500000,0.13.00 +1371199800000,0.13.00 +1371200100000,0.13.00 +1371200400000,0.13.00 1371200700000,0.11150 1371201000000,0.11550 -1371201300000,0.12.20 +1371201300000,0.13.00 1371201600000,0.11550 1371201900000,0.11550 1371202200000,0.11450 1371202500000,0.11550 -1371202800000,0.12.20 +1371202800000,0.13.00 1371203100000,0.11350 -1371203400000,0.12.20 -1371203700000,0.12.20 -1371204000000,0.12.20 +1371203400000,0.13.00 +1371203700000,0.13.00 +1371204000000,0.13.00 1371204300000,0.11150 1371204600000,0.11250 1371204900000,0.11250 @@ -2526,30 +2526,30 @@ 1371287100000,0.10350 1371287400000,0.10700 1371287700000,0.10950 -1371288000000,0.12.20 -1371288300000,0.12.20 -1371288600000,0.12.20 +1371288000000,0.13.00 +1371288300000,0.13.00 +1371288600000,0.13.00 1371288900000,0.11550 -1371289200000,0.12.20 -1371289500000,0.12.20 -1371289800000,0.12.20 +1371289200000,0.13.00 +1371289500000,0.13.00 +1371289800000,0.13.00 1371290100000,0.11050 1371290400000,0.11350 -1371290700000,0.12.20 -1371291000000,0.12.20 +1371290700000,0.13.00 +1371291000000,0.13.00 1371291300000,0.11650 -1371291600000,0.12.20 +1371291600000,0.13.00 1371291900000,0.11650 -1371292200000,0.12.20 -1371292500000,0.12.20 +1371292200000,0.13.00 +1371292500000,0.13.00 1371292800000,0.11650 -1371293100000,0.12.20 +1371293100000,0.13.00 1371293400000,0.11850 -1371293700000,0.12.20 +1371293700000,0.13.00 1371294000000,0.11850 -1371294300000,0.12.20 +1371294300000,0.13.00 1371294600000,0.11750 -1371294900000,0.12.20 +1371294900000,0.13.00 1371295200000,0.12000 1371295500000,0.12150 1371295800000,0.12000 @@ -2558,28 +2558,28 @@ 1371296700000,0.11950 1371297000000,0.12100 1371297300000,0.11950 -1371297600000,0.12.20 +1371297600000,0.13.00 1371297900000,0.11350 -1371298200000,0.12.20 +1371298200000,0.13.00 1371298500000,0.11150 -1371298800000,0.12.20 -1371299100000,0.12.20 -1371299400000,0.12.20 -1371299700000,0.12.20 +1371298800000,0.13.00 +1371299100000,0.13.00 +1371299400000,0.13.00 +1371299700000,0.13.00 1371300000000,0.11350 1371300300000,0.11750 -1371300600000,0.12.20 -1371300900000,0.12.20 -1371301200000,0.12.20 -1371301500000,0.12.20 +1371300600000,0.13.00 +1371300900000,0.13.00 +1371301200000,0.13.00 +1371301500000,0.13.00 1371301800000,0.11450 -1371302100000,0.12.20 +1371302100000,0.13.00 1371302400000,0.11150 -1371302700000,0.12.20 +1371302700000,0.13.00 1371303000000,0.11250 -1371303300000,0.12.20 +1371303300000,0.13.00 1371303600000,0.11050 -1371303900000,0.12.20 +1371303900000,0.13.00 1371304200000,0.11050 1371304500000,0.11050 1371304800000,0.10950 @@ -2594,7 +2594,7 @@ 1371307500000,0.10500 1371307800000,0.10900 1371308100000,0.11150 -1371308400000,0.12.20 +1371308400000,0.13.00 1371308700000,0.10900 1371309000000,0.10650 1371309300000,0.10400 @@ -2830,55 +2830,55 @@ 1371378300000,0.10200 1371378600000,0.10600 1371378900000,0.10900 -1371379200000,0.12.20 -1371379500000,0.12.20 -1371379800000,0.12.20 -1371380100000,0.12.20 -1371380400000,0.12.20 -1371380700000,0.12.20 +1371379200000,0.13.00 +1371379500000,0.13.00 +1371379800000,0.13.00 +1371380100000,0.13.00 +1371380400000,0.13.00 +1371380700000,0.13.00 1371381000000,0.11750 -1371381300000,0.12.20 -1371381600000,0.12.20 -1371381900000,0.12.20 -1371382200000,0.12.20 +1371381300000,0.13.00 +1371381600000,0.13.00 +1371381900000,0.13.00 +1371382200000,0.13.00 1371382500000,0.11650 -1371382800000,0.12.20 -1371383100000,0.12.20 +1371382800000,0.13.00 +1371383100000,0.13.00 1371383400000,0.11450 -1371383700000,0.12.20 -1371384000000,0.12.20 +1371383700000,0.13.00 +1371384000000,0.13.00 1371384300000,0.11250 1371384600000,0.11650 1371384900000,0.11750 1371385200000,0.11750 1371385500000,0.11650 -1371385800000,0.12.20 +1371385800000,0.13.00 1371386100000,0.11650 1371386400000,0.12050 1371386700000,0.12300 1371387000000,0.12200 1371387300000,0.12050 1371387600000,0.12000 -1371387900000,0.12.20 -1371388200000,0.12.20 -1371388500000,0.12.20 +1371387900000,0.13.00 +1371388200000,0.13.00 +1371388500000,0.13.00 1371388800000,0.11450 1371389100000,0.11350 1371389400000,0.11450 1371389700000,0.11550 1371390000000,0.11750 1371390300000,0.11650 -1371390600000,0.12.20 -1371390900000,0.12.20 +1371390600000,0.13.00 +1371390900000,0.13.00 1371391200000,0.11350 -1371391500000,0.12.20 -1371391800000,0.12.20 +1371391500000,0.13.00 +1371391800000,0.13.00 1371392100000,0.11350 -1371392400000,0.12.20 +1371392400000,0.13.00 1371392700000,0.11250 -1371393000000,0.12.20 -1371393300000,0.12.20 -1371393600000,0.12.20 +1371393000000,0.13.00 +1371393300000,0.13.00 +1371393600000,0.13.00 1371393900000,0.11050 1371394200000,0.10800 1371394500000,0.10550 @@ -3177,7 +3177,7 @@ 1371482400000,0.12450 1371482700000,0.12000 1371483000000,0.11550 -1371483300000,0.12.20 +1371483300000,0.13.00 1371483600000,0.11450 1371483900000,0.10950 1371484200000,0.10600 @@ -3379,34 +3379,34 @@ 1371543000000,0.10550 1371543300000,0.10850 1371543600000,0.11150 -1371543900000,0.12.20 +1371543900000,0.13.00 1371544200000,0.11450 -1371544500000,0.12.20 -1371544800000,0.12.20 +1371544500000,0.13.00 +1371544800000,0.13.00 1371545100000,0.11350 -1371545400000,0.12.20 +1371545400000,0.13.00 1371545700000,0.11450 -1371546000000,0.12.20 +1371546000000,0.13.00 1371546300000,0.11450 1371546600000,0.11150 1371546900000,0.10850 1371547200000,0.10750 1371547500000,0.10800 -1371547800000,0.12.20 -1371548100000,0.12.20 +1371547800000,0.13.00 +1371548100000,0.13.00 1371548400000,0.11550 1371548700000,0.11250 -1371549000000,0.12.20 +1371549000000,0.13.00 1371549300000,0.10900 -1371549600000,0.12.20 +1371549600000,0.13.00 1371549900000,0.10950 -1371550200000,0.12.20 -1371550500000,0.12.20 -1371550800000,0.12.20 +1371550200000,0.13.00 +1371550500000,0.13.00 +1371550800000,0.13.00 1371551100000,0.11550 1371551400000,0.11450 1371551700000,0.11250 -1371552000000,0.12.20 +1371552000000,0.13.00 1371552300000,0.10850 1371552600000,0.10550 1371552900000,0.10400 @@ -3660,31 +3660,31 @@ 1371627300000,0.10650 1371627600000,0.10700 1371627900000,0.10950 -1371628200000,0.12.20 +1371628200000,0.13.00 1371628500000,0.11050 -1371628800000,0.12.20 -1371629100000,0.12.20 -1371629400000,0.12.20 -1371629700000,0.12.20 -1371630000000,0.12.20 +1371628800000,0.13.00 +1371629100000,0.13.00 +1371629400000,0.13.00 +1371629700000,0.13.00 +1371630000000,0.13.00 1371630300000,0.10850 1371630600000,0.11150 -1371630900000,0.12.20 +1371630900000,0.13.00 1371631200000,0.11250 -1371631500000,0.12.20 -1371631800000,0.12.20 +1371631500000,0.13.00 +1371631800000,0.13.00 1371632100000,0.10850 1371632400000,0.11050 1371632700000,0.11050 1371633000000,0.11050 -1371633300000,0.12.20 -1371633600000,0.12.20 +1371633300000,0.13.00 +1371633600000,0.13.00 1371633900000,0.11350 1371634200000,0.11150 1371634500000,0.11050 1371634800000,0.10850 -1371635100000,0.12.20 -1371635400000,0.12.20 +1371635100000,0.13.00 +1371635400000,0.13.00 1371635700000,0.10900 1371636000000,0.10750 1371636300000,0.10800 @@ -3952,31 +3952,31 @@ 1371714900000,0.10250 1371715200000,0.10400 1371715500000,0.10700 -1371715800000,0.12.20 +1371715800000,0.13.00 1371716100000,0.10850 1371716400000,0.10800 1371716700000,0.10750 1371717000000,0.10900 1371717300000,0.10800 -1371717600000,0.12.20 -1371717900000,0.12.20 -1371718200000,0.12.20 -1371718500000,0.12.20 -1371718800000,0.12.20 -1371719100000,0.12.20 -1371719400000,0.12.20 -1371719700000,0.12.20 +1371717600000,0.13.00 +1371717900000,0.13.00 +1371718200000,0.13.00 +1371718500000,0.13.00 +1371718800000,0.13.00 +1371719100000,0.13.00 +1371719400000,0.13.00 +1371719700000,0.13.00 1371720000000,0.11150 1371720300000,0.11050 -1371720600000,0.12.20 -1371720900000,0.12.20 +1371720600000,0.13.00 +1371720900000,0.13.00 1371721200000,0.10850 1371721500000,0.10950 -1371721800000,0.12.20 -1371722100000,0.12.20 +1371721800000,0.13.00 +1371722100000,0.13.00 1371722400000,0.11350 1371722700000,0.11550 -1371723000000,0.12.20 +1371723000000,0.13.00 1371723300000,0.10950 1371723600000,0.10800 1371723900000,0.10750 @@ -4036,13 +4036,13 @@ 1371740100000,0.12600 1371740400000,0.12450 1371740700000,0.12100 -1371741000000,0.12.20 +1371741000000,0.13.00 1371741300000,0.11950 1371741600000,0.11850 -1371741900000,0.12.20 -1371742200000,0.12.20 +1371741900000,0.13.00 +1371742200000,0.13.00 1371742500000,0.11350 -1371742800000,0.12.20 +1371742800000,0.13.00 1371743100000,0.12250 1371743400000,0.12750 1371743700000,0.12800 @@ -4052,7 +4052,7 @@ 1371744900000,0.12300 1371745200000,0.12300 1371745500000,0.12150 -1371745800000,0.12.20 +1371745800000,0.13.00 1371746100000,0.11150 1371746400000,0.10900 1371746700000,0.10800 @@ -4146,8 +4146,8 @@ 1371773100000,0.10300 1371773400000,0.10350 1371773700000,0.10600 -1371774000000,0.12.20 -1371774300000,0.12.20 +1371774000000,0.13.00 +1371774300000,0.13.00 1371774600000,0.12300 1371774900000,0.12950 1371775200000,0.13300 @@ -4175,7 +4175,7 @@ 1371781800000,0.12650 1371782100000,0.12450 1371782400000,0.12100 -1371782700000,0.12.20 +1371782700000,0.13.00 1371783000000,0.11750 1371783300000,0.11650 1371783600000,0.11150 @@ -4244,33 +4244,33 @@ 1371802500000,0.10150 1371802800000,0.10000 1371803100000,0.10650 -1371803400000,0.12.20 +1371803400000,0.13.00 1371803700000,0.11250 -1371804000000,0.12.20 -1371804300000,0.12.20 +1371804000000,0.13.00 +1371804300000,0.13.00 1371804600000,0.11250 1371804900000,0.11250 1371805200000,0.11450 -1371805500000,0.12.20 +1371805500000,0.13.00 1371805800000,0.11550 -1371806100000,0.12.20 -1371806400000,0.12.20 +1371806100000,0.13.00 +1371806400000,0.13.00 1371806700000,0.11450 -1371807000000,0.12.20 -1371807300000,0.12.20 -1371807600000,0.12.20 -1371807900000,0.12.20 +1371807000000,0.13.00 +1371807300000,0.13.00 +1371807600000,0.13.00 +1371807900000,0.13.00 1371808200000,0.11350 -1371808500000,0.12.20 -1371808800000,0.12.20 -1371809100000,0.12.20 -1371809400000,0.12.20 +1371808500000,0.13.00 +1371808800000,0.13.00 +1371809100000,0.13.00 +1371809400000,0.13.00 1371809700000,0.11250 1371810000000,0.11450 1371810300000,0.11550 -1371810600000,0.12.20 -1371810900000,0.12.20 -1371811200000,0.12.20 +1371810600000,0.13.00 +1371810900000,0.13.00 +1371811200000,0.13.00 1371811500000,0.10900 1371811800000,0.10750 1371812100000,0.10600 @@ -4327,7 +4327,7 @@ 1371827400000,0.10800 1371827700000,0.11050 1371828000000,0.11150 -1371828300000,0.12.20 +1371828300000,0.13.00 1371828600000,0.11150 1371828900000,0.10850 1371829200000,0.10700 @@ -4349,13 +4349,13 @@ 1371834000000,0.09900 1371834300000,0.10800 1371834600000,0.11050 -1371834900000,0.12.20 -1371835200000,0.12.20 +1371834900000,0.13.00 +1371835200000,0.13.00 1371835500000,0.11150 -1371835800000,0.12.20 -1371836100000,0.12.20 -1371836400000,0.12.20 -1371836700000,0.12.20 +1371835800000,0.13.00 +1371836100000,0.13.00 +1371836400000,0.13.00 +1371836700000,0.13.00 1371837000000,0.12350 1371837300000,0.12700 1371837600000,0.12800 @@ -4368,10 +4368,10 @@ 1371839700000,0.12750 1371840000000,0.12350 1371840300000,0.12250 -1371840600000,0.12.20 -1371840900000,0.12.20 -1371841200000,0.12.20 -1371841500000,0.12.20 +1371840600000,0.13.00 +1371840900000,0.13.00 +1371841200000,0.13.00 +1371841500000,0.13.00 1371841800000,0.11350 1371842100000,0.11150 1371842400000,0.11050 @@ -4549,22 +4549,22 @@ 1371894000000,0.10600 1371894300000,0.10950 1371894600000,0.11050 -1371894900000,0.12.20 +1371894900000,0.13.00 1371895200000,0.11350 -1371895500000,0.12.20 +1371895500000,0.13.00 1371895800000,0.12300 1371896100000,0.11050 1371896400000,0.10900 -1371896700000,0.12.20 -1371897000000,0.12.20 -1371897300000,0.12.20 -1371897600000,0.12.20 -1371897900000,0.12.20 +1371896700000,0.13.00 +1371897000000,0.13.00 +1371897300000,0.13.00 +1371897600000,0.13.00 +1371897900000,0.13.00 1371898200000,0.11750 1371898500000,0.11450 1371898800000,0.11350 -1371899100000,0.12.20 -1371899400000,0.12.20 +1371899100000,0.13.00 +1371899400000,0.13.00 1371899700000,0.11250 1371900000000,0.12100 1371900300000,0.12100 @@ -4578,16 +4578,16 @@ 1371902700000,0.12150 1371903000000,0.12050 1371903300000,0.11950 -1371903600000,0.12.20 +1371903600000,0.13.00 1371903900000,0.11750 -1371904200000,0.12.20 +1371904200000,0.13.00 1371904500000,0.11550 1371904800000,0.12000 -1371905100000,0.12.20 -1371905400000,0.12.20 +1371905100000,0.13.00 +1371905400000,0.13.00 1371905700000,0.11350 1371906000000,0.11250 -1371906300000,0.12.20 +1371906300000,0.13.00 1371906600000,0.11050 1371906900000,0.10950 1371907200000,0.10950 @@ -4840,13 +4840,13 @@ 1371981300000,0.11050 1371981600000,0.11450 1371981900000,0.11350 -1371982200000,0.12.20 -1371982500000,0.12.20 +1371982200000,0.13.00 +1371982500000,0.13.00 1371982800000,0.11250 1371983100000,0.11550 1371983400000,0.11450 1371983700000,0.11250 -1371984000000,0.12.20 +1371984000000,0.13.00 1371984300000,0.11450 1371984600000,0.11350 1371984900000,0.11650 @@ -4860,8 +4860,8 @@ 1371987300000,0.12150 1371987600000,0.12100 1371987900000,0.11950 -1371988200000,0.12.20 -1371988500000,0.12.20 +1371988200000,0.13.00 +1371988500000,0.13.00 1371988800000,0.11850 1371989100000,0.11850 1371989400000,0.11950 @@ -4879,23 +4879,23 @@ 1371993000000,0.12350 1371993300000,0.12400 1371993600000,0.11950 -1371993900000,0.12.20 -1371994200000,0.12.20 +1371993900000,0.13.00 +1371994200000,0.13.00 1371994500000,0.11750 1371994800000,0.12050 1371995100000,0.12200 1371995400000,0.11950 -1371995700000,0.12.20 -1371996000000,0.12.20 +1371995700000,0.13.00 +1371996000000,0.13.00 1371996300000,0.11650 -1371996600000,0.12.20 -1371996900000,0.12.20 -1371997200000,0.12.20 +1371996600000,0.13.00 +1371996900000,0.13.00 +1371997200000,0.13.00 1371997500000,0.11550 -1371997800000,0.12.20 -1371998100000,0.12.20 -1371998400000,0.12.20 -1371998700000,0.12.20 +1371997800000,0.13.00 +1371998100000,0.13.00 +1371998400000,0.13.00 +1371998700000,0.13.00 1371999000000,0.11650 1371999300000,0.10750 1371999600000,0.10450 @@ -4903,7 +4903,7 @@ 1372000200000,0.10150 1372000500000,0.10050 1372000800000,0.10000 -13720.12.2000,0.09900 +13720.13.0000,0.09900 1372001400000,0.10100 1372001700000,0.10200 1372002000000,0.10050 @@ -5003,7 +5003,7 @@ 1372030200000,0.09200 1372030500000,0.09350 1372030800000,0.09400 -13720.12.2000,0.09350 +13720.13.0000,0.09350 1372031400000,0.09350 1372031700000,0.09300 1372032000000,0.09050 @@ -5103,7 +5103,7 @@ 1372060200000,0.09700 1372060500000,0.09600 1372060800000,0.09650 -13720.12.2000,0.09950 +13720.13.0000,0.09950 1372061400000,0.09900 1372061700000,0.09900 1372062000000,0.10000 @@ -5114,23 +5114,23 @@ 1372063500000,0.10950 1372063800000,0.11150 1372064100000,0.11150 -1372064400000,0.12.20 +1372064400000,0.13.00 1372064700000,0.11350 -1372065000000,0.12.20 +1372065000000,0.13.00 1372065300000,0.11250 -1372065600000,0.12.20 -1372065900000,0.12.20 -1372066200000,0.12.20 -1372066500000,0.12.20 +1372065600000,0.13.00 +1372065900000,0.13.00 +1372066200000,0.13.00 +1372066500000,0.13.00 1372066800000,0.11650 -1372067100000,0.12.20 +1372067100000,0.13.00 1372067400000,0.11450 1372067700000,0.11150 -1372068000000,0.12.20 -1372068300000,0.12.20 +1372068000000,0.13.00 +1372068300000,0.13.00 1372068600000,0.10950 1372068900000,0.10850 -1372069200000,0.12.20 +1372069200000,0.13.00 1372069500000,0.10900 1372069800000,0.10850 1372070100000,0.11050 @@ -5203,7 +5203,7 @@ 1372090200000,0.08600 1372090500000,0.08700 1372090800000,0.08500 -13720.12.2000,0.08350 +13720.13.0000,0.08350 1372091400000,0.08250 1372091700000,0.08350 1372092000000,0.08550 @@ -5408,8 +5408,8 @@ 1372151700000,0.10900 1372152000000,0.10950 1372152300000,0.10900 -1372152600000,0.12.20 -1372152900000,0.12.20 +1372152600000,0.13.00 +1372152900000,0.13.00 1372153200000,0.11350 1372153500000,0.11150 1372153800000,0.10800 @@ -5696,7 +5696,7 @@ 1372238100000,0.11250 1372238400000,0.11150 1372238700000,0.10950 -1372239000000,0.12.20 +1372239000000,0.13.00 1372239300000,0.10900 1372239600000,0.10650 1372239900000,0.10450 @@ -5962,33 +5962,33 @@ 1372317900000,0.10550 1372318200000,0.10550 1372318500000,0.10950 -1372318800000,0.12.20 -1372319100000,0.12.20 +1372318800000,0.13.00 +1372319100000,0.13.00 1372319400000,0.11350 -1372319700000,0.12.20 -1372320000000,0.12.20 +1372319700000,0.13.00 +1372320000000,0.13.00 1372320300000,0.10700 1372320600000,0.10700 1372320900000,0.10900 1372321200000,0.10850 -1372321500000,0.12.20 -1372321800000,0.12.20 +1372321500000,0.13.00 +1372321800000,0.13.00 1372322100000,0.11150 -1372322400000,0.12.20 +1372322400000,0.13.00 1372322700000,0.11450 1372323000000,0.11350 -1372323300000,0.12.20 -1372323600000,0.12.20 -1372323900000,0.12.20 -1372324200000,0.12.20 +1372323300000,0.13.00 +1372323600000,0.13.00 +1372323900000,0.13.00 +1372324200000,0.13.00 1372324500000,0.11050 -1372324800000,0.12.20 +1372324800000,0.13.00 1372325100000,0.11050 1372325400000,0.10850 1372325700000,0.10650 1372326000000,0.10700 1372326300000,0.11050 -1372326600000,0.12.20 +1372326600000,0.13.00 1372326900000,0.11250 1372327200000,0.11250 1372327500000,0.10700 @@ -6316,13 +6316,13 @@ 1372424100000,0.12100 1372424400000,0.12100 1372424700000,0.11850 -1372425000000,0.12.20 -1372425300000,0.12.20 +1372425000000,0.13.00 +1372425300000,0.13.00 1372425600000,0.12050 1372425900000,0.12150 1372426200000,0.11750 -1372426500000,0.12.20 -1372426800000,0.12.20 +1372426500000,0.13.00 +1372426800000,0.13.00 1372427100000,0.10900 1372427400000,0.10950 1372427700000,0.10600 @@ -6459,30 +6459,30 @@ 1372467000000,0.10800 1372467300000,0.10950 1372467600000,0.10950 -1372467900000,0.12.20 +1372467900000,0.13.00 1372468200000,0.11750 1372468500000,0.11550 -1372468800000,0.12.20 +1372468800000,0.13.00 1372469100000,0.11750 -1372469400000,0.12.20 -1372469700000,0.12.20 -1372470000000,0.12.20 -1372470300000,0.12.20 -1372470600000,0.12.20 +1372469400000,0.13.00 +1372469700000,0.13.00 +1372470000000,0.13.00 +1372470300000,0.13.00 +1372470600000,0.13.00 1372470900000,0.11950 1372471200000,0.11950 -1372471500000,0.12.20 +1372471500000,0.13.00 1372471800000,0.11550 1372472100000,0.11450 -1372472400000,0.12.20 +1372472400000,0.13.00 1372472700000,0.11450 1372473000000,0.11450 1372473300000,0.11450 -1372473600000,0.12.20 +1372473600000,0.13.00 1372473900000,0.11250 1372474200000,0.11250 -1372474500000,0.12.20 -1372474800000,0.12.20 +1372474500000,0.13.00 +1372474800000,0.13.00 1372475100000,0.13050 1372475400000,0.13600 1372475700000,0.13800 @@ -6529,8 +6529,8 @@ 1372488000000,0.11850 1372488300000,0.11450 1372488600000,0.11250 -1372488900000,0.12.20 -1372489200000,0.12.20 +1372488900000,0.13.00 +1372489200000,0.13.00 1372489500000,0.10450 1372489800000,0.10350 1372490100000,0.10000 @@ -6565,52 +6565,52 @@ 1372498800000,0.10450 1372499100000,0.10550 1372499400000,0.10850 -1372499700000,0.12.20 -1372500000000,0.12.20 -1372500300000,0.12.20 -1372500600000,0.12.20 -1372500900000,0.12.20 -1372501200000,0.12.20 +1372499700000,0.13.00 +1372500000000,0.13.00 +1372500300000,0.13.00 +1372500600000,0.13.00 +1372500900000,0.13.00 +1372501200000,0.13.00 1372501500000,0.11350 1372501800000,0.11550 -1372502100000,0.12.20 +1372502100000,0.13.00 1372502400000,0.11650 1372502700000,0.11750 1372503000000,0.11750 -1372503300000,0.12.20 -1372503600000,0.12.20 +1372503300000,0.13.00 +1372503600000,0.13.00 1372503900000,0.11550 1372504200000,0.11650 -1372504500000,0.12.20 -1372504800000,0.12.20 +1372504500000,0.13.00 +1372504800000,0.13.00 1372505100000,0.11650 1372505400000,0.11850 1372505700000,0.12050 -1372506000000,0.12.20 -1372506300000,0.12.20 -1372506600000,0.12.20 +1372506000000,0.13.00 +1372506300000,0.13.00 +1372506600000,0.13.00 1372506900000,0.11450 1372507200000,0.11450 -1372507500000,0.12.20 -1372507800000,0.12.20 +1372507500000,0.13.00 +1372507800000,0.13.00 1372508100000,0.11450 1372508400000,0.11450 -1372508700000,0.12.20 -1372509000000,0.12.20 -1372509300000,0.12.20 +1372508700000,0.13.00 +1372509000000,0.13.00 +1372509300000,0.13.00 1372509600000,0.11850 1372509900000,0.12050 -1372510200000,0.12.20 -1372510500000,0.12.20 -1372510800000,0.12.20 +1372510200000,0.13.00 +1372510500000,0.13.00 +1372510800000,0.13.00 1372511100000,0.11150 1372511400000,0.10950 -1372511700000,0.12.20 +1372511700000,0.13.00 1372512000000,0.10900 -1372512300000,0.12.20 +1372512300000,0.13.00 1372512600000,0.11150 -1372512900000,0.12.20 -1372513200000,0.12.20 +1372512900000,0.13.00 +1372513200000,0.13.00 1372513500000,0.10900 1372513800000,0.10900 1372514100000,0.10850 @@ -6862,62 +6862,62 @@ 1372587900000,0.10900 1372588200000,0.10950 1372588500000,0.11050 -1372588800000,0.12.20 +1372588800000,0.13.00 1372589100000,0.11250 1372589400000,0.11450 -1372589700000,0.12.20 -1372590000000,0.12.20 +1372589700000,0.13.00 +1372590000000,0.13.00 1372590300000,0.11750 -1372590600000,0.12.20 +1372590600000,0.13.00 1372590900000,0.11850 -1372591200000,0.12.20 -1372591500000,0.12.20 +1372591200000,0.13.00 +1372591500000,0.13.00 1372591800000,0.11650 -1372592100000,0.12.20 +1372592100000,0.13.00 1372592400000,0.12000 1372592700000,0.12250 1372593000000,0.12000 1372593300000,0.12000 1372593600000,0.12000 -1372593900000,0.12.20 +1372593900000,0.13.00 1372594200000,0.11850 1372594500000,0.11850 -1372594800000,0.12.20 -1372595100000,0.12.20 +1372594800000,0.13.00 +1372595100000,0.13.00 1372595400000,0.11850 -1372595700000,0.12.20 +1372595700000,0.13.00 1372596000000,0.11450 1372596300000,0.11450 -1372596600000,0.12.20 +1372596600000,0.13.00 1372596900000,0.11650 1372597200000,0.11650 1372597500000,0.11750 1372597800000,0.12050 1372598100000,0.11450 -1372598400000,0.12.20 -1372598700000,0.12.20 -1372599000000,0.12.20 -1372599300000,0.12.20 -1372599600000,0.12.20 -1372599900000,0.12.20 -1372600200000,0.12.20 +1372598400000,0.13.00 +1372598700000,0.13.00 +1372599000000,0.13.00 +1372599300000,0.13.00 +1372599600000,0.13.00 +1372599900000,0.13.00 +1372600200000,0.13.00 1372600500000,0.11450 1372600800000,0.11450 1372601100000,0.11450 -1372601400000,0.12.20 -1372601700000,0.12.20 -1372602000000,0.12.20 -1372602300000,0.12.20 +1372601400000,0.13.00 +1372601700000,0.13.00 +1372602000000,0.13.00 +1372602300000,0.13.00 1372602600000,0.11250 -1372602900000,0.12.20 -1372603200000,0.12.20 -1372603500000,0.12.20 +1372602900000,0.13.00 +1372603200000,0.13.00 +1372603500000,0.13.00 1372603800000,0.10850 1372604100000,0.10500 1372604400000,0.10500 1372604700000,0.10650 -1372605000000,0.12.20 -1372605300000,0.12.20 +1372605000000,0.13.00 +1372605300000,0.13.00 1372605600000,0.10800 1372605900000,0.10450 1372606200000,0.10250 @@ -7120,4 +7120,4 @@ 1372665300000,0.09950 1372665600000,0.09950 1372665900000,0.09950 -1372666200000,0.12.20 +1372666200000,0.13.00 diff --git a/logisland-plugins/logisland-scripting-plugin/pom.xml b/logisland-plugins/logisland-scripting-plugin/pom.xml index 807759929..8924f9fb1 100644 --- a/logisland-plugins/logisland-scripting-plugin/pom.xml +++ b/logisland-plugins/logisland-scripting-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-scripting-plugin diff --git a/logisland-plugins/logisland-useragent-plugin/pom.xml b/logisland-plugins/logisland-useragent-plugin/pom.xml index edd99e753..ba3989c8f 100644 --- a/logisland-plugins/logisland-useragent-plugin/pom.xml +++ b/logisland-plugins/logisland-useragent-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-useragent-plugin diff --git a/logisland-plugins/logisland-web-analytics-plugin/pom.xml b/logisland-plugins/logisland-web-analytics-plugin/pom.xml index e6b883ce5..7602985d1 100644 --- a/logisland-plugins/logisland-web-analytics-plugin/pom.xml +++ b/logisland-plugins/logisland-web-analytics-plugin/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland-plugins - 0.12.2 + 0.13.0 logisland-web-analytics-plugin diff --git a/logisland-plugins/pom.xml b/logisland-plugins/pom.xml index f7ad205de..5c8517bb1 100644 --- a/logisland-plugins/pom.xml +++ b/logisland-plugins/pom.xml @@ -22,7 +22,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 logisland-plugins @@ -41,5 +41,6 @@ logisland-enrichment-plugin logisland-elasticsearch-plugin logisland-hbase-plugin + logisland-excel-plugin diff --git a/logisland-services/logisland-cache_key_value-service-api/pom.xml b/logisland-services/logisland-cache_key_value-service-api/pom.xml index f855ea29b..79f81b088 100644 --- a/logisland-services/logisland-cache_key_value-service-api/pom.xml +++ b/logisland-services/logisland-cache_key_value-service-api/pom.xml @@ -7,7 +7,7 @@ logisland-services com.hurence.logisland - 0.12.2 + 0.13.0 logisland-cache_key_value-service-api diff --git a/logisland-services/logisland-elasticsearch-client-service-api/pom.xml b/logisland-services/logisland-elasticsearch-client-service-api/pom.xml index 74cdb21cd..fd3453783 100644 --- a/logisland-services/logisland-elasticsearch-client-service-api/pom.xml +++ b/logisland-services/logisland-elasticsearch-client-service-api/pom.xml @@ -21,7 +21,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-elasticsearch-client-service-api diff --git a/logisland-services/logisland-elasticsearch_2_4_0-client-service/pom.xml b/logisland-services/logisland-elasticsearch_2_4_0-client-service/pom.xml index 16409a3dc..2f45d6e60 100644 --- a/logisland-services/logisland-elasticsearch_2_4_0-client-service/pom.xml +++ b/logisland-services/logisland-elasticsearch_2_4_0-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-elasticsearch_2_4_0-client-service diff --git a/logisland-services/logisland-elasticsearch_5_4_0-client-service/pom.xml b/logisland-services/logisland-elasticsearch_5_4_0-client-service/pom.xml index 18608f4f8..9e60bb364 100644 --- a/logisland-services/logisland-elasticsearch_5_4_0-client-service/pom.xml +++ b/logisland-services/logisland-elasticsearch_5_4_0-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-elasticsearch_5_4_0-client-service diff --git a/logisland-services/logisland-elasticsearch_5_4_0-client-service/src/main/java/com/hurence/logisland/service/elasticsearch/ElasticsearchRecordConverter.java b/logisland-services/logisland-elasticsearch_5_4_0-client-service/src/main/java/com/hurence/logisland/service/elasticsearch/ElasticsearchRecordConverter.java index 86eb9f628..1f0561c42 100644 --- a/logisland-services/logisland-elasticsearch_5_4_0-client-service/src/main/java/com/hurence/logisland/service/elasticsearch/ElasticsearchRecordConverter.java +++ b/logisland-services/logisland-elasticsearch_5_4_0-client-service/src/main/java/com/hurence/logisland/service/elasticsearch/ElasticsearchRecordConverter.java @@ -35,7 +35,7 @@ class ElasticsearchRecordConverter { /** * Converts an Event into an Elasticsearch document * to be indexed later - * + *e * @param record to convert * @return the json converted record */ diff --git a/logisland-services/logisland-hbase-client-service-api/pom.xml b/logisland-services/logisland-hbase-client-service-api/pom.xml index e2bf9519c..ef7987127 100644 --- a/logisland-services/logisland-hbase-client-service-api/pom.xml +++ b/logisland-services/logisland-hbase-client-service-api/pom.xml @@ -21,7 +21,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-hbase-client-service-api diff --git a/logisland-services/logisland-hbase_1_1_2-client-service/pom.xml b/logisland-services/logisland-hbase_1_1_2-client-service/pom.xml index 3604add8b..afd5fb6d3 100644 --- a/logisland-services/logisland-hbase_1_1_2-client-service/pom.xml +++ b/logisland-services/logisland-hbase_1_1_2-client-service/pom.xml @@ -19,7 +19,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-hbase_1_1_2-client-service diff --git a/logisland-services/logisland-ip-to-geo-service-api/pom.xml b/logisland-services/logisland-ip-to-geo-service-api/pom.xml index 9218d5f10..6a6cbc4ca 100644 --- a/logisland-services/logisland-ip-to-geo-service-api/pom.xml +++ b/logisland-services/logisland-ip-to-geo-service-api/pom.xml @@ -5,7 +5,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 4.0.0 diff --git a/logisland-services/logisland-ip-to-geo-service-maxmind/pom.xml b/logisland-services/logisland-ip-to-geo-service-maxmind/pom.xml index e0b1ac332..1697252b1 100644 --- a/logisland-services/logisland-ip-to-geo-service-maxmind/pom.xml +++ b/logisland-services/logisland-ip-to-geo-service-maxmind/pom.xml @@ -5,7 +5,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 diff --git a/logisland-services/logisland-redis_4-client-service/pom.xml b/logisland-services/logisland-redis_4-client-service/pom.xml new file mode 100644 index 000000000..07b8c7156 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/pom.xml @@ -0,0 +1,151 @@ + + + 4.0.0 + + + com.hurence.logisland + logisland-services + 0.13.0 + + + logisland-redis_4-client-service + jar + + + + 1.8.11.RELEASE + 2.9.0 + + + ${logisland.shade.packageName}.redis409 + + + UTF-8 + UTF-8 + + + + + org.slf4j + slf4j-api + + + com.hurence.logisland + logisland-api + + + com.hurence.logisland + logisland-cache_key_value-service-api + + + com.hurence.logisland + logisland-utils + + + com.esotericsoftware.kryo + kryo + + + + + + org.springframework.data + spring-data-redis + ${spring.data.redis.version} + + + redis.clients + jedis + ${jedis.version} + + + + com.github.kstyrc + embedded-redis + 0.6 + test + + + + + + + org.apache.maven.plugins + maven-surefire-plugin + + -Dtests.security.manager=false + + + + org.immutables.tools + maven-shade-plugin + 4 + + + package + + shade + + + true + + + + redis.clients:* + org.springframework.data:* + org.springframework:* + + + + + *:* + + META-INF/license/** + META-INF/* + META-INF/maven/** + LICENSE + NOTICE + /*.txt + build.properties + + + + + + + + + + GIT commit ID + ${maven.build.timestamp} + + + + + + + org.springframework.data + ${shaded.package}.org.springframework.data + + + + + redis.clients + ${shaded.package}.redis.clients + + + + + + + + + + + + + diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisConnectionPool.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisConnectionPool.java new file mode 100644 index 000000000..9a22913c5 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisConnectionPool.java @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis; + +import org.springframework.data.redis.connection.RedisConnection; + +/** + * A service that provides connections to Redis using spring-data-redis. + */ +public interface RedisConnectionPool { + + /** + * Obtains a RedisConnection instance from the pool. + * + * NOTE: Clients are responsible for ensuring the close() method of the connection is called to return it to the pool. + * + * @return a RedisConnection instance + */ + RedisConnection getConnection(); + + /** + * Some Redis operations are only supported in a specific mode. Clients should use this method to ensure + * the connection pool they are using supports their required operations. + * + * @return the type of Redis instance (i.e. standalone, clustered, sentinel) + */ + RedisType getRedisType(); + +} diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisType.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisType.java new file mode 100644 index 000000000..b6a75c4c0 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/RedisType.java @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis; + +/** + * Possible types of Redis instances. + */ +public enum RedisType { + + STANDALONE("standalone", "A single standalone Redis instance."), + + SENTINEL("sentinel", "Redis Sentinel which provides high-availability. Described further at https://redis.io/topics/sentinel"), + + CLUSTER("cluster", "Clustered Redis which provides sharding and replication. Described further at https://redis.io/topics/cluster-spec"); + + private final String displayName; + private final String description; + + RedisType(final String displayName, final String description) { + this.displayName = displayName; + this.description = description; + } + + public String getDisplayName() { + return displayName; + } + + public String getDescription() { + return description; + } + + public static RedisType fromDisplayName(final String displayName) { + for (RedisType redisType : values()) { + if (redisType.getDisplayName().equals(displayName)) { + return redisType; + } + } + + throw new IllegalArgumentException("Unknown RedisType: " + displayName); + } + +} diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisConnectionPool.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisConnectionPool.java new file mode 100644 index 000000000..dd68e0ec1 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisConnectionPool.java @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.service; + + +import com.hurence.logisland.controller.ConfigurationContext; +import com.hurence.logisland.controller.ControllerServiceInitializationContext; +import com.hurence.logisland.redis.RedisType; +import com.hurence.logisland.redis.util.RedisUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.data.redis.connection.RedisConnection; +import org.springframework.data.redis.connection.jedis.JedisConnectionFactory; + + +public class RedisConnectionPool { + + private volatile ControllerServiceInitializationContext context; + private volatile RedisType redisType; + private volatile JedisConnectionFactory connectionFactory; + + private static Logger logger = LoggerFactory.getLogger(RedisConnectionPool.class); + + + public void init(final ControllerServiceInitializationContext context) { + this.context = context; + + final String redisMode = context.getPropertyValue(RedisUtils.REDIS_MODE).asString(); + this.redisType = RedisType.fromDisplayName(redisMode); + } + + + public void close() { + if (connectionFactory != null) { + connectionFactory.destroy(); + connectionFactory = null; + redisType = null; + context = null; + } + } + + public RedisType getRedisType() { + return redisType; + } + + public RedisConnection getConnection() { + if (connectionFactory == null) { + synchronized (this) { + if (connectionFactory == null) { + logger.info("creating Redis connection factory"); + connectionFactory = RedisUtils.createConnectionFactory(context); + } + } + } + + return connectionFactory.getConnection(); + } + + +} diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisKeyValueCacheService.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisKeyValueCacheService.java new file mode 100644 index 000000000..7d9939951 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/service/RedisKeyValueCacheService.java @@ -0,0 +1,440 @@ +/** + * Copyright (C) 2016 Hurence (support@hurence.com) + *

+ * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.service; + +import com.hurence.logisland.annotation.documentation.CapabilityDescription; +import com.hurence.logisland.annotation.documentation.Tags; +import com.hurence.logisland.annotation.lifecycle.OnEnabled; +import com.hurence.logisland.component.AllowableValue; +import com.hurence.logisland.component.InitializationException; +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.controller.AbstractControllerService; +import com.hurence.logisland.controller.ControllerServiceInitializationContext; +import com.hurence.logisland.record.Record; +import com.hurence.logisland.redis.util.RedisAction; +import com.hurence.logisland.redis.util.RedisUtils; +import com.hurence.logisland.serializer.*; +import com.hurence.logisland.service.cache.CacheService; +import com.hurence.logisland.service.cache.model.Cache; +import com.hurence.logisland.service.cache.model.LRUCache; +import com.hurence.logisland.service.datastore.DatastoreClientService; +import com.hurence.logisland.service.datastore.DatastoreClientServiceException; +import com.hurence.logisland.service.datastore.MultiGetQueryRecord; +import com.hurence.logisland.service.datastore.MultiGetResponseRecord; +import com.hurence.logisland.util.Tuple; +import com.hurence.logisland.validator.StandardValidators; +import com.hurence.logisland.validator.ValidationContext; +import com.hurence.logisland.validator.ValidationResult; +import org.apache.avro.Schema; +import org.apache.commons.io.IOUtils; +import org.springframework.data.redis.connection.RedisConnection; +import org.springframework.data.redis.core.Cursor; +import org.springframework.data.redis.core.ScanOptions; + +import java.io.*; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; + + +/** + * Created by oalam on 23/05/2018. + *

+ *

This an implementation of an high-performance "maybe-distributed" cache using Redis + * It will cache every item automatically with put method. You just have to use get method + * to retrieve cached object.

+ *

+ *

You specify default TTL

+ */ +@Tags({"cache", "service", "key", "value", "pair", "redis"}) +@CapabilityDescription("A controller service for caching records by key value pair with LRU (last recently used) strategy. using LinkedHashMap") +public class RedisKeyValueCacheService extends AbstractControllerService implements DatastoreClientService, CacheService { + + private volatile RecordSerializer recordSerializer; + private final Serializer stringSerializer = new StringSerializer(); + private volatile RedisConnectionPool redisConnectionPool; + + + public static final AllowableValue AVRO_SERIALIZER = new AllowableValue(AvroSerializer.class.getName(), + "avro serialization", "serialize events as avro blocs"); + public static final AllowableValue JSON_SERIALIZER = new AllowableValue(JsonSerializer.class.getName(), + "avro serialization", "serialize events as json blocs"); + public static final AllowableValue KRYO_SERIALIZER = new AllowableValue(KryoSerializer.class.getName(), + "kryo serialization", "serialize events as json blocs"); + public static final AllowableValue BYTESARRAY_SERIALIZER = new AllowableValue(BytesArraySerializer.class.getName(), + "byte array serialization", "serialize events as byte arrays"); + public static final AllowableValue KURA_PROTOCOL_BUFFER_SERIALIZER = new AllowableValue(KuraProtobufSerializer.class.getName(), + "Kura Protobuf serialization", "serialize events as Kura protocol buffer"); + public static final AllowableValue NO_SERIALIZER = new AllowableValue("none", "no serialization", "send events as bytes"); + + + public static final PropertyDescriptor RECORD_SERIALIZER = new PropertyDescriptor.Builder() + .name("record.recordSerializer") + .description("the way to serialize/deserialize the record") + .required(true) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues(KRYO_SERIALIZER, JSON_SERIALIZER, AVRO_SERIALIZER, BYTESARRAY_SERIALIZER, KURA_PROTOCOL_BUFFER_SERIALIZER, NO_SERIALIZER) + .defaultValue(JSON_SERIALIZER.getValue()) + .build(); + + + public static final PropertyDescriptor AVRO_SCHEMA = new PropertyDescriptor.Builder() + .name("record.avro.schema") + .description("the avro schema definition") + .required(false) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .build(); + + @Override + @OnEnabled + public void init(ControllerServiceInitializationContext context) throws InitializationException { + try { + this.redisConnectionPool = new RedisConnectionPool(); + this.redisConnectionPool.init(context); + this.recordSerializer = getSerializer( + context.getPropertyValue(RECORD_SERIALIZER).asString(), + context.getPropertyValue(AVRO_SCHEMA).asString()); + } catch (Exception e) { + throw new InitializationException(e); + } + } + + + @Override + public List getSupportedPropertyDescriptors() { + + List properties = new ArrayList<>(RedisUtils.REDIS_CONNECTION_PROPERTY_DESCRIPTORS); + properties.add(RECORD_SERIALIZER); + + return properties; + } + + @Override + protected Collection customValidate(ValidationContext validationContext) { + return RedisUtils.validate(validationContext); + } + + @Override + public Record get(String key) { + try { + return get(key, stringSerializer, (Deserializer) recordSerializer); + } catch (IOException e) { + e.printStackTrace(); + return null; + } + } + + @Override + public void set(String key, Record value) { + try { + put(key, value,stringSerializer, (Serializer) recordSerializer); + } catch (IOException e) { + e.printStackTrace(); + } + } + + + protected Cache createCache(final ControllerServiceInitializationContext context) throws IOException, InterruptedException { + final int capacity = context.getPropertyValue(CACHE_SIZE).asInteger(); + return new LRUCache(capacity); + } + + + public boolean putIfAbsent(final String key, final Record value, final Serializer keySerializer, final Serializer valueSerializer) throws IOException { + return withConnection(redisConnection -> { + final Tuple kv = serialize(key, value, keySerializer, valueSerializer); + return redisConnection.setNX(kv.getKey(), kv.getValue()); + }); + } + + public Record getAndPutIfAbsent(final String key, final Record value, final Serializer keySerializer, final Serializer valueSerializer, final Deserializer valueDeserializer) throws IOException { + return withConnection(redisConnection -> { + final Tuple kv = serialize(key, value, keySerializer, valueSerializer); + do { + // start a watch on the key and retrieve the current value + redisConnection.watch(kv.getKey()); + final byte[] existingValue = redisConnection.get(kv.getKey()); + + // start a transaction and perform the put-if-absent + redisConnection.multi(); + redisConnection.setNX(kv.getKey(), kv.getValue()); + + // execute the transaction + final List results = redisConnection.exec(); + + // if the results list was empty, then the transaction failed (i.e. key was modified after we started watching), so keep looping to retry + // if the results list has results, then the transaction succeeded and it should have the result of the setNX operation + if (results.size() > 0) { + final Object firstResult = results.get(0); + if (firstResult instanceof Boolean) { + final Boolean absent = (Boolean) firstResult; + + if(absent){ + return null; + }else { + InputStream input = new ByteArrayInputStream(existingValue); + return valueDeserializer.deserialize(input); + } + } else { + // this shouldn't really happen, but just in case there is a non-boolean result then bounce out of the loop + throw new IOException("Unexpected result from Redis transaction: Expected Boolean result, but got " + + firstResult.getClass().getName() + " with value " + firstResult.toString()); + } + } + } while (isEnabled()); + + return null; + }); + } + + + public boolean containsKey(final String key, final Serializer keySerializer) throws IOException { + return withConnection(redisConnection -> { + final byte[] k = serialize(key, keySerializer); + return redisConnection.exists(k); + }); + } + + public void put(final String key, final Record value, final Serializer keySerializer, final Serializer valueSerializer) throws IOException { + withConnection(redisConnection -> { + final Tuple kv = serialize(key, value, keySerializer, valueSerializer); + redisConnection.set(kv.getKey(), kv.getValue()); + return null; + }); + } + + + public Record get(final String key, final Serializer keySerializer, final Deserializer valueDeserializer) throws IOException { + return withConnection(redisConnection -> { + final byte[] k = serialize(key, keySerializer); + final byte[] v = redisConnection.get(k); + InputStream input = new ByteArrayInputStream(v); + return valueDeserializer.deserialize(input); + }); + } + + public void close() throws IOException { + try { + if (this.redisConnectionPool != null) + this.redisConnectionPool.close(); + } catch (Exception e) { + throw new IOException(e); + } + } + + public boolean remove(final String key, final Serializer keySerializer) throws IOException { + return withConnection(redisConnection -> { + final byte[] k = serialize(key, keySerializer); + final long numRemoved = redisConnection.del(k); + return numRemoved > 0; + }); + } + + public long removeByPattern(final java.lang.String regex) throws IOException { + return withConnection(redisConnection -> { + long deletedCount = 0; + final List batchKeys = new ArrayList<>(); + + // delete keys in batches of 1000 using the cursor + final Cursor cursor = redisConnection.scan(ScanOptions.scanOptions().count(100).match(regex).build()); + while (cursor.hasNext()) { + batchKeys.add(cursor.next()); + + if (batchKeys.size() == 1000) { + deletedCount += redisConnection.del(getKeys(batchKeys)); + batchKeys.clear(); + } + } + + // delete any left-over keys if some were added to the batch but never reached 1000 + if (batchKeys.size() > 0) { + deletedCount += redisConnection.del(getKeys(batchKeys)); + batchKeys.clear(); + } + + return deletedCount; + }); + } + + /** + * Convert the list of all keys to an array. + */ + private byte[][] getKeys(final List keys) { + final byte[][] allKeysArray = new byte[keys.size()][]; + for (int i = 0; i < keys.size(); i++) { + allKeysArray[i] = keys.get(i); + } + return allKeysArray; + } + + + private Tuple serialize(final K key, final Record value, final Serializer keySerializer, final Serializer valueSerializer) throws IOException { + final ByteArrayOutputStream out = new ByteArrayOutputStream(); + + keySerializer.serialize(out, key); + final byte[] k = out.toByteArray(); + + out.reset(); + + valueSerializer.serialize(out, value); + final byte[] v = out.toByteArray(); + + return new Tuple<>(k, v); + } + + private byte[] serialize(final K key, final Serializer keySerializer) throws IOException { + final ByteArrayOutputStream out = new ByteArrayOutputStream(); + + keySerializer.serialize(out, key); + return out.toByteArray(); + } + + private T withConnection(final RedisAction action) throws IOException { + RedisConnection redisConnection = null; + try { + redisConnection = redisConnectionPool.getConnection(); + return action.execute(redisConnection); + } finally { + if (redisConnection != null) { + try { + redisConnection.close(); + } catch (Exception e) { + getLogger().warn("Error closing connection: " + e.getMessage(), e); + } + } + } + } + + /** + * build a recordSerializer + * + * @param inSerializerClass the recordSerializer type + * @param schemaContent an Avro schema + * @return the recordSerializer + */ + private RecordSerializer getSerializer(String inSerializerClass, String schemaContent) { + + if (inSerializerClass.equals(AVRO_SERIALIZER.getValue())) { + Schema.Parser parser = new Schema.Parser(); + Schema inSchema = parser.parse(schemaContent); + new AvroSerializer(inSchema); + } else if (inSerializerClass.equals(JSON_SERIALIZER.getValue())) { + return new JsonSerializer(); + } else if (inSerializerClass.equals(BYTESARRAY_SERIALIZER.getValue())) { + return new BytesArraySerializer(); + } else if (inSerializerClass.equals(KURA_PROTOCOL_BUFFER_SERIALIZER.getValue())) { + return new KuraProtobufSerializer(); + } + return new KryoSerializer(true); + + } + + @Override + public void createCollection(String name, int partitionsCount, int replicationFactor) throws DatastoreClientServiceException { + + } + + @Override + public void dropCollection(String name) throws DatastoreClientServiceException { + + } + + @Override + public long countCollection(String name) throws DatastoreClientServiceException { + return 0; + } + + @Override + public boolean existsCollection(String name) throws DatastoreClientServiceException { + return false; + } + + @Override + public void refreshCollection(String name) throws DatastoreClientServiceException { + + } + + @Override + public void copyCollection(String reindexScrollTimeout, String src, String dst) throws DatastoreClientServiceException { + + } + + @Override + public void createAlias(String collection, String alias) throws DatastoreClientServiceException { + + } + + @Override + public boolean putMapping(String indexName, String doctype, String mappingAsJsonString) throws DatastoreClientServiceException { + return false; + } + + @Override + public void bulkFlush() throws DatastoreClientServiceException { + + } + + @Override + public void bulkPut(String collectionName, Record record) throws DatastoreClientServiceException { + set(record.getId(),record); + } + + @Override + public void put(String collectionName, Record record, boolean asynchronous) throws DatastoreClientServiceException { + set(record.getId(),record); + } + + @Override + public List multiGet(List multiGetQueryRecords) throws DatastoreClientServiceException { + return null; + } + + @Override + public Record get(String collectionName, Record record) throws DatastoreClientServiceException { + return get(record.getId()); + } + + @Override + public Collection query(String query) { + return null; + } + + @Override + public long queryCount(String query) { + return 0; + } + + private static class StringSerializer implements Serializer { + @Override + public void serialize(OutputStream output, String value) throws SerializationException, IOException { + if (value != null) { + output.write(value.getBytes(StandardCharsets.UTF_8)); + } + } + } + + private static class StringDeserializer implements Deserializer { + @Override + public String deserialize(InputStream input) throws DeserializationException, IOException { + byte[] bytes = IOUtils.toByteArray(input); + return input == null ? null : new String(bytes, StandardCharsets.UTF_8); + } + } +} + + + diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisAction.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisAction.java new file mode 100644 index 000000000..03cb1852e --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisAction.java @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.util; + +import org.springframework.data.redis.connection.RedisConnection; + +import java.io.IOException; + +/** + * An action to be executed with a RedisConnection. + */ +public interface RedisAction { + + T execute(RedisConnection redisConnection) throws IOException; + +} diff --git a/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisUtils.java b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisUtils.java new file mode 100644 index 000000000..0cc336506 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/main/java/com/hurence/logisland/redis/util/RedisUtils.java @@ -0,0 +1,448 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.util; + + +import com.hurence.logisland.component.AllowableValue; +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.controller.ControllerServiceInitializationContext; +import com.hurence.logisland.redis.RedisType; +import com.hurence.logisland.util.string.StringUtils; +import com.hurence.logisland.validator.StandardValidators; +import com.hurence.logisland.validator.ValidationContext; +import com.hurence.logisland.validator.ValidationResult; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.data.redis.connection.RedisClusterConfiguration; +import org.springframework.data.redis.connection.RedisSentinelConfiguration; +import org.springframework.data.redis.connection.jedis.JedisConnectionFactory; +import redis.clients.jedis.JedisPoolConfig; +import redis.clients.jedis.JedisShardInfo; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.concurrent.TimeUnit; + +public class RedisUtils { + + private static Logger logger = LoggerFactory.getLogger(RedisUtils.class); + + + // These properties are shared between the connection pool controller service and the state provider, the name + // is purposely set to be more human-readable since that will be referenced in state-management.xml + + public static final AllowableValue REDIS_MODE_STANDALONE = new AllowableValue(RedisType.STANDALONE.getDisplayName(), RedisType.STANDALONE.getDisplayName(), RedisType.STANDALONE.getDescription()); + public static final AllowableValue REDIS_MODE_SENTINEL = new AllowableValue(RedisType.SENTINEL.getDisplayName(), RedisType.SENTINEL.getDisplayName(), RedisType.SENTINEL.getDescription()); + public static final AllowableValue REDIS_MODE_CLUSTER = new AllowableValue(RedisType.CLUSTER.getDisplayName(), RedisType.CLUSTER.getDisplayName(), RedisType.CLUSTER.getDescription()); + + public static final PropertyDescriptor REDIS_MODE = new PropertyDescriptor.Builder() + .name("redis.mode") + .displayName("Redis Mode") + .description("The type of Redis being communicated with - standalone, sentinel, or clustered.") + .allowableValues(REDIS_MODE_STANDALONE, REDIS_MODE_SENTINEL, REDIS_MODE_CLUSTER) + .defaultValue(REDIS_MODE_STANDALONE.getValue()) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) + .build(); + + public static final PropertyDescriptor CONNECTION_STRING = new PropertyDescriptor.Builder() + .name("connection.string") + .displayName("Connection String") + .description("The connection string for Redis. In a standalone instance this value will be of the form hostname:port. " + + "In a sentinel instance this value will be the comma-separated list of sentinels, such as host1:port1,host2:port2,host3:port3. " + + "In a clustered instance this value will be the comma-separated list of cluster masters, such as host1:port,host2:port,host3:port.") + .required(true) + .addValidator(StandardValidators.NON_BLANK_VALIDATOR) + // .expressionLanguageSupported(true) + .build(); + + public static final PropertyDescriptor DATABASE = new PropertyDescriptor.Builder() + .name("database.index") + .displayName("Database Index") + .description("The database index to be used by connections created from this connection pool. " + + "See the databases property in redis.conf, by default databases 0-15 will be available.") + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) + .defaultValue("0") + // .expressionLanguageSupported(true) + .required(true) + .build(); + + public static final PropertyDescriptor COMMUNICATION_TIMEOUT = new PropertyDescriptor.Builder() + .name("communication.timeout") + .displayName("Communication Timeout") + .description("The timeout to use when attempting to communicate with Redis.") + .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR) + .defaultValue("10 seconds") + .required(true) + .build(); + + public static final PropertyDescriptor CLUSTER_MAX_REDIRECTS = new PropertyDescriptor.Builder() + .name("cluster.max.redirects") + .displayName("Cluster Max Redirects") + .description("The maximum number of redirects that can be performed when clustered.") + .addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR) + .defaultValue("5") + .required(true) + .build(); + + public static final PropertyDescriptor SENTINEL_MASTER = new PropertyDescriptor.Builder() + .name("sentinel.master") + .displayName("Sentinel Master") + .description("The name of the sentinel master, require when Mode is set to Sentinel") + .addValidator(StandardValidators.NON_BLANK_VALIDATOR) + // .expressionLanguageSupported(true) + .build(); + + public static final PropertyDescriptor PASSWORD = new PropertyDescriptor.Builder() + .name("password") + .displayName("Password") + .description("The password used to authenticate to the Redis server. See the requirepass property in redis.conf.") + .addValidator(StandardValidators.NON_BLANK_VALIDATOR) + // .expressionLanguageSupported(true) + .sensitive(true) + .build(); + + public static final PropertyDescriptor POOL_MAX_TOTAL = new PropertyDescriptor.Builder() + .name("pool.max.total") + .displayName("Pool - Max Total") + .description("The maximum number of connections that can be allocated by the pool (checked out to clients, or idle awaiting checkout). " + + "A negative value indicates that there is no limit.") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("8") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_MAX_IDLE = new PropertyDescriptor.Builder() + .name("pool.max.idle") + .displayName("Pool - Max Idle") + .description("The maximum number of idle connections that can be held in the pool, or a negative value if there is no limit.") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("8") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_MIN_IDLE = new PropertyDescriptor.Builder() + .name("pool.min.idle") + .displayName("Pool - Min Idle") + .description("The target for the minimum number of idle connections to maintain in the pool. If the configured value of Min Idle is " + + "greater than the configured value for Max Idle, then the value of Max Idle will be used instead.") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("0") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_BLOCK_WHEN_EXHAUSTED = new PropertyDescriptor.Builder() + .name("pool.block.when.exhausted") + .displayName("Pool - Block When Exhausted") + .description("Whether or not clients should block and wait when trying to obtain a connection from the pool when the pool has no available connections. " + + "Setting this to false means an error will occur immediately when a client requests a connection and none are available.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues("true", "false") + .defaultValue("true") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_MAX_WAIT_TIME = new PropertyDescriptor.Builder() + .name("pool.max.wait.time") + .displayName("Pool - Max Wait Time") + .description("The amount of time to wait for an available connection when Block When Exhausted is set to true.") + .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR) + .defaultValue("10 seconds") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_MIN_EVICTABLE_IDLE_TIME = new PropertyDescriptor.Builder() + .name("pool.min.evictable.idle.time") + .displayName("Pool - Min Evictable Idle Time") + .description("The minimum amount of time an object may sit idle in the pool before it is eligible for eviction.") + .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR) + .defaultValue("60 seconds") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_TIME_BETWEEN_EVICTION_RUNS = new PropertyDescriptor.Builder() + .name("pool.time.between.eviction.runs") + .displayName("Pool - Time Between Eviction Runs") + .description("The amount of time between attempting to evict idle connections from the pool.") + .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR) + .defaultValue("30 seconds") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_NUM_TESTS_PER_EVICTION_RUN = new PropertyDescriptor.Builder() + .name("pool.num.tests.per.eviction.run") + .displayName("Pool - Num Tests Per Eviction Run") + .description("The number of connections to tests per eviction attempt. A negative value indicates to test all connections.") + .addValidator(StandardValidators.INTEGER_VALIDATOR) + .defaultValue("-1") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_TEST_ON_CREATE = new PropertyDescriptor.Builder() + .name("pool.test.on.create") + .displayName("Pool - Test On Create") + .description("Whether or not connections should be tested upon creation.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues("true", "false") + .defaultValue("false") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_TEST_ON_BORROW = new PropertyDescriptor.Builder() + .name("pool.test.on.borrow") + .displayName("Pool - Test On Borrow") + .description("Whether or not connections should be tested upon borrowing from the pool.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues("true", "false") + .defaultValue("false") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_TEST_ON_RETURN = new PropertyDescriptor.Builder() + .name("pool.test.on.return") + .displayName("Pool - Test On Return") + .description("Whether or not connections should be tested upon returning to the pool.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues("true", "false") + .defaultValue("false") + .required(true) + .build(); + + public static final PropertyDescriptor POOL_TEST_WHILE_IDLE = new PropertyDescriptor.Builder() + .name("pool.test.while.idle") + .displayName("Pool - Test While Idle") + .description("Whether or not connections should be tested while idle.") + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .allowableValues("true", "false") + .defaultValue("true") + .required(true) + .build(); + + + + + + + + + + + + + + + public static final List REDIS_CONNECTION_PROPERTY_DESCRIPTORS; + + static { + final List props = new ArrayList<>(); + props.add(RedisUtils.REDIS_MODE); + props.add(RedisUtils.CONNECTION_STRING); + props.add(RedisUtils.DATABASE); + props.add(RedisUtils.COMMUNICATION_TIMEOUT); + props.add(RedisUtils.CLUSTER_MAX_REDIRECTS); + props.add(RedisUtils.SENTINEL_MASTER); + props.add(RedisUtils.PASSWORD); + props.add(RedisUtils.POOL_MAX_TOTAL); + props.add(RedisUtils.POOL_MAX_IDLE); + props.add(RedisUtils.POOL_MIN_IDLE); + props.add(RedisUtils.POOL_BLOCK_WHEN_EXHAUSTED); + props.add(RedisUtils.POOL_MAX_WAIT_TIME); + props.add(RedisUtils.POOL_MIN_EVICTABLE_IDLE_TIME); + props.add(RedisUtils.POOL_TIME_BETWEEN_EVICTION_RUNS); + props.add(RedisUtils.POOL_NUM_TESTS_PER_EVICTION_RUN); + props.add(RedisUtils.POOL_TEST_ON_CREATE); + props.add(RedisUtils.POOL_TEST_ON_BORROW); + props.add(RedisUtils.POOL_TEST_ON_RETURN); + props.add(RedisUtils.POOL_TEST_WHILE_IDLE); + REDIS_CONNECTION_PROPERTY_DESCRIPTORS = Collections.unmodifiableList(props); + } + + + public static JedisConnectionFactory createConnectionFactory(final ControllerServiceInitializationContext context) { + final String redisMode = context.getPropertyValue(RedisUtils.REDIS_MODE).asString(); + final String connectionString = context.getPropertyValue(RedisUtils.CONNECTION_STRING).asString(); + final Integer dbIndex = context.getPropertyValue(RedisUtils.DATABASE).asInteger(); + final String password = context.getPropertyValue(RedisUtils.PASSWORD).asString(); + final Integer timeout = context.getPropertyValue(RedisUtils.COMMUNICATION_TIMEOUT).asTimePeriod(TimeUnit.MILLISECONDS).intValue(); + final JedisPoolConfig poolConfig = createJedisPoolConfig(context); + + JedisConnectionFactory connectionFactory; + + if (RedisUtils.REDIS_MODE_STANDALONE.getValue().equals(redisMode)) { + final JedisShardInfo jedisShardInfo = createJedisShardInfo(connectionString, timeout, password); + + logger.info("Connecting to Redis in standalone mode at " + connectionString); + connectionFactory = new JedisConnectionFactory(jedisShardInfo); + + } else if (RedisUtils.REDIS_MODE_SENTINEL.getValue().equals(redisMode)) { + final String[] sentinels = connectionString.split("[,]"); + final String sentinelMaster = context.getPropertyValue(RedisUtils.SENTINEL_MASTER).asString(); + final RedisSentinelConfiguration sentinelConfiguration = new RedisSentinelConfiguration(sentinelMaster, new HashSet<>(getTrimmedValues(sentinels))); + final JedisShardInfo jedisShardInfo = createJedisShardInfo(sentinels[0], timeout, password); + + logger.info("Connecting to Redis in sentinel mode..."); + logger.info("Redis master = " + sentinelMaster); + + for (final String sentinel : sentinels) { + logger.info("Redis sentinel at " + sentinel); + } + + connectionFactory = new JedisConnectionFactory(sentinelConfiguration, poolConfig); + connectionFactory.setShardInfo(jedisShardInfo); + + } else { + final String[] clusterNodes = connectionString.split("[,]"); + final Integer maxRedirects = context.getPropertyValue(RedisUtils.CLUSTER_MAX_REDIRECTS).asInteger(); + + final RedisClusterConfiguration clusterConfiguration = new RedisClusterConfiguration(getTrimmedValues(clusterNodes)); + clusterConfiguration.setMaxRedirects(maxRedirects); + + logger.info("Connecting to Redis in clustered mode..."); + for (final String clusterNode : clusterNodes) { + logger.info("Redis cluster node at " + clusterNode); + } + + connectionFactory = new JedisConnectionFactory(clusterConfiguration, poolConfig); + } + + connectionFactory.setUsePool(true); + connectionFactory.setPoolConfig(poolConfig); + connectionFactory.setDatabase(dbIndex); + connectionFactory.setTimeout(timeout); + + if (!StringUtils.isBlank(password)) { + connectionFactory.setPassword(password); + } + + // need to call this to initialize the pool/connections + connectionFactory.afterPropertiesSet(); + logger.info("done creating Connection factory"); + return connectionFactory; + } + + private static List getTrimmedValues(final String[] values) { + final List trimmedValues = new ArrayList<>(); + for (final String value : values) { + trimmedValues.add(value.trim()); + } + return trimmedValues; + } + + private static JedisShardInfo createJedisShardInfo(final String hostAndPort, final Integer timeout, final String password) { + final String[] hostAndPortSplit = hostAndPort.split("[:]"); + final String host = hostAndPortSplit[0].trim(); + final Integer port = Integer.parseInt(hostAndPortSplit[1].trim()); + + final JedisShardInfo jedisShardInfo = new JedisShardInfo(host, port); + jedisShardInfo.setConnectionTimeout(timeout); + jedisShardInfo.setSoTimeout(timeout); + + if (!StringUtils.isEmpty(password)) { + jedisShardInfo.setPassword(password); + } + + return jedisShardInfo; + } + + private static JedisPoolConfig createJedisPoolConfig(final ControllerServiceInitializationContext context) { + final JedisPoolConfig poolConfig = new JedisPoolConfig(); + poolConfig.setMaxTotal(context.getPropertyValue(RedisUtils.POOL_MAX_TOTAL).asInteger()); + poolConfig.setMaxIdle(context.getPropertyValue(RedisUtils.POOL_MAX_IDLE).asInteger()); + poolConfig.setMinIdle(context.getPropertyValue(RedisUtils.POOL_MIN_IDLE).asInteger()); + poolConfig.setBlockWhenExhausted(context.getPropertyValue(RedisUtils.POOL_BLOCK_WHEN_EXHAUSTED).asBoolean()); + poolConfig.setMaxWaitMillis(context.getPropertyValue(RedisUtils.POOL_MAX_WAIT_TIME).asTimePeriod(TimeUnit.MILLISECONDS)); + poolConfig.setMinEvictableIdleTimeMillis(context.getPropertyValue(RedisUtils.POOL_MIN_EVICTABLE_IDLE_TIME).asTimePeriod(TimeUnit.MILLISECONDS)); + poolConfig.setTimeBetweenEvictionRunsMillis(context.getPropertyValue(RedisUtils.POOL_TIME_BETWEEN_EVICTION_RUNS).asTimePeriod(TimeUnit.MILLISECONDS)); + poolConfig.setNumTestsPerEvictionRun(context.getPropertyValue(RedisUtils.POOL_NUM_TESTS_PER_EVICTION_RUN).asInteger()); + poolConfig.setTestOnCreate(context.getPropertyValue(RedisUtils.POOL_TEST_ON_CREATE).asBoolean()); + poolConfig.setTestOnBorrow(context.getPropertyValue(RedisUtils.POOL_TEST_ON_BORROW).asBoolean()); + poolConfig.setTestOnReturn(context.getPropertyValue(RedisUtils.POOL_TEST_ON_RETURN).asBoolean()); + poolConfig.setTestWhileIdle(context.getPropertyValue(RedisUtils.POOL_TEST_WHILE_IDLE).asBoolean()); + return poolConfig; + } + + public static List validate(ValidationContext validationContext) { + final List results = new ArrayList<>(); + + final String redisMode = validationContext.getPropertyValue(RedisUtils.REDIS_MODE).asString(); + final String connectionString = validationContext.getPropertyValue(RedisUtils.CONNECTION_STRING).asString(); + final Integer dbIndex = validationContext.getPropertyValue(RedisUtils.DATABASE).asInteger(); + + if (StringUtils.isBlank(connectionString)) { + results.add(new ValidationResult.Builder() + .subject(RedisUtils.CONNECTION_STRING.getDisplayName()) + .valid(false) + .explanation("Connection String cannot be blank") + .build()); + } else if (RedisUtils.REDIS_MODE_STANDALONE.getValue().equals(redisMode)) { + final String[] hostAndPort = connectionString.split("[:]"); + if (hostAndPort == null || hostAndPort.length != 2 || StringUtils.isBlank(hostAndPort[0]) || StringUtils.isBlank(hostAndPort[1]) || !isInteger(hostAndPort[1])) { + results.add(new ValidationResult.Builder() + .subject(RedisUtils.CONNECTION_STRING.getDisplayName()) + .input(connectionString) + .valid(false) + .explanation("Standalone Connection String must be in the form host:port") + .build()); + } + } else { + for (final String connection : connectionString.split("[,]")) { + final String[] hostAndPort = connection.split("[:]"); + if (hostAndPort == null || hostAndPort.length != 2 || StringUtils.isBlank(hostAndPort[0]) || StringUtils.isBlank(hostAndPort[1]) || !isInteger(hostAndPort[1])) { + results.add(new ValidationResult.Builder() + .subject(RedisUtils.CONNECTION_STRING.getDisplayName()) + .input(connection) + .valid(false) + .explanation("Connection String must be in the form host:port,host:port,host:port,etc.") + .build()); + } + } + } + + if (RedisUtils.REDIS_MODE_CLUSTER.getValue().equals(redisMode) && dbIndex > 0) { + results.add(new ValidationResult.Builder() + .subject(RedisUtils.DATABASE.getDisplayName()) + .valid(false) + .explanation("Database Index must be 0 when using clustered Redis") + .build()); + } + + if (RedisUtils.REDIS_MODE_SENTINEL.getValue().equals(redisMode)) { + final String sentinelMaster = validationContext.getPropertyValue(RedisUtils.SENTINEL_MASTER).asString(); + if (StringUtils.isEmpty(sentinelMaster)) { + results.add(new ValidationResult.Builder() + .subject(RedisUtils.SENTINEL_MASTER.getDisplayName()) + .valid(false) + .explanation("Sentinel Master must be provided when Mode is Sentinel") + .build()); + } + } + + return results; + } + + private static boolean isInteger(final String number) { + try { + Integer.parseInt(number); + return true; + } catch (Exception e) { + return false; + } + } + +} diff --git a/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/FakeRedisProcessor.java b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/FakeRedisProcessor.java new file mode 100644 index 000000000..943120ffc --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/FakeRedisProcessor.java @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.service; + + +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.processor.AbstractProcessor; +import com.hurence.logisland.processor.ProcessContext; +import com.hurence.logisland.record.Record; +import com.hurence.logisland.validator.StandardValidators; + +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +/** + * Fake processor used for testing RedisConnectionPoolService. + */ +public class FakeRedisProcessor extends AbstractProcessor { + + public static final PropertyDescriptor REDIS_SERVICE = new PropertyDescriptor.Builder() + .name("redis-service") + .displayName("Redis Service") + .identifiesControllerService(RedisKeyValueCacheService.class) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) + .build(); + + @Override + public List getSupportedPropertyDescriptors() { + return Collections.singletonList(REDIS_SERVICE); + } + + + @Override + public Collection process(ProcessContext context, Collection records) { + return null; + } +} diff --git a/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/ITRedisKeyValueCacheClientService.java b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/ITRedisKeyValueCacheClientService.java new file mode 100644 index 000000000..fcc8151d5 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/ITRedisKeyValueCacheClientService.java @@ -0,0 +1,314 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.service; + + +import com.hurence.logisland.component.InitializationException; +import com.hurence.logisland.component.PropertyDescriptor; +import com.hurence.logisland.processor.AbstractProcessor; +import com.hurence.logisland.processor.ProcessContext; +import com.hurence.logisland.record.*; +import com.hurence.logisland.redis.util.RedisUtils; +import com.hurence.logisland.serializer.DeserializationException; +import com.hurence.logisland.serializer.Deserializer; +import com.hurence.logisland.serializer.SerializationException; +import com.hurence.logisland.serializer.Serializer; +import com.hurence.logisland.util.runner.TestRunner; +import com.hurence.logisland.util.runner.TestRunners; +import com.hurence.logisland.validator.StandardValidators; +import org.apache.commons.io.IOUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import redis.embedded.RedisServer; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.InetSocketAddress; +import java.net.StandardSocketOptions; +import java.nio.channels.SocketChannel; +import java.nio.charset.StandardCharsets; +import java.util.*; + +/** + * This is an integration test that is meant to be run against a real Redis instance. + */ +public class ITRedisKeyValueCacheClientService { + + public static final String SERVICE_IDENTIFIER = "redis-map-cache-client"; + private TestRedisProcessor proc; + private TestRunner testRunner; + private RedisServer redisServer; + private RedisKeyValueCacheService redisMapCacheClientService; + private int redisPort; + + @Before + public void setup() throws IOException { + this.redisPort = getAvailablePort(); + + this.redisServer = new RedisServer(redisPort); + redisServer.start(); + + proc = new TestRedisProcessor(); + testRunner = TestRunners.newTestRunner(proc); + } + + private int getAvailablePort() throws IOException { + try (SocketChannel socket = SocketChannel.open()) { + socket.setOption(StandardSocketOptions.SO_REUSEADDR, true); + socket.bind(new InetSocketAddress("localhost", 0)); + return socket.socket().getLocalPort(); + } + } + + @After + public void teardown() throws IOException { + if (redisServer != null) { + redisServer.stop(); + } + } + + @Test + public void testStandaloneRedis() throws InitializationException, IOException, InterruptedException { + try { + // create, configure, and enable the RedisConnectionPool service + redisMapCacheClientService = new RedisKeyValueCacheService(); + redisMapCacheClientService.setIdentifier(SERVICE_IDENTIFIER); + + + testRunner.setProperty(RedisUtils.CONNECTION_STRING, "localhost:" + redisPort); + testRunner.setProperty(RedisUtils.REDIS_MODE, RedisUtils.REDIS_MODE_STANDALONE); + testRunner.setProperty(RedisUtils.DATABASE, "0"); + testRunner.setProperty(RedisUtils.COMMUNICATION_TIMEOUT, "10 seconds"); + + testRunner.setProperty(RedisUtils.POOL_MAX_TOTAL, "8"); + testRunner.setProperty(RedisUtils.POOL_MAX_IDLE, "8"); + testRunner.setProperty(RedisUtils.POOL_MIN_IDLE, "0"); + testRunner.setProperty(RedisUtils.POOL_BLOCK_WHEN_EXHAUSTED, "true"); + testRunner.setProperty(RedisUtils.POOL_MAX_WAIT_TIME, "10 seconds"); + testRunner.setProperty(RedisUtils.POOL_MIN_EVICTABLE_IDLE_TIME, "60 seconds"); + testRunner.setProperty(RedisUtils.POOL_TIME_BETWEEN_EVICTION_RUNS, "30 seconds"); + testRunner.setProperty(RedisUtils.POOL_NUM_TESTS_PER_EVICTION_RUN, "-1"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_CREATE, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_BORROW, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_RETURN, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_WHILE_IDLE, "true"); + + testRunner.setProperty(RedisKeyValueCacheService.RECORD_SERIALIZER, "com.hurence.logisland.serializer.JsonSerializer"); + testRunner.addControllerService(SERVICE_IDENTIFIER, redisMapCacheClientService); + + // uncomment this to test using a different database index than the default 0 + //testRunner.setProperty(redisConnectionPool, RedisUtils.DATABASE, "1"); + + // uncomment this to test using a password to authenticate to redis + //testRunner.setProperty(redisConnectionPool, RedisUtils.PASSWORD, "foobared"); + + testRunner.enableControllerService(redisMapCacheClientService); + + setupRedisMapCacheClientService(); + executeProcessor(); + } finally { + if (redisMapCacheClientService != null) { + redisMapCacheClientService.close(); + } + } + } + + private void setupRedisMapCacheClientService() throws InitializationException { + // create, configure, and enable the RedisDistributedMapCacheClient service + redisMapCacheClientService = new RedisKeyValueCacheService(); + redisMapCacheClientService.setIdentifier(SERVICE_IDENTIFIER); + + testRunner.addControllerService(SERVICE_IDENTIFIER, redisMapCacheClientService); + // testRunner.setProperty(redisMapCacheClientService, RedisKeyValueCacheService.REDIS_CONNECTION_POOL, "redis-connection-pool"); + testRunner.enableControllerService(redisMapCacheClientService); + testRunner.setProperty(TestRedisProcessor.REDIS_MAP_CACHE, "redis-map-cache-client"); + } + + + private Collection getRandomMetrics(int size) throws InterruptedException { + + List records = new ArrayList<>(); + Random rnd = new Random(); + long now = System.currentTimeMillis(); + + String[] metricsType = {"disk.io", "cpu.wait", "io.wait"}; + String[] hosts = {"host1", "host2", "host3"}; + for (int i = 0; i < size; i++) { + records.add(new StandardRecord(RecordDictionary.METRIC) + .setStringField(FieldDictionary.RECORD_NAME, metricsType[rnd.nextInt(3)]) + .setStringField("host", hosts[rnd.nextInt(3)]) + .setField(FieldDictionary.RECORD_TIME, FieldType.LONG, new Date().getTime()) + .setField(FieldDictionary.RECORD_VALUE, FieldType.DOUBLE, 100.0 * Math.random()) + .setTime(now) + ); + now += rnd.nextInt(500); + } + + return records; + } + + + private void executeProcessor() throws InterruptedException { + // queue a flow file to trigger the processor and executeProcessor it + testRunner.enqueue(getRandomMetrics(10)); + testRunner.run(); + testRunner.assertAllInputRecordsProcessed(); + } + + /** + * Test processor that exercises RedisDistributedMapCacheClient. + */ + private static class TestRedisProcessor extends AbstractProcessor { + + public static final PropertyDescriptor REDIS_MAP_CACHE = new PropertyDescriptor.Builder() + .name("redis-map-cache") + .displayName("Redis Map Cache") + .identifiesControllerService(RedisKeyValueCacheService.class) + .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) + .required(true) + .build(); + + + @Override + public List getSupportedPropertyDescriptors() { + return Collections.singletonList(REDIS_MAP_CACHE); + } + + + @Override + public Collection process(ProcessContext context, Collection records) { + if (records.isEmpty()) { + return records; + } + + final ByteArrayOutputStream out = new ByteArrayOutputStream(); + final Serializer stringSerializer = new StringSerializer(); + final Deserializer stringDeserializer = new StringDeserializer(); + + final RedisKeyValueCacheService cacheClient = context.getPropertyValue(REDIS_MAP_CACHE).asControllerService(RedisKeyValueCacheService.class); + + try { + final long timestamp = System.currentTimeMillis(); + final String key = "test-redis-processor-" + timestamp; + final String value = "the time is " + timestamp; + + // verify the key doesn't exists, put the key/value, then verify it exists + Assert.assertFalse(cacheClient.containsKey(key, stringSerializer)); + cacheClient.put(key, value, stringSerializer, stringSerializer); + Assert.assertTrue(cacheClient.containsKey(key, stringSerializer)); + + // verify get returns the expected value we set above + final String retrievedValue = cacheClient.get(key, stringSerializer, stringDeserializer); + Assert.assertEquals(value, retrievedValue); + + // verify remove removes the entry and contains key returns false after + Assert.assertTrue(cacheClient.remove(key, stringSerializer)); + Assert.assertFalse(cacheClient.containsKey(key, stringSerializer)); + + // verify putIfAbsent works the first time and returns false the second time + Assert.assertTrue(cacheClient.putIfAbsent(key, value, stringSerializer, stringSerializer)); + Assert.assertFalse(cacheClient.putIfAbsent(key, "some other value", stringSerializer, stringSerializer)); + Assert.assertEquals(value, cacheClient.get(key, stringSerializer, stringDeserializer)); + + // verify that getAndPutIfAbsent returns the existing value and doesn't modify it in the cache + final String getAndPutIfAbsentResult = cacheClient.getAndPutIfAbsent(key, value, stringSerializer, stringSerializer, stringDeserializer); + Assert.assertEquals(value, getAndPutIfAbsentResult); + Assert.assertEquals(value, cacheClient.get(key, stringSerializer, stringDeserializer)); + + // verify that getAndPutIfAbsent on a key that doesn't exist returns null + final String keyThatDoesntExist = key + "_DOES_NOT_EXIST"; + Assert.assertFalse(cacheClient.containsKey(keyThatDoesntExist, stringSerializer)); + final String getAndPutIfAbsentResultWhenDoesntExist = cacheClient.getAndPutIfAbsent(keyThatDoesntExist, value, stringSerializer, stringSerializer, stringDeserializer); + Assert.assertEquals(null, getAndPutIfAbsentResultWhenDoesntExist); + Assert.assertEquals(value, cacheClient.get(keyThatDoesntExist, stringSerializer, stringDeserializer)); + + + // get/set checks with serializer + for (Record record : records) { + String recordKey = record.getId(); + cacheClient.set(recordKey, record); + Assert.assertTrue(cacheClient.containsKey(recordKey, stringSerializer)); + Record storedRecord = cacheClient.get(recordKey); + Assert.assertEquals(record,storedRecord); + cacheClient.remove(recordKey, stringSerializer); + Assert.assertFalse(cacheClient.containsKey(recordKey, stringSerializer)); + } + + /* + // verify atomic fetch returns the correct entry + final AtomicCacheEntry entry = cacheClient.fetch(key, stringSerializer, stringDeserializer); + Assert.assertEquals(key, entry.getKey()); + Assert.assertEquals(value, entry.getValue()); + Assert.assertTrue(Arrays.equals(value.getBytes(StandardCharsets.UTF_8), entry.getRevision().orElse(null))); + + final AtomicCacheEntry notLatestEntry = new AtomicCacheEntry<>(entry.getKey(), entry.getValue(), "not previous".getBytes(StandardCharsets.UTF_8)); + + // verify atomic replace does not replace when previous value is not equal + Assert.assertFalse(cacheClient.replace(notLatestEntry, stringSerializer, stringSerializer)); + Assert.assertEquals(value, cacheClient.get(key, stringSerializer, stringDeserializer)); + + // verify atomic replace does replace when previous value is equal + final String replacementValue = "this value has been replaced"; + entry.setValue(replacementValue); + Assert.assertTrue(cacheClient.replace(entry, stringSerializer, stringSerializer)); + Assert.assertEquals(replacementValue, cacheClient.get(key, stringSerializer, stringDeserializer)); + + // verify atomic replace does replace no value previous existed + final String replaceKeyDoesntExist = key + "_REPLACE_DOES_NOT_EXIST"; + final AtomicCacheEntry entryDoesNotExist = new AtomicCacheEntry<>(replaceKeyDoesntExist, replacementValue, null); + Assert.assertTrue(cacheClient.replace(entryDoesNotExist, stringSerializer, stringSerializer)); + Assert.assertEquals(replacementValue, cacheClient.get(replaceKeyDoesntExist, stringSerializer, stringDeserializer)); +*/ + final int numToDelete = 2000; + for (int i = 0; i < numToDelete; i++) { + cacheClient.put(key + "-" + i, value, stringSerializer, stringSerializer); + } + + Assert.assertTrue(cacheClient.removeByPattern("test-redis-processor-*") >= numToDelete); + Assert.assertFalse(cacheClient.containsKey(key, stringSerializer)); + + + } catch (final Exception e) { + getLogger().error("Routing to failure due to: " + e.getMessage(), e); + + } + return Collections.emptyList(); + } + + } + + private static class StringSerializer implements Serializer { + @Override + public void serialize(OutputStream output, String value) throws SerializationException, IOException { + if (value != null) { + output.write(value.getBytes(StandardCharsets.UTF_8)); + } + } + } + + private static class StringDeserializer implements Deserializer { + @Override + public String deserialize(InputStream input) throws DeserializationException, IOException { + byte[] bytes = IOUtils.toByteArray(input); + return input == null ? null : new String(bytes, StandardCharsets.UTF_8); + } + } +} diff --git a/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/TestRedisConnectionPoolService.java b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/TestRedisConnectionPoolService.java new file mode 100644 index 000000000..100f74e38 --- /dev/null +++ b/logisland-services/logisland-redis_4-client-service/src/test/java/com/hurence/logisland/redis/service/TestRedisConnectionPoolService.java @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.hurence.logisland.redis.service; + + +import com.hurence.logisland.component.InitializationException; +import com.hurence.logisland.redis.util.RedisUtils; +import com.hurence.logisland.util.runner.TestRunner; +import com.hurence.logisland.util.runner.TestRunners; +import org.junit.Before; +import org.junit.Test; + +public class TestRedisConnectionPoolService { + + private TestRunner testRunner; + private FakeRedisProcessor proc; + private RedisKeyValueCacheService redisService; + + @Before + public void setup() throws InitializationException { + proc = new FakeRedisProcessor(); + testRunner = TestRunners.newTestRunner(proc); + + redisService = new RedisKeyValueCacheService(); + + redisService.setIdentifier("redis-service"); + testRunner.setProperty(RedisUtils.REDIS_MODE, RedisUtils.REDIS_MODE_STANDALONE); + testRunner.setProperty(RedisUtils.DATABASE, "0"); + testRunner.setProperty(RedisUtils.COMMUNICATION_TIMEOUT, "10 seconds"); + + testRunner.setProperty(RedisUtils.POOL_MAX_TOTAL, "8"); + testRunner.setProperty(RedisUtils.POOL_MAX_IDLE, "8"); + testRunner.setProperty(RedisUtils.POOL_MIN_IDLE, "0"); + testRunner.setProperty(RedisUtils.POOL_BLOCK_WHEN_EXHAUSTED, "true"); + testRunner.setProperty(RedisUtils.POOL_MAX_WAIT_TIME, "10 seconds"); + testRunner.setProperty(RedisUtils.POOL_MIN_EVICTABLE_IDLE_TIME, "60 seconds"); + testRunner.setProperty(RedisUtils.POOL_TIME_BETWEEN_EVICTION_RUNS, "30 seconds"); + testRunner.setProperty(RedisUtils.POOL_NUM_TESTS_PER_EVICTION_RUN, "-1"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_CREATE, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_BORROW, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_ON_RETURN, "false"); + testRunner.setProperty(RedisUtils.POOL_TEST_WHILE_IDLE, "true"); + testRunner.setProperty(RedisKeyValueCacheService.RECORD_SERIALIZER, "com.hurence.logisland.serializer.JsonSerializer"); + testRunner.addControllerService("redis-service", redisService); + } + + @Test + public void testValidateConnectionString() { + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, " "); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "${redis.connection}"); + testRunner.assertNotValid(redisService); + + /* testRunner.setProperty("redis.connection", "localhost:6379"); + testRunner.assertValid(redisService);*/ + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:a"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:6379"); + testRunner.assertValid(redisService); + + // standalone can only have one host:port pair + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:6379,localhost:6378"); + testRunner.assertNotValid(redisService); + + // cluster can have multiple host:port pairs + testRunner.setProperty(redisService, RedisUtils.REDIS_MODE, RedisUtils.REDIS_MODE_CLUSTER.getValue()); + testRunner.assertValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:6379,localhost"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "local:host:6379,localhost:6378"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:a,localhost:b"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost :6379, localhost :6378, localhost:6377"); + testRunner.assertValid(redisService); + } + + @Test + public void testValidateSentinelMasterRequiredInSentinelMode() { + testRunner.setProperty(redisService, RedisUtils.REDIS_MODE, RedisUtils.REDIS_MODE_SENTINEL.getValue()); + testRunner.setProperty(redisService, RedisUtils.CONNECTION_STRING, "localhost:6379,localhost:6378"); + testRunner.assertNotValid(redisService); + + testRunner.setProperty(redisService, RedisUtils.SENTINEL_MASTER, "mymaster"); + testRunner.assertValid(redisService); + } + +} diff --git a/logisland-services/logisland-solr-client-service/logisland-solr-client-service-api/pom.xml b/logisland-services/logisland-solr-client-service/logisland-solr-client-service-api/pom.xml index 3d2fd1ff1..ae5569b89 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr-client-service-api/pom.xml +++ b/logisland-services/logisland-solr-client-service/logisland-solr-client-service-api/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-solr-client-service - 0.12.2 + 0.13.0 logisland-solr-client-service-api diff --git a/logisland-services/logisland-solr-client-service/logisland-solr-client-service-test/pom.xml b/logisland-services/logisland-solr-client-service/logisland-solr-client-service-test/pom.xml index d4d491172..43d9c24e1 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr-client-service-test/pom.xml +++ b/logisland-services/logisland-solr-client-service/logisland-solr-client-service-test/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-solr-client-service - 0.12.2 + 0.13.0 logisland-solr-client-service-test @@ -29,7 +29,7 @@ com.hurence.logisland logisland-solr-client-service-api - 0.12.2 + 0.13.0 org.apache.solr diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_5_5_5-client-service/pom.xml b/logisland-services/logisland-solr-client-service/logisland-solr_5_5_5-client-service/pom.xml index 521d6de95..a7b73c5b7 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_5_5_5-client-service/pom.xml +++ b/logisland-services/logisland-solr-client-service/logisland-solr_5_5_5-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-solr-client-service - 0.12.2 + 0.13.0 logisland-solr_5_5_5-client-service @@ -33,7 +33,7 @@ com.hurence.logisland logisland-solr-client-service-test - 0.12.2 + 0.13.0 test diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/pom.xml b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/pom.xml index c638a4572..7afe8120b 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/pom.xml +++ b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-solr-client-service - 0.12.2 + 0.13.0 logisland-solr_6_4_2-chronix-client-service diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/ChronixUpdater.java b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/ChronixUpdater.java index f9a377411..583b730cc 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/ChronixUpdater.java +++ b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/ChronixUpdater.java @@ -25,17 +25,15 @@ import de.qaware.chronix.timeseries.MetricTimeSeries; import org.apache.solr.client.solrj.SolrClient; import org.apache.solr.client.solrj.SolrServerException; -import org.apache.solr.client.solrj.request.UpdateRequest; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; -import java.util.ArrayList; -import java.util.List; -import java.util.concurrent.ArrayBlockingQueue; +import java.util.*; import java.util.concurrent.BlockingQueue; import java.util.function.BinaryOperator; import java.util.function.Function; +import java.util.stream.Collectors; public class ChronixUpdater implements Runnable { @@ -45,6 +43,7 @@ public class ChronixUpdater implements Runnable { private final long flushInterval; private volatile int batchedUpdates = 0; private volatile long lastTS = 0; + private final Map fieldToMetricTypeMapping = new HashMap<>(); private static volatile int threadCount = 0; protected static Function groupBy = MetricTimeSeries::getName; @@ -52,10 +51,11 @@ public class ChronixUpdater implements Runnable { private Logger logger = LoggerFactory.getLogger(ChronixUpdater.class.getName() + threadCount); - MetricTimeSeriesConverter converter = null; - ChronixSolrStorage storage = null; + private MetricTimeSeriesConverter converter = null; + private ChronixSolrStorage storage = null; - public ChronixUpdater(SolrClient solr, BlockingQueue records, int batchSize, long flushInterval) { + public ChronixUpdater(SolrClient solr, BlockingQueue records, Map fieldToMetricTypeMapping, + int batchSize, long flushInterval) { this.solr = solr; this.records = records; this.batchSize = batchSize; @@ -63,118 +63,126 @@ public ChronixUpdater(SolrClient solr, BlockingQueue records, int batchS this.lastTS = System.nanoTime(); // far in the future ... converter = new MetricTimeSeriesConverter(); storage = new ChronixSolrStorage<>(batchSize, groupBy, reduce); + if (fieldToMetricTypeMapping != null) { + this.fieldToMetricTypeMapping.putAll(fieldToMetricTypeMapping); + } + //add the defaults + this.fieldToMetricTypeMapping.put(FieldDictionary.RECORD_VALUE, RecordDictionary.METRIC); threadCount++; } @Override public void run() { + List batchBuffer = new ArrayList<>(); + while (true) { // process record if one try { Record record = records.take(); if (record != null) { - - try { - MetricTimeSeries metric = convertToMetric(record); - - List timeSeries = new ArrayList<>(); - timeSeries.add(metric); - storage.add(converter, timeSeries, solr); - - - } catch (DatastoreClientServiceException ex) { - logger.error(ex.toString() + " for record " + record.toString()); - } - + batchBuffer.add(record); batchedUpdates++; } } catch (InterruptedException e) { - e.printStackTrace(); + //here we should exit the loop + logger.warn("Interrupted while waiting", e); + break; } // try { long currentTS = System.nanoTime(); - if ((currentTS - lastTS) >= flushInterval * 1000000 || batchedUpdates >= batchSize) { - - logger.debug("commiting " + batchedUpdates + " records to Chronix after " + (currentTS - lastTS) + " ns"); + if ((currentTS - lastTS) >= flushInterval * 1000000 || batchedUpdates >= batchSize) { + //use moustache operator to avoid composing strings when not needed + logger.debug("committing {} records to Chronix after {} ns", batchedUpdates, (currentTS - lastTS)); + batchBuffer.stream().collect(Collectors.groupingBy(r -> r.getField(FieldDictionary.RECORD_NAME).asString())) + .values().forEach(list -> { + storage.add(converter, convertToMetric(list.stream().sorted(Comparator.comparing(Record::getTime)).collect(Collectors.toList())), solr); + }); solr.commit(); lastTS = currentTS; + batchBuffer = new ArrayList<>(); batchedUpdates = 0; } // Thread.sleep(10); } catch (IOException | SolrServerException e) { - e.printStackTrace(); + logger.error("Unexpected I/O exception", e); } } } - MetricTimeSeries convertToMetric(Record record) throws DatastoreClientServiceException { - - try { - long recordTS = record.getTime().getTime(); - - MetricTimeSeries.Builder builder = new MetricTimeSeries.Builder( - record.getField(FieldDictionary.RECORD_NAME).asString(), RecordDictionary.METRIC) - .start(recordTS) - .end(recordTS + 10) - .attribute("id", record.getId()) - .point(recordTS, record.getField(FieldDictionary.RECORD_VALUE).asDouble()); - - - // add all other records - record.getAllFieldsSorted().forEach(field -> { - try { - // cleanup invalid es fields characters like '.' - String fieldName = field.getName() - .replaceAll("\\.", "_"); - - if (!fieldName.equals(FieldDictionary.RECORD_TIME) && - !fieldName.equals(FieldDictionary.RECORD_NAME) && - !fieldName.equals(FieldDictionary.RECORD_VALUE) && - !fieldName.equals(FieldDictionary.RECORD_ID) && - !fieldName.equals(FieldDictionary.RECORD_TYPE)) - - - switch (field.getType()) { - - case STRING: - builder.attribute(fieldName, field.asString()); - break; - case INT: - builder.attribute(fieldName, field.asInteger()); - break; - case LONG: - builder.attribute(fieldName, field.asLong()); - break; - case FLOAT: - builder.attribute(fieldName, field.asFloat()); - break; - case DOUBLE: - builder.attribute(fieldName, field.asDouble()); - break; - case BOOLEAN: - builder.attribute(fieldName, field.asBoolean()); - break; - default: - builder.attribute(fieldName, field.getRawValue()); - break; + List convertToMetric(List records) throws DatastoreClientServiceException { + + + Record first = records.get(0); + String batchUID = UUID.randomUUID().toString(); + final long firstTS = records.get(0).getTime().getTime(); + long tmp = records.get(records.size() - 1).getTime().getTime(); + final long lastTS = tmp == firstTS ? firstTS + 1 : firstTS; + + + //extract meta + String metricName = first.getField(FieldDictionary.RECORD_NAME).asString(); + Map attributes = first.getAllFieldsSorted().stream() + .filter(field -> !fieldToMetricTypeMapping.containsKey(field.getName())) + .filter(field -> !field.getName().equals(FieldDictionary.RECORD_TIME) && + !field.getName().equals(FieldDictionary.RECORD_NAME) && + !field.getName().equals(FieldDictionary.RECORD_VALUE) && + !field.getName().equals(FieldDictionary.RECORD_ID) && + !field.getName().equals(FieldDictionary.RECORD_TYPE) + ) + .collect(Collectors.toMap(field -> field.getName().replaceAll("\\.", "_"), + field -> { + try { + switch (field.getType()) { + case STRING: + return field.asString(); + case INT: + return field.asInteger(); + case LONG: + return field.asLong(); + case FLOAT: + return field.asFloat(); + case DOUBLE: + return field.asDouble(); + case BOOLEAN: + return field.asBoolean(); + default: + return field.getRawValue(); + } + } catch (Exception e) { + logger.error("Unable to process field " + field, e); + return null; + } } - - } catch (Throwable ex) { - logger.error("unable to process a field in record : {}, {}", record, ex.toString()); - } - - - }); - - return builder.build(); - } catch (Exception ex) { - throw new DatastoreClientServiceException("bad record : " + ex.toString()); - } + )); + + return fieldToMetricTypeMapping.entrySet().stream() + .map(entry -> { + MetricTimeSeries.Builder builder = new MetricTimeSeries.Builder(metricName, entry.getValue()); + List> points = records.stream() + .filter(record -> record.hasField(entry.getKey()) && record.getField(entry.getKey()).isSet()) + .map(record -> + new AbstractMap.SimpleEntry<>(record.getTime().getTime(), + record.getField(entry.getKey()).asDouble()) + ).collect(Collectors.toList()); + if (points.isEmpty()) { + return null; + } + points.stream().forEach(kv -> builder.point(kv.getKey(), kv.getValue())); + + return builder + .start(firstTS) + .end(lastTS) + .attributes(attributes) + .attribute("id", batchUID) + .build(); + } + ).filter(a -> a != null) + .collect(Collectors.toList()); } } diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/Solr_6_4_2_ChronixClientService.java b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/Solr_6_4_2_ChronixClientService.java index c6066490b..cc7d16299 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/Solr_6_4_2_ChronixClientService.java +++ b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/main/java/com/hurence/logisland/service/solr/Solr_6_4_2_ChronixClientService.java @@ -29,6 +29,7 @@ import com.hurence.logisland.service.datastore.MultiGetQueryRecord; import com.hurence.logisland.service.datastore.MultiGetResponseRecord; import com.hurence.logisland.validator.StandardValidators; +import org.apache.commons.lang3.StringUtils; import org.apache.solr.client.solrj.SolrClient; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServerException; @@ -39,12 +40,12 @@ import org.slf4j.LoggerFactory; import java.io.IOException; -import java.util.ArrayList; -import java.util.Collection; -import java.util.Collections; -import java.util.List; +import java.util.*; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.BlockingQueue; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.stream.Collectors; @Tags({"solr", "client"}) @CapabilityDescription("Implementation of ChronixClientService for Solr 6 4 2") @@ -52,11 +53,11 @@ public class Solr_6_4_2_ChronixClientService extends AbstractControllerService i private static Logger logger = LoggerFactory.getLogger(Solr_6_4_2_ChronixClientService.class); protected volatile SolrClient solr; - - List updaters = null; + private ExecutorService executorService = Executors.newSingleThreadExecutor(); + private ChronixUpdater updater; final BlockingQueue queue = new ArrayBlockingQueue<>(1000000); - PropertyDescriptor SOLR_CLOUD = new PropertyDescriptor.Builder() + public static final PropertyDescriptor SOLR_CLOUD = new PropertyDescriptor.Builder() .name("solr.cloud") .description("is slor cloud enabled") .required(true) @@ -64,7 +65,7 @@ public class Solr_6_4_2_ChronixClientService extends AbstractControllerService i .defaultValue("false") .build(); - PropertyDescriptor SOLR_CONNECTION_STRING = new PropertyDescriptor.Builder() + public static final PropertyDescriptor SOLR_CONNECTION_STRING = new PropertyDescriptor.Builder() .name("solr.connection.string") .description("zookeeper quorum host1:2181,host2:2181 for solr cloud or http address of a solr core ") .required(true) @@ -72,22 +73,15 @@ public class Solr_6_4_2_ChronixClientService extends AbstractControllerService i .defaultValue("localhost:8983/solr") .build(); - PropertyDescriptor SOLR_COLLECTION = new PropertyDescriptor.Builder() + public static final PropertyDescriptor SOLR_COLLECTION = new PropertyDescriptor.Builder() .name("solr.collection") .description("name of the collection to use") .required(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR) .build(); - PropertyDescriptor CONCURRENT_REQUESTS = new PropertyDescriptor.Builder() - .name("solr.concurrent.requests") - .description("setConcurrentRequests") - .required(false) - .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) - .defaultValue("2") - .build(); - PropertyDescriptor FLUSH_INTERVAL = new PropertyDescriptor.Builder() + public static final PropertyDescriptor FLUSH_INTERVAL = new PropertyDescriptor.Builder() .name("flush.interval") .description("flush interval in ms") .required(false) @@ -95,19 +89,28 @@ public class Solr_6_4_2_ChronixClientService extends AbstractControllerService i .defaultValue("500") .build(); + public static final PropertyDescriptor METRICS_TYPE_MAPPING = new PropertyDescriptor.Builder() + .name("metrics.type.mapping") + .description("The mapping between record field name and chronix metric type. " + + "This is a comma separated list. E.g. record_value:metric,quality:quality") + .required(false) + .addValidator(StandardValidators.COMMA_SEPARATED_LIST_VALIDATOR) + .defaultValue("") + .build(); + + + @Override public List getSupportedPropertyDescriptors() { - List props = new ArrayList<>(); props.add(BATCH_SIZE); props.add(BULK_SIZE); props.add(SOLR_CLOUD); props.add(SOLR_COLLECTION); props.add(SOLR_CONNECTION_STRING); - props.add(CONCURRENT_REQUESTS); props.add(FLUSH_INTERVAL); - + props.add(METRICS_TYPE_MAPPING); return Collections.unmodifiableList(props); } @@ -119,71 +122,68 @@ public void init(ControllerServiceInitializationContext context) throws Initiali createSolrClient(context); createChronixStorage(context); } catch (Exception e) { - throw new InitializationException(e); + throw new InitializationException("Error while instantiating ChronixClientService. " + + "Please check your configuration!", e); } } } + + private Map createMetricsTypeMapping(ControllerServiceInitializationContext context) { + return Arrays.stream(context.getPropertyValue(METRICS_TYPE_MAPPING).asString() + .split(",")) + .filter(StringUtils::isNotBlank) + .map(s -> s.split(":")) + .collect(Collectors.toMap(a -> a[0], a -> a[1])); + } + /** - * Instantiate ElasticSearch Client. This chould be called by subclasses' @OnScheduled method to create a client + * Instantiate Chronix Client. This should be called by subclasses' @OnScheduled method to create a client * if one does not yet exist. If called when scheduled, closeClient() should be called by the subclasses' @OnStopped * method so the client will be destroyed when the processor is stopped. * * @param context The context for this processor - * @throws ProcessException if an error occurs while creating an Elasticsearch client + * @throws ProcessException if an error occurs while creating an Chronix client */ protected void createSolrClient(ControllerServiceInitializationContext context) throws ProcessException { if (solr != null) { return; } - try { - // create a solr client - final boolean isCloud = context.getPropertyValue(SOLR_CLOUD).asBoolean(); - final String connectionString = context.getPropertyValue(SOLR_CONNECTION_STRING).asString(); - final String collection = context.getPropertyValue(SOLR_COLLECTION).asString(); - - - if (isCloud) { - //logInfo("creating solrCloudClient on $solrUrl for collection $collection"); - CloudSolrClient cloudSolrClient = new CloudSolrClient.Builder().withZkHost(connectionString).build(); - cloudSolrClient.setDefaultCollection(collection); - cloudSolrClient.setZkClientTimeout(30000); - cloudSolrClient.setZkConnectTimeout(30000); - solr = cloudSolrClient; - } else { - // logInfo(s"creating HttpSolrClient on $solrUrl for collection $collection") - solr = new HttpSolrClient.Builder(connectionString + "/" + collection).build(); - } + + // create a solr client + final boolean isCloud = context.getPropertyValue(SOLR_CLOUD).asBoolean(); + final String connectionString = context.getPropertyValue(SOLR_CONNECTION_STRING).asString(); + final String collection = context.getPropertyValue(SOLR_COLLECTION).asString(); - } catch (Exception ex) { - logger.error(ex.toString()); + if (isCloud) { + //logInfo("creating solrCloudClient on $solrUrl for collection $collection"); + CloudSolrClient cloudSolrClient = new CloudSolrClient.Builder().withZkHost(connectionString).build(); + cloudSolrClient.setDefaultCollection(collection); + cloudSolrClient.setZkClientTimeout(30000); + cloudSolrClient.setZkConnectTimeout(30000); + solr = cloudSolrClient; + } else { + // logInfo(s"creating HttpSolrClient on $solrUrl for collection $collection") + solr = new HttpSolrClient.Builder(connectionString + "/" + collection).build(); } + + } protected void createChronixStorage(ControllerServiceInitializationContext context) throws ProcessException { - if (updaters != null) { + if (updater != null) { return; } - try { - // setup a thread pool of solr updaters - int batchSize = context.getPropertyValue(BATCH_SIZE).asInteger(); - int numConcurrentRequests = context.getPropertyValue(CONCURRENT_REQUESTS).asInteger(); - long flushInterval = context.getPropertyValue(FLUSH_INTERVAL).asLong(); - updaters = new ArrayList<>(numConcurrentRequests); - for (int i = 0; i < numConcurrentRequests; i++) { - ChronixUpdater updater = new ChronixUpdater(solr, queue, batchSize, flushInterval); - new Thread(updater).start(); - updaters.add(updater); - } + // setup a thread pool of solr updaters + int batchSize = context.getPropertyValue(BATCH_SIZE).asInteger(); + long flushInterval = context.getPropertyValue(FLUSH_INTERVAL).asLong(); + updater = new ChronixUpdater(solr, queue, createMetricsTypeMapping(context), batchSize, flushInterval); + executorService.execute(updater); - - } catch (Exception ex) { - logger.error(ex.toString()); - } } @Override diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/test/java/com/hurence/logisland/service/solr/ChronixClientServiceTest.java b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/test/java/com/hurence/logisland/service/solr/ChronixClientServiceTest.java index a254a76b6..966819c2b 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/test/java/com/hurence/logisland/service/solr/ChronixClientServiceTest.java +++ b/logisland-services/logisland-solr-client-service/logisland-solr_6_4_2-chronix-client-service/src/test/java/com/hurence/logisland/service/solr/ChronixClientServiceTest.java @@ -60,31 +60,6 @@ protected void createSolrClient(ControllerServiceInitializationContext context) solr = solrRule.getClient(); } -// @Override -// protected void createChronixStorage(ControllerServiceInitializationContext context) throws ProcessException { -// if (storage != null) { -// return; -// } -// try { -// -// converter = new MetricTimeSeriesConverter(); -// storage = new ChronixSolrStorage<>(20, groupBy, reduce); -// -// -// } catch (Exception ex) { -// logger.error(ex.toString()); -// } -// } - - - @Override - public List getSupportedPropertyDescriptors() { - - List props = new ArrayList<>(); - - return Collections.unmodifiableList(props); - } - } private DatastoreClientService configureClientService(final TestRunner runner) throws InitializationException { @@ -92,9 +67,16 @@ private DatastoreClientService configureClientService(final TestRunner runner) t runner.setProperty(TestProcessor.SOLR_CLIENT_SERVICE, "service"); + runner.setProperty("solr.collection", "chronix"); + + //shouldn't be automatic?? + service.getSupportedPropertyDescriptors().stream() + .filter(p->p.getDefaultValue() != null) + .forEach(p->runner.setProperty(p, p.getDefaultValue())); + runner.addControllerService("service", service); runner.enableControllerService(service); - runner.assertValid(service); + //runner.assertValid(service); return service; } @@ -104,6 +86,7 @@ private Collection getRandomMetrics(int size) throws InterruptedExceptio List records = new ArrayList<>(); Random rnd = new Random(); + long now = System.currentTimeMillis(); String[] metricsType = {"disk.io", "cpu.wait", "io.wait"}; String[] hosts = {"host1", "host2", "host3"}; @@ -113,8 +96,9 @@ private Collection getRandomMetrics(int size) throws InterruptedExceptio .setStringField("host", hosts[rnd.nextInt(3)]) .setField(FieldDictionary.RECORD_TIME, FieldType.LONG, new Date().getTime()) .setField(FieldDictionary.RECORD_VALUE, FieldType.FLOAT, 100.0 * Math.random()) + .setTime(now) ); - Thread.sleep(rnd.nextInt(500)); + now+=rnd.nextInt(500); } return records; @@ -122,7 +106,7 @@ private Collection getRandomMetrics(int size) throws InterruptedExceptio @Test - public void testConvertion() throws InterruptedException { + public void testConversion() throws InterruptedException { final Date now = new Date(); final Record record = new StandardRecord(RecordDictionary.METRIC) @@ -132,8 +116,8 @@ public void testConvertion() throws InterruptedException { final BlockingQueue queue = new ArrayBlockingQueue<>(1000000); - final ChronixUpdater service = new ChronixUpdater(solrRule.getClient(), queue, 10, 1000); - MetricTimeSeries metric = service.convertToMetric(record); + final ChronixUpdater service = new ChronixUpdater(solrRule.getClient(), queue, Collections.emptyMap(), 10, 1000); + MetricTimeSeries metric = service.convertToMetric(Collections.singletonList(record)).get(0); assertTrue(metric.getName().equals("cpu.wait")); assertTrue(metric.getType().equals("metric")); diff --git a/logisland-services/logisland-solr-client-service/logisland-solr_6_6_2-client-service/pom.xml b/logisland-services/logisland-solr-client-service/logisland-solr_6_6_2-client-service/pom.xml index 2828dfaad..4c5adbcf0 100644 --- a/logisland-services/logisland-solr-client-service/logisland-solr_6_6_2-client-service/pom.xml +++ b/logisland-services/logisland-solr-client-service/logisland-solr_6_6_2-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-solr-client-service - 0.12.2 + 0.13.0 @@ -30,12 +30,12 @@ com.hurence.logisland logisland-solr-client-service-api - 0.12.2 + 0.13.0 com.hurence.logisland logisland-solr-client-service-test - 0.12.2 + 0.13.0 test diff --git a/logisland-services/logisland-solr-client-service/pom.xml b/logisland-services/logisland-solr-client-service/pom.xml index 739aaf241..5d880e12a 100644 --- a/logisland-services/logisland-solr-client-service/pom.xml +++ b/logisland-services/logisland-solr-client-service/pom.xml @@ -7,7 +7,7 @@ com.hurence.logisland logisland-services - 0.12.2 + 0.13.0 logisland-solr-client-service diff --git a/logisland-services/pom.xml b/logisland-services/pom.xml index 0a910c17b..4700be5ee 100644 --- a/logisland-services/pom.xml +++ b/logisland-services/pom.xml @@ -6,7 +6,7 @@ com.hurence.logisland logisland - 0.12.2 + 0.13.0 pom @@ -22,5 +22,6 @@ logisland-cache_key_value-service-api logisland-ip-to-geo-service-maxmind logisland-ip-to-geo-service-api + logisland-redis_4-client-service diff --git a/pom.xml b/pom.xml index 930f1e44e..03de3b3b3 100644 --- a/pom.xml +++ b/pom.xml @@ -22,7 +22,7 @@ 4.0.0 com.hurence.logisland logisland - 0.12.2 + 0.13.0 pom LogIsland is an event mining platform based on Kafka to handle a huge amount of data in realtime. @@ -123,7 +123,7 @@ central - Maven Repository https://repo1.maven.org/maven2 @@ -175,6 +175,18 @@ Confluent http://packages.confluent.io/maven/ + + + jitpack.io + https://jitpack.io + + + openscada + http://neutronium.openscada.org/maven/ + + true + + @@ -763,6 +775,16 @@ logisland-solr_6_4_2-chronix-client-service ${project.version} + + com.hurence.logisland + logisland-excel-plugin + ${project.version} + + + com.hurence.logisland + logisland-redis_4-client-service + ${project.version} + com.hurence.logisland logisland-documentation @@ -1246,6 +1268,29 @@ 2.6.6 2.5 + + logisland-connect + + + + + + com.hurence.logisland + logisland-connect-spark + ${project.version} + + + com.hurence.logisland + logisland-connector-opcda + ${project.version} + + + com.hurence.logisland + logisland-connectors-bundle + ${project.version} + + + @@ -1366,11 +1411,11 @@ - contrib-check