Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My adding interface and implemention #12

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 50 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,50 @@
# JavaReedSolomon

This is a simple and efficient Reed-Solomon implementation in Java,
which was originally built at [Backblaze](https://www.backblaze.com).
There is an overview of how the algorithm works in my [blog
post](https://www.backblaze.com/blog/reed-solomon/).

The ReedSolomon class does the encoding and decoding, and is supported
by Matrix, which does matrix arithmetic, and Galois, which is a finite
field over 8-bit values.

For examples of how to use ReedSolomon, take a look at SampleEncoder
and SampleDecoder. They show, in a very simple way, how to break a
file into shards and encode parity, and then how to take a subset of
the shards and reconstruct the original file.

There is a Gradle build file to make a jar and run the tests. Running
it is simple. Just type: `gradle build`

We would like to send out a special thanks to James Plank at the
University of Tennessee at Knoxville for his useful papers on erasure
coding. If you'd like an intro into how it all works, take a look at
[this introductory paper](http://web.eecs.utk.edu/~plank/plank/papers/SPE-9-97.html).

This project is limited to a pure Java implementation. If you need
more speed, and can handle some assembly-language programming,
you may be interested in using the Intel SIMD instructions to speed
up the Galois field multiplication. You can read more about that
in the paper on [Screaming Fast Galois Field Arithmetic](http://www.kaymgee.com/Kevin_Greenan/Publications_files/plank-fast2013.pdf).

## Performance Notes

The performance of the inner loop depends on the specific processor
you're running on. There are twelve different permutations of the
loop in this library, and the ReedSolomonBenchmark class will tell
you which one is faster for your particular application. The number
of parity and data shards in the benchmark, as well as the buffer
sizes, match the usage at Backblaze. You can set the parameters of
the benchmark to match your specific use before choosing a loop
implementation.

These are the speeds I got running the benchmark on a Backblaze
storage pod:

```
ByteInputOutputExpCodingLoop 95.2 MB/s
ByteInputOutputTableCodingLoop 107.0 MB/s
ByteOutputInputExpCodingLoop 130.3 MB/s
ByteOutputInputTableCodingLoop 181.4 MB/s
InputByteOutputExpCodingLoop 94.4 MB/s
InputByteOutputTableCodingLoop 138.3 MB/s
InputOutputByteExpCodingLoop 200.4 MB/s
InputOutputByteTableCodingLoop 525.7 MB/s
OutputByteInputExpCodingLoop 143.7 MB/s
OutputByteInputTableCodingLoop 209.5 MB/s
OutputInputByteExpCodingLoop 217.6 MB/s
OutputInputByteTableCodingLoop 515.7 MB/s
```

![Bar Chart of Benchmark Results](notes/benchmark_on_storage_pod.png)
## My adding interface and implemention
First, I read and learn the code of this excellent project. Then, on the basis of this great work, I added the interface and implementation of byte array data, so that we can implement the byte array erasure algorithm. You can apply it to the erasure processing of network data transmission, the efficiency and ability of the algorithm is great.

首先,我阅读并学习这个优秀项目的代码,然后,我在这个很棒的工作的基础上添加了处理字节数组数据的接口和实现,以便我们可以实现字节数组纠删算法的处理。你可以将其应用到网络数据传输的纠删处理上,算法的效率和能力很棒。

## example
package com.backblaze.erasure.robinliew.dealbytesinterface;

/**
*
* @author RobinLiew 2017.9.21
*
*/
public class test {
public static void main(String[] args) {

IRSErasureCorrection rsProcessor=new RSErasureCorrectionImpl();

byte[] data=new byte[1000];
for(int i=0; i<data.length; i++) {
data[i] = 1;
}
for(int i=0; i<500; i++) {
data[i] = (byte) (16 + i);
}


int sliceCount=4;//The data is 4 copies(数据为4份)
int fecSliceCount=2;//2 copies of erasure redundancy(纠删冗余为2份)
int sliceLength=data.length/sliceCount;
byte[] en_data;
en_data=rsProcessor.rs_Encoder(data, sliceLength, sliceCount, fecSliceCount);

//==================Test use: second pieces of data are lost, and the decoding code has the corresponding test code(测试使用:让第二片数据丢失,解码代码中也有对应的测试代码)=====
byte[] temp = new byte[250];
System.arraycopy(temp, 0, en_data, 250, 250);
//==========================================================================================================

boolean[] eraserFlag=new boolean[sliceCount+fecSliceCount];
for(int i=0;i<eraserFlag.length;i++){
eraserFlag[i]=true;
}
eraserFlag[1]=false;

int result=rsProcessor.rs_Decoder(en_data, sliceLength, sliceCount, fecSliceCount=2, eraserFlag);
System.out.println("complete test!");//测试完毕!
}

}

Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package com.backblaze.erasure.robinliew.dealbytesinterface;
/**
* Codec interface of RS erasure checking algorithm(RS纠删校验算法编解码器接口)
* @author RobinLiew RobinLiew 2017.9.21
*
*/
public interface IRSErasureCorrection {
/**
* 编码
* @param srcBuffer Original data that needs to be erasure(需要进行纠删编码的原始数据)
* @param sliceLength The length of the file in the file block (the length of the file is consistent)(文件块中文件片长度(文件片的长度保持一致))
* @param sliceCount The number of files in a file block(文件块中文件片的数量)
* @param fecSliceCount The number of pieces of erasure check in a file block(文件块内纠删校验的片的数量)
* @return The return value is the check data(返回值是校验数据)
*/
public byte[] rs_Encoder(byte[] srcBuffer,int sliceLength,int sliceCount,int fecSliceCount);
/**
* 解码
* @param srcEraseBuff Received file blocks (including raw data and erasure check data)(接收到的文件块(包括原始数据和纠删校验数据))
* @param sliceLen The length of the file in a file block(文件块中文件片的长度),
* @param sliceCount The number of files in a file block(文件块中文件片的数量)
* @param rsSliceCount The number of RS erasure check pieces in a file block(文件块内rs纠删校验片的数量)
* @param eraserFlag Erase the image, the array length is sliceCount+rsSliceCount, the true element represents the file pieces without being erased,
* false indicates that the file was wipe out(擦除样图,数组长度为sliceCount+rsSliceCount,其中元素true表示文件片未被擦除,false表示文件片被擦除)
* @return If the return value is 0 on behalf of success,
* that piece of data by right or wipe out the number of pieces in the allowable range of file transmission in the process,
* at the same time to write the original data deleted after srcEraseBuff correction;
* if non zero represents no success, which shows that the number of wipe out more than RS erasure ability
* (返回值如果是0代表成功,说明在传输过程中文件块数据正确或经擦出的片数在容许范围内,同时把纠删后的原始数据写入srcEraseBuff;
* 如果非零代表不成功,说明经擦出的片数超过了RS的纠删能力)
*/
public int rs_Decoder(byte[] srcEraseBuff,int sliceLen,int sliceCount,int rsSliceCount,boolean[] eraserFlag);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
package com.backblaze.erasure.robinliew.dealbytesinterface;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;

import org.junit.Test;

import com.backblaze.erasure.ReedSolomon;

/**
* Implementation of RS algorithm codec interface
* Encoding: incoming byte[] data data containing N slice data, encoding the array of N+M slice data after encoding, and M as the number of erasure check pieces
* Decode: the data byte[] rs_data after the afferent code, and the information of the data sheet, the erasure check, the recorded lost data sheet
* RS算法编解码接口实现
* 编码:传入包含N片数据的byte[] data数据,编码后生成N+M片数据数组,M为纠删校验片的数量
* 解码:传入编码后的数据byte[] rs_data,以及数据片、纠删校验片、记录的丢失数据片的信息
* @author RobinLiew 2017.9.21
*
*/
public class RSErasureCorrectionImpl implements IRSErasureCorrection{

public int DATA_SHARDS = 4;//Default number of data slices(默认的数据片数量)
public int PARITY_SHARDS = 2;//Default number of checkout data(默认的校验片数据数量)
public int TOTAL_SHARDS = 6;//The total number of the default slices(默认的切片的总数量)

public int BYTES_IN_INT = 4;

@Override
public byte[] rs_Encoder(byte[] srcBuffer, int sliceLength, int sliceCount,
int fecSliceCount) {

byte[] rsData=null;

try{
//The length of the data of the payload (equivalent to the length of the file)净荷的数据长度(相当于文件的长度)
final int dataSize = (int) srcBuffer.length;
DATA_SHARDS=sliceCount;
PARITY_SHARDS=fecSliceCount;
TOTAL_SHARDS=DATA_SHARDS+PARITY_SHARDS;


// Figure out how big each shard will be. The total size stored
final int storedSize = dataSize; //The total size of the incoming data(传入数据的总大小)
final int shardSize = (storedSize) / DATA_SHARDS;//The size of each piece of data(每片数据的大小)

// Create a buffer holding the srcBuffer size, followed by
final int bufferSize = shardSize * DATA_SHARDS;
final byte [] allBytes = new byte[bufferSize];
ByteBuffer.wrap(allBytes).putInt(dataSize);
InputStream in = new ByteArrayInputStream(srcBuffer);
int bytesRead = in.read(allBytes, 0, dataSize);
if (bytesRead != dataSize) {
throw new IOException("not enough bytes read");
}
in.close();

// Make the buffers to hold the shards.
byte [] [] shards = new byte [TOTAL_SHARDS] [shardSize];

// Fill in the data shards
for (int i = 0; i < DATA_SHARDS; i++) {
System.arraycopy(allBytes, i * shardSize, shards[i], 0, shardSize);
}

// Use Reed-Solomon to calculate the parity.
ReedSolomon reedSolomon = ReedSolomon.create(DATA_SHARDS, PARITY_SHARDS);
reedSolomon.encodeParity(shards, 0, shardSize);

List<Byte> list=new ArrayList<>();

rsData=new byte[TOTAL_SHARDS*shardSize];
int index=0;
for(int i = 0; i < TOTAL_SHARDS; i++){
for(int j=0;j<shards[i].length;j++){
rsData[index]=shards[i][j];
index++;
}
}

}catch(Exception e){
e.printStackTrace();
}

return rsData;

}

@Override
public int rs_Decoder(byte[] srcEraseBuff, int sliceLen, int sliceCount,
int rsSliceCount, boolean[] eraserFlag) {//eraserFlag used to record information of lost pieces(用来记录丢失片的信息)

try{

DATA_SHARDS=sliceCount;
PARITY_SHARDS=rsSliceCount;
TOTAL_SHARDS=DATA_SHARDS+PARITY_SHARDS;



final byte [] [] shards = new byte [TOTAL_SHARDS] [];
boolean [] shardPresent = new boolean [TOTAL_SHARDS];//Information for recording the existence and loss of subsections(用来记录子片存在与丢失的信息)

shardPresent=eraserFlag;

int shardSize =sliceLen;
int shardCount = 0;//The number of subsections that exist(记录存在的子片的数量)
int offset=0;

for(int i = 0; i < TOTAL_SHARDS; i++){

shards[i] = new byte [shardSize];
System.arraycopy(srcEraseBuff, offset, shards[i], 0, sliceLen);
if(shardPresent[i]==false){
shardCount--;
}
shardCount += 1;
offset=offset+sliceLen;
}

// We need at least DATA_SHARDS to be able to reconstruct the file.
if (shardCount < DATA_SHARDS) {
System.out.println("The number of lost data is too much, beyond the erasure ability of the RS erasure algorithm!");//丢失的数据数量过多,超出RS纠删算法的纠删能力!
return 1;
}

// Make empty buffers for the missing shards.
for (int i = 0; i < TOTAL_SHARDS; i++) {
if (!shardPresent[i]) {//A piece of data is lost and an empty piece of data is set up to take up the position(某一片数据丢了,建立空的数据片来占位)
shards[i] = new byte [shardSize];
}
}

// Use Reed-Solomon to fill in the missing shards
ReedSolomon reedSolomon = ReedSolomon.create(DATA_SHARDS, PARITY_SHARDS);
reedSolomon.decodeMissing(shards, shardPresent, 0, shardSize);

// Combine the data shards into one buffer for convenience.
// (This is not efficient, but it is convenient.)
byte [] allBytes = new byte [shardSize * DATA_SHARDS];
for (int i = 0; i < DATA_SHARDS; i++) {
System.arraycopy(shards[i], 0, allBytes, shardSize * i, shardSize);
}
System.arraycopy(allBytes, 0, srcEraseBuff, 0, allBytes.length);
}catch(Exception e){
e.printStackTrace();
}

return 0;//Return 0 to represent erasure success(返回0表示纠删成功)
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package com.backblaze.erasure.robinliew.dealbytesinterface;

/**
*
* @author RobinLiew 2017.9.21
*
*/
public class test {
public static void main(String[] args) {

IRSErasureCorrection rsProcessor=new RSErasureCorrectionImpl();

byte[] data=new byte[1000];
for(int i=0; i<data.length; i++) {
data[i] = 1;
}
for(int i=0; i<500; i++) {
data[i] = (byte) (16 + i);
}


int sliceCount=4;//The data is 4 copies(数据为4份)
int fecSliceCount=2;//2 copies of erasure redundancy(纠删冗余为2份)
int sliceLength=data.length/sliceCount;
byte[] en_data;
en_data=rsProcessor.rs_Encoder(data, sliceLength, sliceCount, fecSliceCount);

//==================Test use: second pieces of data are lost, and the decoding code has the corresponding test code(测试使用:让第二片数据丢失,解码代码中也有对应的测试代码)===================
byte[] temp = new byte[250];
System.arraycopy(temp, 0, en_data, 250, 250);
//============================================================================================================================================================================

boolean[] eraserFlag=new boolean[sliceCount+fecSliceCount];
for(int i=0;i<eraserFlag.length;i++){
eraserFlag[i]=true;
}
eraserFlag[1]=false;

int result=rsProcessor.rs_Decoder(en_data, sliceLength, sliceCount, fecSliceCount=2, eraserFlag);
System.out.println("complete test!");//测试完毕!
}

}