MIPr provides the ability to process images in Hadoop.
MIPr includes:
- Writable Wrappers for images
- InputFormat and OutputFormat for images
- Several Jobs for image processing
- OpenCV and OpenIMAJ support
- Java 7 (preferably Oracle)
- Maven 3.2.5
-
Clone repository with MIPr sources
git clone https://github.com/sozykin/mipr.git
-
Build package by using Apache Maven
To build full package with OpenIMAJ and OpenCV support run
mvn package
Notice that size of the package will be greater than separate build
To build separate packages run
mvn package -pl [desired_package] -am
Where desired_package is one of the followings:
- core_package
- includes_OpenCV (includes core with OpenCV support)
- includes_OpenIMAJ (includes core with OpenIMAJ support)
-
It will build jar file ...-jar-with-dependencies.jar and place it in the target folder.
-
Copy image files to HDFS:
$ hadoop fs -copyFromLocal local_image_folder hdfs_image_folder
-
Run test MIPr Job which converts color images to grayscale:
$ hadoop jar mipr-core-0.1-jar-with-dependencies.jar experiments.Img2Gray hdfs_image_folder hdfs_output_folder
-
Copy processed images back from HDFS to the local filesystem:
$ hadoop fs -copyToLocal hdfs_output_folder local_output_folder
-
Check that images were converted correctly.
To process images by your own way you need to create one class. For example, lets create job, which processes color images to grayscale by using OpenCV. For now, MIPr already has this class which placed in includes_OpenCV\src\main\java\experiments\Img2Gray_opencv.
-
Create public class inherited from Configured superclass and Tool interface.
public class Img2Gray_opencv extends Configured implements Tool{ public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Img2Gray(), args); System.exit(res); }
-
Create run method inside your class. Fill it regarding library you will use.
public int run(String[] args) throws Exception { String input = args[0]; String output = args[1]; Job job = MiprMain.getOpenCVJobTemplate(); job.setJarByClass(Img2Gray_opencv.class); job.setMapperClass(Img2Gray_opencvMapper.class); job.setInputFormatClass(MatImageInputFormat.class); job.setOutputFormatClass(MatImageOutputFormat.class); Path outputPath = new Path(output); FileInputFormat.setInputPaths(job, input); FileOutputFormat.setOutputPath(job, outputPath); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(MatImageWritable.class); return job.waitForCompletion(true) ? 0 : 1; }
Most important configurations are:
-
job.setInputFormatClass([InputFormat].class)
Where [InputFormat] one of the following:
-
Java 2D
BufferedImageInputFormat
-
OpenIMAJ
MBFImageInputFormat
-
OpenCV
MatImageInputFormat
CombineMatImageInputFormat
-
-
job.setOutputFormatClass([OutputFormat].class)
Where [OutputFormat] is similar to [InputFormat]
-
job.setMapperClass([MapperClass].class)
Where [MapperClass] is your implemented Mapper class which contains map-method.
-
job.setOutputKeyClass(NullWritable.class)
In most cases of image processing Key class doesn't necessary. You can leave it by using special NullWritable hadoop-class which contains nothing.
-
job.setOutputValueClass([Value].class)
[Value] depends on which library you are going to use.
-
Java 2D
BufferedImageWritable
-
OpenIMAJ
MBFImageWritable
-
OpenCV
MatImageWritable
-
-
-
Create Mapper class. Your class should extend OpenCVMapper superclass to make available usage of OpenCV library in parallel mode. Method map contains image processing algorithm.
public static class Img2Gray_opencvMapper extends OpenCVMapper<NullWritable, MatImageWritable, NullWritable, MatImageWritable>{ protected void map(NullWritable key, MatImageWritable value, Context context) throws IOException, InterruptedException { Mat image = value.getImage(); Mat result = new Mat(image.height(), image.width(), CvType.CV_8UC3); if (image.type() == CvType.CV_8UC3) { Imgproc.cvtColor(image, result, Imgproc.COLOR_RGB2GRAY); } else result = image; context.write(NullWritable.get(), new MatImageWritable(result, value.getFileName(), value.getFormat())); } }
-
Return to running section and build package including your own hadoop-job.