forked from richardxia/BLB
-
Notifications
You must be signed in to change notification settings - Fork 1
davidhoward/BLB
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Thank you for using the UC Berkeley BLB Specializer. For more information on this specializer and the ASP project, please visit http://www.sejits.org. This document describes the basic steps necessary to install and use the UC Berkeley BLB Specializer. For information on using this specializer, please see https://github.com/davidhoward/BLB/wiki/Home . 1. Prerequisites 2. Installation and Setup 3. Verifying your installation 4. Running your first specialized program 1. PREREQUISITES The UC Berkeley BLB Specializer relies upon several other software packages in order to function. Please verify that these are installed and working correctly on your machine before attempting to install this specializer. 0) A Unix based operating system See sejits.org for information about using ASP on a non unix-y platform. i) Python 2.6.x ASP SEJITS is designed to work with the 2.6 series of python releases. This specializer and the libraries it relies upon may not function with other versions of python. See http://www.python.org for information and to download python. ii) The ASP Framework This package contains functions common to all ASP specializers, and is needed by this specializer to function. See sejits.org for information on installing the ASP framework. iii) Gnu Scientific Library The Gnu Scientific Library may or may not be used for certain specialized operations, but is necessary for the specializer to function properly. This specializer was tested against GSL 1.15. For information and to get GSL, please visit http://www.gnu.org/s/gsl. iv) Spark Computing Cluster (Optional for Distributed Version of Specializer) See spark-project.org for information on this open source cluster computing system (developed in UC Berkeley's AMP Research Laboratory) and how to install. 2. INSTALLATION AND SETUP Preparing the UC Berkeley BLB Specializer for use involves three steps: Acquiring the source files, altering the appropriate environment variables, and writing the configuration files. For the distributed version of the specializer, one must get up and running a Spark computing cluster. See the final section here for additional details on this and what to do once the Spark cluster is up. ACQUIRING THE SOURCE The most recent relase build of this specializer is available at https://github.com/shoaibkamil/asp/tree/master/specializers/BLB This is a git repository, and may be cloned as such. SETTING ENVIRONMENT VARIABLES For easy use of this specializer, please ensure that the directory containing the compiled GSL binary is on your LD_LIBRARY_PATH, and that the root directory of your specializer installation (the one containing this document) is on your PYTHONPATH. WRITING THE CONFIGURATION FILE The UC Berkeley BLB Specializer requires some information about your system to function properly. You should create a file in your root BLB directory named blb_setup.py with the following contents: - gslroot = "/path/to/gsl/header_files" The specializer needs to know where to find the gsl header files. - cache_dir = "/path/to/cgen/cache" The specializer caches modules it has already compiled in this directory, so that it does not have to re-compile for cases it has already seen. An example of a properly formatted blb_setup.py is located in blb_setup_EXAMPLE.txt. DISTRIBUTED VERSION SETUP To setup the distributed version, first get up and running a Spark cluster. Inside the master node on the cluster, run the setup script located here: https://github.com/davidhoward/BLB/blob/master/blb_spark_setup.sh Then, it is advisable to adjust the level of parallelism for your cluster. To do this, open the file ~/sejits/asp/specializers/blb/blb_core_parallel.scala and go to line 88. There, one can adjust the level of parallelism that is ideal for the their cluster (likely the number of cores in the cluster). 3. VERIFYING YOUR INSTALLATION To verify a proper installation of the distributed version, open the BLB folder (~/BLB) and run ./run_dist_tests.sh If this works, congrats! You have just run the BLB (bag of little bootstraps) algorithm on your cluster. The default behavior is to compute a classifier's estimated accuracy on a subset of the Enron email corpus. 4. RUNNING YOUR FIRST SPECIALIZED PROGRAM For the distributed version, to have gotten here, you must have already actually ran the test BLB SEJITS program when verifying your installation. To customize this test and perform alternative calculations, open the file on your cluster ~/BLB/tests/spark_test.py and edit the three input functions to the BLB algorithm (compute_estimate, reduce_bootstraps, and average). An easy change would be to calculate the standard deviation instead of the mean in the reduce_bootstraps function. There is already a commented out standard deviation function written. Past this customization, one can create their own distributed specializer instance as done in the bottom of spark_test.py file. For more information, see the wiki for this repo, specifically the section page titled "Using the Specializer."
About
An ASP specializer for statistical bootstrapping
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 89.3%
- Scala 7.2%
- Shell 3.5%