an implementation of using hadoop to process big data and visualize it in a web ui. our first planned visualization is to use Nasdaq ITCH data transformed in VWAP. See 'Contribute to VWAP Visualization' below.
Demo: /web
Prior to running deploy steps, please complete 'Prerequisites'
cd api
export AWS_PROFILE=big-data-trends-visualizer
mvn package && serverless deploy
- Install nvm, see https://github.com/nvm-sh/nvm
- Install node, run
nvm use v12
- Install serverless, run
npm install -g serverless
- Install serverless plugins
npm install -g serverless-plugin-scripts
- Install Maven, see http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html
- Install AWS CLI, see https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html
eg. on Mac I ran
brew install awscli
- Run
aws configure
, and enter your AWS credentials. Theaws_access_key_id
andaws_secret_access_key
are secret, reach out to get access.
Edit
~/.aws/credentials
and ensure the the profile is namedbig-data-trends-visualizer
[big-data-trends-visualizer]
aws_access_key_id = ********************
aws_secret_access_key = ****************************************
The web front is composed of 2 views web/home/index.html
and web/visualizer/index.html
cd web
npm install
npm start
- Open URL
http://127.0.0.1:1643/home/
in web browser
cd web
npm install
npx bower install <package-name>
, see https://bower.io/ for more- IMPORTANT: We are using github pages to host the webpage, please check in all
bower_components
files
Our planned prototype will target a VWAP Visualization. The idea is to create a visualization for the Nasdaq ITCH data and generate a Volume Weighted Average Price visualization.
- Add front end visualization to
web/visualizer/index.html
Use D3, see https://d3js.org/, or other library
- Split NASDAQ data
Determine the schema of the NASDAQ dataset, see S3 bucket, see https://console.aws.amazon.com/s3/buckets/nasdaq-itch/?region=us-east-1
- Write Hadoop Map Reduce Algorithm
Transform from ITCH to VWAP on a cluster of Hadoop machines
- Connect the input and output of Hadoop result to the API
Integrate our on-deman API with the Hadoop cluster. This requires us to deliver the input and output to the Hadoop cluster, store it somewhere to be retrieved, and wire it up to an API that returns the result to the customer