Skip to content

minhthong582000/my-data-stack

Repository files navigation

Data Stack

Data Stack built using Docker Compose

Getting Started

Prerequisites

  • Docker CE
  • Docker Compose
  • MySQL client

Installation

Step 1

Clone the repository.

Step 2

Edit the host file on your machine:

sudo vim /etc/hosts

docker ps

Step 3

Run:

docker-compose up -d

Wait a moment for all services to start.

Step 4

Run the following command to send a sample data file to Hadoop:

sh scripts/send-file.sh

Step 5

Run the following command to create tables in the database:

sh scripts/create-table.sh

Spark code - Reading and Writing Data from HDFS

Access Jupyter in your browser at: http://localhost:8888

Run: spark.ipynb

Demo

  1. Docker ps

docker ps

  1. Send data file to Hadoop:

hadoop

  1. Create tables in the database:

hadoop

  1. Check the file on Hadoop:

hadoop

  1. Read and write data to the database:

hadoop

  1. Verify in the database:

hadoop