Ansible collection to deploy the components of TDP
hadoop
: deploys the Hadoop TDP Release (HDFS + YARN + MapReduce)hbase
: deploys the HBase TDP Release (HBase Master + HBase RegionServer), Phoenix and Phoenix Query Serverhive
: deploys the Hive TDP Release (Hiveserver2 + Tez)knox
: deploys the Knox TDP Release (Knox Gateway)ranger
: deploys the Ranger TDP Release (Ranger Admin + Ranger plugins)spark
: deploys the Spark TDP Release (Spark Client + Spark History Server)zookeeper
: deploys the Apache ZooKeeper Release
The best to get started with TDP and the Ansible roles is to go through the Getting Started repository.
Ansible 2.9 does not handle installing a collection from a Git repository with ansible-galaxy
. Instead, clone the repository in the correct folder.
For example, set the property collections_paths
in your ansible.cfg
:
[defaults]
collections_paths=collections
Then create the folders structures and clone:
mkdir -p collections/ansible_collections/tosit
git clone https://github.com/TOSIT-FR/ansible-tdp-roles collections/ansible_collections/tosit/tdp
The project structure should look like this:
.
├── ansible.cfg
├── collections
│ └── ansible_collections
│ └── tosit
│ └── tdp
│ ├── galaxy.yml
│ ├── README.md
│ └── roles
│ ├── hadoop
│ ├── hive
│ ├── ranger
│ ├── spark
│ ├── ...
│ └── zookeeper
├── roles
├── test.yml
Note that the first role
folder is not the roles from this collection, but any other roles the project has. The collections
folder has been set in ansible.cfg
.
The collection is compatible with Mitogen 0.2.
In order to activate Mitogen, follow the Mitogen installation guide.
Note: We use custom plugins which are incompatible with Mitogen. For this reason, we added strategy: linear
in some of our playbooks (e.g.: hbase_hdfs_init.yml) to avoid any issues with Mitogen configured Ansible environments.
Using ansible-galaxy: TBD
hdfs_file
module: file and directory handling in HDFS
Example usage:
- name: Add directory for spark logs
delegate_to: "{{ groups['hdfs_nn'][0] }}"
tosit.tdp.hdfs_file:
hdfs_conf: "{{ hadoop_conf_dir }}"
path: "{{ item.path }}"
state: "{{ item.state | default(omit) }}"
owner: "{{ item.owner | default(omit) }}"
group: "{{ item.group | default(omit) }}"
mode: "{{ item.mode | default(omit) }}"
become: yes
become_user: "{{ hdfs_user }}"
loop:
- path: /spark-logs
state: directory
owner: "{{ spark_user }}"
group: "{{ hadoop_group }}"
mode: '777'
access_fqdn
filter plugin: returnsaccess_fqdn
, oraccess_sn
+domain
, orinventory_hostname
+domain
(checking if variables exist for the host in this order)
Example usage:
- debug:
msg: "{{ groups[hdfs_nn][0] | access_fqdn(hostvars) }}"
- debug:
msg: "{{ groups['hdfs_jn'] | map('access_fqdn', hostvars) | list }}"
The best way to use the roles from the collection is to call the related file from the playbooks
directory inside another playbook.
Examples:
- name: Deploy ZooKeeper
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/zookeeper.yml
- name: Deploy Hadoop
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hadoop.yml
- name: Deploy Hive
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hive.yml
- Python >= 3.6 with virtual env package (i.e.
python3-venv
)
Please follow the guidelines at contributing and respect the code of conduct.