Skip to content

Learnign high-level reasoning of human interactions in a scoial greeting using deep reinforcement learning

Notifications You must be signed in to change notification settings

michal-stolarz/deep_reinforcement_abstract_lfd

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning of Abstract Reasoning from Demonstrations (deep_reinforcement_abstract_lfd)

Author: Madison Clark-Turner

The following repository contains a Deep Recurrent Q-Network (DRQN) architecture for the learning high-level human interactions from demonstrations (LfD). We specifically designed this network for use with an Applied Behavioral Analysis (ABA) styled social greeting behavioral intervention (BI). Our system collects demonstrations of the BI using a tele-operated robot and then extracts relevant features of the interaction in order to perform the intervention autonomously. The DRQN structure was designed using Tensorflow and integrated on a NAO humanoid robot using ROS. For the purposes of privacy we have excluded the raw demonstration data but include the tools to generate additional demonstration files.

A description of our implementation along with both simulated results and live-system results are currently under review for presentation in HRI 2018.

Installation

Once the dependencies have been installed download the contents of the GitHub repository into your "~/catkin_ws/src" directory. In addition to the files located in the repository you will also need to download the InceptionResNetV2.ckpt file from the given link in order to use the provided software. Move the downloaded checkpoint file to "~/catkin_ws/src/deep_reinforcement_abstract_lfd/src/dqn/." or an equivalent directory.

Finally run the following command to generate storage directories

mkdir ~/bag
cd  ~/catkin_ws/src/deep_reinforcement_abstract_lfd
mkdir tfrecords tfrecords/train tfrecords/test

Usage

Usage of the system occurs in three steps:

  1. Collection of training data using a tele-operated robot as it delivers the desired BI
  2. Learning of the BI via the DRQN
  3. Execution of the learned BI using an autonomous system

Our implementation is designed for use with a social greeting BI. Which proceeds in the following manner:

  1. The therapist/robot delivers a Discriminative Stimuli (The robot says "hello" and waves)
  2. The participant provides a response that is either compliant (responding to the robot) or non-compliant (refusing to acknowledge the robot's command)
  3. The robot reacts to the participants response:
    • Compliant: the robot delivers a reward congratulating the participant on following the intervention. The BI then continues to step
    • Non-compliant: the robot delivers a prompt instructing the participant to respond in a compliant manner (saying ", say hello" and waving). The BI then returns to step 2 or if several prompts have failed to elicit a compliant response then the BI proceeds tho step 4.
  4. The robot ends the BI by saying "Good Bye"

Data Collection

Data collection is performed using a tele-operated NAO humanoid robot. Demonstrations are first recorded as rosbags and then later converted into TFRecords for training in the DRQN.

Operating the robot can be performed using our provided interface. The interface can be opened using the following commands in separate terminals

roslaunch nao_bringup nao_full_py.launch
roslaunch deep_reinforcement_abstract_lfd deep_reinforcement_abstract_lfd.launch

Robot Interface

The following buttons perform the following operations:

System Functions (Red)

  • Start: Places the robot into a position it can deliver the BI and stiffens the robot's joints.
  • Shut Down: Closes the robot interface and ends listening ROS topics

Action Functions (Blue)

  • Command: delivers the Discriminative Stimuli (SD)
  • Prompt: executes the Prompt (PMT) action
  • Reward: executes the Reward (REW) action
  • Abort: executes the End Session (END) action

Recording Functions (Green)

  • Start Record: starts recording observations of the BI
  • Stop Record: stops recording observations of the BI and outputs the generated rosbag to the "~/bag" directory

Stance Functions (Magenta)

  • Stand: places the robot in a standing stance
  • Rest: places the robot in a resting/crouching stance

Utility Functions (Yellow)

  • Angle Head: angles the robot's head down so that the camera is focused on the participant
  • Toggle Life: disables autonomous life

Autonomous Functions (Black)

  • Run: has the robot deliver the learned BI autonomously.

When delivering the SD and PMT the robot will greet the participant by the name listed in the textbox. A live feed of what the robot observes is displayed in the interface and a clock displaying the current time (minutes and seconds) is provided for operations that require timing on the part of the system operator. When the system is initially started the "DQN Running" indicator will be red and the 'Run' button will be disabled. Once the DRQN has initialized its variables and is functioning the DQN indicator will turn green and the 'Run' button will be enabled.

To record training examples begin by selecting 'Start Record', then perform the desired function (e.g. the SD action followed by an observation period of several seconds, the REW action, and then the END action). Once the entire interaction has been observed select the 'Stop Record' button to generate a rosbag file (extension .bag) in the "~/bag" directory described in the installation instructions.

Once a collection of rosbag demonstrations have been recorded the files can be converted to TFRecords using

python generate_tfrecord_from_rosbag.py

Training the DRQN

To train the DRQN using we must need to first alter several inputs in a parameter file. Inside 'src/dqn/params_file.txt' are several parameters that influence our model generation.

  • CHECKPOINT_DIRECTORY = a prefix for the name of the directory in which to save the DRQN's internal values. If the directory does not exist the model will generate a new directory. If the directory does exist then the contents of the directory will be overwritten.
  • CHECKPOINT_NAME = the name for the checkpoint file to be saved
  • RESTORE_CHECKPOINT = the directory and name of the checkpoint file to be restored. If left blank the network will use randomized values to initialize the DRQN. When first training the network this option should be left blank.
  • LOG_DIR = A directory to store information about the DRQN network (most of this functionality is currently disabled)
  • TRAIN_DIR = the directory containing the TFRecord demonstrations to train the system. The path is relative to the current directory.
  • TEST_DIR = the directory containing the TFRecord demonstrations to evaluate the system. The path is relative to the current directory.

When first training the network make sure that CHECKPOINT_DIRECTORY, CHECKPOINT_NAME, and TRAIN_DIR are defined and that RESTORE_CHECKPOINT is left blank. Then execute

python model_trainer.py

The trainer will begin optimizing the network and will output timing information at every iteration. The accuracy of the model on the test set will be printed every 10 iterations and the values of the current model will be saved to the listed checkpoint file every 1,000 iterations. Every 10,000 iterations the suffix for the given checkpoint directory will change. The finalized network values will be output in the directory suffixed by "_final".

Once the network has trained the system can be evaluated by defining TEST_DIR and setting RESTORE_CHECKPOINT to list the directory and checkpoint name described when training the model. Executing

python evaluator.py

provides simulated results of the system's accuracy.

Execution of Autonomous System

The automated system an then be run on the robot using the same commands listed in the Data Collection section. Once the interface is open and the "DQN Running" indicator is lit the automated system can be executed by either selecting the 'Run' button or by pressing the NAO's left foot bumper.

Dependencies

The following libraries are used by this application:

Acknowledgements

We borrowed code from several sources for this project:

About

Learnign high-level reasoning of human interactions in a scoial greeting using deep reinforcement learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.6%
  • C++ 25.7%
  • CMake 2.7%