Skip to content
Roberto edited this page Jan 26, 2015 · 28 revisions

Idomaar components

Idomaar consists of: the orchestrator, the evaluator, the data container, and the computing enviroment.

The computing environment

The computing environment consists of the code able to serve recommendation requests. The computing enviroments implements the recommendation algorithm to be experimented. Practically, the computing enviroment consists of a virtual machine that implements the code to serve recommendations. The computing environment communicates with the orchestrtor through two different publish-subscribe channels:

  • A data streaming channel - consisting in a Kafka connector - It is dedicated to receive the entities and relations forming the available knowledge (e.g., used to train the recommendation algorithm). It is a unidirectional channel, where data are streamed from the orchestrator to the computing environment.
  • A control channel - consisting in a ZeroMQ connector or, alternatively, a HTTP connector - dedicated to the exchange of control messages such as: the orchestrator sends a recommendation request, the computing environment notifies it is ready to serve recommendations, the computing environment replies to a recommendation request. Differently from the data streaming channel, the control channel is bidirectional.

The orchestrator

Th orchestrator is in charge of controlling the stream of data sent to the computing environment in order to handle: batch bootstrapping, on-line data streaming, on-line recommendation request.

The evaluator

The evaluator consists of a splitting module and an evaluation module. The splitting module separates the data into training and test sets. The evaluation module computes quality and performance metrics over the experiment output. Both modules will be described in more detail later.

Data container

The data container contains the dataset in accordance to the data model described in !!TBD!! In addition the data container can optionally store the split data prepared by the evaluator.

Idomaar evaluation process

Idomaar allows evaluating recommendation tasks via a three-phase process: data splitting, data streaming, result evaluation.

Data splitting

Data splitting creates the data required to run the experiment. An input dataset is processed and split into two main sets: batch training data and on-line streaming data.

Batch training data is the set of entities and relations used to bootstrap the computing environment. It basically consists of all the data available to the computing environment for the initial training of the recommendation algorithm.

On-line streaming data consists of a set of additional entities and relations, together with a list of recommendation requests to be tested. In fact, entities and relations can be provided later to the computing environment (as they become available, e.g., the injection of new items). Furthermore, a set of recommendation requests is created. Each request consists of the message to be sent to the computing environment (to request a recommendation) and the expected output (i.e., the groundtruth).

Once this phase completes, the orchestrator starts the computing enviroment and waits for its acknoledge (though the control channel) once the ennviroment has been set up.

Data streaming

The data streaming consists of two main sequential tasks: computing environment bootstrapping and on-line evaluation.

In the fist task - computing environment bootstrapping- the orchestrator reads the batch training data and sends it to the computing environment (through the data streaming channel). At the end, the orchestrator notifies (through the control channel) the computing environment that the bootstrapping data are finished. This task completes when the computing environment has bootstrapped the recommendation algorithms and it is ready to serve recommendation requests; at the end of this process, the computing environment sends an acknoledge message to the orchestrator.

During the task on-line evaluation, the orchestrator reads the additional entities and relations and streams them (through the data streaming channel) to the computing environment that has the opportunity to incremental update (if it has the right capabilities) the recommendation algorithms. In accordance to the recommendation requests created by the splitting module (during the phase data splitting) and the streaming strategy, the orchestrator sends the recommendation requests to the computing environment (through the control channel); the recommendation requests are served by the recommendation algorithm whose output is sent back to the orchestrator (through the control channel) and stored.

The recommendation output contains all the information required later by the evaluator to compute both quality and performance metrics.

When all on-line streaming data have been streamed to the computing environment, the orchestrator notifies (through the control channel) the computing environment that the experiment has completed and waits for its acknoledge (through the control channel) to conclude the data streaming task.

At the end of this phase, the computing environment can be shut down.

Result evaluation

The last phase consists in evaluating the output of the computing environment, both in terms of quality metrics (e.g., precision/recall, RMSE, etc.) and performance metrics (e.g., response time). This phase processes the output of the computing environment stored by the orchestrator and generate a set experiment results.