-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Idomaar consists of: the orchestrator, the evaluator, the data container, and the computing enviroment.
The computing environment consists of the code able to serve recommendation requests. The computing enviroments implements the recommendation algorithm to be experimented. The only requirements of the computing environment is to be able to connect to two publish-subscribe channels:
- A data streaming channel - consisting in a Kafka connector - It is dedicated to receive the entities and relations forming the available knowledge (e.g., used to train the recommendation algorithm). It is a unidirectional channel, where data are streamed from the orchestrator to the computing environment.
- A control channel - consisting in a ZeroMQ connector or, alternatively, a HTTP connector - dedicated to the exchange of control messages such as: the orchestrator sends a recommendation request, the computing environment notifies it is ready to serve recommendations, the computing environment replies to a recommendation request. Differently from the data streaming channel, the control channel is bidirectional.
Th orchestrator is in charge of controlling the stream of data sent to the computing environment in order to handle: batch bootstrapping, on-line data streaming, on-line recommendation request.
The evaluator consists of a splitting module and an evaluation module. The splitting module separates the data into training and test sets. The evaluation module computes quality and performance metrics over the experiment output. Both modules will be described in more detail later.
The data container contains the dataset in accordance to the data model described in !!TBD!! In addition the data container can optionally store the split data prepared by the evaluator.
Idomaar allows evaluating recommendation tasks via a three-phase process: data splitting, data streaming, result evaluation.
Data splitting creates the data required to run the experiment. An input dataset is processed and split into two main sets: batch training data and on-line streaming data. Batch training data is the set of entities and relations used to bootstrap the computing environment. It basically consists of all the data available to the computing environment for the initial training of the recommendation algorithm. On-line streaming data consists of a set of additional entities and relations, together with a list of recommendation requests to be tested. In fact, entities and relations can be provided later to the computing environment (as they become available, e.g., the injection of new items). Furthermore, a set of recommendation requests is created. Each request consists of the message to be sent to the computing environment (to request a recommendation) and the expected output (i.e., the groundtruth).
The data streaming consists of two main sequential tasks: computing environment bootstrapping and on-line evaluation. During the task Computing environment bootstrapping, the orchestrator reads the batch training data and sends it to the computing environment and waits for computing environment to be ready to serve recommendation requests. During the task on-line evaluation, the orchestrator reads the additional entities and relations and streams them to the computing environment that has the opportunity to incremental updating (if it has the right capabilities) the recommendation algorithms. In a channel separated from entities and relations, the orchestrator