Processing Flows¶

This diagram depicts the “happy-path” flow through the HRI for a single batch.

core-architecture

Steps¶

The Data Integrator creates a new batch.
The Management API writes a batch notification message to the associated notification topic.
The Data Consumer receives the batch notification message.
Data Integrator writes the data to the correct Kafka *.in topic.
The Data Consumer may now begin reading the data from the Kafka topic but can choose to wait until step 8 to begin reading the data.
The Data Integrator completes writing all data contained in this batch, and it then signals to the Management API that it completed sending the data for the batch.
The Management API writes a batch notification message to the associated notification topic.
The Data Consumer receives the batch notification message.

Alternate Flows¶

Batch Termination¶

If the Data Integrator encounters an error after creating a batch in step 2, they may send a request to the Management API to terminate the batch. The Management API will then write a batch notification message to the associated notification topic, and the Consumer will receive it.

Interleaved Batches¶

The HRI does not prevent the Data Integrator from writing multiples batches into the same topic at the same time. Every record will have a header value that specifies the “batchId”, which is returned from the Management API (see api-spec/management-api/management.yml), so the Consumer can distinguish each one.

In practice, the Data Integrator may only write one batch at a time. As necessary, additional input topics can be created to prevent the interleaving of batches or data types. However, please note that, in general, Kafka performs better with a small number of large topics.