Skip to content

HRI API Specification

The HRI consists of two separate APIs: the Management API and Apache Kafka.

Management API Specification

The Management API is defined using the OpenAPI 3.0 specification: management.yml. You can open the file directly or use a program such as Swagger UI to view it.

HRI Tenants & Elasticsearch Indices

HRI has been designed with a multi-tenant cloud architecture. The API mainly contains methods for managing Tenants like creating, getting, and deleting. Each of these calls takes in the tenantId. The ID is appended with the suffix -batches to create an index in Elasticsearch, where all the batch metadata is stored.

A Get call without a tenantId will return a list of all tenants. The Get call, when given a tenantId, will return information on the elastic index of a specific tenant. Below is a table of the fields returned by this call:

Field Description
health health of the Elastic cluster
status status of the index, can be open or closed
index the name of the index, which will be the tenantId with -batches appended to it
uuid universally unique identifier
pri number of primary shards
rep number of replicas
docs.count number of batches documents stored in the index
docs.deleted number of batches documents deleted from the index
store.size store size taken by primary and replica shards
pri.store.size store size taken only by primary shards

Batches

The API contains methods for managing batches like creating, getting, and updating. Below is a table of the fields:

Field Description
id auto generated unique ID
name name of the batch, provided by the Data Integrator
integratorId unique ID of the Data Integrator that created this batch
topic Event Streams (Kafka) topic that contains the data, provided by the Data Integrator
dataType the type of data, provided by the Data Integrator
status status of the batch: [ started, completed, terminated ]
startDate the date and time the batch was started
endDate the date and time the batch was completed or terminated
recordCount the number of records in the batch, provided by the Data Integrator when completed
metadata custom json value, optional

Only the name, topic, and dataType fields are required when creating a batch.

The metadata field is optional and allows the Data Integrator to include any additional information about the batch that Data Consumers might request. This information will be included in all notification messages.

The recordCount is provided by the Data Integrator when the batch is completed, and thus not always present.

All other fields are generated by the API.

Streams

The API also contains methods for managing Event Streams topics like creating, getting, and deleting. Below is a table of the fields:

Field Description
id stream ID, consisting of a data integrator and optional qualifier, delimited by ‘.’
numPartitions the number of partitions on the topic
retentionMs length of time in milliseconds before log segments are automatically discarded from a partition
retentionBytes optional maximum size in bytes that a partition can grow before discarding log segments
cleanupPolicy optional retention policy on old log segments
segmentMs optional time in milliseconds after which Kafka will force the log to roll even if the segment file isn’t full
segmentBytes optional log segment file size in bytes
segmentIndexBytes optional size in bytes of the index that maps offsets to file positions

Only the numPartitions and retentionMs fields are required when creating a stream. The rest of the topic configurations (retentionBytes, cleanupPolicy, segmentMs, segmentBytes, and segmentIndexBytes) are optional. Below is a table of the default values and acceptable ranges for these optional fields:

Field Default value Acceptable values/ranges
retentionBytes 1073741824 [10485760..1073741824]
cleanupPolicy delete [ delete, compact ]
segmentMs nil [300000..2592000000]
segmentBytes 536870912 [10485760..536870912]
segmentIndexBytes nil [102400..104857600]

If the cleanupPolicy field is set to compact, it will disable deletion based on time, ignoring the value set for the field retentionMs.

Event Streams is an IBM Cloud managed version of Apache Kafka. It uses standard Kafka libraries to read and write data to topics. See the IBM documentation for details on connection parameters.

Apache Kafka

Apache Kafka has its own API and clients are available for most languages. If using IBM Event Streams, see their documentation for details on connection parameters. Below are the requirements on the records written to and read from Kafka.

Health Input Data - FHIR Model

HRI does not impose any requirements on the format of the content of the Health (data) records written to Kafka, although Alvearie has selected FHIR as the preferred data model for all Health Data. See their FHIR implementation guide for more details. Data Integrators and Data Consumers must work together to agree on the specifics of the input data such as format and frequency.

HRI-Specific Requirements

The HRI does have the following requirements and recommendations:

  • Batch ID Header - every record must have a header entry with the batch ID that uses the key batchId. Data Integrators may include any additional header values, which will get passed downstream to consumers.

  • Zstd Compression - use zstd compression when writing to Kafka by setting the compression.type producer configuration. Event Streams throttles network usage and limits Kafka messages to 1 MB. Using compression will help prevent an Event Streams bottleneck.

  • 1 MB Message Limit - Event Streams limits messages to 1 MB. There is not a way to directly set the max message size after compression is applied in the Kafka producer. The message.max.bytes producer configuration is applied before compression. The batch.size producer configuration can be set to limit the batching of records, but it can also affect performance. We recommend doing performance testing to determine appropriate values based on your data. For records over 1 MB compressed, there are two strategies:

    1. External References - for records that have large binary attachments like images or pdfs, you may provide a reference to the large resource that is included in the message, rather than the (large) resource itself. For example, you could put a COS Object URL, or some other external data store URL, and key into the message.

    2. Splitting up Records - records can be split into smaller parts, sent through the HRI, and re-assembled by down stream consumers.

Notification Messages

The notification messages are json-encoded batches. They match the schema returned by the Management API described above, which is also defined here: batchNotification.json.