Skip to content

HRI API Specification

The HRI consists of two separate APIs: the Management API and Apache Kafka.

Management API Specification

The Management API is defined using the OpenAPI 3.0 specification: management.yml. You can open the file directly or use a program such as IntelliJ or Swagger UI to view it.

HRI Tenants & Elasticsearch Indices

HRI has been designed with a multi-tenant cloud architecture. The API mainly contains methods for managing Tenants like creating, getting, and deleting. Each of these calls takes in the tenantId. The ID is appended with the suffix -batches to create an index in Elasticsearch, where all the batch metadata is stored.

A Get call without a tenantId will return a list of all tenants. The Get call, when given a tenantId, will return information on the Elastic index of a specific tenant. Below is a table of the fields returned by this call:

Field Description
health health of the Elastic cluster
status status of the index, can be open or closed
index the name of the index, which will be the tenantId with -batches appended to it
uuid universally unique identifier
pri number of primary shards
rep number of replicas
docs.count number of batches documents stored in the index
docs.deleted number of batches documents deleted from the index
store.size store size taken by primary and replica shards
pri.store.size store size taken only by primary shards

Batches

The API contains methods for managing batches like creating, getting, and updating. Below is a table of the fields:

Field Description
id auto generated unique ID
name name of the batch, provided by the Data Integrator
integratorId unique ID of the Data Integrator that created this batch
topic Kafka topic that contains the data, provided by the Data Integrator
dataType the type of data, provided by the Data Integrator
status status of the batch: [ started, sendCompleted, completed, terminated, failed ]
startDate the date and time the batch was started
endDate the date and time the batch was completed, terminated, or failed
expectedRecordCount the number of records in the batch, provided by the Data Integrator when calling ‘sendComplete’
recordCount (deprecated) the number of records in the batch, provided by the Data Integrator when calling ‘sendComplete’. Replaced by expectedRecordCount and deprecated in v2.0.0
actualRecordCount the number of records received, calculated by validation processing
invalidThreshold the number of invalid records allowed in this batch before the batch fails validation, provided by the Data Integrator; defaults to -1 (infinite)
invalidRecordCount the number of invalid records, calculated by validation processing
metadata custom json value, optional

Only the name, topic, and dataType fields are required when creating a batch.

The invalidRecordCount field is used by validation processing, so that when this many invalid records are encountered, the HRI will have determined that the entire batch has Failed Validation.

The expectedRecordCount is provided by the Data Integrator when calling the ‘sendComplete’ endpoint, and thus not always present. The recordCount field is identical and provides backward compatibility with older versions of HRI. recordCount is deprecated in release v2.0.0 and will be removed in a later release.

The metadata field is optional and allows the Data Integrator to include any additional information about the batch that Data Consumers might request. This information will be included in all notification messages.

All other fields are generated by the API.

Streams

The API also contains methods for managing Kafka topics like creating, getting, and deleting. Below is a table of the fields:

Field Description
id stream ID, consisting of a data integrator and optional qualifier, delimited by ‘.’
numPartitions the number of partitions on the topic
retentionMs length of time in milliseconds before log segments are automatically discarded from a partition
retentionBytes optional maximum size in bytes that a partition can grow before discarding log segments
cleanupPolicy optional retention policy on old log segments
segmentMs optional time in milliseconds after which Kafka will force the log to roll even if the segment file isn’t full
segmentBytes optional log segment file size in bytes
segmentIndexBytes optional size in bytes of the index that maps offsets to file positions

Only the numPartitions and retentionMs fields are required when creating a stream. The rest of the topic configurations (retentionBytes, cleanupPolicy, segmentMs, segmentBytes, and segmentIndexBytes) are optional. Below is a table of the default values and acceptable ranges for these optional fields:

Field Default value Acceptable values/ranges
retentionBytes 1073741824 [10485760..1073741824]
cleanupPolicy delete [ delete, compact ]
segmentMs nil [300000..2592000000]
segmentBytes 536870912 [10485760..536870912]
segmentIndexBytes nil [102400..104857600]

If the cleanupPolicy field is set to compact, it will disable deletion based on time, ignoring the value set for the field retentionMs.

Apache Kafka

Apache Kafka has its own API and clients are available for most languages. If using IBM Event Streams, see their documentation for details on connection parameters. Below are the requirements on the records written to and read from Kafka.

Health Input Data - FHIR Model

HRI does not impose any requirements on the format of the content of the Health (data) records written to Kafka, although Alvearie has selected FHIR as the preferred data model for all Health Data. See their FHIR implementation guide for more details. Data Integrators and Data Consumers must work together to agree on the specifics of the input data such as format and frequency.

HRI-Specific Requirements

The HRI does have the following requirements and recommendations:

  • Batch ID Header - every record must have a header entry with the batch ID that uses the key batchId. Data Integrators may include any additional header values, which will get passed downstream to consumers.

  • Zstd Compression - use zstd compression when writing to Kafka by setting the compression.type producer configuration. Event Streams throttles network usage and limits Kafka messages to 1 MB. Using compression will help prevent an Event Streams bottleneck.

  • 1 MB Message Limit - Event Streams limits messages to 1 MB. There is not a way to directly set the max message size after compression is applied in the Kafka producer. The max.request.size producer configuration is applied before compression. The batch.size producer configuration can be set to limit the batching of records, but it can also affect performance. We recommend doing performance testing to determine appropriate values based on your data. For records over 1 MB compressed, there are two strategies:

    1. External References - for records that have large binary attachments like images or pdfs, you may provide a reference to the resource in the message, rather than the (large) resource itself. For example, you could put a COS Object URL, or some other external data store URL, and key into the message.

    2. Splitting up Records - records can be split into smaller parts, sent through the HRI, and re-assembled by down stream consumers.

Notification Messages

The notification messages are json-encoded batches. They match the schema returned by the Management API described above, which is also defined here: batchNotification.json.

Invalid Record Notifications

When validation encounters an invalid record, an invalid record notification is written to the *.invalid topic. It contains a failure message, the batchId, and a pointer to the original record. Below is a table of the fields, and the json schema is defined here: invalidRecord.json.

Field Description
batchId Id of the Batch that the original record belongs to
failure the description of why the original record was invalid
topic the topic of the original record
partition the partition of the original record
offset the offset of the original record