HRI API Specification¶

The HRI consists of two separate APIs: the Management API and Apache Kafka.

Management API Specification¶

The Management API is defined using the OpenAPI 3.0 specification: management.yml. You can open the file directly or use a program such as IntelliJ or Swagger UI to view it.

HRI Tenants & Elasticsearch Indices¶

HRI has been designed with a multi-tenant cloud architecture. The API mainly contains methods for managing Tenants like creating, getting, and deleting. Each of these calls takes in the tenantId. The ID is appended with the suffix -batches to create an index in Elasticsearch, where all the batch metadata is stored.

A Get call without a tenantId will return a list of all tenants. The Get call, when given a tenantId, will return information on the Elastic index of a specific tenant. Below is a table of the fields returned by this call:

Field	Description
health	health of the Elastic cluster
status	status of the index, can be open or closed
index	the name of the index, which will be the tenantId with `-batches` appended to it
uuid	universally unique identifier
pri	number of primary shards
rep	number of replicas
docs.count	number of batches documents stored in the index
docs.deleted	number of batches documents deleted from the index
store.size	store size taken by primary and replica shards
pri.store.size	store size taken only by primary shards

Batches¶

The API contains methods for managing batches like creating, getting, and updating. Below is a table of the fields:

Field	Description
id	auto generated unique ID
name	name of the batch, provided by the Data Integrator
integratorId	unique ID of the Data Integrator that created this batch
topic	Kafka topic that contains the data, provided by the Data Integrator
dataType	the type of data, provided by the Data Integrator
status	status of the batch: [ started, sendCompleted, completed, terminated, failed ]
startDate	the date and time the batch was started
endDate	the date and time the batch was completed, terminated, or failed
expectedRecordCount	the number of records in the batch, provided by the Data Integrator when calling ‘sendComplete’
recordCount (deprecated)	the number of records in the batch, provided by the Data Integrator when calling ‘sendComplete’. Replaced by `expectedRecordCount` and deprecated in v2.0.0
actualRecordCount	the number of records received, calculated by validation processing
invalidThreshold	the number of invalid records allowed in this batch before the batch fails validation, provided by the Data Integrator; defaults to -1 (infinite)
invalidRecordCount	the number of invalid records, calculated by validation processing
metadata	custom json value, optional

Only the name, topic, and dataType fields are required when creating a batch.

The invalidRecordCount field is used by validation processing, so that when this many invalid records are encountered, the HRI will have determined that the entire batch has Failed Validation.

The expectedRecordCount is provided by the Data Integrator when calling the ‘sendComplete’ endpoint, and thus not always present. The recordCount field is identical and provides backward compatibility with older versions of HRI. recordCount is deprecated in release v2.0.0 and will be removed in a later release.

The metadata field is optional and allows the Data Integrator to include any additional information about the batch that Data Consumers might request. This information will be included in all notification messages.

All other fields are generated by the API.

Streams¶

The API also contains methods for managing Kafka topics like creating, getting, and deleting. Below is a table of the fields:

Field	Description
id	stream ID, consisting of a data integrator and optional qualifier, delimited by ‘.’
numPartitions	the number of partitions on the topic
retentionMs	length of time in milliseconds before log segments are automatically discarded from a partition
retentionBytes	optional maximum size in bytes that a partition can grow before discarding log segments
cleanupPolicy	optional retention policy on old log segments
segmentMs	optional time in milliseconds after which Kafka will force the log to roll even if the segment file isn’t full
segmentBytes	optional log segment file size in bytes
segmentIndexBytes	optional size in bytes of the index that maps offsets to file positions

Only the numPartitions and retentionMs fields are required when creating a stream. The rest of the topic configurations (retentionBytes, cleanupPolicy, segmentMs, segmentBytes, and segmentIndexBytes) are optional. Below is a table of the default values and acceptable ranges for these optional fields:

Field	Default value	Acceptable values/ranges
retentionBytes	1073741824	[10485760..1073741824]
cleanupPolicy	delete	[ delete, compact ]
segmentMs	nil	[300000..2592000000]
segmentBytes	536870912	[10485760..536870912]
segmentIndexBytes	nil	[102400..104857600]

If the cleanupPolicy field is set to compact, it will disable deletion based on time, ignoring the value set for the field retentionMs.

Apache Kafka¶

Apache Kafka has its own API and clients are available for most languages. If using IBM Event Streams, see their documentation for details on connection parameters. Below are the requirements on the records written to and read from Kafka.

Health Input Data - FHIR Model¶

HRI does not impose any requirements on the format of the content of the Health (data) records written to Kafka, although Alvearie has selected FHIR as the preferred data model for all Health Data. See their FHIR implementation guide for more details. Data Integrators and Data Consumers must work together to agree on the specifics of the input data such as format and frequency.

HRI-Specific Requirements¶

The HRI does have the following requirements and recommendations:

Batch ID Header - every record must have a header entry with the batch ID that uses the key batchId. Data Integrators may include any additional header values, which will get passed downstream to consumers.
Zstd Compression - use zstd compression when writing to Kafka by setting the compression.type producer configuration. Event Streams throttles network usage and limits Kafka messages to 1 MB. Using compression will help prevent an Event Streams bottleneck.
1 MB Message Limit - Event Streams limits messages to 1 MB. There is not a way to directly set the max message size after compression is applied in the Kafka producer. The max.request.size producer configuration is applied before compression. The batch.size producer configuration can be set to limit the batching of records, but it can also affect performance. We recommend doing performance testing to determine appropriate values based on your data. For records over 1 MB compressed, there are two strategies:
1. External References - for records that have large binary attachments like images or pdfs, you may provide a reference to the resource in the message, rather than the (large) resource itself. For example, you could put a COS Object URL, or some other external data store URL, and key into the message.
2. Splitting up Records - records can be split into smaller parts, sent through the HRI, and re-assembled by down stream consumers.

Notification Messages¶

The notification messages are json-encoded batches. They match the schema returned by the Management API described above, which is also defined here: batchNotification.json.

Invalid Record Notifications¶

When validation encounters an invalid record, an invalid record notification is written to the *.invalid topic. It contains a failure message, the batchId, and a pointer to the original record. Below is a table of the fields, and the json schema is defined here: invalidRecord.json.

Field	Description
batchId	Id of the Batch that the original record belongs to
failure	the description of why the original record was invalid
topic	the topic of the original record
partition	the partition of the original record
offset	the offset of the original record