Analytics and Machine Learning with HFQL

Use case

A hospital's quality improvement team needs to build a dashboard that tracks key clinical metrics: average time from admission to first lab result, the distribution of HbA1c values across their diabetic population, and medication adherence rates by department. The clinical data lives in a FHIR server, but their analytics tools — Python notebooks, BI dashboards, and ML pipelines — expect tabular data: rows and columns, not nested JSON documents.

Extracting data from FHIR's JSON format into a flat table is traditionally painful. A single Patient resource has nested arrays for names, identifiers, addresses, and telecoms. An Observation has polymorphic value types (valueQuantity, valueCodeableConcept, valueString). Flattening these into columns requires custom code for every resource type.

HFQL (HAPI FHIR Query Language) solves this by providing a SQL-like query interface that returns results in tabular CSV format — ready for import into pandas, Excel, DBeaver, or any tool that reads tables.

Other scenarios this recipe applies to

ML model training: A data science team needs to extract thousands of lab results with timestamps, patient demographics, and diagnosis codes to train a prediction model for sepsis risk. HFQL can produce the training dataset as a flat table.
Regulatory reporting: A compliance officer needs to generate a quarterly report of all prescriptions grouped by medication and prescribing practitioner. HFQL's GROUP BY and COUNT operations produce the summary directly.

What you will build

Prerequisites

Fire Arrow Server is running. See Getting Started.
Authentication is enabled and you have a bearer token.
Basic familiarity with SQL syntax.

Enabling HFQL

HFQL is disabled by default because it is an experimental feature (beta). Performance may be lower than standard FHIR REST searches for large datasets, and the SQL syntax has known limitations. To enable it, set the following in your configuration:

application.yaml
fire-arrow:
  hfql:
    enabled: true

Or set the environment variable:

HFQL_ENABLED=true

After restarting the server, the $hfql-execute operation appears in the CapabilityStatement, and the HFQL link appears in the web UI header.

See the Server Configuration documentation for the full list of configuration options.

How HFQL enforces authorization

A common concern with SQL-like access is whether it bypasses the authorization layer. It does not. HFQL in Fire Arrow Server enforces the same authorization rules as regular FHIR REST searches, through a mechanism called search narrowing.

Why search narrowing is necessary

HFQL works by translating your SQL query into internal DAO (database access object) calls. These internal calls do not pass through the normal FHIR REST interceptor chain, which means the standard SearchNarrowingInterceptor that restricts REST searches would not apply. Without additional measures, an HFQL query could potentially return data outside the user's authorized compartment.

How Fire Arrow solves this

Fire Arrow wraps the HFQL executor in a narrowing decorator that injects compartment-based restrictions into every query before it reaches the database. Here is what happens step by step:

You submit a query: SELECT id, code.coding[0].display FROM Observation WHERE status = 'final'
The decorator identifies the resource type from the FROM clause: Observation.
It evaluates your authorization rules to determine which compartments you can access. For example, if you are a practitioner with LegitimateInterest access, the decorator resolves which patients you have a relationship with.
It appends restriction clauses to the query: ... WHERE status = 'final' AND (subject.reference = 'Patient/123' OR subject.reference = 'Patient/456').
The narrowed query is executed against the database. You only see results for patients you are authorized to access.

This narrowing applies to both initial queries and continuation requests (pagination). When HFQL returns paginated results, the client sends back a continuation token that includes the parsed query. Fire Arrow re-applies the narrowing to each continuation request, preventing a tampered continuation from bypassing restrictions.

info

The authorization enforcement for HFQL uses the same validator pipeline as REST searches. If your rules use LegitimateInterest, PatientCompartment, CareTeam, or any other validator, those same validators determine what data HFQL queries can access.

HFQL syntax overview

HFQL uses a SQL-like syntax adapted for FHIR's data model. The key difference from regular SQL is that column expressions are FHIRPath-like element paths rather than database column names.

Basic query structure

SELECT
    column1,
    column2 AS alias
FROM
    ResourceType
WHERE
    condition
ORDER BY
    column
LIMIT
    count

Path expressions

FHIR resources have nested, repeating elements. HFQL uses bracket notation to access specific items in arrays and dot notation for nested elements:

Expression	Meaning
`id`	Resource ID
`name[0].family`	First name entry's family name
`name[0].given[0]`	First given name of the first name entry
`code.coding[0].display`	Display text of the first coding
`value.Quantity.value`	Value from a polymorphic `value[x]` when it is a Quantity
`identifier.where(system='http://example.org').value`	Value of a specific identifier (FHIRPath `where()` function)

Operators

Category	Operators
Comparison	`=`, `!=`, `<`, `>`, `<=`, `>=`
Pattern matching	`LIKE '%pattern%'`
Set membership	`IN ('value1', 'value2')`
Null checks	`IS NULL`, `IS NOT NULL`
Logical	`AND`, `OR`, `NOT`

Aggregation

HFQL supports COUNT(*), GROUP BY, and HAVING:

SELECT code.coding[0].code, COUNT(*)
FROM Observation
GROUP BY code.coding[0].code
HAVING COUNT(*) > 10

Using HFQL from the Web UI

The quickest way to explore HFQL is through the built-in web UI. Navigate to the HFQL page (visible in the header when HFQL is enabled). The editor provides:

Sample queries — load pre-built examples for patients, observations, and conditions from the dropdown menu.
Syntax documentation — click the "Docs" button to see the quick reference alongside the editor.
Tabular results — results display in a table view, or switch to raw CSV mode.
Keyboard shortcut — press Ctrl+Enter (or Cmd+Enter on macOS) to execute.

See Developer Tools for a screenshot and additional details.

Using HFQL from the command line

HFQL queries are submitted via POST to the $hfql-execute operation with a FHIR Parameters resource. The response is CSV.

Basic patient listing

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
  -H "Content-Type: application/fhir+json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "resourceType": "Parameters",
    "parameter": [
      {"name": "action", "valueCode": "search"},
      {"name": "query", "valueString": "SELECT id, name[0].family, name[0].given[0], birthDate, gender FROM Patient LIMIT 10"}
    ]
  }'

The response is CSV text. The first two rows are protocol metadata (version and search continuation info); the data rows follow.

Observations with values

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
  -H "Content-Type: application/fhir+json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "resourceType": "Parameters",
    "parameter": [
      {"name": "action", "valueCode": "search"},
      {"name": "query", "valueString": "SELECT id, code.coding[0].display, value.Quantity.value, value.Quantity.unit, effectiveDateTime FROM Observation WHERE status = '\''final'\'' LIMIT 20"}
    ]
  }'

Aggregation: count conditions by clinical status

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
  -H "Content-Type: application/fhir+json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "resourceType": "Parameters",
    "parameter": [
      {"name": "action", "valueCode": "search"},
      {"name": "query", "valueString": "SELECT clinicalStatus.coding[0].code, COUNT(*) FROM Condition GROUP BY clinicalStatus.coding[0].code"}
    ]
  }'

Using HFQL from Python

For analytics and machine learning workflows, Python is the most common tool. Here is how to query HFQL and load results directly into a pandas DataFrame.

Querying HFQL and loading into pandas

import requests
import pandas as pd
from io import StringIO

FHIR_BASE = "http://localhost:8080/fhir"
TOKEN = "<your-token>"

def hfql_query(query: str) -> pd.DataFrame:
    """Execute an HFQL query and return results as a DataFrame."""
    resp = requests.post(
        f"{FHIR_BASE}/$hfql-execute",
        json={
            "resourceType": "Parameters",
            "parameter": [
                {"name": "action", "valueCode": "search"},
                {"name": "query", "valueString": query},
            ],
        },
        headers={
            "Content-Type": "application/fhir+json",
            "Authorization": f"Bearer {TOKEN}",
        },
    )
    resp.raise_for_status()

    # The response is CSV. The first row is the protocol version line,
    # the second row is the search continuation metadata.
    # Data starts from the third row, where the first column is the
    # row offset (integer) followed by the actual data columns.
    lines = resp.text.strip().split("\n")
    if len(lines) < 3:
        return pd.DataFrame()

    # Skip the two metadata rows; parse the rest as CSV
    data_csv = "\n".join(lines[2:])
    df = pd.read_csv(StringIO(data_csv), header=None)

    # The first column is the row offset — drop it.
    # Extract column names from the query (or assign manually).
    df = df.iloc[:, 1:]
    return df

Example: Extract lab results for analysis

# Extract HbA1c results with patient demographics
df = hfql_query("""
    SELECT
        subject.reference,
        code.coding[0].display,
        value.Quantity.value,
        value.Quantity.unit,
        effectiveDateTime
    FROM Observation
    WHERE code.coding[0].code = '4548-4'
""")

df.columns = ["patient_ref", "test_name", "value", "unit", "date"]
df["value"] = pd.to_numeric(df["value"], errors="coerce")
df["date"] = pd.to_datetime(df["date"], errors="coerce")

print(f"Total HbA1c results: {len(df)}")
print(f"Mean HbA1c: {df['value'].mean():.1f}%")
print(f"Patients with HbA1c > 7%: {(df['value'] > 7).sum()}")

Example: Build a training dataset for ML

# Extract a feature matrix for a diabetes risk model
patients = hfql_query("""
    SELECT id, birthDate, gender
    FROM Patient
    WHERE active = true
""")
patients.columns = ["patient_id", "birth_date", "gender"]

conditions = hfql_query("""
    SELECT
        subject.reference,
        code.coding[0].code,
        clinicalStatus.coding[0].code
    FROM Condition
    WHERE clinicalStatus.coding[0].code = 'active'
""")
conditions.columns = ["patient_ref", "condition_code", "status"]

medications = hfql_query("""
    SELECT
        subject.reference,
        medicationCodeableConcept.coding[0].code,
        status
    FROM MedicationRequest
    WHERE status = 'active'
""")
medications.columns = ["patient_ref", "medication_code", "status"]

# Join and pivot into a feature matrix
# (downstream ML pipeline continues from here)
print(f"Patients: {len(patients)}")
print(f"Active conditions: {len(conditions)}")
print(f"Active medications: {len(medications)}")

caution

HFQL is a beta feature. For large datasets (tens of thousands of resources), query performance may be significantly slower than equivalent FHIR REST searches. The HFQL executor loads each candidate resource and evaluates WHERE clauses in memory using FHIRPath, rather than using database indexes for all filter conditions. Plan accordingly for production analytics workloads.

Using HFQL with JDBC (DBeaver)

HFQL also provides a JDBC driver, allowing you to connect tools like DBeaver directly to your FHIR server as if it were a SQL database.

Setup

Download the JDBC driver JAR from the HAPI FHIR releases page. Look for hapi-fhir-jpaserver-hfql-*.jar.
In DBeaver, go to Database → Driver Manager → New.
Configure the driver:

Setting	Value
Class Name	`ca.uhn.fhir.jpa.fql.jdbc.JdbcDriver`
URL Template	`jdbc:hapifhirql:{host}`

Add the downloaded JAR as a library.
Create a new connection using this driver. Set the URL to your Fire Arrow Server's FHIR base URL:

jdbc:hapifhirql:http://localhost:8080/fhir

If authentication is required, set the username and password fields — they will be sent as HTTP Basic Authentication headers.

Once connected, you can browse FHIR resource types as "tables" and write HFQL queries directly in DBeaver's SQL editor.

Practical recipes

Recipe 1: Patient demographics report

SELECT
    id,
    name[0].family,
    name[0].given[0],
    birthDate,
    gender,
    address[0].city,
    address[0].state
FROM Patient
WHERE active = true
ORDER BY name[0].family
LIMIT 100

Recipe 2: Lab results summary with aggregation

SELECT
    code.coding[0].display,
    COUNT(*)
FROM Observation
WHERE category.coding[0].code = 'laboratory'
GROUP BY code.coding[0].display
ORDER BY COUNT(*) DESC
LIMIT 20

Recipe 3: Active medications by patient

SELECT
    subject.reference,
    medicationCodeableConcept.coding[0].display,
    authoredOn,
    dosageInstruction[0].text
FROM MedicationRequest
WHERE status = 'active'
ORDER BY subject.reference

Recipe 4: Condition prevalence

SELECT
    code.coding[0].code,
    code.coding[0].display,
    COUNT(*)
FROM Condition
WHERE clinicalStatus.coding[0].code = 'active'
GROUP BY code.coding[0].code, code.coding[0].display
ORDER BY COUNT(*) DESC
LIMIT 25

Recipe 5: Encounter duration analysis

SELECT
    id,
    class.code,
    type[0].coding[0].display,
    period.start,
    period.end,
    subject.reference
FROM Encounter
WHERE status = 'finished'
ORDER BY period.start DESC
LIMIT 50

Limitations and workarounds

HFQL is a beta feature with known limitations. Understanding these helps you decide when to use HFQL versus standard FHIR REST queries.

Limitation	Impact	Workaround
WHERE clauses are evaluated in memory	Only `id` and `meta.lastUpdated` are converted to efficient database search parameters. All other WHERE conditions are evaluated as FHIRPath expressions against each candidate resource in memory.	Keep WHERE clauses simple and use `LIMIT` to cap result sets. For performance-critical queries, use FHIR REST search parameters which use database indexes.
No JOINs	You cannot join `Patient` and `Observation` in a single query. Each query targets one resource type.	Run separate queries and join in your analytics tool (pandas, DBeaver, etc.).
No INSERT/UPDATE/DELETE	HFQL is read-only.	Use standard FHIR REST operations for writes.
Limited function support	`COUNT()`, `EXISTS()`, `COALESCE()`, and basic FHIRPath functions are supported. Complex aggregations (AVG, SUM, MEDIAN) are not.	Extract raw data with HFQL and compute aggregations in Python/pandas.
Polymorphic value access	Accessing `value.Quantity.value` works but `value[x]` syntax does not. You must know the concrete type.	Check the resource definition to determine which `value[x]` type is used for your data.

When to use HFQL vs. FHIR REST

Use HFQL when...	Use FHIR REST when...
You need tabular output for analytics	You need full FHIR JSON resources
Exploratory data analysis (ad-hoc queries)	Production application data access
Building training datasets for ML	Real-time clinical workflows
Generating simple reports	High-performance, high-volume queries
Prototyping data extraction logic	Queries that need `_include` / `_revinclude`

Cross-references

Developer Tools — HFQL Query Tool — using the built-in web UI editor
Authorization Concepts — how rules apply to HFQL queries
LegitimateInterest Validator — compartment-based access enforcement
Custom Operations — the $hfql-execute operation reference
Server Configuration — enabling and configuring HFQL

Use case​

Other scenarios this recipe applies to​

What you will build​

Prerequisites​

Enabling HFQL​

How HFQL enforces authorization​

Why search narrowing is necessary​

How Fire Arrow solves this​

HFQL syntax overview​

Basic query structure​

Path expressions​

Operators​

Aggregation​

Using HFQL from the Web UI​

Using HFQL from the command line​

Basic patient listing​

Observations with values​

Aggregation: count conditions by clinical status​

Using HFQL from Python​

Querying HFQL and loading into pandas​

Example: Extract lab results for analysis​

Example: Build a training dataset for ML​

Using HFQL with JDBC (DBeaver)​

Setup​

Practical recipes​

Recipe 1: Patient demographics report​

Recipe 2: Lab results summary with aggregation​

Recipe 3: Active medications by patient​

Recipe 4: Condition prevalence​

Recipe 5: Encounter duration analysis​

Limitations and workarounds​

When to use HFQL vs. FHIR REST​

Cross-references​