Skip to main content

Analytics and Machine Learning with HFQL

Use case

A hospital's quality improvement team needs to build a dashboard that tracks key clinical metrics: average time from admission to first lab result, the distribution of HbA1c values across their diabetic population, and medication adherence rates by department. The clinical data lives in a FHIR server, but their analytics tools — Python notebooks, BI dashboards, and ML pipelines — expect tabular data: rows and columns, not nested JSON documents.

Extracting data from FHIR's JSON format into a flat table is traditionally painful. A single Patient resource has nested arrays for names, identifiers, addresses, and telecoms. An Observation has polymorphic value types (valueQuantity, valueCodeableConcept, valueString). Flattening these into columns requires custom code for every resource type.

HFQL (HAPI FHIR Query Language) solves this by providing a SQL-like query interface that returns results in tabular CSV format — ready for import into pandas, Excel, DBeaver, or any tool that reads tables.

Other scenarios this recipe applies to

  • ML model training: A data science team needs to extract thousands of lab results with timestamps, patient demographics, and diagnosis codes to train a prediction model for sepsis risk. HFQL can produce the training dataset as a flat table.
  • Regulatory reporting: A compliance officer needs to generate a quarterly report of all prescriptions grouped by medication and prescribing practitioner. HFQL's GROUP BY and COUNT operations produce the summary directly.

What you will build

Prerequisites

Enabling HFQL

HFQL is disabled by default because it is an experimental feature (beta). Performance may be lower than standard FHIR REST searches for large datasets, and the SQL syntax has known limitations. To enable it, set the following in your configuration:

application.yaml
fire-arrow:
hfql:
enabled: true

Or set the environment variable:

HFQL_ENABLED=true

After restarting the server, the $hfql-execute operation appears in the CapabilityStatement, and the HFQL link appears in the web UI header.

See the Server Configuration documentation for the full list of configuration options.

How HFQL enforces authorization

A common concern with SQL-like access is whether it bypasses the authorization layer. It does not. HFQL in Fire Arrow Server enforces the same authorization rules as regular FHIR REST searches, through a mechanism called search narrowing.

Why search narrowing is necessary

HFQL works by translating your SQL query into internal DAO (database access object) calls. These internal calls do not pass through the normal FHIR REST interceptor chain, which means the standard SearchNarrowingInterceptor that restricts REST searches would not apply. Without additional measures, an HFQL query could potentially return data outside the user's authorized compartment.

How Fire Arrow solves this

Fire Arrow wraps the HFQL executor in a narrowing decorator that injects compartment-based restrictions into every query before it reaches the database. Here is what happens step by step:

  1. You submit a query: SELECT id, code.coding[0].display FROM Observation WHERE status = 'final'
  2. The decorator identifies the resource type from the FROM clause: Observation.
  3. It evaluates your authorization rules to determine which compartments you can access. For example, if you are a practitioner with LegitimateInterest access, the decorator resolves which patients you have a relationship with.
  4. It appends restriction clauses to the query: ... WHERE status = 'final' AND (subject.reference = 'Patient/123' OR subject.reference = 'Patient/456').
  5. The narrowed query is executed against the database. You only see results for patients you are authorized to access.

This narrowing applies to both initial queries and continuation requests (pagination). When HFQL returns paginated results, the client sends back a continuation token that includes the parsed query. Fire Arrow re-applies the narrowing to each continuation request, preventing a tampered continuation from bypassing restrictions.

info

The authorization enforcement for HFQL uses the same validator pipeline as REST searches. If your rules use LegitimateInterest, PatientCompartment, CareTeam, or any other validator, those same validators determine what data HFQL queries can access.

HFQL syntax overview

HFQL uses a SQL-like syntax adapted for FHIR's data model. The key difference from regular SQL is that column expressions are FHIRPath-like element paths rather than database column names.

Basic query structure

SELECT
column1,
column2 AS alias
FROM
ResourceType
WHERE
condition
ORDER BY
column
LIMIT
count

Path expressions

FHIR resources have nested, repeating elements. HFQL uses bracket notation to access specific items in arrays and dot notation for nested elements:

ExpressionMeaning
idResource ID
name[0].familyFirst name entry's family name
name[0].given[0]First given name of the first name entry
code.coding[0].displayDisplay text of the first coding
value.Quantity.valueValue from a polymorphic value[x] when it is a Quantity
identifier.where(system='http://example.org').valueValue of a specific identifier (FHIRPath where() function)

Operators

CategoryOperators
Comparison=, !=, <, >, <=, >=
Pattern matchingLIKE '%pattern%'
Set membershipIN ('value1', 'value2')
Null checksIS NULL, IS NOT NULL
LogicalAND, OR, NOT

Aggregation

HFQL supports COUNT(*), GROUP BY, and HAVING:

SELECT code.coding[0].code, COUNT(*)
FROM Observation
GROUP BY code.coding[0].code
HAVING COUNT(*) > 10

Using HFQL from the Web UI

The quickest way to explore HFQL is through the built-in web UI. Navigate to the HFQL page (visible in the header when HFQL is enabled). The editor provides:

  • Sample queries — load pre-built examples for patients, observations, and conditions from the dropdown menu.
  • Syntax documentation — click the "Docs" button to see the quick reference alongside the editor.
  • Tabular results — results display in a table view, or switch to raw CSV mode.
  • Keyboard shortcut — press Ctrl+Enter (or Cmd+Enter on macOS) to execute.

See Developer Tools for a screenshot and additional details.

Using HFQL from the command line

HFQL queries are submitted via POST to the $hfql-execute operation with a FHIR Parameters resource. The response is CSV.

Basic patient listing

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"resourceType": "Parameters",
"parameter": [
{"name": "action", "valueCode": "search"},
{"name": "query", "valueString": "SELECT id, name[0].family, name[0].given[0], birthDate, gender FROM Patient LIMIT 10"}
]
}'

The response is CSV text. The first two rows are protocol metadata (version and search continuation info); the data rows follow.

Observations with values

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"resourceType": "Parameters",
"parameter": [
{"name": "action", "valueCode": "search"},
{"name": "query", "valueString": "SELECT id, code.coding[0].display, value.Quantity.value, value.Quantity.unit, effectiveDateTime FROM Observation WHERE status = '\''final'\'' LIMIT 20"}
]
}'

Aggregation: count conditions by clinical status

curl -s -X POST "http://localhost:8080/fhir/\$hfql-execute" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"resourceType": "Parameters",
"parameter": [
{"name": "action", "valueCode": "search"},
{"name": "query", "valueString": "SELECT clinicalStatus.coding[0].code, COUNT(*) FROM Condition GROUP BY clinicalStatus.coding[0].code"}
]
}'

Using HFQL from Python

For analytics and machine learning workflows, Python is the most common tool. Here is how to query HFQL and load results directly into a pandas DataFrame.

Querying HFQL and loading into pandas

import requests
import pandas as pd
from io import StringIO

FHIR_BASE = "http://localhost:8080/fhir"
TOKEN = "<your-token>"

def hfql_query(query: str) -> pd.DataFrame:
"""Execute an HFQL query and return results as a DataFrame."""
resp = requests.post(
f"{FHIR_BASE}/$hfql-execute",
json={
"resourceType": "Parameters",
"parameter": [
{"name": "action", "valueCode": "search"},
{"name": "query", "valueString": query},
],
},
headers={
"Content-Type": "application/fhir+json",
"Authorization": f"Bearer {TOKEN}",
},
)
resp.raise_for_status()

# The response is CSV. The first row is the protocol version line,
# the second row is the search continuation metadata.
# Data starts from the third row, where the first column is the
# row offset (integer) followed by the actual data columns.
lines = resp.text.strip().split("\n")
if len(lines) < 3:
return pd.DataFrame()

# Skip the two metadata rows; parse the rest as CSV
data_csv = "\n".join(lines[2:])
df = pd.read_csv(StringIO(data_csv), header=None)

# The first column is the row offset — drop it.
# Extract column names from the query (or assign manually).
df = df.iloc[:, 1:]
return df

Example: Extract lab results for analysis

# Extract HbA1c results with patient demographics
df = hfql_query("""
SELECT
subject.reference,
code.coding[0].display,
value.Quantity.value,
value.Quantity.unit,
effectiveDateTime
FROM Observation
WHERE code.coding[0].code = '4548-4'
""")

df.columns = ["patient_ref", "test_name", "value", "unit", "date"]
df["value"] = pd.to_numeric(df["value"], errors="coerce")
df["date"] = pd.to_datetime(df["date"], errors="coerce")

print(f"Total HbA1c results: {len(df)}")
print(f"Mean HbA1c: {df['value'].mean():.1f}%")
print(f"Patients with HbA1c > 7%: {(df['value'] > 7).sum()}")

Example: Build a training dataset for ML

# Extract a feature matrix for a diabetes risk model
patients = hfql_query("""
SELECT id, birthDate, gender
FROM Patient
WHERE active = true
""")
patients.columns = ["patient_id", "birth_date", "gender"]

conditions = hfql_query("""
SELECT
subject.reference,
code.coding[0].code,
clinicalStatus.coding[0].code
FROM Condition
WHERE clinicalStatus.coding[0].code = 'active'
""")
conditions.columns = ["patient_ref", "condition_code", "status"]

medications = hfql_query("""
SELECT
subject.reference,
medicationCodeableConcept.coding[0].code,
status
FROM MedicationRequest
WHERE status = 'active'
""")
medications.columns = ["patient_ref", "medication_code", "status"]

# Join and pivot into a feature matrix
# (downstream ML pipeline continues from here)
print(f"Patients: {len(patients)}")
print(f"Active conditions: {len(conditions)}")
print(f"Active medications: {len(medications)}")
caution

HFQL is a beta feature. For large datasets (tens of thousands of resources), query performance may be significantly slower than equivalent FHIR REST searches. The HFQL executor loads each candidate resource and evaluates WHERE clauses in memory using FHIRPath, rather than using database indexes for all filter conditions. Plan accordingly for production analytics workloads.

Using HFQL with JDBC (DBeaver)

HFQL also provides a JDBC driver, allowing you to connect tools like DBeaver directly to your FHIR server as if it were a SQL database.

Setup

  1. Download the JDBC driver JAR from the HAPI FHIR releases page. Look for hapi-fhir-jpaserver-hfql-*.jar.
  2. In DBeaver, go to Database → Driver Manager → New.
  3. Configure the driver:
SettingValue
Class Nameca.uhn.fhir.jpa.fql.jdbc.JdbcDriver
URL Templatejdbc:hapifhirql:{host}
  1. Add the downloaded JAR as a library.
  2. Create a new connection using this driver. Set the URL to your Fire Arrow Server's FHIR base URL:
jdbc:hapifhirql:http://localhost:8080/fhir
  1. If authentication is required, set the username and password fields — they will be sent as HTTP Basic Authentication headers.

Once connected, you can browse FHIR resource types as "tables" and write HFQL queries directly in DBeaver's SQL editor.

Practical recipes

Recipe 1: Patient demographics report

SELECT
id,
name[0].family,
name[0].given[0],
birthDate,
gender,
address[0].city,
address[0].state
FROM Patient
WHERE active = true
ORDER BY name[0].family
LIMIT 100

Recipe 2: Lab results summary with aggregation

SELECT
code.coding[0].display,
COUNT(*)
FROM Observation
WHERE category.coding[0].code = 'laboratory'
GROUP BY code.coding[0].display
ORDER BY COUNT(*) DESC
LIMIT 20

Recipe 3: Active medications by patient

SELECT
subject.reference,
medicationCodeableConcept.coding[0].display,
authoredOn,
dosageInstruction[0].text
FROM MedicationRequest
WHERE status = 'active'
ORDER BY subject.reference

Recipe 4: Condition prevalence

SELECT
code.coding[0].code,
code.coding[0].display,
COUNT(*)
FROM Condition
WHERE clinicalStatus.coding[0].code = 'active'
GROUP BY code.coding[0].code, code.coding[0].display
ORDER BY COUNT(*) DESC
LIMIT 25

Recipe 5: Encounter duration analysis

SELECT
id,
class.code,
type[0].coding[0].display,
period.start,
period.end,
subject.reference
FROM Encounter
WHERE status = 'finished'
ORDER BY period.start DESC
LIMIT 50

Limitations and workarounds

HFQL is a beta feature with known limitations. Understanding these helps you decide when to use HFQL versus standard FHIR REST queries.

LimitationImpactWorkaround
WHERE clauses are evaluated in memoryOnly id and meta.lastUpdated are converted to efficient database search parameters. All other WHERE conditions are evaluated as FHIRPath expressions against each candidate resource in memory.Keep WHERE clauses simple and use LIMIT to cap result sets. For performance-critical queries, use FHIR REST search parameters which use database indexes.
No JOINsYou cannot join Patient and Observation in a single query. Each query targets one resource type.Run separate queries and join in your analytics tool (pandas, DBeaver, etc.).
No INSERT/UPDATE/DELETEHFQL is read-only.Use standard FHIR REST operations for writes.
Limited function supportCOUNT(), EXISTS(), COALESCE(), and basic FHIRPath functions are supported. Complex aggregations (AVG, SUM, MEDIAN) are not.Extract raw data with HFQL and compute aggregations in Python/pandas.
Polymorphic value accessAccessing value.Quantity.value works but value[x] syntax does not. You must know the concrete type.Check the resource definition to determine which value[x] type is used for your data.

When to use HFQL vs. FHIR REST

Use HFQL when...Use FHIR REST when...
You need tabular output for analyticsYou need full FHIR JSON resources
Exploratory data analysis (ad-hoc queries)Production application data access
Building training datasets for MLReal-time clinical workflows
Generating simple reportsHigh-performance, high-volume queries
Prototyping data extraction logicQueries that need _include / _revinclude

Cross-references