Skip to main content

Generating Synthetic Test Data with Synthea

Use case

A development team is building a patient-facing mobile app on top of Fire Arrow Server. Before connecting real clinical data, they need a realistic test environment with hundreds of patients, each having a plausible medical history — conditions, medications, lab results, encounters, care plans, and immunizations. Manually creating this data would take weeks, and using real patient data in a development environment is a compliance violation.

Synthea is an open-source synthetic patient generator that produces complete, clinically realistic FHIR patient records. It simulates patient lifecycles from birth through death, including disease progression, medication prescriptions, lab values, and care encounters — all as valid FHIR R4 resources. The data is entirely synthetic: it does not contain or derive from real patient information, making it safe to use in any environment.

Other scenarios this recipe applies to

  • Demo environments: A sales team needs a Fire Arrow Server instance loaded with realistic data to demonstrate the platform's capabilities — dashboards, authorization rules, CarePlan materialization — all with plausible clinical content.
  • Integration testing: A CI/CD pipeline needs to seed a fresh Fire Arrow Server with a known patient population before running automated tests against the API.
  • Training and education: A medical informatics course needs students to explore FHIR resources on a server with rich, diverse clinical data without any privacy concerns.

What you will build

Prerequisites

  • Fire Arrow Server is running. See Getting Started.
  • Java JDK 17 or newer is installed (required by Synthea).
  • You have administrative access or the bundle-uploader feature enabled in the Web UI.

Understanding Synthea's output

What Synthea generates

Synthea simulates complete patient lifecycles. For each synthetic patient, it produces FHIR resources covering:

CategoryFHIR resources
DemographicsPatient (name, gender, birth date, address, identifiers)
EncountersEncounter (primary care visits, ER visits, specialist referrals)
ConditionsCondition (diagnoses, onset, resolution)
MedicationsMedicationRequest (prescriptions, dosage, duration)
ProceduresProcedure (surgeries, imaging, therapeutic procedures)
ObservationsObservation (vital signs, lab results, social history)
ImmunizationsImmunization (vaccines administered)
Care plansCarePlan (treatment goals and activities)
AllergiesAllergyIntolerance (drug and food allergies)
ClaimsClaim, ExplanationOfBenefit (insurance and billing data)

The clinical content is based on published epidemiological data and clinical guidelines, so the resulting patient populations have realistic disease prevalence, medication patterns, and lab value distributions.

Output file structure

When configured for transaction bundles, Synthea produces three types of files in output/fhir/:

FileContentsPurpose
hospitalInformation*.jsonOrganization and Location resourcesDefines the hospitals and clinics where care takes place.
practitionerInformation*.jsonPractitioner resourcesDefines the healthcare providers who deliver care.
Individual patient files (e.g., Aaron_Baumbach_*.json)All clinical resources for one patientThe complete medical record for a single synthetic patient.

Why import order matters

When transaction bundles are enabled, Synthea deliberately excludes Practitioner, Organization, and Location resources from individual patient bundles to avoid creating duplicates. Instead, patient bundles reference these resources using FHIR conditional references (query-based references), for example:

Practitioner?identifier=http://hl7.org/fhir/sid/us-npi|999999647

When Fire Arrow Server processes the transaction bundle, it resolves this reference by searching for a Practitioner with that identifier. If the Practitioner does not exist yet, the reference cannot be resolved and the transaction fails.

This means you must upload the files in this order:

  1. Hospital information (Organizations and Locations) — first
  2. Practitioner information — second
  3. Patient bundles — last

The hospital and practitioner bundles use ifNoneExist preconditions, so re-uploading them is safe — existing resources will not be duplicated.

Why transaction bundles?

Synthea can produce two bundle types: collection bundles (the default) and transaction bundles. For importing into Fire Arrow Server, transaction bundles are strongly recommended:

AspectCollection bundlesTransaction bundles
AtomicityNone — each resource is processed independentlyAll-or-nothing — if any resource fails, the entire bundle is rolled back
Reference resolutionReferences may break if a dependent resource failsAll references within the bundle are guaranteed to resolve
Duplicate handlingNo built-in protectionifNoneExist prevents duplicate Organizations, Locations, and Practitioners
PerformanceMultiple database transactionsSingle database transaction per bundle — significantly faster

Step 1: Install Synthea

Clone the Synthea repository and build it:

git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build check test

Alternatively, download a pre-built release JAR from the Synthea releases page.

Step 2: Configure Synthea for transaction bundles

Create a configuration file that enables FHIR R4 transaction bundles along with the separate hospital and practitioner exports:

fire-arrow-config.properties
# Enable FHIR R4 export (enabled by default, but be explicit)
exporter.fhir.export = true

# Use transaction bundles instead of collection bundles
exporter.fhir.transaction_bundle = true

# Export hospital (Organization + Location) and Practitioner bundles separately
exporter.hospital.fhir.export = true
exporter.practitioner.fhir.export = true

# Include full patient history (0 = no cutoff)
exporter.years_of_history = 0

Optional: Adjusting the population

You can customize additional settings depending on your needs:

fire-arrow-config.properties (continued)
# Generate 100 living patients (default is 1)
generate.default_population = 100

Step 3: Generate the synthetic population

Run Synthea with your configuration file. You can optionally specify a geographic region:

# Generate 100 patients using the config file
./run_synthea -c fire-arrow-config.properties -p 100

# Or generate patients for a specific US state and city
./run_synthea -c fire-arrow-config.properties -p 100 Massachusetts Boston

Synthea will output progress as it generates each patient. When complete, the output/fhir/ directory will contain:

output/fhir/
├── hospitalInformation1713812345678.json ← Organizations + Locations
├── practitionerInformation1713812345678.json ← Practitioners
├── Aaron_Baumbach_a1b2c3d4-....json ← Patient bundle
├── Abby_Connelly_e5f6g7h8-....json ← Patient bundle
├── ... (one file per patient)
└── Zoe_Williams_i9j0k1l2-....json ← Patient bundle

You can verify the output with a quick check:

# Count the generated files
ls output/fhir/*.json | wc -l

# Inspect a patient bundle's resource types
cat output/fhir/Aaron_Baumbach_*.json | python3 -c "
import json, sys, collections
bundle = json.load(sys.stdin)
types = collections.Counter(e['resource']['resourceType'] for e in bundle.get('entry', []))
for t, c in types.most_common():
print(f' {t}: {c}')
"

Step 4: Import into Fire Arrow Server

The Bundle Uploader provides a visual interface for importing bundles with progress tracking and error reporting.

  1. Open the Fire Arrow Web UI and navigate to Tools > Bundle Uploader in the sidebar.
  2. Upload the hospital information bundle first. Drag and drop the hospitalInformation*.json file onto the upload area, or click to browse. Click Start Upload and wait for it to complete successfully.
  3. Upload the practitioner information bundle second. Repeat the process with the practitionerInformation*.json file.
  4. Upload patient bundles last. You can select multiple patient JSON files at once. The uploader processes them sequentially to avoid overwhelming the server.

Each bundle shows its processing result. For transaction bundles, you will see either all entries succeeding (201 Created) or the entire bundle failing with an error message. If a patient bundle fails with reference resolution errors, verify that the hospital and practitioner bundles were uploaded first.

Option B: Using curl (for scripting and automation)

For larger datasets or CI/CD pipelines, script the import with curl:

FHIR_BASE="http://localhost:8080/fhir"
TOKEN="<your-admin-token>"

# Step 1: Upload hospital information (Organizations + Locations)
echo "Uploading hospital information..."
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @output/fhir/hospitalInformation*.json

# Step 2: Upload practitioner information
echo "Uploading practitioner information..."
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @output/fhir/practitionerInformation*.json

# Step 3: Upload each patient bundle
echo "Uploading patient bundles..."
for file in output/fhir/*.json; do
# Skip the hospital and practitioner files
case "$file" in
*hospitalInformation*|*practitionerInformation*) continue ;;
esac

echo " Uploading: $(basename "$file")"
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @"$file" > /dev/null
done

echo "Done. Uploaded $(ls output/fhir/*.json | grep -v -e hospitalInformation -e practitionerInformation | wc -l) patient bundles."

Option C: Using Python (for progress tracking and error handling)

import json
import glob
import sys
import requests

FHIR_BASE = "http://localhost:8080/fhir"
TOKEN = "<your-admin-token>"
HEADERS = {
"Content-Type": "application/fhir+json",
"Authorization": f"Bearer {TOKEN}",
}

def upload_bundle(filepath: str) -> bool:
with open(filepath) as f:
bundle = json.load(f)

resp = requests.post(FHIR_BASE, json=bundle, headers=HEADERS)
if resp.status_code >= 400:
print(f" FAILED ({resp.status_code}): {filepath}")
return False
return True

# Step 1: Hospital information
print("Uploading hospital information...")
for f in glob.glob("output/fhir/hospitalInformation*.json"):
upload_bundle(f)

# Step 2: Practitioner information
print("Uploading practitioner information...")
for f in glob.glob("output/fhir/practitionerInformation*.json"):
upload_bundle(f)

# Step 3: Patient bundles
patient_files = [
f for f in glob.glob("output/fhir/*.json")
if "hospitalInformation" not in f
and "practitionerInformation" not in f
]

print(f"Uploading {len(patient_files)} patient bundles...")
success = 0
for i, f in enumerate(patient_files, 1):
if upload_bundle(f):
success += 1
if i % 10 == 0:
print(f" Progress: {i}/{len(patient_files)}")

print(f"Done. {success}/{len(patient_files)} patients uploaded successfully.")

What to expect after import

After importing a population of 100 patients, your Fire Arrow Server will typically contain:

Resource typeApproximate countDescription
Patient100One per synthetic person
Encounter2,000–5,000Multiple visits per patient over their lifetime
Condition500–1,500Diagnoses across the population
Observation10,000–30,000Vital signs, lab results, social history
MedicationRequest500–2,000Prescriptions
Procedure500–2,000Surgeries and therapeutic procedures
Immunization500–1,500Vaccines
CarePlan200–800Active and completed care plans
Organization5–20Hospitals and clinics
Practitioner20–50Healthcare providers

You can verify the import by opening the Patient Dashboard in the Web UI, which should now show all the imported patients with their clinical data.

Tips and troubleshooting

Generating larger populations

For populations larger than a few hundred patients, consider:

  • Increase JVM memory: ./run_synthea may need more heap space for very large runs. Set _JAVA_OPTIONS="-Xmx4g" before running.
  • Use the Python upload script with error handling rather than the Web UI, which is designed for interactive use.
  • Split into batches: Generate in batches of 500–1,000 patients if you encounter memory issues.

Common import errors

ErrorCauseFix
Could not resolve reference: Practitioner?identifier=...Patient bundle uploaded before the practitioner bundleUpload practitionerInformation*.json first, then retry
Could not resolve reference: Organization?identifier=...Patient bundle uploaded before the hospital bundleUpload hospitalInformation*.json first, then retry
HTTP 413 Request Entity Too LargeA patient bundle exceeds the server's request size limitIncrease server.max-http-post-size in application.yaml or adjust your reverse proxy settings
HTTP 403 ForbiddenThe authentication token does not have permission to create resourcesUse an admin token or ensure your role has write access to all the resource types Synthea produces

Resetting the database

If you want to start fresh with a new synthetic population, you can use the $expunge operation to clear all data. See Custom Operations for details.

Cross-references