Generating Synthetic Test Data with Synthea
Use case
A development team is building a patient-facing mobile app on top of Fire Arrow Server. Before connecting real clinical data, they need a realistic test environment with hundreds of patients, each having a plausible medical history — conditions, medications, lab results, encounters, care plans, and immunizations. Manually creating this data would take weeks, and using real patient data in a development environment is a compliance violation.
Synthea is an open-source synthetic patient generator that produces complete, clinically realistic FHIR patient records. It simulates patient lifecycles from birth through death, including disease progression, medication prescriptions, lab values, and care encounters — all as valid FHIR R4 resources. The data is entirely synthetic: it does not contain or derive from real patient information, making it safe to use in any environment.
Other scenarios this recipe applies to
- Demo environments: A sales team needs a Fire Arrow Server instance loaded with realistic data to demonstrate the platform's capabilities — dashboards, authorization rules, CarePlan materialization — all with plausible clinical content.
- Integration testing: A CI/CD pipeline needs to seed a fresh Fire Arrow Server with a known patient population before running automated tests against the API.
- Training and education: A medical informatics course needs students to explore FHIR resources on a server with rich, diverse clinical data without any privacy concerns.
What you will build
Prerequisites
- Fire Arrow Server is running. See Getting Started.
- Java JDK 17 or newer is installed (required by Synthea).
- You have administrative access or the
bundle-uploaderfeature enabled in the Web UI.
Understanding Synthea's output
What Synthea generates
Synthea simulates complete patient lifecycles. For each synthetic patient, it produces FHIR resources covering:
| Category | FHIR resources |
|---|---|
| Demographics | Patient (name, gender, birth date, address, identifiers) |
| Encounters | Encounter (primary care visits, ER visits, specialist referrals) |
| Conditions | Condition (diagnoses, onset, resolution) |
| Medications | MedicationRequest (prescriptions, dosage, duration) |
| Procedures | Procedure (surgeries, imaging, therapeutic procedures) |
| Observations | Observation (vital signs, lab results, social history) |
| Immunizations | Immunization (vaccines administered) |
| Care plans | CarePlan (treatment goals and activities) |
| Allergies | AllergyIntolerance (drug and food allergies) |
| Claims | Claim, ExplanationOfBenefit (insurance and billing data) |
The clinical content is based on published epidemiological data and clinical guidelines, so the resulting patient populations have realistic disease prevalence, medication patterns, and lab value distributions.
Output file structure
When configured for transaction bundles, Synthea produces three types of files in output/fhir/:
| File | Contents | Purpose |
|---|---|---|
hospitalInformation*.json | Organization and Location resources | Defines the hospitals and clinics where care takes place. |
practitionerInformation*.json | Practitioner resources | Defines the healthcare providers who deliver care. |
Individual patient files (e.g., Aaron_Baumbach_*.json) | All clinical resources for one patient | The complete medical record for a single synthetic patient. |
Why import order matters
When transaction bundles are enabled, Synthea deliberately excludes Practitioner, Organization, and Location resources from individual patient bundles to avoid creating duplicates. Instead, patient bundles reference these resources using FHIR conditional references (query-based references), for example:
Practitioner?identifier=http://hl7.org/fhir/sid/us-npi|999999647
When Fire Arrow Server processes the transaction bundle, it resolves this reference by searching for a Practitioner with that identifier. If the Practitioner does not exist yet, the reference cannot be resolved and the transaction fails.
This means you must upload the files in this order:
- Hospital information (Organizations and Locations) — first
- Practitioner information — second
- Patient bundles — last
The hospital and practitioner bundles use ifNoneExist preconditions, so re-uploading them is safe — existing resources will not be duplicated.
Why transaction bundles?
Synthea can produce two bundle types: collection bundles (the default) and transaction bundles. For importing into Fire Arrow Server, transaction bundles are strongly recommended:
| Aspect | Collection bundles | Transaction bundles |
|---|---|---|
| Atomicity | None — each resource is processed independently | All-or-nothing — if any resource fails, the entire bundle is rolled back |
| Reference resolution | References may break if a dependent resource fails | All references within the bundle are guaranteed to resolve |
| Duplicate handling | No built-in protection | ifNoneExist prevents duplicate Organizations, Locations, and Practitioners |
| Performance | Multiple database transactions | Single database transaction per bundle — significantly faster |
Step 1: Install Synthea
Clone the Synthea repository and build it:
git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build check test
Alternatively, download a pre-built release JAR from the Synthea releases page.
Step 2: Configure Synthea for transaction bundles
Create a configuration file that enables FHIR R4 transaction bundles along with the separate hospital and practitioner exports:
# Enable FHIR R4 export (enabled by default, but be explicit)
exporter.fhir.export = true
# Use transaction bundles instead of collection bundles
exporter.fhir.transaction_bundle = true
# Export hospital (Organization + Location) and Practitioner bundles separately
exporter.hospital.fhir.export = true
exporter.practitioner.fhir.export = true
# Include full patient history (0 = no cutoff)
exporter.years_of_history = 0
Optional: Adjusting the population
You can customize additional settings depending on your needs:
# Generate 100 living patients (default is 1)
generate.default_population = 100
Step 3: Generate the synthetic population
Run Synthea with your configuration file. You can optionally specify a geographic region:
# Generate 100 patients using the config file
./run_synthea -c fire-arrow-config.properties -p 100
# Or generate patients for a specific US state and city
./run_synthea -c fire-arrow-config.properties -p 100 Massachusetts Boston
Synthea will output progress as it generates each patient. When complete, the output/fhir/ directory will contain:
output/fhir/
├── hospitalInformation1713812345678.json ← Organizations + Locations
├── practitionerInformation1713812345678.json ← Practitioners
├── Aaron_Baumbach_a1b2c3d4-....json ← Patient bundle
├── Abby_Connelly_e5f6g7h8-....json ← Patient bundle
├── ... (one file per patient)
└── Zoe_Williams_i9j0k1l2-....json ← Patient bundle
You can verify the output with a quick check:
# Count the generated files
ls output/fhir/*.json | wc -l
# Inspect a patient bundle's resource types
cat output/fhir/Aaron_Baumbach_*.json | python3 -c "
import json, sys, collections
bundle = json.load(sys.stdin)
types = collections.Counter(e['resource']['resourceType'] for e in bundle.get('entry', []))
for t, c in types.most_common():
print(f' {t}: {c}')
"
Step 4: Import into Fire Arrow Server
Option A: Using the Web UI (recommended for smaller datasets)
The Bundle Uploader provides a visual interface for importing bundles with progress tracking and error reporting.
- Open the Fire Arrow Web UI and navigate to Tools > Bundle Uploader in the sidebar.
- Upload the hospital information bundle first. Drag and drop the
hospitalInformation*.jsonfile onto the upload area, or click to browse. Click Start Upload and wait for it to complete successfully. - Upload the practitioner information bundle second. Repeat the process with the
practitionerInformation*.jsonfile. - Upload patient bundles last. You can select multiple patient JSON files at once. The uploader processes them sequentially to avoid overwhelming the server.
Each bundle shows its processing result. For transaction bundles, you will see either all entries succeeding (201 Created) or the entire bundle failing with an error message. If a patient bundle fails with reference resolution errors, verify that the hospital and practitioner bundles were uploaded first.
Option B: Using curl (for scripting and automation)
For larger datasets or CI/CD pipelines, script the import with curl:
FHIR_BASE="http://localhost:8080/fhir"
TOKEN="<your-admin-token>"
# Step 1: Upload hospital information (Organizations + Locations)
echo "Uploading hospital information..."
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @output/fhir/hospitalInformation*.json
# Step 2: Upload practitioner information
echo "Uploading practitioner information..."
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @output/fhir/practitionerInformation*.json
# Step 3: Upload each patient bundle
echo "Uploading patient bundles..."
for file in output/fhir/*.json; do
# Skip the hospital and practitioner files
case "$file" in
*hospitalInformation*|*practitionerInformation*) continue ;;
esac
echo " Uploading: $(basename "$file")"
curl -s -X POST "$FHIR_BASE" \
-H "Content-Type: application/fhir+json" \
-H "Authorization: Bearer $TOKEN" \
-d @"$file" > /dev/null
done
echo "Done. Uploaded $(ls output/fhir/*.json | grep -v -e hospitalInformation -e practitionerInformation | wc -l) patient bundles."
Option C: Using Python (for progress tracking and error handling)
import json
import glob
import sys
import requests
FHIR_BASE = "http://localhost:8080/fhir"
TOKEN = "<your-admin-token>"
HEADERS = {
"Content-Type": "application/fhir+json",
"Authorization": f"Bearer {TOKEN}",
}
def upload_bundle(filepath: str) -> bool:
with open(filepath) as f:
bundle = json.load(f)
resp = requests.post(FHIR_BASE, json=bundle, headers=HEADERS)
if resp.status_code >= 400:
print(f" FAILED ({resp.status_code}): {filepath}")
return False
return True
# Step 1: Hospital information
print("Uploading hospital information...")
for f in glob.glob("output/fhir/hospitalInformation*.json"):
upload_bundle(f)
# Step 2: Practitioner information
print("Uploading practitioner information...")
for f in glob.glob("output/fhir/practitionerInformation*.json"):
upload_bundle(f)
# Step 3: Patient bundles
patient_files = [
f for f in glob.glob("output/fhir/*.json")
if "hospitalInformation" not in f
and "practitionerInformation" not in f
]
print(f"Uploading {len(patient_files)} patient bundles...")
success = 0
for i, f in enumerate(patient_files, 1):
if upload_bundle(f):
success += 1
if i % 10 == 0:
print(f" Progress: {i}/{len(patient_files)}")
print(f"Done. {success}/{len(patient_files)} patients uploaded successfully.")
What to expect after import
After importing a population of 100 patients, your Fire Arrow Server will typically contain:
| Resource type | Approximate count | Description |
|---|---|---|
| Patient | 100 | One per synthetic person |
| Encounter | 2,000–5,000 | Multiple visits per patient over their lifetime |
| Condition | 500–1,500 | Diagnoses across the population |
| Observation | 10,000–30,000 | Vital signs, lab results, social history |
| MedicationRequest | 500–2,000 | Prescriptions |
| Procedure | 500–2,000 | Surgeries and therapeutic procedures |
| Immunization | 500–1,500 | Vaccines |
| CarePlan | 200–800 | Active and completed care plans |
| Organization | 5–20 | Hospitals and clinics |
| Practitioner | 20–50 | Healthcare providers |
You can verify the import by opening the Patient Dashboard in the Web UI, which should now show all the imported patients with their clinical data.
Tips and troubleshooting
Generating larger populations
For populations larger than a few hundred patients, consider:
- Increase JVM memory:
./run_syntheamay need more heap space for very large runs. Set_JAVA_OPTIONS="-Xmx4g"before running. - Use the Python upload script with error handling rather than the Web UI, which is designed for interactive use.
- Split into batches: Generate in batches of 500–1,000 patients if you encounter memory issues.
Common import errors
| Error | Cause | Fix |
|---|---|---|
Could not resolve reference: Practitioner?identifier=... | Patient bundle uploaded before the practitioner bundle | Upload practitionerInformation*.json first, then retry |
Could not resolve reference: Organization?identifier=... | Patient bundle uploaded before the hospital bundle | Upload hospitalInformation*.json first, then retry |
HTTP 413 Request Entity Too Large | A patient bundle exceeds the server's request size limit | Increase server.max-http-post-size in application.yaml or adjust your reverse proxy settings |
HTTP 403 Forbidden | The authentication token does not have permission to create resources | Use an admin token or ensure your role has write access to all the resource types Synthea produces |
Resetting the database
If you want to start fresh with a new synthetic population, you can use the $expunge operation to clear all data. See Custom Operations for details.
Cross-references
- Bundle Uploader — Web UI for uploading FHIR bundles
- Patient Dashboard — browsing imported patient data
- Custom Operations —
$expungefor clearing data - Synthea on GitHub — source code and documentation
- Synthea Wiki: FHIR Transaction Bundles — detailed configuration reference