Add FHIR R4 Healthcare Intelligence connector example by fivetran-davidmillman · Pull Request #573 · fivetran/connector_sdk

fivetran-davidmillman · 2026-04-20T19:28:46Z

Summary

Adds a new connector under all_things_ai/tutorials/databricks-fm-fhir-healthcare-intelligence
Syncs Patient, Condition, Observation, and MedicationRequest resources from any FHIR R4-compliant server (default: HAPI FHIR public test server, no auth required)
Implements the Hybrid (Discovery + Debate) AI pattern using Databricks ai_query(): a Discovery phase for population health risk stratification, and a Debate phase where a Clinical Risk Analyst and Resource Allocation Analyst debate per-patient intervention priorities, producing a consensus with a disagreement flag
Produces 8 destination tables: patients, conditions, observations, medications, population_insights, clinical_assessments, resource_assessments, debate_consensus
Follows the Hybrid pattern established by PRs Add Databricks NOAA Weather Risk Intelligence connector to AI tutorials #570 and Add Databricks FDA FAERS Pharmacovigilance Intelligence connector to AI tutorials #571
Updates main README.md with entry for new connector

Test plan

fivetran debug run locally — SYNC SUCCEEDED
Deployed to Fivetran and triggered production sync — SUCCEEDED
black --line-length 99 passes
flake8 --extend-ignore=E203,E501,B008 passes
All configuration.json values are angle-bracket placeholders
No credentials in submitted files

`fivetran debug` results

SDK Version: 2.8.1
Sync: SUCCEEDED (00:02:24)
Records: 3 patients, 2 AI-enriched (max_patients=3, max_enrichments=2)

Table	Records
patients	3
population_insights	1
clinical_assessments	2
resource_assessments	2
debate_consensus	2
Total upserts	10

Raw fivetran debug terminal output

20-Apr 13:59:11.030 INFO sdk fivetran-connector-sdk 2.8.1 is available. (a newer release exists)
20-Apr 13:59:11.030 INFO sdk run 'pip install --upgrade fivetran-connector-sdk' to update
20-Apr 13:59:11.433 INFO sdk debugging connector at: .../databricks-fm-fhir-healthcare-intelligence
20-Apr 13:59:11.730 INFO sdk starting connector tester
20-Apr 13:59:11.730 INFO debugger connector tester version: 2.26.0410.001
20-Apr 13:59:13.126 INFO debugger previous state: {"last_sync": "2026-04-20T18:49:43.854397+00:00"}
20-Apr 13:59:14.223 INFO sdk calling schema()
20-Apr 13:59:14.233 INFO debugger schema change detected: tester.patients
20-Apr 13:59:14.233 INFO debugger schema change detected: tester.conditions
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.observations
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.medications
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.population_insights
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.clinical_assessments
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.resource_assessments
20-Apr 13:59:14.234 INFO debugger schema change detected: tester.debate_consensus
20-Apr 13:59:14.238 INFO sdk calling update()
20-Apr 13:59:14.238 WARNING Example: all_things_ai/tutorials : databricks-fm-fhir-healthcare-intelligence
20-Apr 13:59:14.238 INFO FHIR base URL: https://hapi.fhir.org/baseR4
20-Apr 13:59:14.238 INFO Max patients: 3, condition filter: none
20-Apr 13:59:14.238 INFO Hybrid analysis ENABLED: model=databricks-claude-sonnet-4-6
20-Apr 13:59:14.239 INFO Phase 1 (MOVE): Fetching clinical data from FHIR
20-Apr 13:59:14.239 INFO Fetching patients (condition filter: none, max: 3)
20-Apr 13:59:14.699 INFO Fetched 3 patient resources
20-Apr 13:59:23.511 INFO Phase 1 complete: 3 patients fetched
20-Apr 13:59:23.512 INFO Phase 2 (DISCOVERY): AI population health analysis
20-Apr 13:59:23.512 INFO Calling ai_query() for population health discovery
20-Apr 13:59:23.578 INFO debugger checkpoint recorded: {"last_sync": "2026-04-20T18:49:43.854397+00:00"}
20-Apr 13:59:57.487 INFO Discovery phase complete
20-Apr 13:59:57.487 INFO Phase 3 (DEBATE): Clinical vs Resource debate per patient
20-Apr 13:59:57.487 INFO Starting debate for 2 patients (3 ai_query() calls each)
20-Apr 13:59:57.513 INFO debugger table created: tester.population_insights
20-Apr 13:59:57.526 INFO debugger checkpoint recorded: {"last_sync": "2026-04-20T18:49:43.854397+00:00"}
20-Apr 14:01:00.335 INFO debugger table created: tester.clinical_assessments
20-Apr 14:01:00.352 INFO debugger table created: tester.resource_assessments
20-Apr 14:01:00.368 INFO debugger table created: tester.debate_consensus
20-Apr 14:01:00.381 INFO debugger checkpoint recorded: {"last_sync": "2026-04-20T18:49:43.854397+00:00"}
20-Apr 14:01:37.568 INFO Debate complete: 2 patients debated, 1 with disagreement flags
20-Apr 14:01:37.568 INFO Sync complete: 3 patients, all clinical phases done
20-Apr 14:01:37.611 INFO debugger checkpoint recorded: {"last_sync": "2026-04-20T18:49:43.854397+00:00"}
20-Apr 14:01:37.611 INFO debugger checkpoint recorded: {"last_sync": "2026-04-20T19:01:37.568371+00:00"}
20-Apr 14:01:37.612 INFO debugger Final checkpoint: committing any remaining operations
20-Apr 14:01:37.612 INFO debugger SYNC SUCCEEDED — total elapsed 00:02:24
Operation       | Counts
----------------+------------
Upserts         | 10
Updates         | 0
Deletes         | 0
Truncates       | 0
Schema changes  | 8
Checkpoints     | 5

Adds a new AI-powered connector under all_things_ai/tutorials that syncs clinical data from a FHIR R4 server and enriches it with Databricks ai_query() using the Hybrid (Discovery + Debate) pattern for population health risk stratification and per-patient intervention recommendations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cla-assistant · 2026-04-20T19:29:09Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

David.Millman seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

cla-assistant · 2026-04-20T19:29:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

David.Millman seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull request overview

Adds a new Databricks AI tutorial connector example that syncs FHIR R4 clinical resources and optionally enriches them using a Hybrid (Discovery + Debate) ai_query() workflow, plus updates the repo root README to list the new tutorial.

Changes:

Added a new connector implementation (connector.py) for fetching FHIR Patient/Condition/Observation/MedicationRequest and producing AI enrichment tables.
Added tutorial docs (README.md) and a template configuration.json for the new connector.
Updated the repository README.md to include the new tutorial link.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
all_things_ai/tutorials/databricks-fm-fhir-healthcare-intelligence/connector.py	Implements FHIR fetch, Databricks ai_query() enrichment, and optional Genie Space creation.
all_things_ai/tutorials/databricks-fm-fhir-healthcare-intelligence/configuration.json	Adds a configuration template for FHIR + Databricks settings.
all_things_ai/tutorials/databricks-fm-fhir-healthcare-intelligence/README.md	Documents connector features, configuration, and produced tables.
README.md	Adds an entry pointing to the new tutorial connector.

Copilot · 2026-04-24T02:10:11Z

+    session = requests.Session()
+    session.headers.update({"Accept": __FHIR_ACCEPT_HEADER})
+    return session


create_session() sets a global Accept: application/fhir+json header on the session, but the same session is later used for Databricks calls (which may not honor/accept that media type). To avoid unexpected 406/negotiation issues, either use a separate session for Databricks or explicitly set an appropriate Accept header (e.g., application/json) on Databricks requests.

Fixed in commit be739142 — call_ai_query() now adds "Accept": "application/json" directly to its own headers dict for both the initial POST and all poll GETs. The create_session() still sets Accept: application/fhir+json globally for FHIR calls; the Databricks calls override it locally so the two don't conflict.

Copilot · 2026-04-24T02:10:11Z

+    record = {"patient_id": patient_id, "assessment_type": assessment_type}
+    record.update(flatten_dict(assessment))
+
+    # The 'upsert' operation is used to insert or update data in the destination table.
+    # The first argument is the name of the destination table.
+    # The second argument is a dictionary containing the record to be upserted.
+    op.upsert(table=table_name, data=record)
+    return True


upsert_assessment() builds record with patient_id/assessment_type and then does record.update(flatten_dict(assessment)), which allows LLM-returned keys (e.g., "patient_id" or "assessment_type") to overwrite the stable identifiers. To keep primary key fields safe, merge in the opposite order (flatten first, then set patient_id/assessment_type) or drop/rename conflicting keys.

Fixed in commit be739142 — upsert_assessment() now flattens the assessment first (record = flatten_dict(assessment) if assessment else {}), then stamps patient_id and assessment_type on top. LLM-returned keys can no longer overwrite the stable identifiers. Regression test: TestUpsertAssessment.test_patient_id_not_overwritten_by_llm_key.

Copilot · 2026-04-24T02:10:12Z

+
+- Fetches Patient, Condition, Observation, and MedicationRequest resources from any FHIR R4-compliant server
+- Supports optional ICD-10 code prefix filtering to target a specific patient cohort (e.g., `E11` for diabetes)
+- Supports incremental sync using FHIR `_lastUpdated` filtering based on the previous sync timestamp


The README claims the connector “supports incremental sync using FHIR _lastUpdated filtering based on the previous sync timestamp”, but the connector code only sorts by -_lastUpdated and never uses state["last_sync"] (or any _lastUpdated filter). Either implement incremental filtering in the FHIR queries using the stored state timestamp, or adjust the README to reflect the current behavior (full/limited sample sync).

Suggested change

- Supports incremental sync using FHIR `_lastUpdated` filtering based on the previous sync timestamp

- Retrieves the most recently updated FHIR resources ordered by `_lastUpdated`; the current sample sync does not apply previous-state incremental `_lastUpdated` filtering

Fixed in commit be739142 — both parts addressed: (1) run_move_phase() now actually reads state.get("last_sync") and adds _lastUpdated=gt{...} to the FHIR patient query; (2) the README Features section already stated incremental sync support, which now accurately reflects the implementation.

Copilot · 2026-04-24T02:10:12Z

+
+FHIR R4 servers return resources as paginated Bundle resources. The connector follows `Bundle.link` entries with `relation=next` to retrieve subsequent pages until no next link is present or the configured `max_patients` limit is reached. The next-page URL is used directly as provided by the server; query parameters are only passed on the initial request.
+
+Databricks SQL Statement API results may be paginated via `next_chunk_internal_link`. The connector follows these links to retrieve all rows from large `ai_query()` results.


The README states the connector follows Databricks SQL Statement API pagination via next_chunk_internal_link, but call_ai_query() only reads result.data_array[0][0] and does not handle chunk pagination. Please either add chunk-pagination support or remove/adjust this statement to avoid misleading users.

Suggested change

Databricks SQL Statement API results may be paginated via `next_chunk_internal_link`. The connector follows these links to retrieve all rows from large `ai_query()` results.

Databricks SQL Statement API responses can be paginated via `next_chunk_internal_link`, but this tutorial connector currently reads the immediate `ai_query()` result only and does not follow chunk-pagination links. The included AI queries are therefore expected to return a single result value rather than a large multi-row result set.

Fixed in commit be739142 — README Pagination section updated to: 'Databricks SQL Statement API responses can be paginated via next_chunk_internal_link, but this tutorial connector reads the immediate ai_query() result only and does not follow chunk-pagination links. The included AI queries return a single result value rather than a large multi-row result set.'

Copilot · 2026-04-24T02:10:12Z

+
+## Additional considerations
+
+This connector was built by David Millman (david.millman@fivetran.com) during a working session with Kelly Kohlleffel. It follows the Hybrid (Discovery + Debate) pattern established by the NOAA Weather Risk Intelligence connector (PR #570) and the FDA FAERS Pharmacovigilance Intelligence connector (PR #571). The HAPI FHIR public test server (`https://hapi.fhir.org/baseR4`) is used as the default data source and contains synthetic clinical data suitable for demonstration purposes.


README includes a personal employee email address. Repos like this typically avoid publishing personal contact info; please remove the email or replace it with a generic support alias/link (e.g., Support team / docs link).

Suggested change

This connector was built by David Millman (david.millman@fivetran.com) during a working session with Kelly Kohlleffel. It follows the Hybrid (Discovery + Debate) pattern established by the NOAA Weather Risk Intelligence connector (PR #570) and the FDA FAERS Pharmacovigilance Intelligence connector (PR #571). The HAPI FHIR public test server (`https://hapi.fhir.org/baseR4`) is used as the default data source and contains synthetic clinical data suitable for demonstration purposes.

This connector was built by David Millman during a working session with Kelly Kohlleffel. It follows the Hybrid (Discovery + Debate) pattern established by the NOAA Weather Risk Intelligence connector (PR #570) and the FDA FAERS Pharmacovigilance Intelligence connector (PR #571). The HAPI FHIR public test server (`https://hapi.fhir.org/baseR4`) is used as the default data source and contains synthetic clinical data suitable for demonstration purposes.

Fixed in commit be739142 — personal email address removed from README Additional considerations. Attribution now reads: 'This connector was built by David Millman during a working session with Kelly Kohlleffel.'

Copilot · 2026-04-24T02:10:12Z

+  "enable_enrichment": "<TRUE_OR_FALSE>",
+  "enable_discovery": "<TRUE_OR_FALSE>",
+  "enable_genie_space": "<TRUE_OR_FALSE>",
+  "genie_table_identifier": "<CATALOG.SCHEMA.TABLE>",


configuration.json uses very generic placeholders for booleans ("<TRUE_OR_FALSE>") and other fields. The repo’s configuration templates typically use descriptive placeholders that communicate intent and defaults (e.g., "<TRUE_OR_FALSE_DEFAULT_TRUE>" / "<TRUE_OR_FALSE_DEFAULT_FALSE>") to reduce misconfiguration.

Suggested change

"enable_enrichment": "<TRUE_OR_FALSE>",

"enable_discovery": "<TRUE_OR_FALSE>",

"enable_genie_space": "<TRUE_OR_FALSE>",

"genie_table_identifier": "<CATALOG.SCHEMA.TABLE>",

"enable_enrichment": "<TRUE_OR_FALSE_DEFAULT_FALSE>",

"enable_discovery": "<TRUE_OR_FALSE_DEFAULT_FALSE>",

"enable_genie_space": "<TRUE_OR_FALSE_DEFAULT_FALSE>",

"genie_table_identifier": "<YOUR_DATABRICKS_GENIE_TABLE_IDENTIFIER_CATALOG_SCHEMA_TABLE>",

Fixed in commit be739142 — configuration.json boolean placeholders now communicate defaults: <TRUE_OR_FALSE_DEFAULT_TRUE> for enable_enrichment and enable_discovery, <TRUE_OR_FALSE_DEFAULT_FALSE> for enable_genie_space. Defaults are also documented in the README configuration table.

Copilot · 2026-04-24T02:10:13Z

+    resources = []
+    next_url = url
+    current_params = params
+
+    while next_url:
+        try:
+            data = fetch_data_with_retry(session, next_url, params=current_params)
+        except RuntimeError as e:
+            log.warning(f"Failed to fetch FHIR bundle page: {e}")
+            break
+
+        # Only pass params on the first request; subsequent requests use the full next URL
+        current_params = None
+
+        entries = data.get("entry", [])
+        for entry in entries:
+            resource = entry.get("resource", {})
+            if resource:
+                resources.append(resource)
+


fetch_fhir_bundle() accumulates all resources from all pages into an in-memory list. In run_move_phase(), this is used for Conditions/Observations/MedicationRequests without any max_results cap, so a single patient with large history can cause unbounded memory growth. Consider streaming page-by-page (upsert as you go) and only retaining a small bounded sample per patient for prompt construction (e.g., latest N conditions/observations/meds).

Fixed in commit be739142 — added constant __MAX_RESOURCES_PER_PATIENT = 100 and passed max_results=__MAX_RESOURCES_PER_PATIENT to each of the three per-patient fetch_fhir_bundle() calls (Condition, Observation, MedicationRequest). fetch_fhir_bundle() already supports max_results and truncates after pagination, so this cap is enforced end-to-end.

Copilot · 2026-04-24T02:10:13Z

+    # Validate Databricks credentials when enrichment is enabled
+    is_enrichment = _parse_bool(configuration.get("enable_enrichment"), default=True)
+    is_discovery = _parse_bool(configuration.get("enable_discovery"), default=True)
+    is_genie = _parse_bool(configuration.get("enable_genie_space"), default=False)
+
+    if is_enrichment or is_discovery or is_genie:
+        for key in [
+            "databricks_workspace_url",
+            "databricks_token",
+            "databricks_warehouse_id",
+        ]:
+            if _is_placeholder(configuration.get(key)):
+                raise ValueError(f"Missing required Databricks config: {key}")
+


validate_configuration() currently requires Databricks credentials when enable_discovery is true, even if enable_enrichment is false (in update(), discovery is only run when both are true). This makes configs like enable_enrichment=false, enable_discovery=true fail validation unnecessarily. Consider gating Databricks credential requirements on the actual execution conditions (e.g., require creds when enable_enrichment is true, or when enable_genie_space is true).

Fixed in commit be739142 — validate_configuration() condition changed from if is_enrichment or is_discovery or is_genie: to if is_enrichment or is_genie:. Discovery-only mode (enable_enrichment=false, enable_discovery=true) no longer requires Databricks credentials, matching the actual runtime behavior in update() where discovery only executes when both enrichment AND discovery are enabled. Regression test: TestValidateConfiguration.test_databricks_creds_not_required_when_enrichment_disabled.

Copilot · 2026-04-24T02:10:13Z

+    try:
+        response = session.post(url, headers=headers, json=payload, timeout=timeout_seconds)
+        response.raise_for_status()
+


call_ai_query() does not implement retry/backoff for transient Databricks API failures (e.g., 429/5xx), so intermittent issues will silently skip enrichment for patients. Since this is an external API call, add bounded retries with exponential backoff (and fail fast on auth/4xx) similar to fetch_data_with_retry().

Fixed in commit be739142 — call_ai_query() now wraps the initial POST in a retry loop (up to __MAX_RETRIES=3 attempts, exponential backoff for status codes in __RETRYABLE_STATUS_CODES). Auth errors (401/403) still fail fast. The same retry constants used by fetch_data_with_retry() are reused for consistency. README Error handling section updated to document the new behavior.

fivetran-anushkaparashar

I’ve left a few comments. Please also review Copilot’s suggestions and address them if they’re valid.

fivetran-anushkaparashar · 2026-04-24T02:03:29Z

+                op.checkpoint(state=state)
+
+        # Final checkpoint
+        state["last_sync"] = datetime.now(timezone.utc).isoformat()


last_sync is written but never read back, and that's the missing piece for incremental sync.

Fixed in commit be739142 — run_move_phase() now reads state.get("last_sync") and adds _lastUpdated=gt{last_sync} to the FHIR patient query params when a prior sync timestamp is present. On the first sync (no last_sync in state) no filter is added, so all patients are fetched. A regression test in tests/test_connector.py (category 5: incremental sync) verifies both paths.

fivetran-anushkaparashar · 2026-04-24T02:08:53Z

+        return
+
+    condition_label = (condition_filter or "all_conditions").replace(" ", "_")
+    insight_id = f"insight_{condition_label}_{len(patient_records)}"


If patient count changes between syncs (e.g., 15 → 20), a new row is inserted instead of updating the existing one. Shouldn't insight should represent one row per condition configuration, not one row per patient count?

Fixed in commit be739142 — insight_id is now f"insight_{condition_label}" (e.g., insight_E11 or insight_all_conditions). The patient count suffix has been removed, so the key is stable across syncs and upserts update the same row rather than creating new ones.

fivetran-anushkaparashar · 2026-04-24T02:15:33Z

+        '  "winner_rationale": "...",\n'
+        '  "agreement_areas": ["..."],\n'
+        '  "disagreement_areas": ["..."],\n'
+        '  "disagreement_flag": true,\n'


The prompt has disagreement_flag hardcoded to true, which biases the LLM to always report disagreement. Is it intentional?

Fixed in commit be739142 — build_consensus_prompt() prompt template now reads 'disagreement_flag': true|false instead of true. The LLM will now evaluate disagreement independently rather than being nudged toward always returning true.

Fixes all 12 unanswered reviewer threads from Apr 24 (fivetran-anushkaparashar + Copilot) plus Level 3 retrofit patterns from NOAA/FDA/FAERS/CPSC/SEC EDGAR: connector.py: - Fix 1: Add _lastUpdated=gt{last_sync} incremental filter to FHIR patient query - Fix 2: Stable insight_id (drop patient count suffix to prevent PK churn) - Fix 3: Add Accept: application/json to Databricks call_ai_query() headers - Fix 4: Reverse upsert_assessment() merge order — flatten first, then set patient_id/assessment_type so LLM keys cannot overwrite stable identifiers - Fix 5: Cap per-patient FHIR resource fetches at __MAX_RESOURCES_PER_PATIENT=100 - Fix 6: validate_configuration() only requires Databricks creds for enrichment or Genie — discovery-only mode no longer triggers credential requirement - Fix 7: Retry initial ai_query() POST up to 3× with exponential backoff - Fix 8: disagreement_flag prompt bias removed (true → true|false) - Fix 9: Genie API serialization — question/content were lists, must be strings - Fix 12: statement_id guard before polling loop (FDA Drug Label lesson) - Fix 14: Genie Space timeout uses databricks_timeout config, not hardcoded 60 README.md: - Pagination section: correct ai_query() chunk-pagination claim (not followed) - Error handling section: document new retry behavior for ai_query() POST - Additional considerations: remove personal email address - Configuration file section: add standard requirements.txt note configuration.json: - Boolean placeholders now communicate defaults (TRUE_OR_FALSE_DEFAULT_TRUE / TRUE_OR_FALSE_DEFAULT_FALSE) New files: - requirements.txt: fivetran_connector_sdk>=2.0.0 - requirements-test.txt: pytest>=7.0.0 - tests/test_connector.py: 60 tests — 5 mandatory categories (config validation, record builders, helper functions, phase logic, incremental sync regression) All 24 submission gate checks pass. 60/60 pytest green. Black + flake8 clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kellykohlleffel · 2026-05-18T17:05:49Z

PR #573 — All 12 reviewer threads addressed (commit `be739142`)

Hi @fivetran-anushkaparashar @fivetran-sahilkhirwal — David is unavailable for a few days so I stepped in as collaborator to address the Apr 24 feedback and bring this connector up to Level 3 standard. Here's a summary of everything that landed in be739142:

12 reviewer issues fixed

Issue	Thread	Fix
`last_sync` never read back in FHIR query	3135048225	`run_move_phase()` now adds `_lastUpdated=gt{last_sync}` when cursor present
`insight_id` includes patient count (PK churn)	3135061809	Stable `f"insight_{condition_label}"` — no patient count suffix
Shared session `Accept` header breaks Databricks	3135065318	`call_ai_query()` adds `Accept: application/json` locally in its own headers dict
`upsert_assessment()` key clobbering	3135065338	Flatten first, then stamp `patient_id`/`assessment_type` on top
README incremental claim didn't match code	3135065370	Code now actually filters by `_lastUpdated`; README claim is accurate
README chunk-pagination claim	3135065386	Updated to state that `ai_query()` results are not chunk-paginated
Personal email in README	3135065395	Removed
Generic boolean placeholders	3135065408	Now `<TRUE_OR_FALSE_DEFAULT_TRUE>` / `<TRUE_OR_FALSE_DEFAULT_FALSE>`
Unbounded per-patient resource fetches	3135065421	`__MAX_RESOURCES_PER_PATIENT = 100` passed to all three per-patient fetches
Validation requires Databricks for discovery-only	3135065438	Condition changed to `if is_enrichment or is_genie:`
No retry/backoff in `call_ai_query()`	3135065449	Retry loop (3×, exponential backoff) added to initial POST
`disagreement_flag` hardcoded `true` in prompt	3135078835	Changed to `true\|false`

Level 3 retrofit additions (learned from NOAA/FDA/FAERS/CPSC)

statement_id guard before polling loop (prevents poll_url = ".../None" on timeout)
Genie Space API strings fixed ("question": q, "content": str — not lists)
Genie Space timeout uses databricks_timeout config instead of hardcoded 60

New files

requirements.txt — fivetran_connector_sdk>=2.0.0
requirements-test.txt — pytest>=7.0.0
tests/test_connector.py — 60 tests covering all 5 mandatory categories (config validation, record builders, helper functions, phase logic w/ mocked Databricks, incremental sync regression)

Gate results

24/24 submission gate checks: ✓
60/60 pytest: ✓
Black + flake8 (from repo root): ✓

Ready for re-review. Please let me know if anything needs further adjustment.

kellykohlleffel · 2026-05-18T19:14:09Z

@cla-assistant recheck

kellykohlleffel · 2026-05-18T20:09:11Z

@cla-assistant recheck

kellykohlleffel · 2026-05-19T15:01:50Z

@fivetran-anushkaparashar — could you drop a @cla-assistant recheck comment here? David has signed the CLA (screenshot confirmed) but the bot isn't refreshing for us since we're not repo members. One comment from you should clear it. Thanks!

fivetran-anushkaparashar · 2026-05-20T10:40:27Z

@cla-assistant recheck

fivetran-anushkaparashar · 2026-05-21T08:44:54Z

@kellykohlleffel @fivetran-davidmillman Can you please re-check if CLA has been signed already?

fivetran-anushkaparashar · 2026-05-21T08:55:05Z

@@ -0,0 +1,578 @@
+"""


NIT: Test file can be removed.

Removed in commit 5e8bc69f. The tests/test_connector.py file has been deleted from the connector directory.

Removing test file to match the pattern of other tutorial connectors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fivetran-sahilkhirwal

Changes look good
Can you please resolve the merge conflict

fivetran-anushkaparashar · 2026-05-26T06:28:28Z

@@ -0,0 +1 @@
+pytest>=7.0.0


Remove this file. Please add your deps in requirement.txt

Fixed in commit 83c5b855 — removed requirements-test.txt. No test files are committed on this PR, so pytest is not needed as a dependency.

fivetran-dejantucakov

Approved with changes to READMEs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fivetran-anushkaparashar

LGTM!

Please resolve the conflicts.

Keep FHIR connector link entry; drop the yield Note block that upstream removed from the AI tutorials section.

kellykohlleffel · 2026-05-26T19:55:26Z

Resolved the merge conflict with upstream/main in README.md (commit 3005c940). The FHIR connector link is preserved; the yield Note block that upstream had already removed is dropped. PR should now show as mergeable.

fivetran-JenasVimal · 2026-06-02T17:10:43Z

@fivetran-davidmillman @kellykohlleffel , is this PR raised from a forked repository? If so, please avoid using forks for changes to this repo . Instead, clone the repository, create a separate branch, push your changes there, and raise the PR from that branch.

This might also help resolve the CLA issue you're seeing, as I've run into a similar issue before when working from a fork.

Thanks!

CC : @fivetran-anushkaparashar @fivetran-sahilkhirwal

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

@@ -0,0 +1 @@
+fivetran_connector_sdk>=2.0.0


+            if status_code in (401, 403):
+                msg = f"HTTP {status_code}: Check your FHIR server credentials. URL: {url}"
+                log.severe(msg)
+                raise RuntimeError(msg) from e


+    except (requests.exceptions.RequestException, ValueError, RuntimeError) as e:
+        log.severe(f"Unexpected error during sync: {str(e)}")
+        raise


+    """
+    Define the schema function which lets you configure the schema your connector delivers.
+    See the technical reference documentation for more details on the schema function:
+    https://fivetran.com/docs/connectors/connector-sdk/technical-reference/connector-sdk-code/connector-sdk-methods#schema
+    Args:


+    """
+    Define the update function, which is a required function, and is called by Fivetran during each sync.
+    See the technical reference documentation for more details on the update function
+    https://fivetran.com/docs/connectors/connector-sdk/technical-reference#update
+    Args:


fivetran-JenasVimal · 2026-07-01T06:18:19Z

Closing this PR due to CLA issues , will re-raise this PR in the new community connectors repo

fivetran-anushkaparashar · 2026-07-02T04:30:36Z

Please refer: #601

fivetran-davidmillman requested review from a team as code owners April 20, 2026 19:28

github-actions Bot added size/XXL PR size: extra extra large ai-assisted/unknown risk/unknown labels Apr 20, 2026

fivetran-anushkaparashar self-requested a review April 21, 2026 09:04

fivetran-rishabhghosh removed the request for review from a team April 21, 2026 20:14

fivetran-anushkaparashar requested a review from Copilot April 24, 2026 02:05

Copilot started reviewing on behalf of fivetran-anushkaparashar April 24, 2026 02:06 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

fivetran-anushkaparashar reviewed Apr 24, 2026

View reviewed changes

kellykohlleffel requested review from fivetran-anushkaparashar and fivetran-sahilkhirwal May 18, 2026 17:05

fivetran-anushkaparashar previously approved these changes May 21, 2026

View reviewed changes

fix(fivetran#573): remove tests/test_connector.py per Anushka review

5e8bc69

Removing test file to match the pattern of other tutorial connectors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kellykohlleffel dismissed fivetran-anushkaparashar’s stale review via 5e8bc69 May 21, 2026 12:26

fivetran-sahilkhirwal previously approved these changes May 22, 2026

View reviewed changes

fivetran-anushkaparashar requested changes May 26, 2026

View reviewed changes

tech writing updates

15f5d37

fivetran-dejantucakov dismissed fivetran-sahilkhirwal’s stale review via 15f5d37 May 26, 2026 09:56

README updates to align with Fivetran standards

e13c87b

fivetran-dejantucakov reviewed May 26, 2026

View reviewed changes

Comment thread all_things_ai/tutorials/databricks-fm-fhir-healthcare-intelligence/README.md Outdated

Apply suggestion from @fivetran-dejantucakov

4991523

fivetran-dejantucakov previously approved these changes May 26, 2026

View reviewed changes

Remove requirements-test.txt — no committed tests; pytest not needed

83c5b85

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kellykohlleffel dismissed fivetran-dejantucakov’s stale review via 83c5b85 May 26, 2026 12:45

kellykohlleffel requested a review from fivetran-anushkaparashar May 26, 2026 12:47

fivetran-anushkaparashar previously approved these changes May 26, 2026

View reviewed changes

Resolve merge conflict with upstream/main in README.md

3005c94

Keep FHIR connector link entry; drop the yield Note block that upstream removed from the AI tutorials section.

kellykohlleffel dismissed fivetran-anushkaparashar’s stale review via 3005c94 May 26, 2026 19:55

fivetran-anushkaparashar approved these changes May 28, 2026

View reviewed changes

fivetran-dejantucakov approved these changes Jun 2, 2026

View reviewed changes

fivetran-JenasVimal requested a review from Copilot June 29, 2026 19:35

Copilot started reviewing on behalf of fivetran-JenasVimal June 29, 2026 19:35 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

fivetran-JenasVimal closed this Jul 1, 2026

	- Supports incremental sync using FHIR `_lastUpdated` filtering based on the previous sync timestamp
	- Retrieves the most recently updated FHIR resources ordered by `_lastUpdated`; the current sample sync does not apply previous-state incremental `_lastUpdated` filtering


		FHIR R4 servers return resources as paginated Bundle resources. The connector follows `Bundle.link` entries with `relation=next` to retrieve subsequent pages until no next link is present or the configured `max_patients` limit is reached. The next-page URL is used directly as provided by the server; query parameters are only passed on the initial request.

		Databricks SQL Statement API results may be paginated via `next_chunk_internal_link`. The connector follows these links to retrieve all rows from large `ai_query()` results.

	Databricks SQL Statement API results may be paginated via `next_chunk_internal_link`. The connector follows these links to retrieve all rows from large `ai_query()` results.
	Databricks SQL Statement API responses can be paginated via `next_chunk_internal_link`, but this tutorial connector currently reads the immediate `ai_query()` result only and does not follow chunk-pagination links. The included AI queries are therefore expected to return a single result value rather than a large multi-row result set.


		## Additional considerations

		This connector was built by David Millman (david.millman@fivetran.com) during a working session with Kelly Kohlleffel. It follows the Hybrid (Discovery + Debate) pattern established by the NOAA Weather Risk Intelligence connector (PR #570) and the FDA FAERS Pharmacovigilance Intelligence connector (PR #571). The HAPI FHIR public test server (`https://hapi.fhir.org/baseR4`) is used as the default data source and contains synthetic clinical data suitable for demonstration purposes.

Uh oh!

Conversation

fivetran-davidmillman commented Apr 20, 2026

Summary

Test plan

fivetran debug results

Uh oh!

cla-assistant Bot commented Apr 20, 2026

Uh oh!

cla-assistant Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fivetran-anushkaparashar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kellykohlleffel commented May 18, 2026

PR #573 — All 12 reviewer threads addressed (commit be739142)

12 reviewer issues fixed

Level 3 retrofit additions (learned from NOAA/FDA/FAERS/CPSC)

`fivetran debug` results

PR #573 — All 12 reviewer threads addressed (commit `be739142`)