A simple tool to convert basic biobank data to the BBMRI-ERIC Locator FHIR profiles. This tool was built by providing a custom source to the BBMRI Federated Platform Converter (bbmri-fp-etl). The BBMRI-FP-ETL generator has a MIABIS model implemented as a middle layer and currently two possible destinations, that is, FHIR JSON and OMOP CDM CSV.
It is, essentially, an ETL script that:
- Extracts/reads data from a file in csv/xlsx format.
- Transforms data first into a MIABIS-compliant structure, then into FHIR JSON bundles based on project-specific profiles.
- Optionally Loads FHIR JSON bundles into the selected Blaze FHIR server.
NOTICE: The current version creates FHIR resources that follow the BBMRI Locator DE-version Profiles. The BBMRI-ERIC ecosystem is transitioning towards other FHIR Profiles based on FHIR (MIABIS ON FHIR), still unsupported by this toolkit. If you are part of BBMRI-ERIC please align with the Federated Platform Task Force or HQ-CS IT before considering employing this application to create new interoperable Federated Platform datasets.
The input files needed by the code are:
- minimal dataset in csv or xlsx format. See xlsx tabs for information about the fields and allowed values.
- biobank information in a yaml file
- collection information in a yaml file
- optional FHIR server url in a yaml file
The code in output shows a table where are summarized resources divided by type. In particular are displayed:
- the resources read from input
- the FHIR resources created
- (when upload is enabled) the FHIR resources available in the server
The JSON files containing the FHIR resources are found in the specified output directories. The default directories are: * output_organizations: the biobank and the collection resources * output_patients: the Patient related resources (Patient, Condition, Specimen, SampleDiagnosis)
The files are ready to be uploaded to the blaze server, if not already done by enabling the "--upload_to_blaze" flag.
Clone the repository
git clone https://github.com/crs4/Biobank2Locator.git
or if you use ssh:
git clone git@github.com:crs4/Biobank2Locator.git
Create a virtual enviroment for python using your favourite virtual enviroment manager (venv, virtualenv, Anaconda, Mamba, Poetry, etc).
For instance, get micromamba:
curl -kL https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj -C ./ --strip-components=1 bin/micromamba
Create a new environment:
micromamba create -n bb2loc python=3.14
micromamba activate bb2loc
install dependencies:
micromamba install fhirclient openpyxl pandas python-dateutil pydantic pytz PyYAML Requests tabulate uvicorn
if some of them fails do it via pip:
pip install --upgrade pip
pip install packagethatfailed
if all failed:
pip install -r requirements.txt
To see all the available flags:
python bims_source.py --help
| flag | meaning | default |
|---|---|---|
| inputdir | input dir for all files | input |
| organizations_outputdir | output dir for organizations files | output_organizations |
| patients_outputdir | output dir for patients files | output_patients |
| minimal_dataset_input_file | filename for minimal dataset | minimal-dataset-with-consent-template.csv |
| minimal_dataset_type_of_input | type of file for minimal dataset | csv |
| biobank_yaml_config_file | biobank config filename | biobank_config.yaml |
| collection_yaml_config_file | collection config filename | collection_config.yaml |
| upload_to_blaze | upload or not the defined FHIR Blaze server | False |
to run reading the minimal dataset from an excel file without uploading the created FHIR resources to a server:
python bims_source.py --minimal_dataset_input_file minimal-dataset-template.xlsx --minimal_dataset_type_of_input excel
reading from a csc and uploading to the FHIR server:
python bims_source.py --minimal_dataset_input_file minimal-dataset-template.csv --minimal_dataset_type_of_input csv --upload_to_blaze
This work has been partially supported by the Projects: Strengthening BBMRI.it, funded by Next Generation EU _ Italian NRRP IR0000031 _ CUP B53C22001820006; ToPMa (G.A. RC_CRP_077) and XDATA funded by the Sardinian Regional Authority.