IgM assay analyser

IgM binding array processing pipeline for the Binder lab from input raw csv files to quantification of feature importance and code for the analysis, presented in the paper Deroissart et al., XXXX

IgM processing pipeline

Running the pipeline

On HPC

To perform a test run clone the repository and run from an environment with installed snakemake (tested on snakemake8).

snakemake --profile <your_HPC_profile.sm>

On PC

Coming later.

Setup

To prepare the run you need to:

specify the project config path adjust the cluster settings to your HPC in the config/pipeline_config.yml
specify the paths and the run-specific parameters in the project config file. The example config file with explanations is provided in test. The pipeline can run in two modes: either with or without preprocessing (from individual samples with technical replicas to QN files), regulated by flag skip_preprocessing. Flag should be set to True if only goal is to repeat downstream analysis with different parameters or reproduce downstream analysis on two cohorts merged together. To provide a custom (i.e. batch-corrected from two runs) QN files use a flag use_custom_QN and in this case provide path to the QN file to use in custom_QN_file file. Default pipeline separates samples by sex for downstream analysis. Make sure that the metadata file contains the Sex column, that specifies Male and Female for patient annotation. If other value groups are used, please modify it at line 117 of the snakemake file.

Metadata description

Annotation table: List of samples (identical to sample names), comma-separated with columns providing information about the tissue and organism (not used in the pipeline, but relevant for potential downstream analysis). Example annotation table is provided in test/config
Metadata table Metadata table should contain fixed first six colummns with sample annotaiton introduction: Sample_full_ID - sample id, same as in file names and anntation table, followed by columns providing information about the patient encrollment in the cohort: Record ID,Event Name,IRB Protocol,Date of Study Enrollment ,Age at Consent. This information is not used downstream, however those columns are skipped by the pipeline, so it's important to keep them (even as empty columns). Further columns contain numeric and categorical metadata. If there is no information avaliable for a given sample, please keep the column empty. Make sure that column names are R-friendly and don't contain any special characters. File should be comma-separated. Example is provided in test/meta
Peptide annotation table For each peptide coordinate prepare a tab-separated information file, contating the following information:

row, column (see the input data description)
Coordinate (row_column)
sequence (unique aminoacid sequence of each pepetide)
library (which peptide library is profiled - here we expect to have two peptide libraries)
is_duplicate (yes/no column. It is recommended to include several peptides more than ones as control. If a peptide is added a second time, indicate it with a yes. However first instance of each peptide should be indicated as no)
sequence_label (used for plotting) Example is provided in test/meta.

Input data description

CSV files named .csv are processed output from the ImageJ software. Each CSV file contains two technical replicas with rows and columns corresponding to the coordinates of the plate in the assay. Example data is provided in test/data

Analysis steps

Quality control and assesment of technical replicates
Quantile normalisation
Quantification of relationships between the data with clinical parameters (correlations, confounding analysis)
Differential analysis based on the metadata column

Downstream data analysis

In the paper 3 separate cohorts where used: CAVA cohort, profiled in 2 batches and BIKE cohorts for plasma and aorta plaque samples (matched). The two batches of CAVA cohort where processed independantly, then quantile normalised files with corrected for batch effect as shown in the downstream_analysis/XX.ipynb. Created file was further re-analsyed within the pipeline, using the custom_QN_file option. For some downstream steps of the BIKE cohort we customly repeated the quantile normalisation on the combined array across tissues: downstream_analysis/BIKE_normalization_QN_across_tissues.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
downstream_analsysis		downstream_analsysis
envs		envs
test		test
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IgM assay analyser

IgM processing pipeline

Running the pipeline

On HPC

On PC

Setup

Metadata description

Input data description

Analysis steps

Downstream data analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IgM assay analyser

IgM processing pipeline

Running the pipeline

On HPC

On PC

Setup

Metadata description

Input data description

Analysis steps

Downstream data analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages