From 84f2c122aba58d26845f90c3bbc4f70863361ea3 Mon Sep 17 00:00:00 2001 From: Matthias Bernt Date: Fri, 1 Jul 2022 11:29:26 +0200 Subject: [PATCH 1/6] add documentation how to avoid data modification by tools --- doc/source/admin/production.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md index 3ba2f84e7123..176c8e9d93d1 100644 --- a/doc/source/admin/production.md +++ b/doc/source/admin/production.md @@ -152,6 +152,25 @@ To get started with setting up local data, please see [Data Integration](https:/ File sizes have grown very large thanks to rapidly advancing sequencer technology, and it is not always practical to upload these files through the browser. Thankfully, a simple solution is to allow Galaxy users to upload them via FTP and import those files in to their histories. Configuration for FTP is explained on the [File Upload via FTP](special_topics/ftp.md) page. +### Protect Galaxy against data loss due to misbehaving tools + +Tools have access to the paths of input and output data sets which are stored in +``file_path`` and by default the credentials used for running tools are the same +as for running Galaxy. Thus its possible that tools modify data in Galaxy's +``file_path``. Examples for such changes are: + +- Addition of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. +- Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). + +Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. + +There are two approaches to protect Galaxy against this: + +- Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. +- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will in an environment that allows write access only for the job working dir. All other paths will be accessible read only. + +For both more information can be found in the [job configuration](jobs.md) documentatiion and see also [using a compute cluster](cluster.md). + ## Advanced configuration ### Load balancing and web application scaling From d4473a9939bc2156c97bb2ab7522df4d66fc6f23 Mon Sep 17 00:00:00 2001 From: M Bernt Date: Fri, 1 Jul 2022 11:38:00 +0200 Subject: [PATCH 2/6] Update doc/source/admin/production.md Co-authored-by: Marius van den Beek --- doc/source/admin/production.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md index 176c8e9d93d1..a1eb91edf7ea 100644 --- a/doc/source/admin/production.md +++ b/doc/source/admin/production.md @@ -164,10 +164,11 @@ as for running Galaxy. Thus its possible that tools modify data in Galaxy's Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. -There are two approaches to protect Galaxy against this: +There are three approaches to protect Galaxy against this: - Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. - Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will in an environment that allows write access only for the job working dir. All other paths will be accessible read only. +- Use pulsar to stage inputs and outputs For both more information can be found in the [job configuration](jobs.md) documentatiion and see also [using a compute cluster](cluster.md). From 99935b64efd8ec587e08b7946963952d237750f9 Mon Sep 17 00:00:00 2001 From: M Bernt Date: Fri, 1 Jul 2022 12:44:12 +0200 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: Helena --- doc/source/admin/production.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md index a1eb91edf7ea..cdd1a80ee664 100644 --- a/doc/source/admin/production.md +++ b/doc/source/admin/production.md @@ -159,7 +159,7 @@ Tools have access to the paths of input and output data sets which are stored in as for running Galaxy. Thus its possible that tools modify data in Galaxy's ``file_path``. Examples for such changes are: -- Addition of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. +- Creation of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. - Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. @@ -170,7 +170,7 @@ There are three approaches to protect Galaxy against this: - Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will in an environment that allows write access only for the job working dir. All other paths will be accessible read only. - Use pulsar to stage inputs and outputs -For both more information can be found in the [job configuration](jobs.md) documentatiion and see also [using a compute cluster](cluster.md). +More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md). ## Advanced configuration From fe10f76aba446a12ac237d671a39ee260a581bc7 Mon Sep 17 00:00:00 2001 From: M Bernt Date: Fri, 1 Jul 2022 14:07:36 +0200 Subject: [PATCH 4/6] Update doc/source/admin/production.md Co-authored-by: Martin Cech --- doc/source/admin/production.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md index cdd1a80ee664..f40eee952b74 100644 --- a/doc/source/admin/production.md +++ b/doc/source/admin/production.md @@ -167,7 +167,7 @@ Note that the tool only knows the paths to inputs and outputs, but if using the There are three approaches to protect Galaxy against this: - Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. -- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will in an environment that allows write access only for the job working dir. All other paths will be accessible read only. +- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will execute in an environment that allows write access only for the job working dir. All other paths will be accessible read only. - Use pulsar to stage inputs and outputs More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md). From a75e17e081e3c50007921c13abb4840e90f1165e Mon Sep 17 00:00:00 2001 From: Matthias Bernt Date: Fri, 1 Jul 2022 17:33:13 +0200 Subject: [PATCH 5/6] move to new top level category --- doc/source/admin/index.rst | 1 + doc/source/admin/production.md | 20 -------------------- doc/source/admin/security.md | 21 +++++++++++++++++++++ 3 files changed, 22 insertions(+), 20 deletions(-) create mode 100644 doc/source/admin/security.md diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst index efeebf35776f..3f8c582cb4bf 100644 --- a/doc/source/admin/index.rst +++ b/doc/source/admin/index.rst @@ -11,6 +11,7 @@ This documentation is in the midst of being ported and unified based on resource config config_logging production + security nginx apache scaling diff --git a/doc/source/admin/production.md b/doc/source/admin/production.md index f40eee952b74..3ba2f84e7123 100644 --- a/doc/source/admin/production.md +++ b/doc/source/admin/production.md @@ -152,26 +152,6 @@ To get started with setting up local data, please see [Data Integration](https:/ File sizes have grown very large thanks to rapidly advancing sequencer technology, and it is not always practical to upload these files through the browser. Thankfully, a simple solution is to allow Galaxy users to upload them via FTP and import those files in to their histories. Configuration for FTP is explained on the [File Upload via FTP](special_topics/ftp.md) page. -### Protect Galaxy against data loss due to misbehaving tools - -Tools have access to the paths of input and output data sets which are stored in -``file_path`` and by default the credentials used for running tools are the same -as for running Galaxy. Thus its possible that tools modify data in Galaxy's -``file_path``. Examples for such changes are: - -- Creation of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. -- Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). - -Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. - -There are three approaches to protect Galaxy against this: - -- Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. -- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will execute in an environment that allows write access only for the job working dir. All other paths will be accessible read only. -- Use pulsar to stage inputs and outputs - -More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md). - ## Advanced configuration ### Load balancing and web application scaling diff --git a/doc/source/admin/security.md b/doc/source/admin/security.md new file mode 100644 index 000000000000..7ce27dc80013 --- /dev/null +++ b/doc/source/admin/security.md @@ -0,0 +1,21 @@ +# Security considerations + +### Protect Galaxy against data loss due to misbehaving tools + +Tools have access to the paths of input and output data sets which are stored in +``file_path`` and by default the credentials used for running tools are the same +as for running Galaxy. Thus its possible that tools modify data in Galaxy's +``file_path``. Examples for such changes are: + +- Creation of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. +- Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). + +Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. + +There are three approaches to protect Galaxy against this: + +- Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. +- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will execute in an environment that allows write access only for the job working dir. All other paths will be accessible read only. +- Use pulsar to stage inputs and outputs + +More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md). From ed5bdcefec1cf7f6e128a9d55dc75d0dbc32545e Mon Sep 17 00:00:00 2001 From: M Bernt Date: Mon, 4 Jul 2022 14:27:29 +0200 Subject: [PATCH 6/6] Apply suggestions from code review Co-authored-by: Nicola Soranzo --- doc/source/admin/security.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/source/admin/security.md b/doc/source/admin/security.md index 7ce27dc80013..564cd590f88d 100644 --- a/doc/source/admin/security.md +++ b/doc/source/admin/security.md @@ -4,18 +4,18 @@ Tools have access to the paths of input and output data sets which are stored in ``file_path`` and by default the credentials used for running tools are the same -as for running Galaxy. Thus its possible that tools modify data in Galaxy's -``file_path``. Examples for such changes are: +as for running Galaxy. Thus it is possible that a tool modifies data in Galaxy's +``file_path``. Examples of such potential changes are: - Creation of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files. -- Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). +- Removal of tool input or output files. This will create problems with other tools using these datasets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur). -Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible. +Note that a tool only knows the paths to its inputs and outputs, but if using the default configuration for other paths (e.g. the configuration directory) also these paths can be calculated and accessed. -There are three approaches to protect Galaxy against this: +There are three approaches to protect Galaxy against these risks: - Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable. - Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will execute in an environment that allows write access only for the job working dir. All other paths will be accessible read only. -- Use pulsar to stage inputs and outputs +- Use [Pulsar](https://pulsar.readthedocs.io/) to stage inputs and outputs. More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md).