Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,7 @@ target
docker-squash.iml
**/image.tar
**/tox.tar

.cursor/*

*.tar
82 changes: 82 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,85 @@ Let's confirm the image structure now:
6ee235cf4473 3 weeks ago /bin/sh -c #(nop) LABEL name=CentOS Base Imag 0 B
474c2ee77fa3 3 weeks ago /bin/sh -c #(nop) ADD file:72852fc7626d233343 196.6 MB
1544084fad81 6 months ago /bin/sh -c #(nop) MAINTAINER The CentOS Proje 0 B

Working without Docker daemon
-----------------------------

Sometimes you may want to squash an image without direct access to Docker daemon (e.g., in CI/CD pipelines,
air-gapped environments, or when Docker is not running). The ``--input-tar`` parameter allows you to process
Docker images exported as tar files without requiring a Docker daemon connection.

**Step 1**: Export the image to a tar file using ``docker save``:

::

$ docker save -o source.tar jboss/wildfly:latest

**Step 2**: Squash the image from the tar file. Let's squash the last 8 layers:

::

$ python -m docker_squash.cli --input-tar source.tar --tag jboss/wildfly:squashed -f 8 --output-path squashed.tar --load-image false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be run without the -f parameter as the log out below has the squashed image larger than the original which is a confusing result for a README. Also, if both docker squash and tar squash have an example showing the same result IMHO its more inituitive.

Copy link
Copy Markdown
Author

@vulyon vulyon Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, jboss/wildfly:latest this image has changed.

(base) root@master:~# docker pull jboss/wildfly:latest
latest: Pulling from jboss/wildfly
f87ff222252e: Pull complete
8116b2f7ca5a: Pull complete
0b43aea4eeb1: Pull complete
13776e8da872: Pull complete
f26d32e28c29: Pull complete
Digest: sha256:35320abafdec6d360559b411aff466514d5741c3c527221445f48246350fdfe5
Status: Downloaded newer image for jboss/wildfly:latest
docker.io/jboss/wildfly:latest

(base) root@master:~# docker history jboss/wildfly:latest
IMAGE CREATED CREATED BY SIZE COMMENT
35320abafdec 3 years ago /bin/sh -c #(nop) CMD ["/opt/jboss/wildfly/… 0B
3 years ago /bin/sh -c #(nop) EXPOSE 8080 0B
3 years ago /bin/sh -c #(nop) USER jboss 0B
3 years ago /bin/sh -c #(nop) ENV LAUNCH_JBOSS_IN_BACKG… 0B
3 years ago /bin/sh -c cd $HOME && curl -L -O https:… 270MB
3 years ago /bin/sh -c #(nop) USER root 0B
3 years ago /bin/sh -c #(nop) ENV JBOSS_HOME=/opt/jboss… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_SHA1=238e67f4… 0B
3 years ago /bin/sh -c #(nop) ENV WILDFLY_VERSION=25.0.… 0B
4 years ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/jv… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c yum -y install java-11-openjdk-de… 239MB
4 years ago /bin/sh -c #(nop) USER root 0B
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
4 years ago /bin/sh -c #(nop) USER jboss 0B
4 years ago /bin/sh -c #(nop) WORKDIR /opt/jboss 0B
4 years ago /bin/sh -c groupadd -r jboss -g 1000 && user… 406kB
4 years ago /bin/sh -c yum update -y && yum -y install x… 33.5MB
4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
5 years ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
5 years ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
5 years ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 222MB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the readme.rst

2025-07-04 06:14:01,649 tar_image.py:83 INFO Extracting tar image from source.tar
2025-07-04 06:14:01,918 tar_image.py:102 INFO Detected OCI format image
2025-07-04 06:14:01,919 tar_image.py:254 INFO Preparing for squashing...
2025-07-04 06:14:01,919 tar_image.py:259 INFO Old image has 22 layers
2025-07-04 06:14:01,919 tar_image.py:305 INFO Will squash 8 layers
2025-07-04 06:14:01,919 tar_image.py:313 INFO Starting squashing process...
2025-07-04 06:14:01,919 image.py:750 INFO Starting squashing for /tmp/docker-squash-1strl2rh/new/squashed/layer.tar...
2025-07-04 06:14:04,001 image.py:775 INFO Squashing file '/tmp/docker-squash-1strl2rh/old/blobs/sha256/f26d32e28c292aba76defcdd67c267000d31a6ac3ebdab5c850aba90ef834927'...
2025-07-04 06:14:05,284 image.py:923 INFO Squashing finished!
2025-07-04 06:14:06,202 tar_image.py:632 WARNING OCI output format not fully implemented - creating Docker format
2025-07-04 06:14:06,202 tar_image.py:558 INFO Using user-specified tag: jboss/wildfly:squashed
2025-07-04 06:14:06,277 tar_image.py:352 INFO Squashing completed successfully
2025-07-04 06:14:06,277 tar_image.py:362 INFO Original image size: 382.24 MB
2025-07-04 06:14:06,277 tar_image.py:363 INFO Squashed image size: 421.60 MB
2025-07-04 06:14:06,277 tar_image.py:366 INFO If the squashed image is larger than original it means that there were no meaningful files to squash and it just added metadata. Are you sure you specified correct parameters?
2025-07-04 06:14:06,277 cli.py:179 INFO New squashed image ID is sha256:dbde9a2e59a3975663b55773510f36c14b5046f4ef26a84f84445d406124772d
2025-07-04 06:14:06,277 tar_image.py:732 INFO Exporting squashed image to squashed.tar
2025-07-04 06:14:07,544 tar_image.py:742 INFO Export completed successfully
2025-07-04 06:14:07,544 cli.py:195 INFO Done

**Step 3**: Load the squashed image back into Docker:

::

$ docker load -i squashed.tar
Loaded image: jboss/wildfly:squashed

Now you can verify the squashed image structure:

::

$ docker history jboss/wildfly:squashed
IMAGE CREATED CREATED BY SIZE COMMENT
9d47ef6da59f 41 seconds ago 270MB Squashed layers
<missing> 3 years ago /bin/sh -c #(nop) ENV WILDFLY_VERSION=25.0.… 0B
<missing> 4 years ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/jv… 0B
<missing> 4 years ago /bin/sh -c #(nop) USER jboss 0B
<missing> 4 years ago /bin/sh -c yum -y install java-11-openjdk-de… 239MB
<missing> 4 years ago /bin/sh -c #(nop) USER root 0B
<missing> 4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
<missing> 4 years ago /bin/sh -c #(nop) USER jboss 0B
<missing> 4 years ago /bin/sh -c #(nop) WORKDIR /opt/jboss 0B
<missing> 4 years ago /bin/sh -c groupadd -r jboss -g 1000 && user… 406kB
<missing> 4 years ago /bin/sh -c yum update -y && yum -y install x… 33.5MB
<missing> 4 years ago /bin/sh -c #(nop) MAINTAINER Marek Goldmann… 0B
<missing> 4 years ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 years ago /bin/sh -c #(nop) LABEL org.label-schema.sc… 0B
<missing> 4 years ago /bin/sh -c #(nop) ADD file:61908381d3142ffba… 222MB

**Key advantages of tar mode:**

- No Docker daemon required during squashing
- Works in CI/CD pipelines and restricted environments
- Supports both Docker format and OCI format images
- Maintains complete layer history compatibility
- Can process images on systems where Docker is not installed
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that its helpful when working with podman as well

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! That's a great point. The --input-tar feature is indeed very helpful for Podman users.

Since Podman uses podman save to export images in the same tar format as docker save, users can now:

# Export image with Podman
podman save myimage:latest -o image.tar

# Squash with docker-squash (no Docker daemon required)
docker-squash --input-tar image.tar --tag myimage:squashed --output-path squashed.tar

# Import back to Podman
podman load -i squashed.tar

This workflow is particularly valuable in environments where:

  • Only Podman is available (no Docker daemon)
  • Running in CI/CD pipelines with Podman
  • Working in rootless containers or restricted environments
  • Processing images offline without any container runtime

Should I add a Podman example to the documentation to highlight this use case?


**Important notes:**

- Always use ``--tag`` parameter to avoid overwriting the original image name
- Set ``--load-image false`` if you only want to export the squashed tar file
- Use ``--output-path`` to specify where the squashed tar should be saved
- The tool automatically detects image format (Docker vs OCI) from the input tar
96 changes: 83 additions & 13 deletions docker_squash/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,13 @@ def run(self):
"--version", action="version", help="Show version and exit", version=version
)

parser.add_argument("image", help="Image to be squashed")
parser.add_argument("image", nargs="?", help="Image to be squashed")

parser.add_argument(
"--input-tar",
help="Path to tar file created by 'docker save'. Process tar file directly without requiring Docker daemon.",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should investigate using exclusive groups for argparse - as that has built in support for having either the --input-tar or image option and would avoid the manual checks below.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - I think its valid for output-path to be the same as input-tar (?) , should, in tar mode, this be the default?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! I have the code changes.


parser.add_argument(
"-f",
"--from-layer",
Expand All @@ -79,7 +85,7 @@ def run(self):
parser.add_argument(
"-t",
"--tag",
help="Specify the tag to be used for the new image. If not specified no tag will be applied",
help="Specify the tag to be used for the squashed image (recommended). Without this, the squashed image will have no repository tags to avoid overwriting the original image.",
)
parser.add_argument(
"-m",
Expand Down Expand Up @@ -112,24 +118,27 @@ def run(self):

args = parser.parse_args()

if not args.input_tar and not args.image:
parser.error("Either 'image' or '--input-tar' must be specified")

if args.input_tar and args.image:
parser.error(
"Cannot specify both 'image' and '--input-tar' at the same time"
)

if args.verbose:
self.log.setLevel(logging.DEBUG)
else:
self.log.setLevel(logging.INFO)

self.log.debug("Running version %s", version)

try:
squash.Squash(
log=self.log,
image=args.image,
from_layer=args.from_layer,
tag=args.tag,
comment=args.message,
output_path=args.output_path,
load_image=args.load_image,
tmp_dir=args.tmp_dir,
cleanup=args.cleanup,
).run()
if args.input_tar:
self._run_tar_mode(args)
else:
self._run_image_mode(args)

except KeyboardInterrupt:
self.log.error("Program interrupted by user, exiting...")
sys.exit(1)
Expand All @@ -150,6 +159,67 @@ def run(self):

sys.exit(1)

def _run_tar_mode(self, args):
from docker_squash.tar_image import TarImage

# Provide helpful guidance about --tag parameter
if not args.tag:
self.log.info(
"💡 Tip: Consider using --tag to specify a name for your squashed image"
)
self.log.info(" Example: --tag myimage:squashed")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a tag make sense for an output tar? It is probably of only relevance if --load-image has been specified?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I respectfully disagree with this assessment. The --tag parameter is meaningful for output tar files regardless of the --load-image setting, here's why:

Tag is part of image metadata in tar format:

  • Docker/Podman tar format stores tags in manifest.json under RepoTags field
  • This metadata becomes part of the squashed tar file

Tag is useful in all scenarios:

  1. --load-image true: Image gets loaded with the specified tag
  2. --load-image false + --output-path: The output tar contains tag metadata, so when someone later runs docker load -i squashed.tar, the image will have the proper tag
  3. Distribution: Tagged tar files are more useful when shared with others

Without --tag, the consequences are significant:

# Without tag - image loads but has no name
$ docker load -i squashed.tar
Loaded image ID: sha256:abc123...
$ docker images
REPOSITORY TAG IMAGE ID
<none> <none> sha256:abc123... # Hard to identify!

# With tag - much more usable
$ docker load -i squashed.tar
Loaded image: myapp:squashed
$ docker images
REPOSITORY TAG IMAGE ID
myapp squashed sha256:abc123... # Clear identification

The tip message encourages good practices for tar-based workflows, not just --load-image scenarios. The tag becomes part of the portable tar artifact.


tar_image = TarImage(
log=self.log,
tar_path=args.input_tar,
from_layer=args.from_layer,
tmp_dir=args.tmp_dir,
tag=args.tag,
comment=args.message,
)

try:
new_image_id = tar_image.squash()
self.log.info("New squashed image ID is %s" % new_image_id)

if args.output_path:
tar_image.export_tar_archive(args.output_path)

if args.load_image:
tar_image.load_squashed_image()

if not args.output_path and not args.load_image:
import os
import tempfile

temp_output = os.path.join(
tempfile.gettempdir(), f"squashed-{new_image_id[:12]}.tar"
)
tar_image.export_tar_archive(temp_output)
self.log.info(
"Since no output path was specified and loading to Docker was disabled, "
f"the squashed image has been saved to: {temp_output}"
)

self.log.info("Done")

finally:
if not args.tmp_dir:
tar_image.cleanup()

def _run_image_mode(self, args):
squash.Squash(
log=self.log,
image=args.image,
from_layer=args.from_layer,
tag=args.tag,
comment=args.message,
output_path=args.output_path,
load_image=args.load_image,
tmp_dir=args.tmp_dir,
cleanup=args.cleanup,
).run()


def run():
cli = CLI()
Expand Down
Loading