Lpips metric#1474
Conversation
- diffsynth/models/lpips.py: AlexNet/VGG16/SqueezeNet1.1 backbones, ScalingLayer, NetLinLayer, LPIPSModel, LPIPSCompute
- diffsynth/metrics/lpips.py: LPIPSMetric.from_pretrained(net='alex'|'vgg'|'squeeze') with auto-derived ModelConfig
- examples/image_quality_metric/lpips.py: img-vs-img and dir-vs-dir examples
- Register 3 entries in image_metrics_series + identity state_dict converter
- Numerically bit-exact with the official lpips package (verified on PerceptualSimilarity/imgs/ex_dir{0,1})
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an LPIPS (Learned Perceptual Image Patch Similarity) image-quality metric to diffsynth.metrics, including model registrations and an example script.
Changes:
- Introduce
LPIPSModel+LPIPSComputeto run LPIPS on single images or stem-matched directory pairs. - Add
LPIPSMetric.from_pretrained(...)and export it fromdiffsynth.metrics. - Register three LPIPS backbones (alex/vgg/squeeze) in
model_configs, plus an example script.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/image_quality_metric/lpips.py | Example usage downloading a dataset and computing LPIPS for image-vs-image and dir-vs-dir. |
| diffsynth/utils/state_dict_converters/image_metrics.py | Adds an LPIPS state-dict converter for ImageMetrics weights. |
| diffsynth/models/lpips.py | Implements LPIPS backbones, scaling/linear layers, and compute wrapper handling files/dirs and resizing. |
| diffsynth/metrics/lpips.py | Adds LPIPSMetric with from_pretrained integration into the model download/load flow. |
| diffsynth/metrics/init.py | Exports LPIPSMetric in the package API. |
| diffsynth/configs/model_configs.py | Registers three LPIPS model entries (alex/vgg/squeeze) with hashes and extra kwargs. |
| PR_LPIPS.md | PR documentation describing API/behavior/weights/test plan. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _open_rgb(image: ImageInput) -> Image.Image: | ||
| if isinstance(image, (str, os.PathLike)): | ||
| image = Image.open(image) | ||
| if not isinstance(image, Image.Image): | ||
| raise TypeError(f"LPIPS expects PIL images or image paths, got {type(image)}.") | ||
| return image.convert("RGB") |
| def _compute_pairs(self, pairs, do_resize: bool) -> float: | ||
| scores = [] | ||
| batch_size = max(1, self.batch_size) | ||
| for start in range(0, len(pairs), batch_size): | ||
| chunk = pairs[start : start + batch_size] | ||
| xs0 = torch.stack([self._to_tensor(_open_rgb(a), do_resize) for a, _ in chunk]).to(self.device) | ||
| xs1 = torch.stack([self._to_tensor(_open_rgb(b), do_resize) for _, b in chunk]).to(self.device) | ||
| scores.append(self.model(xs0, xs1).detach().cpu()) |
| def __init__( | ||
| self, | ||
| model: LPIPSModel, | ||
| device: Union[str, torch.device] = "cpu", | ||
| batch_size: int = 16, | ||
| num_workers: int = 0, | ||
| target_size: int = 512, | ||
| ): |
| def ImageMetricsLPIPSStateDictConverter(state_dict): | ||
| return {key: state_dict[key] for key in state_dict} |
|
|
||
| from ..core import ModelConfig | ||
| from ..core.device.npu_compatible_device import get_device_type | ||
| from ..models.lpips import LPIPSModel, LPIPS_NET_CHOICES, LPIPSCompute |
| dataset_snapshot_download( | ||
| "DiffSynth-Studio/diffsynth_example_dataset", | ||
| allow_file_pattern=["flux/FLUX.1-dev/*", "flux2/FLUX.2-dev/*"], | ||
| local_dir="./data/diffsynth_example_dataset", | ||
| ) | ||
| metric = LPIPSMetric.from_pretrained( | ||
| net="alex", | ||
| device="cuda", | ||
| ) |
| if a_is_dir: | ||
| pairs = _pair_directories_by_stem(image_a, image_b) | ||
| sizes = set() | ||
| for path_a, path_b in pairs: | ||
| with Image.open(path_a) as ia, Image.open(path_b) as ib: | ||
| sizes.add(ia.size) | ||
| sizes.add(ib.size) | ||
| do_resize = len(sizes) > 1 | ||
| return self._compute_pairs(pairs, do_resize=do_resize) |
There was a problem hiding this comment.
Code Review
This pull request introduces the LPIPS (Learned Perceptual Image Patch Similarity) image-quality metric to diffsynth.metrics, supporting AlexNet, VGG, and SqueezeNet backbones. Key feedback includes a critical issue where the model is not set to evaluation mode (.eval()), which causes non-deterministic scores due to active dropout layers during inference. Additionally, the num_workers parameter is currently unused, and it is recommended to add a force_resize option to optimize directory comparisons by skipping the expensive image size-checking loop.
| self._raw_transform = transforms.ToTensor() | ||
| self.to(device) |
There was a problem hiding this comment.
The model is not set to evaluation mode (.eval()). Since LPIPSModel contains _NetLinLayer which uses nn.Dropout, running the metric in default training mode will cause dropout to randomly zero out activations during inference. This makes the LPIPS score non-deterministic and incorrect. Setting the module to evaluation mode disables dropout.
| self._raw_transform = transforms.ToTensor() | |
| self.to(device) | |
| self._raw_transform = transforms.ToTensor() | |
| self.to(device) | |
| self.eval() |
| model: LPIPSModel, | ||
| device: Union[str, torch.device] = "cpu", | ||
| batch_size: int = 16, | ||
| num_workers: int = 0, |
There was a problem hiding this comment.
The num_workers parameter is accepted in the constructor but is completely unused. The batch loop in _compute_pairs loads and processes images sequentially on the main thread. If parallel loading is not planned, consider removing this parameter to avoid confusion, or document that it is currently unused.
| def __init__( | ||
| self, | ||
| model: LPIPSModel, | ||
| device: Union[str, torch.device] = "cpu", | ||
| batch_size: int = 16, | ||
| num_workers: int = 0, | ||
| target_size: int = 512, | ||
| ): |
There was a problem hiding this comment.
Comparing large directories requires opening every image file twice (once to check sizes, and once to load pixel data), which can be a significant performance bottleneck. Adding a force_resize parameter would allow users to skip the size-checking loop entirely when they already know they want to resize all images to target_size.
| def __init__( | |
| self, | |
| model: LPIPSModel, | |
| device: Union[str, torch.device] = "cpu", | |
| batch_size: int = 16, | |
| num_workers: int = 0, | |
| target_size: int = 512, | |
| ): | |
| def __init__( | |
| self, | |
| model: LPIPSModel, | |
| device: Union[str, torch.device] = "cpu", | |
| batch_size: int = 16, | |
| num_workers: int = 0, | |
| target_size: int = 512, | |
| force_resize: bool = False, | |
| ): |
| if a_is_dir: | ||
| pairs = _pair_directories_by_stem(image_a, image_b) | ||
| sizes = set() | ||
| for path_a, path_b in pairs: | ||
| with Image.open(path_a) as ia, Image.open(path_b) as ib: | ||
| sizes.add(ia.size) | ||
| sizes.add(ib.size) | ||
| do_resize = len(sizes) > 1 | ||
| return self._compute_pairs(pairs, do_resize=do_resize) |
There was a problem hiding this comment.
Use the force_resize attribute to skip the expensive size-checking loop when comparing directories.
| if a_is_dir: | |
| pairs = _pair_directories_by_stem(image_a, image_b) | |
| sizes = set() | |
| for path_a, path_b in pairs: | |
| with Image.open(path_a) as ia, Image.open(path_b) as ib: | |
| sizes.add(ia.size) | |
| sizes.add(ib.size) | |
| do_resize = len(sizes) > 1 | |
| return self._compute_pairs(pairs, do_resize=do_resize) | |
| if a_is_dir: | |
| pairs = _pair_directories_by_stem(image_a, image_b) | |
| if self.force_resize: | |
| do_resize = True | |
| else: | |
| sizes = set() | |
| for path_a, path_b in pairs: | |
| with Image.open(path_a) as ia, Image.open(path_b) as ib: | |
| sizes.add(ia.size) | |
| sizes.add(ib.size) | |
| do_resize = len(sizes) > 1 | |
| return self._compute_pairs(pairs, do_resize=do_resize) |
| def from_pretrained( | ||
| cls, | ||
| net: str = "alex", | ||
| model_config: ModelConfig = None, | ||
| device: torch.device = get_device_type(), | ||
| batch_size: int = 16, | ||
| num_workers: int = 0, | ||
| target_size: int = 512, | ||
| vram_limit: float = None, | ||
| ): |
There was a problem hiding this comment.
Expose the force_resize parameter in from_pretrained to allow skipping the expensive size-checking loop.
def from_pretrained(
cls,
net: str = "alex",
model_config: ModelConfig = None,
device: torch.device = get_device_type(),
batch_size: int = 16,
num_workers: int = 0,
target_size: int = 512,
vram_limit: float = None,
force_resize: bool = False,
):| compute_model = LPIPSCompute( | ||
| model=backbone, | ||
| device=device, | ||
| batch_size=batch_size, | ||
| num_workers=num_workers, | ||
| target_size=target_size, | ||
| ) |
There was a problem hiding this comment.
Pass force_resize to LPIPSCompute.
| compute_model = LPIPSCompute( | |
| model=backbone, | |
| device=device, | |
| batch_size=batch_size, | |
| num_workers=num_workers, | |
| target_size=target_size, | |
| ) | |
| compute_model = LPIPSCompute( | |
| model=backbone, | |
| device=device, | |
| batch_size=batch_size, | |
| num_workers=num_workers, | |
| target_size=target_size, | |
| force_resize=force_resize, | |
| ) |
"feat(metrics): add LPIPS image quality metric
diffsynth/models/lpips.py: AlexNet/VGG16/SqueezeNet1.1 backbones, ScalingLayer, NetLinLayer, LPIPSModel, LPIPSCompute
diffsynth/metrics/lpips.py: LPIPSMetric.from_pretrained(net='alex'|'vgg'|'squeeze') with auto-derived ModelConfig
examples/image_quality_metric/lpips.py: img-vs-img and dir-vs-dir examples
Register 3 entries in image_metrics_series + identity state_dict converter
Numerically bit-exact with the official lpips package (verified on PerceptualSimilarity/imgs/ex_dir{0,1})"