From 2f6a1f3263d0d43502a3255afd287bb28e6e3815 Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Sun, 31 May 2026 10:31:11 +0800
Subject: [PATCH 1/7] feat(metrics): add LPIPS image quality metric

  - diffsynth/models/lpips.py: AlexNet/VGG16/SqueezeNet1.1 backbones, ScalingLayer, NetLinLayer, LPIPSModel, LPIPSCompute
  - diffsynth/metrics/lpips.py: LPIPSMetric.from_pretrained(net='alex'|'vgg'|'squeeze') with auto-derived ModelConfig
  - examples/image_quality_metric/lpips.py: img-vs-img and dir-vs-dir examples
  - Register 3 entries in image_metrics_series + identity state_dict converter
  - Numerically bit-exact with the official lpips package (verified on PerceptualSimilarity/imgs/ex_dir{0,1})
---
 PR_LPIPS.md                                   | 108 ++++++
 diffsynth/configs/model_configs.py            |  24 ++
 diffsynth/metrics/__init__.py                 |   2 +
 diffsynth/metrics/lpips.py                    |  65 ++++
 diffsynth/models/lpips.py                     | 353 ++++++++++++++++++
 .../state_dict_converters/image_metrics.py    |   4 +
 examples/image_quality_metric/lpips.py        |  24 ++
 test.sh                                       |   9 +
 8 files changed, 589 insertions(+)
 create mode 100644 PR_LPIPS.md
 create mode 100644 diffsynth/metrics/lpips.py
 create mode 100644 diffsynth/models/lpips.py
 create mode 100644 examples/image_quality_metric/lpips.py
 create mode 100755 test.sh

diff --git a/PR_LPIPS.md b/PR_LPIPS.md
new file mode 100644
index 000000000..24ee9b616
--- /dev/null
+++ b/PR_LPIPS.md
@@ -0,0 +1,108 @@
+# Add LPIPS image-quality metric
+
+## Summary
+
+Adds **LPIPS** (Learned Perceptual Image Patch Similarity, [Zhang et al. CVPR 2018](https://arxiv.org/abs/1801.03924)) to `diffsynth.metrics`, alongside the existing FID / CLIP / Aesthetic / PickScore / ImageReward / HPSv2 / HPSv3 metrics. Reference implementation: [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity).
+
+Three backbone variants (`alex` / `vgg` / `squeeze`) are supported and selectable through a single `net=...` flag — the matching `safetensors` weight file is auto-resolved when no `model_config` is given.
+
+## Files
+
+### New
+
+| File | Purpose |
+|------|---------|
+| `diffsynth/models/lpips.py` | Self-contained backbones (AlexNet / VGG16 / SqueezeNet1.1 features), `ScalingLayer`, `NetLinLayer`, top-level `LPIPSModel`, and `LPIPSCompute` (handles file/dir input, stem matching, conditional resize). No `torchvision.models` weight fetch — the registered safetensors carry every parameter. |
+| `diffsynth/metrics/lpips.py` | `LPIPSMetric.from_pretrained(net, ...)` matching the existing `FIDMetric` shape. Auto-derives the `ModelConfig` and `model_pool.fetch_model(...)` name from `net`. |
+| `examples/image_quality_metric/lpips.py` | Example covering both `img-vs-img` and `dir-vs-dir` calls on the existing FLUX example dataset. |
+
+### Modified
+
+| File | Change |
+|------|--------|
+| `diffsynth/metrics/__init__.py` | Export `LPIPSMetric` |
+| `diffsynth/configs/model_configs.py` | Three new entries in `image_metrics_series` (one per backbone), each with `extra_kwargs={"net": ...}` |
+| `diffsynth/utils/state_dict_converters/image_metrics.py` | Add `ImageMetricsLPIPSStateDictConverter` (identity converter — the uploaded safetensors already match `LPIPSModel.state_dict()`) |
+
+No other files changed; conda environment, other metrics, README, and docs are untouched.
+
+## Public API
+
+```python
+from diffsynth.metrics import LPIPSMetric
+
+# Default: alex backbone, file = LPIPS/alexnet.safetensors (~9.9 MB)
+metric = LPIPSMetric.from_pretrained(net="alex", device="cuda")
+
+# img vs img -> single float
+score = metric.compute("a.png", "b.png")
+
+# dir vs dir -> mean over filename-stem-matched pairs (float)
+score = metric.compute("./dir_a", "./dir_b")
+```
+
+Other supported kwargs: `net="vgg"|"squeeze"`, `target_size=512`, `batch_size=16`, `num_workers=0`, plus an optional explicit `model_config=ModelConfig(...)` to override the default weight file.
+
+## Behavior
+
+**`compute(image_a, image_b)`** dispatches by input type:
+
+| Both inputs | Behavior |
+|-------------|----------|
+| Image files / `PIL.Image` | If sizes match → no resize. If sizes differ → `Resize(target_size, BICUBIC)` + `CenterCrop(target_size)` (consistent with `diffsynth.models.image_reward`'s pattern). Returns a single float. |
+| Directories | Pair by filename stem (e.g. `dog.png` ↔ `dog.jpg` match; orphan files are ignored). If **all** images across both dirs share the same `(H, W)` → no resize; otherwise resize all. Returns the mean LPIPS over matched pairs. |
+| Mixed (one file, one dir) | `ValueError` |
+
+After `ToTensor`, values are clamped to `[0, 1]` before being mapped to the official `[-1, 1]` LPIPS input range — this guards against BICUBIC overshoot (other metrics in this repo also use BICUBIC; FID and ImageReward do not clamp, but LPIPS is sensitive to out-of-range inputs because `ScalingLayer` applies a per-channel mean/std).
+
+## Weights (uploaded to ModelScope)
+
+The three weight files are committed under `DiffSynth-Studio/ImageMetrics/LPIPS/` on ModelScope. Each one is a complete LPIPS state dict — `net.slice{1..N}.*` (backbone), `scaling_layer.shift/scale` (ImageNet color buffers), and 5 or 7 `lin{i}.model.1.weight` 1×1 conv weights — produced by combining the official torchvision ImageNet checkpoints with the LPIPS lin-layer weights from `richzhang/PerceptualSimilarity`'s `lpips/weights/v0.1/`.
+
+| File | Size | Hash (md5) | `model_name` |
+|------|------|------------|--------------|
+| `LPIPS/alexnet.safetensors` | ~9.9 MB | `08a75c660c9b2e775c530a0955857f1f` | `image_metrics_lpips_alex` |
+| `LPIPS/vgg.safetensors` | ~58.9 MB | `5740953aaa8aba2ecd9b9c23da813591` | `image_metrics_lpips_vgg` |
+| `LPIPS/squeezenet.safetensors` | ~2.9 MB | `ff994b70a30599287a332105396d5004` | `image_metrics_lpips_squeeze` |
+
+## Consistency with existing metrics
+
+- `LPIPSMetric` subclasses the same `Metric` base used by every other metric, and uses the standard `download_and_load_models` → `model_pool.fetch_model(...)` flow.
+- `from_pretrained(...)` follows the FID / CLIP signature shape: optional `model_config`, `device`, `vram_limit`, plus metric-specific kwargs.
+- All three backbones are registered in `image_metrics_series` with the same shape as the FID entry, just differentiated by `extra_kwargs={"net": ...}`.
+- The example file mirrors `examples/image_quality_metric/fid.py` (download via `dataset_snapshot_download`, then `metric.compute(...)`).
+
+## Test plan
+
+Tests run inside the user-provided `compound` conda env on CPU (login node had no GPU); the code path is device-agnostic.
+
+- [x] Numerical parity vs official `lpips` package on `PerceptualSimilarity/imgs/ex_dir{0,1}` (64×64, no resize):
+
+  | net | DiffSynth (mean) | Official `lpips` | abs diff |
+  |-----|------------------|-------------------|----------|
+  | alex | 0.429723 | 0.429723 | 6.7e-08 |
+  | vgg  | 0.495139 | 0.495139 | 1.5e-08 |
+  | squeeze | 0.429475 | 0.429475 | 6.0e-08 |
+
+  Per-pair img-vs-img scores match to `0.000000` for all 6 (3 nets × 2 pairs).
+
+- [x] State dict cross-check: every common key between the new safetensors and `lpips.LPIPS(net=...).state_dict()` is `torch.equal`-identical (alex 17/17, vgg 33/33, squeeze 59/59 keys; the only `lins.*` keys missing are `nn.ModuleList` aliases that point at the same tensors).
+
+- [x] `LPIPSModel.load_state_dict(...)` reports `0` missing and `0` unexpected keys for all three weight files.
+
+- [x] `model_pool.auto_load_model(...)` correctly identifies and loads the right backbone by hash for all three files.
+
+- [x] Behavioral edge cases:
+  - Same image vs itself → `0.0` (alex, exact)
+  - Different-sized images → BICUBIC resize path runs, returns a sensible non-zero score
+  - Mixed-size directory pair → all images are resized, returns mean
+  - Stem matching `dog.png` ↔ `dog.jpg` works
+  - Mixed input (one file, one directory) → `ValueError`
+
+- [x] Example script `examples/image_quality_metric/lpips.py` runs end-to-end (`alex` backbone, FLUX dataset). The `dir-vs-dir` score is `0.0000` because the `flux/FLUX.1-dev` and `flux2/FLUX.2-dev` example dirs contain byte-identical images (same as the FID example exhibits very-near-zero behavior); the `img-vs-img` call between two distinct images returns a sensible non-zero score.
+
+## Out of scope
+
+- README / `docs/.../Image-Quality-Metrics.md` table updates — left for a docs-only follow-up.
+- LPIPS as a training loss — only the inference metric path is added.
+- Resize strategies beyond center-crop + 512×512 BICUBIC — a single `target_size` knob covers the use cases requested.
diff --git a/diffsynth/configs/model_configs.py b/diffsynth/configs/model_configs.py
index 86619d611..a80050959 100644
--- a/diffsynth/configs/model_configs.py
+++ b/diffsynth/configs/model_configs.py
@@ -1071,6 +1071,30 @@
         "model_class": "diffsynth.models.fid.FIDInceptionModel",
         "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsFIDStateDictConverter",
     },
+    {
+        # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/alexnet.safetensors")
+        "model_hash": "08a75c660c9b2e775c530a0955857f1f",
+        "model_name": "image_metrics_lpips_alex",
+        "model_class": "diffsynth.models.lpips.LPIPSModel",
+        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
+        "extra_kwargs": {"net": "alex"},
+    },
+    {
+        # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/vgg.safetensors")
+        "model_hash": "5740953aaa8aba2ecd9b9c23da813591",
+        "model_name": "image_metrics_lpips_vgg",
+        "model_class": "diffsynth.models.lpips.LPIPSModel",
+        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
+        "extra_kwargs": {"net": "vgg"},
+    },
+    {
+        # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/squeezenet.safetensors")
+        "model_hash": "ff994b70a30599287a332105396d5004",
+        "model_name": "image_metrics_lpips_squeeze",
+        "model_class": "diffsynth.models.lpips.LPIPSModel",
+        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
+        "extra_kwargs": {"net": "squeeze"},
+    },
 ]
 
 hidream_o1_image_series = [
diff --git a/diffsynth/metrics/__init__.py b/diffsynth/metrics/__init__.py
index f555d3a7a..b816911d4 100644
--- a/diffsynth/metrics/__init__.py
+++ b/diffsynth/metrics/__init__.py
@@ -6,6 +6,7 @@
 from .hpsv2 import HPSv2Metric
 from .hpsv3 import HPSv3Metric
 from .image_reward import ImageRewardMetric
+from .lpips import LPIPSMetric
 from .pickscore import PickScoreMetric
 
 
@@ -19,4 +20,5 @@
     "CLIPMetric",
     "AestheticMetric",
     "FIDMetric",
+    "LPIPSMetric",
 ]
diff --git a/diffsynth/metrics/lpips.py b/diffsynth/metrics/lpips.py
new file mode 100644
index 000000000..9bf014d5b
--- /dev/null
+++ b/diffsynth/metrics/lpips.py
@@ -0,0 +1,65 @@
+import torch
+
+from ..core import ModelConfig
+from ..core.device.npu_compatible_device import get_device_type
+from ..models.lpips import LPIPSModel, LPIPS_NET_CHOICES, LPIPSCompute
+from .base import Metric
+
+
+_LPIPS_DEFAULT_FILES = {
+    "alex": "LPIPS/alexnet.safetensors",
+    "vgg": "LPIPS/vgg.safetensors",
+    "squeeze": "LPIPS/squeezenet.safetensors",
+}
+
+_LPIPS_MODEL_NAMES = {
+    "alex": "image_metrics_lpips_alex",
+    "vgg": "image_metrics_lpips_vgg",
+    "squeeze": "image_metrics_lpips_squeeze",
+}
+
+
+class LPIPSMetric(Metric):
+    def __init__(self, model: LPIPSCompute):
+        super().__init__()
+        self.model = model
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        net: str = "alex",
+        model_config: ModelConfig = None,
+        device: torch.device = get_device_type(),
+        batch_size: int = 16,
+        num_workers: int = 0,
+        target_size: int = 512,
+        vram_limit: float = None,
+    ):
+        if net not in LPIPS_NET_CHOICES:
+            raise ValueError(f"net must be one of {LPIPS_NET_CHOICES}, got {net!r}")
+        if model_config is None:
+            model_config = ModelConfig(
+                model_id="DiffSynth-Studio/ImageMetrics",
+                origin_file_pattern=_LPIPS_DEFAULT_FILES[net],
+            )
+        model_pool = cls.download_and_load_models([model_config], torch_dtype=torch.float32, device=device, vram_limit=vram_limit)
+        backbone = model_pool.fetch_model(_LPIPS_MODEL_NAMES[net])
+        if backbone is None:
+            raise RuntimeError(
+                f"Failed to load LPIPS model for net={net!r}. The provided weights do not match the registered hash for {_LPIPS_MODEL_NAMES[net]}."
+            )
+        compute_model = LPIPSCompute(
+            model=backbone,
+            device=device,
+            batch_size=batch_size,
+            num_workers=num_workers,
+            target_size=target_size,
+        )
+        return cls(compute_model)
+
+    @torch.no_grad()
+    def compute(self, image_a, image_b) -> float:
+        return self.model.compute(image_a, image_b)
+
+    def forward(self, image_a, image_b):
+        return self.compute(image_a, image_b)
diff --git a/diffsynth/models/lpips.py b/diffsynth/models/lpips.py
new file mode 100644
index 000000000..45650df9f
--- /dev/null
+++ b/diffsynth/models/lpips.py
@@ -0,0 +1,353 @@
+import os
+from collections import defaultdict
+from pathlib import Path
+from typing import Union
+
+import torch
+import torch.nn as nn
+from PIL import Image
+from torchvision import transforms
+
+ImageInput = Union[str, os.PathLike, Image.Image]
+
+IMAGE_EXTENSIONS = {".bmp", ".jpg", ".jpeg", ".pgm", ".png", ".ppm", ".tif", ".tiff", ".webp"}
+
+LPIPS_NET_CHOICES = ("alex", "vgg", "squeeze")
+
+
+def _list_image_files(path: Union[str, os.PathLike]):
+    path = os.fspath(path)
+    if not os.path.isdir(path):
+        raise ValueError(f"Expected a directory for LPIPS, got: {path}")
+    files = []
+    for entry in sorted(os.listdir(path)):
+        full = os.path.join(path, entry)
+        if os.path.isfile(full) and os.path.splitext(entry)[1].lower() in IMAGE_EXTENSIONS:
+            files.append(full)
+    if not files:
+        raise ValueError(f"No images found under {path}.")
+    return files
+
+
+def _pair_directories_by_stem(dir_a, dir_b):
+    files_a = _list_image_files(dir_a)
+    files_b = _list_image_files(dir_b)
+    by_stem_a = defaultdict(list)
+    for f in files_a:
+        by_stem_a[Path(f).stem].append(f)
+    by_stem_b = defaultdict(list)
+    for f in files_b:
+        by_stem_b[Path(f).stem].append(f)
+    common = sorted(set(by_stem_a.keys()) & set(by_stem_b.keys()))
+    if not common:
+        raise ValueError(f"No matching filename stems between {dir_a} and {dir_b}.")
+    pairs = []
+    for stem in common:
+        pairs.append((sorted(by_stem_a[stem])[0], sorted(by_stem_b[stem])[0]))
+    return pairs
+
+
+def _open_rgb(image: ImageInput) -> Image.Image:
+    if isinstance(image, (str, os.PathLike)):
+        image = Image.open(image)
+    if not isinstance(image, Image.Image):
+        raise TypeError(f"LPIPS expects PIL images or image paths, got {type(image)}.")
+    return image.convert("RGB")
+
+
+class _AlexFeatures(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.slice1 = nn.Sequential()
+        self.slice2 = nn.Sequential()
+        self.slice3 = nn.Sequential()
+        self.slice4 = nn.Sequential()
+        self.slice5 = nn.Sequential()
+        self.slice1.add_module("0", nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2))
+        self.slice1.add_module("1", nn.ReLU(inplace=True))
+        self.slice2.add_module("2", nn.MaxPool2d(kernel_size=3, stride=2))
+        self.slice2.add_module("3", nn.Conv2d(64, 192, kernel_size=5, padding=2))
+        self.slice2.add_module("4", nn.ReLU(inplace=True))
+        self.slice3.add_module("5", nn.MaxPool2d(kernel_size=3, stride=2))
+        self.slice3.add_module("6", nn.Conv2d(192, 384, kernel_size=3, padding=1))
+        self.slice3.add_module("7", nn.ReLU(inplace=True))
+        self.slice4.add_module("8", nn.Conv2d(384, 256, kernel_size=3, padding=1))
+        self.slice4.add_module("9", nn.ReLU(inplace=True))
+        self.slice5.add_module("10", nn.Conv2d(256, 256, kernel_size=3, padding=1))
+        self.slice5.add_module("11", nn.ReLU(inplace=True))
+
+    def forward(self, x):
+        h1 = self.slice1(x)
+        h2 = self.slice2(h1)
+        h3 = self.slice3(h2)
+        h4 = self.slice4(h3)
+        h5 = self.slice5(h4)
+        return [h1, h2, h3, h4, h5]
+
+
+class _VGG16Features(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.slice1 = nn.Sequential()
+        self.slice2 = nn.Sequential()
+        self.slice3 = nn.Sequential()
+        self.slice4 = nn.Sequential()
+        self.slice5 = nn.Sequential()
+        cfg = [
+            (1, 0, nn.Conv2d(3, 64, 3, padding=1)),
+            (1, 1, nn.ReLU(inplace=True)),
+            (1, 2, nn.Conv2d(64, 64, 3, padding=1)),
+            (1, 3, nn.ReLU(inplace=True)),
+            (2, 4, nn.MaxPool2d(2, 2)),
+            (2, 5, nn.Conv2d(64, 128, 3, padding=1)),
+            (2, 6, nn.ReLU(inplace=True)),
+            (2, 7, nn.Conv2d(128, 128, 3, padding=1)),
+            (2, 8, nn.ReLU(inplace=True)),
+            (3, 9, nn.MaxPool2d(2, 2)),
+            (3, 10, nn.Conv2d(128, 256, 3, padding=1)),
+            (3, 11, nn.ReLU(inplace=True)),
+            (3, 12, nn.Conv2d(256, 256, 3, padding=1)),
+            (3, 13, nn.ReLU(inplace=True)),
+            (3, 14, nn.Conv2d(256, 256, 3, padding=1)),
+            (3, 15, nn.ReLU(inplace=True)),
+            (4, 16, nn.MaxPool2d(2, 2)),
+            (4, 17, nn.Conv2d(256, 512, 3, padding=1)),
+            (4, 18, nn.ReLU(inplace=True)),
+            (4, 19, nn.Conv2d(512, 512, 3, padding=1)),
+            (4, 20, nn.ReLU(inplace=True)),
+            (4, 21, nn.Conv2d(512, 512, 3, padding=1)),
+            (4, 22, nn.ReLU(inplace=True)),
+            (5, 23, nn.MaxPool2d(2, 2)),
+            (5, 24, nn.Conv2d(512, 512, 3, padding=1)),
+            (5, 25, nn.ReLU(inplace=True)),
+            (5, 26, nn.Conv2d(512, 512, 3, padding=1)),
+            (5, 27, nn.ReLU(inplace=True)),
+            (5, 28, nn.Conv2d(512, 512, 3, padding=1)),
+            (5, 29, nn.ReLU(inplace=True)),
+        ]
+        for slice_idx, orig_idx, module in cfg:
+            getattr(self, f"slice{slice_idx}").add_module(str(orig_idx), module)
+
+    def forward(self, x):
+        h1 = self.slice1(x)
+        h2 = self.slice2(h1)
+        h3 = self.slice3(h2)
+        h4 = self.slice4(h3)
+        h5 = self.slice5(h4)
+        return [h1, h2, h3, h4, h5]
+
+
+class _Fire(nn.Module):
+    def __init__(self, in_channels, squeeze_channels, expand1x1_channels, expand3x3_channels):
+        super().__init__()
+        self.squeeze = nn.Conv2d(in_channels, squeeze_channels, kernel_size=1)
+        self.squeeze_activation = nn.ReLU(inplace=True)
+        self.expand1x1 = nn.Conv2d(squeeze_channels, expand1x1_channels, kernel_size=1)
+        self.expand1x1_activation = nn.ReLU(inplace=True)
+        self.expand3x3 = nn.Conv2d(squeeze_channels, expand3x3_channels, kernel_size=3, padding=1)
+        self.expand3x3_activation = nn.ReLU(inplace=True)
+
+    def forward(self, x):
+        x = self.squeeze_activation(self.squeeze(x))
+        return torch.cat(
+            [
+                self.expand1x1_activation(self.expand1x1(x)),
+                self.expand3x3_activation(self.expand3x3(x)),
+            ],
+            dim=1,
+        )
+
+
+class _SqueezeNet11Features(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.slice1 = nn.Sequential()
+        self.slice2 = nn.Sequential()
+        self.slice3 = nn.Sequential()
+        self.slice4 = nn.Sequential()
+        self.slice5 = nn.Sequential()
+        self.slice6 = nn.Sequential()
+        self.slice7 = nn.Sequential()
+        self.slice1.add_module("0", nn.Conv2d(3, 64, kernel_size=3, stride=2))
+        self.slice1.add_module("1", nn.ReLU(inplace=True))
+        self.slice2.add_module("2", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))
+        self.slice2.add_module("3", _Fire(64, 16, 64, 64))
+        self.slice2.add_module("4", _Fire(128, 16, 64, 64))
+        self.slice3.add_module("5", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))
+        self.slice3.add_module("6", _Fire(128, 32, 128, 128))
+        self.slice3.add_module("7", _Fire(256, 32, 128, 128))
+        self.slice4.add_module("8", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))
+        self.slice4.add_module("9", _Fire(256, 48, 192, 192))
+        self.slice5.add_module("10", _Fire(384, 48, 192, 192))
+        self.slice6.add_module("11", _Fire(384, 64, 256, 256))
+        self.slice7.add_module("12", _Fire(512, 64, 256, 256))
+
+    def forward(self, x):
+        h1 = self.slice1(x)
+        h2 = self.slice2(h1)
+        h3 = self.slice3(h2)
+        h4 = self.slice4(h3)
+        h5 = self.slice5(h4)
+        h6 = self.slice6(h5)
+        h7 = self.slice7(h6)
+        return [h1, h2, h3, h4, h5, h6, h7]
+
+
+_NET_CONFIG = {
+    "alex": {"factory": _AlexFeatures, "channels": (64, 192, 384, 256, 256)},
+    "vgg": {"factory": _VGG16Features, "channels": (64, 128, 256, 512, 512)},
+    "squeeze": {"factory": _SqueezeNet11Features, "channels": (64, 128, 256, 384, 384, 512, 512)},
+}
+
+
+class _ScalingLayer(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.register_buffer("shift", torch.tensor([-0.030, -0.088, -0.188]).view(1, 3, 1, 1))
+        self.register_buffer("scale", torch.tensor([0.458, 0.448, 0.450]).view(1, 3, 1, 1))
+
+    def forward(self, x):
+        return (x - self.shift) / self.scale
+
+
+class _NetLinLayer(nn.Module):
+    def __init__(self, chn_in, use_dropout=True):
+        super().__init__()
+        layers = []
+        if use_dropout:
+            layers.append(nn.Dropout())
+        layers.append(nn.Conv2d(chn_in, 1, kernel_size=1, stride=1, padding=0, bias=False))
+        self.model = nn.Sequential(*layers)
+
+    def forward(self, x):
+        return self.model(x)
+
+
+def _normalize_tensor(x, eps=1e-10):
+    norm = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True))
+    return x / (norm + eps)
+
+
+def _spatial_average(x):
+    return x.mean(dim=(2, 3), keepdim=True)
+
+
+class LPIPSModel(nn.Module):
+    def __init__(self, net: str = "alex", use_dropout: bool = True):
+        super().__init__()
+        if net not in _NET_CONFIG:
+            raise ValueError(f"net must be one of {LPIPS_NET_CHOICES}, got {net!r}")
+        self.net_name = net
+        self.scaling_layer = _ScalingLayer()
+        self.net = _NET_CONFIG[net]["factory"]()
+        chns = _NET_CONFIG[net]["channels"]
+        for i, chn in enumerate(chns):
+            setattr(self, f"lin{i}", _NetLinLayer(chn, use_dropout=use_dropout))
+        self.num_layers = len(chns)
+        for p in self.parameters():
+            p.requires_grad = False
+
+    def forward(self, in0, in1):
+        in0 = self.scaling_layer(in0)
+        in1 = self.scaling_layer(in1)
+        feats0 = self.net(in0)
+        feats1 = self.net(in1)
+        val = 0
+        for i in range(self.num_layers):
+            diff = (_normalize_tensor(feats0[i]) - _normalize_tensor(feats1[i])) ** 2
+            lin = getattr(self, f"lin{i}")
+            val = val + _spatial_average(lin(diff))
+        return val.view(-1)
+
+
+class LPIPSCompute(nn.Module):
+    def __init__(
+        self,
+        model: LPIPSModel,
+        device: Union[str, torch.device] = "cpu",
+        batch_size: int = 16,
+        num_workers: int = 0,
+        target_size: int = 512,
+    ):
+        super().__init__()
+        self.model = model
+        self.batch_size = batch_size
+        self.num_workers = num_workers
+        self.target_size = target_size
+        self._resize_transform = transforms.Compose(
+            [
+                transforms.Resize(target_size, interpolation=transforms.InterpolationMode.BICUBIC),
+                transforms.CenterCrop(target_size),
+                transforms.ToTensor(),
+            ]
+        )
+        self._raw_transform = transforms.ToTensor()
+        self.to(device)
+
+    @property
+    def device(self):
+        try:
+            return next(self.model.parameters()).device
+        except StopIteration:
+            return torch.device("cpu")
+
+    def _to_tensor(self, image: Image.Image, do_resize: bool) -> torch.Tensor:
+        transform = self._resize_transform if do_resize else self._raw_transform
+        x = transform(image).clamp(0.0, 1.0) * 2.0 - 1.0
+        return x
+
+    @torch.no_grad()
+    def _compute_pair(self, img_a: Image.Image, img_b: Image.Image, do_resize: bool) -> float:
+        x0 = self._to_tensor(img_a, do_resize).unsqueeze(0).to(self.device)
+        x1 = self._to_tensor(img_b, do_resize).unsqueeze(0).to(self.device)
+        return float(self.model(x0, x1).item())
+
+    @torch.no_grad()
+    def _compute_pairs(self, pairs, do_resize: bool) -> float:
+        scores = []
+        batch_size = max(1, self.batch_size)
+        for start in range(0, len(pairs), batch_size):
+            chunk = pairs[start : start + batch_size]
+            xs0 = torch.stack([self._to_tensor(_open_rgb(a), do_resize) for a, _ in chunk]).to(self.device)
+            xs1 = torch.stack([self._to_tensor(_open_rgb(b), do_resize) for _, b in chunk]).to(self.device)
+            scores.append(self.model(xs0, xs1).detach().cpu())
+        merged = torch.cat(scores, dim=0)
+        return float(merged.mean().item())
+
+    @staticmethod
+    def _is_dir(value) -> bool:
+        return isinstance(value, (str, os.PathLike)) and os.path.isdir(os.fspath(value))
+
+    @staticmethod
+    def _is_image_input(value) -> bool:
+        if isinstance(value, Image.Image):
+            return True
+        if isinstance(value, (str, os.PathLike)):
+            return os.path.isfile(os.fspath(value))
+        return False
+
+    def compute(self, image_a, image_b) -> float:
+        a_is_dir = self._is_dir(image_a)
+        b_is_dir = self._is_dir(image_b)
+        if a_is_dir != b_is_dir:
+            raise ValueError("LPIPS.compute requires both inputs to be directories or both to be single images.")
+
+        if a_is_dir:
+            pairs = _pair_directories_by_stem(image_a, image_b)
+            sizes = set()
+            for path_a, path_b in pairs:
+                with Image.open(path_a) as ia, Image.open(path_b) as ib:
+                    sizes.add(ia.size)
+                    sizes.add(ib.size)
+            do_resize = len(sizes) > 1
+            return self._compute_pairs(pairs, do_resize=do_resize)
+
+        if not (self._is_image_input(image_a) and self._is_image_input(image_b)):
+            raise ValueError("LPIPS.compute inputs must be image paths, PIL images, or directories.")
+        img_a = _open_rgb(image_a)
+        img_b = _open_rgb(image_b)
+        do_resize = img_a.size != img_b.size
+        return self._compute_pair(img_a, img_b, do_resize=do_resize)
+
+    def forward(self, image_a, image_b):
+        return self.compute(image_a, image_b)
diff --git a/diffsynth/utils/state_dict_converters/image_metrics.py b/diffsynth/utils/state_dict_converters/image_metrics.py
index 30c8b55a3..d781edd62 100644
--- a/diffsynth/utils/state_dict_converters/image_metrics.py
+++ b/diffsynth/utils/state_dict_converters/image_metrics.py
@@ -76,6 +76,10 @@ def ImageMetricsFIDStateDictConverter(state_dict):
     return {"model." + key: state_dict[key] for key in state_dict if not key.startswith("fc.")}
 
 
+def ImageMetricsLPIPSStateDictConverter(state_dict):
+    return {key: state_dict[key] for key in state_dict}
+
+
 def ImageMetricsHPSv3StateDictConverter(state_dict):
     converted = {}
     for key in state_dict:
diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py
new file mode 100644
index 000000000..7f1341c5e
--- /dev/null
+++ b/examples/image_quality_metric/lpips.py
@@ -0,0 +1,24 @@
+from diffsynth.metrics import LPIPSMetric
+from modelscope import dataset_snapshot_download
+
+dataset_snapshot_download(
+    "DiffSynth-Studio/diffsynth_example_dataset",
+    allow_file_pattern=["flux/FLUX.1-dev/*", "flux2/FLUX.2-dev/*"],
+    local_dir="./data/diffsynth_example_dataset",
+)
+metric = LPIPSMetric.from_pretrained(
+    net="alex",
+    device="cuda",
+)
+
+score = metric.compute(
+    "./data/diffsynth_example_dataset/flux/FLUX.1-dev/1.jpg",
+    "./data/diffsynth_example_dataset/flux/FLUX.1-dev/2.jpg",
+)
+print(f"LPIPS score (image vs image): {score:.4f}")
+
+score = metric.compute(
+    "./data/diffsynth_example_dataset/flux/FLUX.1-dev",
+    "./data/diffsynth_example_dataset/flux2/FLUX.2-dev",
+)
+print(f"LPIPS score (dir vs dir): {score:.4f}")
diff --git a/test.sh b/test.sh
new file mode 100755
index 000000000..e66ae226d
--- /dev/null
+++ b/test.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+export PYTHONPATH=$(pwd):$PYTHONPATH
+
+NGPUS=${NGPUS:-1}
+
+srun --partition=medai_p --mpi=pmi2 --gres=gpu:${NGPUS} --quotatype=reserved \
+     -n1 --ntasks-per-node=1 --cpus-per-task=8 \
+     --job-name=lpips --kill-on-bad-exit=1 \
+     python examples/image_quality_metric/lpips.py

From 29d8b793a2aedf2595ad8b7180748425b67c1261 Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Mon, 1 Jun 2026 17:01:02 +0800
Subject: [PATCH 2/7] try

---
 test.sh | 9 ---------
 1 file changed, 9 deletions(-)
 delete mode 100755 test.sh

diff --git a/test.sh b/test.sh
deleted file mode 100755
index e66ae226d..000000000
--- a/test.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-#!/bin/bash
-export PYTHONPATH=$(pwd):$PYTHONPATH
-
-NGPUS=${NGPUS:-1}
-
-srun --partition=medai_p --mpi=pmi2 --gres=gpu:${NGPUS} --quotatype=reserved \
-     -n1 --ntasks-per-node=1 --cpus-per-task=8 \
-     --job-name=lpips --kill-on-bad-exit=1 \
-     python examples/image_quality_metric/lpips.py

From 4dbbbc57c88ac70b26489b4f4608d754a63b523b Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Mon, 1 Jun 2026 22:52:41 +0800
Subject: [PATCH 3/7] add default target size

---
 .gitignore                             | 3 ++-
 examples/image_quality_metric/lpips.py | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index a511cf23f..b10b8a048 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,7 +13,8 @@
 *.msc
 *.mv
 log*.txt
-
+.claude
+test.sh
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py
index 7f1341c5e..f14c90a38 100644
--- a/examples/image_quality_metric/lpips.py
+++ b/examples/image_quality_metric/lpips.py
@@ -9,6 +9,7 @@
 metric = LPIPSMetric.from_pretrained(
     net="alex",
     device="cuda",
+    target=512,
 )
 
 score = metric.compute(

From 332942119da5df448083ff777cf9348165b2d7d6 Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Tue, 2 Jun 2026 13:04:40 +0800
Subject: [PATCH 4/7] remove .gitignore and md

---
 .gitignore  | 177 ----------------------------------------------------
 PR_LPIPS.md | 108 --------------------------------
 2 files changed, 285 deletions(-)
 delete mode 100644 .gitignore
 delete mode 100644 PR_LPIPS.md

diff --git a/.gitignore b/.gitignore
deleted file mode 100644
index b10b8a048..000000000
--- a/.gitignore
+++ /dev/null
@@ -1,177 +0,0 @@
-/data
-/models
-/scripts
-/diffusers
-/.vscode
-*.pkl
-*.safetensors
-*.pth
-*.ckpt
-*.pt
-*.bin
-*.DS_Store
-*.msc
-*.mv
-log*.txt
-.claude
-test.sh
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
\ No newline at end of file
diff --git a/PR_LPIPS.md b/PR_LPIPS.md
deleted file mode 100644
index 24ee9b616..000000000
--- a/PR_LPIPS.md
+++ /dev/null
@@ -1,108 +0,0 @@
-# Add LPIPS image-quality metric
-
-## Summary
-
-Adds **LPIPS** (Learned Perceptual Image Patch Similarity, [Zhang et al. CVPR 2018](https://arxiv.org/abs/1801.03924)) to `diffsynth.metrics`, alongside the existing FID / CLIP / Aesthetic / PickScore / ImageReward / HPSv2 / HPSv3 metrics. Reference implementation: [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity).
-
-Three backbone variants (`alex` / `vgg` / `squeeze`) are supported and selectable through a single `net=...` flag — the matching `safetensors` weight file is auto-resolved when no `model_config` is given.
-
-## Files
-
-### New
-
-| File | Purpose |
-|------|---------|
-| `diffsynth/models/lpips.py` | Self-contained backbones (AlexNet / VGG16 / SqueezeNet1.1 features), `ScalingLayer`, `NetLinLayer`, top-level `LPIPSModel`, and `LPIPSCompute` (handles file/dir input, stem matching, conditional resize). No `torchvision.models` weight fetch — the registered safetensors carry every parameter. |
-| `diffsynth/metrics/lpips.py` | `LPIPSMetric.from_pretrained(net, ...)` matching the existing `FIDMetric` shape. Auto-derives the `ModelConfig` and `model_pool.fetch_model(...)` name from `net`. |
-| `examples/image_quality_metric/lpips.py` | Example covering both `img-vs-img` and `dir-vs-dir` calls on the existing FLUX example dataset. |
-
-### Modified
-
-| File | Change |
-|------|--------|
-| `diffsynth/metrics/__init__.py` | Export `LPIPSMetric` |
-| `diffsynth/configs/model_configs.py` | Three new entries in `image_metrics_series` (one per backbone), each with `extra_kwargs={"net": ...}` |
-| `diffsynth/utils/state_dict_converters/image_metrics.py` | Add `ImageMetricsLPIPSStateDictConverter` (identity converter — the uploaded safetensors already match `LPIPSModel.state_dict()`) |
-
-No other files changed; conda environment, other metrics, README, and docs are untouched.
-
-## Public API
-
-```python
-from diffsynth.metrics import LPIPSMetric
-
-# Default: alex backbone, file = LPIPS/alexnet.safetensors (~9.9 MB)
-metric = LPIPSMetric.from_pretrained(net="alex", device="cuda")
-
-# img vs img -> single float
-score = metric.compute("a.png", "b.png")
-
-# dir vs dir -> mean over filename-stem-matched pairs (float)
-score = metric.compute("./dir_a", "./dir_b")
-```
-
-Other supported kwargs: `net="vgg"|"squeeze"`, `target_size=512`, `batch_size=16`, `num_workers=0`, plus an optional explicit `model_config=ModelConfig(...)` to override the default weight file.
-
-## Behavior
-
-**`compute(image_a, image_b)`** dispatches by input type:
-
-| Both inputs | Behavior |
-|-------------|----------|
-| Image files / `PIL.Image` | If sizes match → no resize. If sizes differ → `Resize(target_size, BICUBIC)` + `CenterCrop(target_size)` (consistent with `diffsynth.models.image_reward`'s pattern). Returns a single float. |
-| Directories | Pair by filename stem (e.g. `dog.png` ↔ `dog.jpg` match; orphan files are ignored). If **all** images across both dirs share the same `(H, W)` → no resize; otherwise resize all. Returns the mean LPIPS over matched pairs. |
-| Mixed (one file, one dir) | `ValueError` |
-
-After `ToTensor`, values are clamped to `[0, 1]` before being mapped to the official `[-1, 1]` LPIPS input range — this guards against BICUBIC overshoot (other metrics in this repo also use BICUBIC; FID and ImageReward do not clamp, but LPIPS is sensitive to out-of-range inputs because `ScalingLayer` applies a per-channel mean/std).
-
-## Weights (uploaded to ModelScope)
-
-The three weight files are committed under `DiffSynth-Studio/ImageMetrics/LPIPS/` on ModelScope. Each one is a complete LPIPS state dict — `net.slice{1..N}.*` (backbone), `scaling_layer.shift/scale` (ImageNet color buffers), and 5 or 7 `lin{i}.model.1.weight` 1×1 conv weights — produced by combining the official torchvision ImageNet checkpoints with the LPIPS lin-layer weights from `richzhang/PerceptualSimilarity`'s `lpips/weights/v0.1/`.
-
-| File | Size | Hash (md5) | `model_name` |
-|------|------|------------|--------------|
-| `LPIPS/alexnet.safetensors` | ~9.9 MB | `08a75c660c9b2e775c530a0955857f1f` | `image_metrics_lpips_alex` |
-| `LPIPS/vgg.safetensors` | ~58.9 MB | `5740953aaa8aba2ecd9b9c23da813591` | `image_metrics_lpips_vgg` |
-| `LPIPS/squeezenet.safetensors` | ~2.9 MB | `ff994b70a30599287a332105396d5004` | `image_metrics_lpips_squeeze` |
-
-## Consistency with existing metrics
-
-- `LPIPSMetric` subclasses the same `Metric` base used by every other metric, and uses the standard `download_and_load_models` → `model_pool.fetch_model(...)` flow.
-- `from_pretrained(...)` follows the FID / CLIP signature shape: optional `model_config`, `device`, `vram_limit`, plus metric-specific kwargs.
-- All three backbones are registered in `image_metrics_series` with the same shape as the FID entry, just differentiated by `extra_kwargs={"net": ...}`.
-- The example file mirrors `examples/image_quality_metric/fid.py` (download via `dataset_snapshot_download`, then `metric.compute(...)`).
-
-## Test plan
-
-Tests run inside the user-provided `compound` conda env on CPU (login node had no GPU); the code path is device-agnostic.
-
-- [x] Numerical parity vs official `lpips` package on `PerceptualSimilarity/imgs/ex_dir{0,1}` (64×64, no resize):
-
-  | net | DiffSynth (mean) | Official `lpips` | abs diff |
-  |-----|------------------|-------------------|----------|
-  | alex | 0.429723 | 0.429723 | 6.7e-08 |
-  | vgg  | 0.495139 | 0.495139 | 1.5e-08 |
-  | squeeze | 0.429475 | 0.429475 | 6.0e-08 |
-
-  Per-pair img-vs-img scores match to `0.000000` for all 6 (3 nets × 2 pairs).
-
-- [x] State dict cross-check: every common key between the new safetensors and `lpips.LPIPS(net=...).state_dict()` is `torch.equal`-identical (alex 17/17, vgg 33/33, squeeze 59/59 keys; the only `lins.*` keys missing are `nn.ModuleList` aliases that point at the same tensors).
-
-- [x] `LPIPSModel.load_state_dict(...)` reports `0` missing and `0` unexpected keys for all three weight files.
-
-- [x] `model_pool.auto_load_model(...)` correctly identifies and loads the right backbone by hash for all three files.
-
-- [x] Behavioral edge cases:
-  - Same image vs itself → `0.0` (alex, exact)
-  - Different-sized images → BICUBIC resize path runs, returns a sensible non-zero score
-  - Mixed-size directory pair → all images are resized, returns mean
-  - Stem matching `dog.png` ↔ `dog.jpg` works
-  - Mixed input (one file, one directory) → `ValueError`
-
-- [x] Example script `examples/image_quality_metric/lpips.py` runs end-to-end (`alex` backbone, FLUX dataset). The `dir-vs-dir` score is `0.0000` because the `flux/FLUX.1-dev` and `flux2/FLUX.2-dev` example dirs contain byte-identical images (same as the FID example exhibits very-near-zero behavior); the `img-vs-img` call between two distinct images returns a sensible non-zero score.
-
-## Out of scope
-
-- README / `docs/.../Image-Quality-Metrics.md` table updates — left for a docs-only follow-up.
-- LPIPS as a training loss — only the inference metric path is added.
-- Resize strategies beyond center-crop + 512×512 BICUBIC — a single `target_size` knob covers the use cases requested.

From 5db7301bda2e917dda97550a2851f4741755fba4 Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Tue, 2 Jun 2026 13:10:57 +0800
Subject: [PATCH 5/7] fix target

---
 examples/image_quality_metric/lpips.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py
index f14c90a38..a34ac996f 100644
--- a/examples/image_quality_metric/lpips.py
+++ b/examples/image_quality_metric/lpips.py
@@ -9,7 +9,7 @@
 metric = LPIPSMetric.from_pretrained(
     net="alex",
     device="cuda",
-    target=512,
+    target_size=512,
 )
 
 score = metric.compute(

From 414dfc7d720b1bb3d2ef07a795892b66af695ece Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Tue, 2 Jun 2026 13:30:44 +0800
Subject: [PATCH 6/7] fix example, converter and numworker

---
 diffsynth/configs/model_configs.py                     |  3 ---
 diffsynth/metrics/lpips.py                             |  2 --
 diffsynth/models/lpips.py                              |  2 --
 diffsynth/utils/state_dict_converters/image_metrics.py |  4 ----
 examples/image_quality_metric/lpips.py                 | 10 +++++++++-
 5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/diffsynth/configs/model_configs.py b/diffsynth/configs/model_configs.py
index a80050959..a0b5d6549 100644
--- a/diffsynth/configs/model_configs.py
+++ b/diffsynth/configs/model_configs.py
@@ -1076,7 +1076,6 @@
         "model_hash": "08a75c660c9b2e775c530a0955857f1f",
         "model_name": "image_metrics_lpips_alex",
         "model_class": "diffsynth.models.lpips.LPIPSModel",
-        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
         "extra_kwargs": {"net": "alex"},
     },
     {
@@ -1084,7 +1083,6 @@
         "model_hash": "5740953aaa8aba2ecd9b9c23da813591",
         "model_name": "image_metrics_lpips_vgg",
         "model_class": "diffsynth.models.lpips.LPIPSModel",
-        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
         "extra_kwargs": {"net": "vgg"},
     },
     {
@@ -1092,7 +1090,6 @@
         "model_hash": "ff994b70a30599287a332105396d5004",
         "model_name": "image_metrics_lpips_squeeze",
         "model_class": "diffsynth.models.lpips.LPIPSModel",
-        "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter",
         "extra_kwargs": {"net": "squeeze"},
     },
 ]
diff --git a/diffsynth/metrics/lpips.py b/diffsynth/metrics/lpips.py
index 9bf014d5b..dc5ae3634 100644
--- a/diffsynth/metrics/lpips.py
+++ b/diffsynth/metrics/lpips.py
@@ -31,7 +31,6 @@ def from_pretrained(
         model_config: ModelConfig = None,
         device: torch.device = get_device_type(),
         batch_size: int = 16,
-        num_workers: int = 0,
         target_size: int = 512,
         vram_limit: float = None,
     ):
@@ -52,7 +51,6 @@ def from_pretrained(
             model=backbone,
             device=device,
             batch_size=batch_size,
-            num_workers=num_workers,
             target_size=target_size,
         )
         return cls(compute_model)
diff --git a/diffsynth/models/lpips.py b/diffsynth/models/lpips.py
index 45650df9f..4c7847c89 100644
--- a/diffsynth/models/lpips.py
+++ b/diffsynth/models/lpips.py
@@ -266,13 +266,11 @@ def __init__(
         model: LPIPSModel,
         device: Union[str, torch.device] = "cpu",
         batch_size: int = 16,
-        num_workers: int = 0,
         target_size: int = 512,
     ):
         super().__init__()
         self.model = model
         self.batch_size = batch_size
-        self.num_workers = num_workers
         self.target_size = target_size
         self._resize_transform = transforms.Compose(
             [
diff --git a/diffsynth/utils/state_dict_converters/image_metrics.py b/diffsynth/utils/state_dict_converters/image_metrics.py
index d781edd62..30c8b55a3 100644
--- a/diffsynth/utils/state_dict_converters/image_metrics.py
+++ b/diffsynth/utils/state_dict_converters/image_metrics.py
@@ -76,10 +76,6 @@ def ImageMetricsFIDStateDictConverter(state_dict):
     return {"model." + key: state_dict[key] for key in state_dict if not key.startswith("fc.")}
 
 
-def ImageMetricsLPIPSStateDictConverter(state_dict):
-    return {key: state_dict[key] for key in state_dict}
-
-
 def ImageMetricsHPSv3StateDictConverter(state_dict):
     converted = {}
     for key in state_dict:
diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py
index a34ac996f..6e15b5a91 100644
--- a/examples/image_quality_metric/lpips.py
+++ b/examples/image_quality_metric/lpips.py
@@ -1,4 +1,4 @@
-from diffsynth.metrics import LPIPSMetric
+from diffsynth.metrics import LPIPSMetric, ModelConfig
 from modelscope import dataset_snapshot_download
 
 dataset_snapshot_download(
@@ -6,8 +6,16 @@
     allow_file_pattern=["flux/FLUX.1-dev/*", "flux2/FLUX.2-dev/*"],
     local_dir="./data/diffsynth_example_dataset",
 )
+
+# net="alex" with LPIPS/alexnet.safetensors (default)
+# For VGG: net="vgg", model_config=ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/vgg.safetensors")
+# For SqueezeNet: net="squeeze", model_config=ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/squeezenet.safetensors")
 metric = LPIPSMetric.from_pretrained(
     net="alex",
+    model_config=ModelConfig(
+        model_id="DiffSynth-Studio/ImageMetrics",
+        origin_file_pattern="LPIPS/alexnet.safetensors",
+    ),
     device="cuda",
     target_size=512,
 )

From 2835af4ad7d07c0ffc6b20b122b4f252864dc1c0 Mon Sep 17 00:00:00 2001
From: yuze <StevenSun@sjtu.edu.cn>
Date: Tue, 2 Jun 2026 13:43:49 +0800
Subject: [PATCH 7/7] .gitignore

---
 .gitignore | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 .gitignore

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000000000..a511cf23f
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,176 @@
+/data
+/models
+/scripts
+/diffusers
+/.vscode
+*.pkl
+*.safetensors
+*.pth
+*.ckpt
+*.pt
+*.bin
+*.DS_Store
+*.msc
+*.mv
+log*.txt
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
\ No newline at end of file