From 2f6a1f3263d0d43502a3255afd287bb28e6e3815 Mon Sep 17 00:00:00 2001 From: yuze Date: Sun, 31 May 2026 10:31:11 +0800 Subject: [PATCH 1/7] feat(metrics): add LPIPS image quality metric - diffsynth/models/lpips.py: AlexNet/VGG16/SqueezeNet1.1 backbones, ScalingLayer, NetLinLayer, LPIPSModel, LPIPSCompute - diffsynth/metrics/lpips.py: LPIPSMetric.from_pretrained(net='alex'|'vgg'|'squeeze') with auto-derived ModelConfig - examples/image_quality_metric/lpips.py: img-vs-img and dir-vs-dir examples - Register 3 entries in image_metrics_series + identity state_dict converter - Numerically bit-exact with the official lpips package (verified on PerceptualSimilarity/imgs/ex_dir{0,1}) --- PR_LPIPS.md | 108 ++++++ diffsynth/configs/model_configs.py | 24 ++ diffsynth/metrics/__init__.py | 2 + diffsynth/metrics/lpips.py | 65 ++++ diffsynth/models/lpips.py | 353 ++++++++++++++++++ .../state_dict_converters/image_metrics.py | 4 + examples/image_quality_metric/lpips.py | 24 ++ test.sh | 9 + 8 files changed, 589 insertions(+) create mode 100644 PR_LPIPS.md create mode 100644 diffsynth/metrics/lpips.py create mode 100644 diffsynth/models/lpips.py create mode 100644 examples/image_quality_metric/lpips.py create mode 100755 test.sh diff --git a/PR_LPIPS.md b/PR_LPIPS.md new file mode 100644 index 000000000..24ee9b616 --- /dev/null +++ b/PR_LPIPS.md @@ -0,0 +1,108 @@ +# Add LPIPS image-quality metric + +## Summary + +Adds **LPIPS** (Learned Perceptual Image Patch Similarity, [Zhang et al. CVPR 2018](https://arxiv.org/abs/1801.03924)) to `diffsynth.metrics`, alongside the existing FID / CLIP / Aesthetic / PickScore / ImageReward / HPSv2 / HPSv3 metrics. Reference implementation: [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity). + +Three backbone variants (`alex` / `vgg` / `squeeze`) are supported and selectable through a single `net=...` flag — the matching `safetensors` weight file is auto-resolved when no `model_config` is given. + +## Files + +### New + +| File | Purpose | +|------|---------| +| `diffsynth/models/lpips.py` | Self-contained backbones (AlexNet / VGG16 / SqueezeNet1.1 features), `ScalingLayer`, `NetLinLayer`, top-level `LPIPSModel`, and `LPIPSCompute` (handles file/dir input, stem matching, conditional resize). No `torchvision.models` weight fetch — the registered safetensors carry every parameter. | +| `diffsynth/metrics/lpips.py` | `LPIPSMetric.from_pretrained(net, ...)` matching the existing `FIDMetric` shape. Auto-derives the `ModelConfig` and `model_pool.fetch_model(...)` name from `net`. | +| `examples/image_quality_metric/lpips.py` | Example covering both `img-vs-img` and `dir-vs-dir` calls on the existing FLUX example dataset. | + +### Modified + +| File | Change | +|------|--------| +| `diffsynth/metrics/__init__.py` | Export `LPIPSMetric` | +| `diffsynth/configs/model_configs.py` | Three new entries in `image_metrics_series` (one per backbone), each with `extra_kwargs={"net": ...}` | +| `diffsynth/utils/state_dict_converters/image_metrics.py` | Add `ImageMetricsLPIPSStateDictConverter` (identity converter — the uploaded safetensors already match `LPIPSModel.state_dict()`) | + +No other files changed; conda environment, other metrics, README, and docs are untouched. + +## Public API + +```python +from diffsynth.metrics import LPIPSMetric + +# Default: alex backbone, file = LPIPS/alexnet.safetensors (~9.9 MB) +metric = LPIPSMetric.from_pretrained(net="alex", device="cuda") + +# img vs img -> single float +score = metric.compute("a.png", "b.png") + +# dir vs dir -> mean over filename-stem-matched pairs (float) +score = metric.compute("./dir_a", "./dir_b") +``` + +Other supported kwargs: `net="vgg"|"squeeze"`, `target_size=512`, `batch_size=16`, `num_workers=0`, plus an optional explicit `model_config=ModelConfig(...)` to override the default weight file. + +## Behavior + +**`compute(image_a, image_b)`** dispatches by input type: + +| Both inputs | Behavior | +|-------------|----------| +| Image files / `PIL.Image` | If sizes match → no resize. If sizes differ → `Resize(target_size, BICUBIC)` + `CenterCrop(target_size)` (consistent with `diffsynth.models.image_reward`'s pattern). Returns a single float. | +| Directories | Pair by filename stem (e.g. `dog.png` ↔ `dog.jpg` match; orphan files are ignored). If **all** images across both dirs share the same `(H, W)` → no resize; otherwise resize all. Returns the mean LPIPS over matched pairs. | +| Mixed (one file, one dir) | `ValueError` | + +After `ToTensor`, values are clamped to `[0, 1]` before being mapped to the official `[-1, 1]` LPIPS input range — this guards against BICUBIC overshoot (other metrics in this repo also use BICUBIC; FID and ImageReward do not clamp, but LPIPS is sensitive to out-of-range inputs because `ScalingLayer` applies a per-channel mean/std). + +## Weights (uploaded to ModelScope) + +The three weight files are committed under `DiffSynth-Studio/ImageMetrics/LPIPS/` on ModelScope. Each one is a complete LPIPS state dict — `net.slice{1..N}.*` (backbone), `scaling_layer.shift/scale` (ImageNet color buffers), and 5 or 7 `lin{i}.model.1.weight` 1×1 conv weights — produced by combining the official torchvision ImageNet checkpoints with the LPIPS lin-layer weights from `richzhang/PerceptualSimilarity`'s `lpips/weights/v0.1/`. + +| File | Size | Hash (md5) | `model_name` | +|------|------|------------|--------------| +| `LPIPS/alexnet.safetensors` | ~9.9 MB | `08a75c660c9b2e775c530a0955857f1f` | `image_metrics_lpips_alex` | +| `LPIPS/vgg.safetensors` | ~58.9 MB | `5740953aaa8aba2ecd9b9c23da813591` | `image_metrics_lpips_vgg` | +| `LPIPS/squeezenet.safetensors` | ~2.9 MB | `ff994b70a30599287a332105396d5004` | `image_metrics_lpips_squeeze` | + +## Consistency with existing metrics + +- `LPIPSMetric` subclasses the same `Metric` base used by every other metric, and uses the standard `download_and_load_models` → `model_pool.fetch_model(...)` flow. +- `from_pretrained(...)` follows the FID / CLIP signature shape: optional `model_config`, `device`, `vram_limit`, plus metric-specific kwargs. +- All three backbones are registered in `image_metrics_series` with the same shape as the FID entry, just differentiated by `extra_kwargs={"net": ...}`. +- The example file mirrors `examples/image_quality_metric/fid.py` (download via `dataset_snapshot_download`, then `metric.compute(...)`). + +## Test plan + +Tests run inside the user-provided `compound` conda env on CPU (login node had no GPU); the code path is device-agnostic. + +- [x] Numerical parity vs official `lpips` package on `PerceptualSimilarity/imgs/ex_dir{0,1}` (64×64, no resize): + + | net | DiffSynth (mean) | Official `lpips` | abs diff | + |-----|------------------|-------------------|----------| + | alex | 0.429723 | 0.429723 | 6.7e-08 | + | vgg | 0.495139 | 0.495139 | 1.5e-08 | + | squeeze | 0.429475 | 0.429475 | 6.0e-08 | + + Per-pair img-vs-img scores match to `0.000000` for all 6 (3 nets × 2 pairs). + +- [x] State dict cross-check: every common key between the new safetensors and `lpips.LPIPS(net=...).state_dict()` is `torch.equal`-identical (alex 17/17, vgg 33/33, squeeze 59/59 keys; the only `lins.*` keys missing are `nn.ModuleList` aliases that point at the same tensors). + +- [x] `LPIPSModel.load_state_dict(...)` reports `0` missing and `0` unexpected keys for all three weight files. + +- [x] `model_pool.auto_load_model(...)` correctly identifies and loads the right backbone by hash for all three files. + +- [x] Behavioral edge cases: + - Same image vs itself → `0.0` (alex, exact) + - Different-sized images → BICUBIC resize path runs, returns a sensible non-zero score + - Mixed-size directory pair → all images are resized, returns mean + - Stem matching `dog.png` ↔ `dog.jpg` works + - Mixed input (one file, one directory) → `ValueError` + +- [x] Example script `examples/image_quality_metric/lpips.py` runs end-to-end (`alex` backbone, FLUX dataset). The `dir-vs-dir` score is `0.0000` because the `flux/FLUX.1-dev` and `flux2/FLUX.2-dev` example dirs contain byte-identical images (same as the FID example exhibits very-near-zero behavior); the `img-vs-img` call between two distinct images returns a sensible non-zero score. + +## Out of scope + +- README / `docs/.../Image-Quality-Metrics.md` table updates — left for a docs-only follow-up. +- LPIPS as a training loss — only the inference metric path is added. +- Resize strategies beyond center-crop + 512×512 BICUBIC — a single `target_size` knob covers the use cases requested. diff --git a/diffsynth/configs/model_configs.py b/diffsynth/configs/model_configs.py index 86619d611..a80050959 100644 --- a/diffsynth/configs/model_configs.py +++ b/diffsynth/configs/model_configs.py @@ -1071,6 +1071,30 @@ "model_class": "diffsynth.models.fid.FIDInceptionModel", "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsFIDStateDictConverter", }, + { + # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/alexnet.safetensors") + "model_hash": "08a75c660c9b2e775c530a0955857f1f", + "model_name": "image_metrics_lpips_alex", + "model_class": "diffsynth.models.lpips.LPIPSModel", + "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", + "extra_kwargs": {"net": "alex"}, + }, + { + # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/vgg.safetensors") + "model_hash": "5740953aaa8aba2ecd9b9c23da813591", + "model_name": "image_metrics_lpips_vgg", + "model_class": "diffsynth.models.lpips.LPIPSModel", + "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", + "extra_kwargs": {"net": "vgg"}, + }, + { + # Example: ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/squeezenet.safetensors") + "model_hash": "ff994b70a30599287a332105396d5004", + "model_name": "image_metrics_lpips_squeeze", + "model_class": "diffsynth.models.lpips.LPIPSModel", + "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", + "extra_kwargs": {"net": "squeeze"}, + }, ] hidream_o1_image_series = [ diff --git a/diffsynth/metrics/__init__.py b/diffsynth/metrics/__init__.py index f555d3a7a..b816911d4 100644 --- a/diffsynth/metrics/__init__.py +++ b/diffsynth/metrics/__init__.py @@ -6,6 +6,7 @@ from .hpsv2 import HPSv2Metric from .hpsv3 import HPSv3Metric from .image_reward import ImageRewardMetric +from .lpips import LPIPSMetric from .pickscore import PickScoreMetric @@ -19,4 +20,5 @@ "CLIPMetric", "AestheticMetric", "FIDMetric", + "LPIPSMetric", ] diff --git a/diffsynth/metrics/lpips.py b/diffsynth/metrics/lpips.py new file mode 100644 index 000000000..9bf014d5b --- /dev/null +++ b/diffsynth/metrics/lpips.py @@ -0,0 +1,65 @@ +import torch + +from ..core import ModelConfig +from ..core.device.npu_compatible_device import get_device_type +from ..models.lpips import LPIPSModel, LPIPS_NET_CHOICES, LPIPSCompute +from .base import Metric + + +_LPIPS_DEFAULT_FILES = { + "alex": "LPIPS/alexnet.safetensors", + "vgg": "LPIPS/vgg.safetensors", + "squeeze": "LPIPS/squeezenet.safetensors", +} + +_LPIPS_MODEL_NAMES = { + "alex": "image_metrics_lpips_alex", + "vgg": "image_metrics_lpips_vgg", + "squeeze": "image_metrics_lpips_squeeze", +} + + +class LPIPSMetric(Metric): + def __init__(self, model: LPIPSCompute): + super().__init__() + self.model = model + + @classmethod + def from_pretrained( + cls, + net: str = "alex", + model_config: ModelConfig = None, + device: torch.device = get_device_type(), + batch_size: int = 16, + num_workers: int = 0, + target_size: int = 512, + vram_limit: float = None, + ): + if net not in LPIPS_NET_CHOICES: + raise ValueError(f"net must be one of {LPIPS_NET_CHOICES}, got {net!r}") + if model_config is None: + model_config = ModelConfig( + model_id="DiffSynth-Studio/ImageMetrics", + origin_file_pattern=_LPIPS_DEFAULT_FILES[net], + ) + model_pool = cls.download_and_load_models([model_config], torch_dtype=torch.float32, device=device, vram_limit=vram_limit) + backbone = model_pool.fetch_model(_LPIPS_MODEL_NAMES[net]) + if backbone is None: + raise RuntimeError( + f"Failed to load LPIPS model for net={net!r}. The provided weights do not match the registered hash for {_LPIPS_MODEL_NAMES[net]}." + ) + compute_model = LPIPSCompute( + model=backbone, + device=device, + batch_size=batch_size, + num_workers=num_workers, + target_size=target_size, + ) + return cls(compute_model) + + @torch.no_grad() + def compute(self, image_a, image_b) -> float: + return self.model.compute(image_a, image_b) + + def forward(self, image_a, image_b): + return self.compute(image_a, image_b) diff --git a/diffsynth/models/lpips.py b/diffsynth/models/lpips.py new file mode 100644 index 000000000..45650df9f --- /dev/null +++ b/diffsynth/models/lpips.py @@ -0,0 +1,353 @@ +import os +from collections import defaultdict +from pathlib import Path +from typing import Union + +import torch +import torch.nn as nn +from PIL import Image +from torchvision import transforms + +ImageInput = Union[str, os.PathLike, Image.Image] + +IMAGE_EXTENSIONS = {".bmp", ".jpg", ".jpeg", ".pgm", ".png", ".ppm", ".tif", ".tiff", ".webp"} + +LPIPS_NET_CHOICES = ("alex", "vgg", "squeeze") + + +def _list_image_files(path: Union[str, os.PathLike]): + path = os.fspath(path) + if not os.path.isdir(path): + raise ValueError(f"Expected a directory for LPIPS, got: {path}") + files = [] + for entry in sorted(os.listdir(path)): + full = os.path.join(path, entry) + if os.path.isfile(full) and os.path.splitext(entry)[1].lower() in IMAGE_EXTENSIONS: + files.append(full) + if not files: + raise ValueError(f"No images found under {path}.") + return files + + +def _pair_directories_by_stem(dir_a, dir_b): + files_a = _list_image_files(dir_a) + files_b = _list_image_files(dir_b) + by_stem_a = defaultdict(list) + for f in files_a: + by_stem_a[Path(f).stem].append(f) + by_stem_b = defaultdict(list) + for f in files_b: + by_stem_b[Path(f).stem].append(f) + common = sorted(set(by_stem_a.keys()) & set(by_stem_b.keys())) + if not common: + raise ValueError(f"No matching filename stems between {dir_a} and {dir_b}.") + pairs = [] + for stem in common: + pairs.append((sorted(by_stem_a[stem])[0], sorted(by_stem_b[stem])[0])) + return pairs + + +def _open_rgb(image: ImageInput) -> Image.Image: + if isinstance(image, (str, os.PathLike)): + image = Image.open(image) + if not isinstance(image, Image.Image): + raise TypeError(f"LPIPS expects PIL images or image paths, got {type(image)}.") + return image.convert("RGB") + + +class _AlexFeatures(nn.Module): + def __init__(self): + super().__init__() + self.slice1 = nn.Sequential() + self.slice2 = nn.Sequential() + self.slice3 = nn.Sequential() + self.slice4 = nn.Sequential() + self.slice5 = nn.Sequential() + self.slice1.add_module("0", nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2)) + self.slice1.add_module("1", nn.ReLU(inplace=True)) + self.slice2.add_module("2", nn.MaxPool2d(kernel_size=3, stride=2)) + self.slice2.add_module("3", nn.Conv2d(64, 192, kernel_size=5, padding=2)) + self.slice2.add_module("4", nn.ReLU(inplace=True)) + self.slice3.add_module("5", nn.MaxPool2d(kernel_size=3, stride=2)) + self.slice3.add_module("6", nn.Conv2d(192, 384, kernel_size=3, padding=1)) + self.slice3.add_module("7", nn.ReLU(inplace=True)) + self.slice4.add_module("8", nn.Conv2d(384, 256, kernel_size=3, padding=1)) + self.slice4.add_module("9", nn.ReLU(inplace=True)) + self.slice5.add_module("10", nn.Conv2d(256, 256, kernel_size=3, padding=1)) + self.slice5.add_module("11", nn.ReLU(inplace=True)) + + def forward(self, x): + h1 = self.slice1(x) + h2 = self.slice2(h1) + h3 = self.slice3(h2) + h4 = self.slice4(h3) + h5 = self.slice5(h4) + return [h1, h2, h3, h4, h5] + + +class _VGG16Features(nn.Module): + def __init__(self): + super().__init__() + self.slice1 = nn.Sequential() + self.slice2 = nn.Sequential() + self.slice3 = nn.Sequential() + self.slice4 = nn.Sequential() + self.slice5 = nn.Sequential() + cfg = [ + (1, 0, nn.Conv2d(3, 64, 3, padding=1)), + (1, 1, nn.ReLU(inplace=True)), + (1, 2, nn.Conv2d(64, 64, 3, padding=1)), + (1, 3, nn.ReLU(inplace=True)), + (2, 4, nn.MaxPool2d(2, 2)), + (2, 5, nn.Conv2d(64, 128, 3, padding=1)), + (2, 6, nn.ReLU(inplace=True)), + (2, 7, nn.Conv2d(128, 128, 3, padding=1)), + (2, 8, nn.ReLU(inplace=True)), + (3, 9, nn.MaxPool2d(2, 2)), + (3, 10, nn.Conv2d(128, 256, 3, padding=1)), + (3, 11, nn.ReLU(inplace=True)), + (3, 12, nn.Conv2d(256, 256, 3, padding=1)), + (3, 13, nn.ReLU(inplace=True)), + (3, 14, nn.Conv2d(256, 256, 3, padding=1)), + (3, 15, nn.ReLU(inplace=True)), + (4, 16, nn.MaxPool2d(2, 2)), + (4, 17, nn.Conv2d(256, 512, 3, padding=1)), + (4, 18, nn.ReLU(inplace=True)), + (4, 19, nn.Conv2d(512, 512, 3, padding=1)), + (4, 20, nn.ReLU(inplace=True)), + (4, 21, nn.Conv2d(512, 512, 3, padding=1)), + (4, 22, nn.ReLU(inplace=True)), + (5, 23, nn.MaxPool2d(2, 2)), + (5, 24, nn.Conv2d(512, 512, 3, padding=1)), + (5, 25, nn.ReLU(inplace=True)), + (5, 26, nn.Conv2d(512, 512, 3, padding=1)), + (5, 27, nn.ReLU(inplace=True)), + (5, 28, nn.Conv2d(512, 512, 3, padding=1)), + (5, 29, nn.ReLU(inplace=True)), + ] + for slice_idx, orig_idx, module in cfg: + getattr(self, f"slice{slice_idx}").add_module(str(orig_idx), module) + + def forward(self, x): + h1 = self.slice1(x) + h2 = self.slice2(h1) + h3 = self.slice3(h2) + h4 = self.slice4(h3) + h5 = self.slice5(h4) + return [h1, h2, h3, h4, h5] + + +class _Fire(nn.Module): + def __init__(self, in_channels, squeeze_channels, expand1x1_channels, expand3x3_channels): + super().__init__() + self.squeeze = nn.Conv2d(in_channels, squeeze_channels, kernel_size=1) + self.squeeze_activation = nn.ReLU(inplace=True) + self.expand1x1 = nn.Conv2d(squeeze_channels, expand1x1_channels, kernel_size=1) + self.expand1x1_activation = nn.ReLU(inplace=True) + self.expand3x3 = nn.Conv2d(squeeze_channels, expand3x3_channels, kernel_size=3, padding=1) + self.expand3x3_activation = nn.ReLU(inplace=True) + + def forward(self, x): + x = self.squeeze_activation(self.squeeze(x)) + return torch.cat( + [ + self.expand1x1_activation(self.expand1x1(x)), + self.expand3x3_activation(self.expand3x3(x)), + ], + dim=1, + ) + + +class _SqueezeNet11Features(nn.Module): + def __init__(self): + super().__init__() + self.slice1 = nn.Sequential() + self.slice2 = nn.Sequential() + self.slice3 = nn.Sequential() + self.slice4 = nn.Sequential() + self.slice5 = nn.Sequential() + self.slice6 = nn.Sequential() + self.slice7 = nn.Sequential() + self.slice1.add_module("0", nn.Conv2d(3, 64, kernel_size=3, stride=2)) + self.slice1.add_module("1", nn.ReLU(inplace=True)) + self.slice2.add_module("2", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)) + self.slice2.add_module("3", _Fire(64, 16, 64, 64)) + self.slice2.add_module("4", _Fire(128, 16, 64, 64)) + self.slice3.add_module("5", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)) + self.slice3.add_module("6", _Fire(128, 32, 128, 128)) + self.slice3.add_module("7", _Fire(256, 32, 128, 128)) + self.slice4.add_module("8", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)) + self.slice4.add_module("9", _Fire(256, 48, 192, 192)) + self.slice5.add_module("10", _Fire(384, 48, 192, 192)) + self.slice6.add_module("11", _Fire(384, 64, 256, 256)) + self.slice7.add_module("12", _Fire(512, 64, 256, 256)) + + def forward(self, x): + h1 = self.slice1(x) + h2 = self.slice2(h1) + h3 = self.slice3(h2) + h4 = self.slice4(h3) + h5 = self.slice5(h4) + h6 = self.slice6(h5) + h7 = self.slice7(h6) + return [h1, h2, h3, h4, h5, h6, h7] + + +_NET_CONFIG = { + "alex": {"factory": _AlexFeatures, "channels": (64, 192, 384, 256, 256)}, + "vgg": {"factory": _VGG16Features, "channels": (64, 128, 256, 512, 512)}, + "squeeze": {"factory": _SqueezeNet11Features, "channels": (64, 128, 256, 384, 384, 512, 512)}, +} + + +class _ScalingLayer(nn.Module): + def __init__(self): + super().__init__() + self.register_buffer("shift", torch.tensor([-0.030, -0.088, -0.188]).view(1, 3, 1, 1)) + self.register_buffer("scale", torch.tensor([0.458, 0.448, 0.450]).view(1, 3, 1, 1)) + + def forward(self, x): + return (x - self.shift) / self.scale + + +class _NetLinLayer(nn.Module): + def __init__(self, chn_in, use_dropout=True): + super().__init__() + layers = [] + if use_dropout: + layers.append(nn.Dropout()) + layers.append(nn.Conv2d(chn_in, 1, kernel_size=1, stride=1, padding=0, bias=False)) + self.model = nn.Sequential(*layers) + + def forward(self, x): + return self.model(x) + + +def _normalize_tensor(x, eps=1e-10): + norm = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True)) + return x / (norm + eps) + + +def _spatial_average(x): + return x.mean(dim=(2, 3), keepdim=True) + + +class LPIPSModel(nn.Module): + def __init__(self, net: str = "alex", use_dropout: bool = True): + super().__init__() + if net not in _NET_CONFIG: + raise ValueError(f"net must be one of {LPIPS_NET_CHOICES}, got {net!r}") + self.net_name = net + self.scaling_layer = _ScalingLayer() + self.net = _NET_CONFIG[net]["factory"]() + chns = _NET_CONFIG[net]["channels"] + for i, chn in enumerate(chns): + setattr(self, f"lin{i}", _NetLinLayer(chn, use_dropout=use_dropout)) + self.num_layers = len(chns) + for p in self.parameters(): + p.requires_grad = False + + def forward(self, in0, in1): + in0 = self.scaling_layer(in0) + in1 = self.scaling_layer(in1) + feats0 = self.net(in0) + feats1 = self.net(in1) + val = 0 + for i in range(self.num_layers): + diff = (_normalize_tensor(feats0[i]) - _normalize_tensor(feats1[i])) ** 2 + lin = getattr(self, f"lin{i}") + val = val + _spatial_average(lin(diff)) + return val.view(-1) + + +class LPIPSCompute(nn.Module): + def __init__( + self, + model: LPIPSModel, + device: Union[str, torch.device] = "cpu", + batch_size: int = 16, + num_workers: int = 0, + target_size: int = 512, + ): + super().__init__() + self.model = model + self.batch_size = batch_size + self.num_workers = num_workers + self.target_size = target_size + self._resize_transform = transforms.Compose( + [ + transforms.Resize(target_size, interpolation=transforms.InterpolationMode.BICUBIC), + transforms.CenterCrop(target_size), + transforms.ToTensor(), + ] + ) + self._raw_transform = transforms.ToTensor() + self.to(device) + + @property + def device(self): + try: + return next(self.model.parameters()).device + except StopIteration: + return torch.device("cpu") + + def _to_tensor(self, image: Image.Image, do_resize: bool) -> torch.Tensor: + transform = self._resize_transform if do_resize else self._raw_transform + x = transform(image).clamp(0.0, 1.0) * 2.0 - 1.0 + return x + + @torch.no_grad() + def _compute_pair(self, img_a: Image.Image, img_b: Image.Image, do_resize: bool) -> float: + x0 = self._to_tensor(img_a, do_resize).unsqueeze(0).to(self.device) + x1 = self._to_tensor(img_b, do_resize).unsqueeze(0).to(self.device) + return float(self.model(x0, x1).item()) + + @torch.no_grad() + def _compute_pairs(self, pairs, do_resize: bool) -> float: + scores = [] + batch_size = max(1, self.batch_size) + for start in range(0, len(pairs), batch_size): + chunk = pairs[start : start + batch_size] + xs0 = torch.stack([self._to_tensor(_open_rgb(a), do_resize) for a, _ in chunk]).to(self.device) + xs1 = torch.stack([self._to_tensor(_open_rgb(b), do_resize) for _, b in chunk]).to(self.device) + scores.append(self.model(xs0, xs1).detach().cpu()) + merged = torch.cat(scores, dim=0) + return float(merged.mean().item()) + + @staticmethod + def _is_dir(value) -> bool: + return isinstance(value, (str, os.PathLike)) and os.path.isdir(os.fspath(value)) + + @staticmethod + def _is_image_input(value) -> bool: + if isinstance(value, Image.Image): + return True + if isinstance(value, (str, os.PathLike)): + return os.path.isfile(os.fspath(value)) + return False + + def compute(self, image_a, image_b) -> float: + a_is_dir = self._is_dir(image_a) + b_is_dir = self._is_dir(image_b) + if a_is_dir != b_is_dir: + raise ValueError("LPIPS.compute requires both inputs to be directories or both to be single images.") + + if a_is_dir: + pairs = _pair_directories_by_stem(image_a, image_b) + sizes = set() + for path_a, path_b in pairs: + with Image.open(path_a) as ia, Image.open(path_b) as ib: + sizes.add(ia.size) + sizes.add(ib.size) + do_resize = len(sizes) > 1 + return self._compute_pairs(pairs, do_resize=do_resize) + + if not (self._is_image_input(image_a) and self._is_image_input(image_b)): + raise ValueError("LPIPS.compute inputs must be image paths, PIL images, or directories.") + img_a = _open_rgb(image_a) + img_b = _open_rgb(image_b) + do_resize = img_a.size != img_b.size + return self._compute_pair(img_a, img_b, do_resize=do_resize) + + def forward(self, image_a, image_b): + return self.compute(image_a, image_b) diff --git a/diffsynth/utils/state_dict_converters/image_metrics.py b/diffsynth/utils/state_dict_converters/image_metrics.py index 30c8b55a3..d781edd62 100644 --- a/diffsynth/utils/state_dict_converters/image_metrics.py +++ b/diffsynth/utils/state_dict_converters/image_metrics.py @@ -76,6 +76,10 @@ def ImageMetricsFIDStateDictConverter(state_dict): return {"model." + key: state_dict[key] for key in state_dict if not key.startswith("fc.")} +def ImageMetricsLPIPSStateDictConverter(state_dict): + return {key: state_dict[key] for key in state_dict} + + def ImageMetricsHPSv3StateDictConverter(state_dict): converted = {} for key in state_dict: diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py new file mode 100644 index 000000000..7f1341c5e --- /dev/null +++ b/examples/image_quality_metric/lpips.py @@ -0,0 +1,24 @@ +from diffsynth.metrics import LPIPSMetric +from modelscope import dataset_snapshot_download + +dataset_snapshot_download( + "DiffSynth-Studio/diffsynth_example_dataset", + allow_file_pattern=["flux/FLUX.1-dev/*", "flux2/FLUX.2-dev/*"], + local_dir="./data/diffsynth_example_dataset", +) +metric = LPIPSMetric.from_pretrained( + net="alex", + device="cuda", +) + +score = metric.compute( + "./data/diffsynth_example_dataset/flux/FLUX.1-dev/1.jpg", + "./data/diffsynth_example_dataset/flux/FLUX.1-dev/2.jpg", +) +print(f"LPIPS score (image vs image): {score:.4f}") + +score = metric.compute( + "./data/diffsynth_example_dataset/flux/FLUX.1-dev", + "./data/diffsynth_example_dataset/flux2/FLUX.2-dev", +) +print(f"LPIPS score (dir vs dir): {score:.4f}") diff --git a/test.sh b/test.sh new file mode 100755 index 000000000..e66ae226d --- /dev/null +++ b/test.sh @@ -0,0 +1,9 @@ +#!/bin/bash +export PYTHONPATH=$(pwd):$PYTHONPATH + +NGPUS=${NGPUS:-1} + +srun --partition=medai_p --mpi=pmi2 --gres=gpu:${NGPUS} --quotatype=reserved \ + -n1 --ntasks-per-node=1 --cpus-per-task=8 \ + --job-name=lpips --kill-on-bad-exit=1 \ + python examples/image_quality_metric/lpips.py From 29d8b793a2aedf2595ad8b7180748425b67c1261 Mon Sep 17 00:00:00 2001 From: yuze Date: Mon, 1 Jun 2026 17:01:02 +0800 Subject: [PATCH 2/7] try --- test.sh | 9 --------- 1 file changed, 9 deletions(-) delete mode 100755 test.sh diff --git a/test.sh b/test.sh deleted file mode 100755 index e66ae226d..000000000 --- a/test.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash -export PYTHONPATH=$(pwd):$PYTHONPATH - -NGPUS=${NGPUS:-1} - -srun --partition=medai_p --mpi=pmi2 --gres=gpu:${NGPUS} --quotatype=reserved \ - -n1 --ntasks-per-node=1 --cpus-per-task=8 \ - --job-name=lpips --kill-on-bad-exit=1 \ - python examples/image_quality_metric/lpips.py From 4dbbbc57c88ac70b26489b4f4608d754a63b523b Mon Sep 17 00:00:00 2001 From: yuze Date: Mon, 1 Jun 2026 22:52:41 +0800 Subject: [PATCH 3/7] add default target size --- .gitignore | 3 ++- examples/image_quality_metric/lpips.py | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index a511cf23f..b10b8a048 100644 --- a/.gitignore +++ b/.gitignore @@ -13,7 +13,8 @@ *.msc *.mv log*.txt - +.claude +test.sh # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py index 7f1341c5e..f14c90a38 100644 --- a/examples/image_quality_metric/lpips.py +++ b/examples/image_quality_metric/lpips.py @@ -9,6 +9,7 @@ metric = LPIPSMetric.from_pretrained( net="alex", device="cuda", + target=512, ) score = metric.compute( From 332942119da5df448083ff777cf9348165b2d7d6 Mon Sep 17 00:00:00 2001 From: yuze Date: Tue, 2 Jun 2026 13:04:40 +0800 Subject: [PATCH 4/7] remove .gitignore and md --- .gitignore | 177 ---------------------------------------------------- PR_LPIPS.md | 108 -------------------------------- 2 files changed, 285 deletions(-) delete mode 100644 .gitignore delete mode 100644 PR_LPIPS.md diff --git a/.gitignore b/.gitignore deleted file mode 100644 index b10b8a048..000000000 --- a/.gitignore +++ /dev/null @@ -1,177 +0,0 @@ -/data -/models -/scripts -/diffusers -/.vscode -*.pkl -*.safetensors -*.pth -*.ckpt -*.pt -*.bin -*.DS_Store -*.msc -*.mv -log*.txt -.claude -test.sh -# Byte-compiled / optimized / DLL files -__pycache__/ -*.py[cod] -*$py.class - -# C extensions -*.so - -# Distribution / packaging -.Python -build/ -develop-eggs/ -dist/ -downloads/ -eggs/ -.eggs/ -lib/ -lib64/ -parts/ -sdist/ -var/ -wheels/ -share/python-wheels/ -*.egg-info/ -.installed.cfg -*.egg -MANIFEST - -# PyInstaller -# Usually these files are written by a python script from a template -# before PyInstaller builds the exe, so as to inject date/other infos into it. -*.manifest -*.spec - -# Installer logs -pip-log.txt -pip-delete-this-directory.txt - -# Unit test / coverage reports -htmlcov/ -.tox/ -.nox/ -.coverage -.coverage.* -.cache -nosetests.xml -coverage.xml -*.cover -*.py,cover -.hypothesis/ -.pytest_cache/ -cover/ - -# Translations -*.mo -*.pot - -# Django stuff: -*.log -local_settings.py -db.sqlite3 -db.sqlite3-journal - -# Flask stuff: -instance/ -.webassets-cache - -# Scrapy stuff: -.scrapy - -# Sphinx documentation -docs/_build/ - -# PyBuilder -.pybuilder/ -target/ - -# Jupyter Notebook -.ipynb_checkpoints - -# IPython -profile_default/ -ipython_config.py - -# pyenv -# For a library or package, you might want to ignore these files since the code is -# intended to run in multiple environments; otherwise, check them in: -# .python-version - -# pipenv -# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. -# However, in case of collaboration, if having platform-specific dependencies or dependencies -# having no cross-platform support, pipenv may install dependencies that don't work, or not -# install all needed dependencies. -#Pipfile.lock - -# poetry -# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. -# This is especially recommended for binary packages to ensure reproducibility, and is more -# commonly ignored for libraries. -# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control -#poetry.lock - -# pdm -# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. -#pdm.lock -# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it -# in version control. -# https://pdm.fming.dev/#use-with-ide -.pdm.toml - -# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm -__pypackages__/ - -# Celery stuff -celerybeat-schedule -celerybeat.pid - -# SageMath parsed files -*.sage.py - -# Environments -.env -.venv -env/ -venv/ -ENV/ -env.bak/ -venv.bak/ - -# Spyder project settings -.spyderproject -.spyproject - -# Rope project settings -.ropeproject - -# mkdocs documentation -/site - -# mypy -.mypy_cache/ -.dmypy.json -dmypy.json - -# Pyre type checker -.pyre/ - -# pytype static type analyzer -.pytype/ - -# Cython debug symbols -cython_debug/ - -# PyCharm -# JetBrains specific template is maintained in a separate JetBrains.gitignore that can -# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore -# and can be added to the global gitignore or merged into this file. For a more nuclear -# option (not recommended) you can uncomment the following to ignore the entire idea folder. -#.idea/ \ No newline at end of file diff --git a/PR_LPIPS.md b/PR_LPIPS.md deleted file mode 100644 index 24ee9b616..000000000 --- a/PR_LPIPS.md +++ /dev/null @@ -1,108 +0,0 @@ -# Add LPIPS image-quality metric - -## Summary - -Adds **LPIPS** (Learned Perceptual Image Patch Similarity, [Zhang et al. CVPR 2018](https://arxiv.org/abs/1801.03924)) to `diffsynth.metrics`, alongside the existing FID / CLIP / Aesthetic / PickScore / ImageReward / HPSv2 / HPSv3 metrics. Reference implementation: [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity). - -Three backbone variants (`alex` / `vgg` / `squeeze`) are supported and selectable through a single `net=...` flag — the matching `safetensors` weight file is auto-resolved when no `model_config` is given. - -## Files - -### New - -| File | Purpose | -|------|---------| -| `diffsynth/models/lpips.py` | Self-contained backbones (AlexNet / VGG16 / SqueezeNet1.1 features), `ScalingLayer`, `NetLinLayer`, top-level `LPIPSModel`, and `LPIPSCompute` (handles file/dir input, stem matching, conditional resize). No `torchvision.models` weight fetch — the registered safetensors carry every parameter. | -| `diffsynth/metrics/lpips.py` | `LPIPSMetric.from_pretrained(net, ...)` matching the existing `FIDMetric` shape. Auto-derives the `ModelConfig` and `model_pool.fetch_model(...)` name from `net`. | -| `examples/image_quality_metric/lpips.py` | Example covering both `img-vs-img` and `dir-vs-dir` calls on the existing FLUX example dataset. | - -### Modified - -| File | Change | -|------|--------| -| `diffsynth/metrics/__init__.py` | Export `LPIPSMetric` | -| `diffsynth/configs/model_configs.py` | Three new entries in `image_metrics_series` (one per backbone), each with `extra_kwargs={"net": ...}` | -| `diffsynth/utils/state_dict_converters/image_metrics.py` | Add `ImageMetricsLPIPSStateDictConverter` (identity converter — the uploaded safetensors already match `LPIPSModel.state_dict()`) | - -No other files changed; conda environment, other metrics, README, and docs are untouched. - -## Public API - -```python -from diffsynth.metrics import LPIPSMetric - -# Default: alex backbone, file = LPIPS/alexnet.safetensors (~9.9 MB) -metric = LPIPSMetric.from_pretrained(net="alex", device="cuda") - -# img vs img -> single float -score = metric.compute("a.png", "b.png") - -# dir vs dir -> mean over filename-stem-matched pairs (float) -score = metric.compute("./dir_a", "./dir_b") -``` - -Other supported kwargs: `net="vgg"|"squeeze"`, `target_size=512`, `batch_size=16`, `num_workers=0`, plus an optional explicit `model_config=ModelConfig(...)` to override the default weight file. - -## Behavior - -**`compute(image_a, image_b)`** dispatches by input type: - -| Both inputs | Behavior | -|-------------|----------| -| Image files / `PIL.Image` | If sizes match → no resize. If sizes differ → `Resize(target_size, BICUBIC)` + `CenterCrop(target_size)` (consistent with `diffsynth.models.image_reward`'s pattern). Returns a single float. | -| Directories | Pair by filename stem (e.g. `dog.png` ↔ `dog.jpg` match; orphan files are ignored). If **all** images across both dirs share the same `(H, W)` → no resize; otherwise resize all. Returns the mean LPIPS over matched pairs. | -| Mixed (one file, one dir) | `ValueError` | - -After `ToTensor`, values are clamped to `[0, 1]` before being mapped to the official `[-1, 1]` LPIPS input range — this guards against BICUBIC overshoot (other metrics in this repo also use BICUBIC; FID and ImageReward do not clamp, but LPIPS is sensitive to out-of-range inputs because `ScalingLayer` applies a per-channel mean/std). - -## Weights (uploaded to ModelScope) - -The three weight files are committed under `DiffSynth-Studio/ImageMetrics/LPIPS/` on ModelScope. Each one is a complete LPIPS state dict — `net.slice{1..N}.*` (backbone), `scaling_layer.shift/scale` (ImageNet color buffers), and 5 or 7 `lin{i}.model.1.weight` 1×1 conv weights — produced by combining the official torchvision ImageNet checkpoints with the LPIPS lin-layer weights from `richzhang/PerceptualSimilarity`'s `lpips/weights/v0.1/`. - -| File | Size | Hash (md5) | `model_name` | -|------|------|------------|--------------| -| `LPIPS/alexnet.safetensors` | ~9.9 MB | `08a75c660c9b2e775c530a0955857f1f` | `image_metrics_lpips_alex` | -| `LPIPS/vgg.safetensors` | ~58.9 MB | `5740953aaa8aba2ecd9b9c23da813591` | `image_metrics_lpips_vgg` | -| `LPIPS/squeezenet.safetensors` | ~2.9 MB | `ff994b70a30599287a332105396d5004` | `image_metrics_lpips_squeeze` | - -## Consistency with existing metrics - -- `LPIPSMetric` subclasses the same `Metric` base used by every other metric, and uses the standard `download_and_load_models` → `model_pool.fetch_model(...)` flow. -- `from_pretrained(...)` follows the FID / CLIP signature shape: optional `model_config`, `device`, `vram_limit`, plus metric-specific kwargs. -- All three backbones are registered in `image_metrics_series` with the same shape as the FID entry, just differentiated by `extra_kwargs={"net": ...}`. -- The example file mirrors `examples/image_quality_metric/fid.py` (download via `dataset_snapshot_download`, then `metric.compute(...)`). - -## Test plan - -Tests run inside the user-provided `compound` conda env on CPU (login node had no GPU); the code path is device-agnostic. - -- [x] Numerical parity vs official `lpips` package on `PerceptualSimilarity/imgs/ex_dir{0,1}` (64×64, no resize): - - | net | DiffSynth (mean) | Official `lpips` | abs diff | - |-----|------------------|-------------------|----------| - | alex | 0.429723 | 0.429723 | 6.7e-08 | - | vgg | 0.495139 | 0.495139 | 1.5e-08 | - | squeeze | 0.429475 | 0.429475 | 6.0e-08 | - - Per-pair img-vs-img scores match to `0.000000` for all 6 (3 nets × 2 pairs). - -- [x] State dict cross-check: every common key between the new safetensors and `lpips.LPIPS(net=...).state_dict()` is `torch.equal`-identical (alex 17/17, vgg 33/33, squeeze 59/59 keys; the only `lins.*` keys missing are `nn.ModuleList` aliases that point at the same tensors). - -- [x] `LPIPSModel.load_state_dict(...)` reports `0` missing and `0` unexpected keys for all three weight files. - -- [x] `model_pool.auto_load_model(...)` correctly identifies and loads the right backbone by hash for all three files. - -- [x] Behavioral edge cases: - - Same image vs itself → `0.0` (alex, exact) - - Different-sized images → BICUBIC resize path runs, returns a sensible non-zero score - - Mixed-size directory pair → all images are resized, returns mean - - Stem matching `dog.png` ↔ `dog.jpg` works - - Mixed input (one file, one directory) → `ValueError` - -- [x] Example script `examples/image_quality_metric/lpips.py` runs end-to-end (`alex` backbone, FLUX dataset). The `dir-vs-dir` score is `0.0000` because the `flux/FLUX.1-dev` and `flux2/FLUX.2-dev` example dirs contain byte-identical images (same as the FID example exhibits very-near-zero behavior); the `img-vs-img` call between two distinct images returns a sensible non-zero score. - -## Out of scope - -- README / `docs/.../Image-Quality-Metrics.md` table updates — left for a docs-only follow-up. -- LPIPS as a training loss — only the inference metric path is added. -- Resize strategies beyond center-crop + 512×512 BICUBIC — a single `target_size` knob covers the use cases requested. From 5db7301bda2e917dda97550a2851f4741755fba4 Mon Sep 17 00:00:00 2001 From: yuze Date: Tue, 2 Jun 2026 13:10:57 +0800 Subject: [PATCH 5/7] fix target --- examples/image_quality_metric/lpips.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py index f14c90a38..a34ac996f 100644 --- a/examples/image_quality_metric/lpips.py +++ b/examples/image_quality_metric/lpips.py @@ -9,7 +9,7 @@ metric = LPIPSMetric.from_pretrained( net="alex", device="cuda", - target=512, + target_size=512, ) score = metric.compute( From 414dfc7d720b1bb3d2ef07a795892b66af695ece Mon Sep 17 00:00:00 2001 From: yuze Date: Tue, 2 Jun 2026 13:30:44 +0800 Subject: [PATCH 6/7] fix example, converter and numworker --- diffsynth/configs/model_configs.py | 3 --- diffsynth/metrics/lpips.py | 2 -- diffsynth/models/lpips.py | 2 -- diffsynth/utils/state_dict_converters/image_metrics.py | 4 ---- examples/image_quality_metric/lpips.py | 10 +++++++++- 5 files changed, 9 insertions(+), 12 deletions(-) diff --git a/diffsynth/configs/model_configs.py b/diffsynth/configs/model_configs.py index a80050959..a0b5d6549 100644 --- a/diffsynth/configs/model_configs.py +++ b/diffsynth/configs/model_configs.py @@ -1076,7 +1076,6 @@ "model_hash": "08a75c660c9b2e775c530a0955857f1f", "model_name": "image_metrics_lpips_alex", "model_class": "diffsynth.models.lpips.LPIPSModel", - "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", "extra_kwargs": {"net": "alex"}, }, { @@ -1084,7 +1083,6 @@ "model_hash": "5740953aaa8aba2ecd9b9c23da813591", "model_name": "image_metrics_lpips_vgg", "model_class": "diffsynth.models.lpips.LPIPSModel", - "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", "extra_kwargs": {"net": "vgg"}, }, { @@ -1092,7 +1090,6 @@ "model_hash": "ff994b70a30599287a332105396d5004", "model_name": "image_metrics_lpips_squeeze", "model_class": "diffsynth.models.lpips.LPIPSModel", - "state_dict_converter": "diffsynth.utils.state_dict_converters.image_metrics.ImageMetricsLPIPSStateDictConverter", "extra_kwargs": {"net": "squeeze"}, }, ] diff --git a/diffsynth/metrics/lpips.py b/diffsynth/metrics/lpips.py index 9bf014d5b..dc5ae3634 100644 --- a/diffsynth/metrics/lpips.py +++ b/diffsynth/metrics/lpips.py @@ -31,7 +31,6 @@ def from_pretrained( model_config: ModelConfig = None, device: torch.device = get_device_type(), batch_size: int = 16, - num_workers: int = 0, target_size: int = 512, vram_limit: float = None, ): @@ -52,7 +51,6 @@ def from_pretrained( model=backbone, device=device, batch_size=batch_size, - num_workers=num_workers, target_size=target_size, ) return cls(compute_model) diff --git a/diffsynth/models/lpips.py b/diffsynth/models/lpips.py index 45650df9f..4c7847c89 100644 --- a/diffsynth/models/lpips.py +++ b/diffsynth/models/lpips.py @@ -266,13 +266,11 @@ def __init__( model: LPIPSModel, device: Union[str, torch.device] = "cpu", batch_size: int = 16, - num_workers: int = 0, target_size: int = 512, ): super().__init__() self.model = model self.batch_size = batch_size - self.num_workers = num_workers self.target_size = target_size self._resize_transform = transforms.Compose( [ diff --git a/diffsynth/utils/state_dict_converters/image_metrics.py b/diffsynth/utils/state_dict_converters/image_metrics.py index d781edd62..30c8b55a3 100644 --- a/diffsynth/utils/state_dict_converters/image_metrics.py +++ b/diffsynth/utils/state_dict_converters/image_metrics.py @@ -76,10 +76,6 @@ def ImageMetricsFIDStateDictConverter(state_dict): return {"model." + key: state_dict[key] for key in state_dict if not key.startswith("fc.")} -def ImageMetricsLPIPSStateDictConverter(state_dict): - return {key: state_dict[key] for key in state_dict} - - def ImageMetricsHPSv3StateDictConverter(state_dict): converted = {} for key in state_dict: diff --git a/examples/image_quality_metric/lpips.py b/examples/image_quality_metric/lpips.py index a34ac996f..6e15b5a91 100644 --- a/examples/image_quality_metric/lpips.py +++ b/examples/image_quality_metric/lpips.py @@ -1,4 +1,4 @@ -from diffsynth.metrics import LPIPSMetric +from diffsynth.metrics import LPIPSMetric, ModelConfig from modelscope import dataset_snapshot_download dataset_snapshot_download( @@ -6,8 +6,16 @@ allow_file_pattern=["flux/FLUX.1-dev/*", "flux2/FLUX.2-dev/*"], local_dir="./data/diffsynth_example_dataset", ) + +# net="alex" with LPIPS/alexnet.safetensors (default) +# For VGG: net="vgg", model_config=ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/vgg.safetensors") +# For SqueezeNet: net="squeeze", model_config=ModelConfig(model_id="DiffSynth-Studio/ImageMetrics", origin_file_pattern="LPIPS/squeezenet.safetensors") metric = LPIPSMetric.from_pretrained( net="alex", + model_config=ModelConfig( + model_id="DiffSynth-Studio/ImageMetrics", + origin_file_pattern="LPIPS/alexnet.safetensors", + ), device="cuda", target_size=512, ) From 2835af4ad7d07c0ffc6b20b122b4f252864dc1c0 Mon Sep 17 00:00:00 2001 From: yuze Date: Tue, 2 Jun 2026 13:43:49 +0800 Subject: [PATCH 7/7] .gitignore --- .gitignore | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000..a511cf23f --- /dev/null +++ b/.gitignore @@ -0,0 +1,176 @@ +/data +/models +/scripts +/diffusers +/.vscode +*.pkl +*.safetensors +*.pth +*.ckpt +*.pt +*.bin +*.DS_Store +*.msc +*.mv +log*.txt + +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. +# This is especially recommended for binary packages to ensure reproducibility, and is more +# commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it +# in version control. +# https://pdm.fming.dev/#use-with-ide +.pdm.toml + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore that can +# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. For a more nuclear +# option (not recommended) you can uncomment the following to ignore the entire idea folder. +#.idea/ \ No newline at end of file