WellyZhang · MiPlayer123 · Dec 7, 2025 · Dec 7, 2025 · Dec 7, 2025 · Dec 7, 2025
diff --git a/.gitignore b/.gitignore
@@ -107,4 +107,7 @@ venv.bak/
 .vscode
 
 # experiments
-/experiments
+/experiments
+
+# Test output
+/test/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,241 @@
+# CLAUDE.md - AI Assistant Guide for RAVEN
+
+## Project Overview
+
+RAVEN (**R**elational and **A**nalogical **V**isual r**E**aso**N**ing) is a dataset and benchmarking framework for visual reasoning research, presented at CVPR 2019. The project generates and evaluates Raven's Progressive Matrices (RPM) - visual IQ test puzzles where models must identify patterns and select the correct answer from multiple choices.
+
+**Paper**: "RAVEN: A Dataset for Relational and Analogical Visual rEasoNing" (Zhang et al., CVPR 2019)
+
+## Repository Structure
+
+```
+RAVEN/
+├── src/
+│   ├── dataset/           # Dataset generation code
+│   │   ├── main.py        # Entry point for dataset generation
+│   │   ├── AoT.py         # And-Or Tree (grammar) implementation
+│   │   ├── Rule.py        # Rule definitions (Constant, Progression, etc.)
+│   │   ├── Attribute.py   # Entity attributes (Type, Size, Color, etc.)
+│   │   ├── build_tree.py  # Tree structure builders for configurations
+│   │   ├── rendering.py   # Image rendering utilities
+│   │   ├── sampling.py    # Attribute/rule sampling logic
+│   │   ├── solver.py      # Rule-based answer solver
+│   │   ├── serialize.py   # Serialization to NPZ/XML formats
+│   │   ├── constraints.py # Rule constraint logic
+│   │   ├── api.py         # Utility APIs (e.g., RLE decoding)
+│   │   └── const.py       # Constants and configuration values
+│   │
+│   └── model/             # Neural network benchmarking code
+│       ├── main.py        # Entry point for training/evaluation
+│       ├── basic_model.py # Base model class with train/validate/test
+│       ├── resnet18.py    # ResNet18 + MLP model (Resnet18_MLP)
+│       ├── cnn_mlp.py     # CNN + MLP model (CNN_MLP)
+│       ├── cnn_lstm.py    # CNN + LSTM model (CNN_LSTM)
+│       ├── fc_tree_net.py # FC Tree Network for structure reasoning
+│       ├── const/         # Model constants
+│       └── utility/       # Dataset loading utilities
+│
+├── assets/
+│   ├── embedding.npy      # Pre-trained GloVe word embeddings
+│   └── README.md          # Dataset format documentation
+│
+├── requirements.txt       # Python dependencies
+├── LICENSE               # GPL-3.0 license
+└── README.md             # Project documentation
+```
+
+## Key Concepts
+
+### And-Or Tree (AoT) Grammar
+
+The dataset uses an attributed stochastic image grammar based on And-Or Trees:
+
+- **Root** → **Structure** → **Component** → **Layout** → **Entity**
+- **And-nodes**: All children must be included
+- **Or-nodes**: One child is sampled
+- **Leaf nodes**: Entities with attributes (Type, Size, Color, Angle)
+
+### Figure Configurations (7 types)
+
+| Config Name | Description |
+|-------------|-------------|
+| `center_single` | Single object in center |
+| `distribute_four` | 2x2 grid layout |
+| `distribute_nine` | 3x3 grid layout |
+| `left_center_single_right_center_single` | Left-Right split |
+| `up_center_single_down_center_single` | Up-Down split |
+| `in_center_single_out_center_single` | Inner-Outer (center) |
+| `in_distribute_four_out_center_single` | Inner-Outer (grid) |
+
+### Rule Types
+
+Rules define how attributes change across the 3x3 matrix:
+
+| Rule | Description |
+|------|-------------|
+| `Constant` | Attribute remains unchanged |
+| `Progression` | Attribute changes by fixed increment (+1, +2, -1, -2) |
+| `Arithmetic` | Third panel = First ± Second |
+| `Distribute_Three` | Three distinct values across each row |
+
+### Attributes
+
+- **Number**: 1-9 entities per panel
+- **Position**: Entity placement on grid
+- **Type**: Shape (triangle, square, pentagon, hexagon, circle)
+- **Size**: 6 levels (0.4 to 0.9 scale)
+- **Color**: 10 grayscale levels (0-255)
+- **Angle**: 8 rotation values (-135° to 180°)
+
+## Development Environment
+
+### Requirements
+
+- **Python 3.7+** (dataset generation) / Python 2.7 (model training - legacy)
+- OpenCV (`opencv-contrib-python`)
+- PyTorch with CUDA support (for model training)
+- NumPy, SciPy, Matplotlib, Pillow, scikit-image, tqdm
+
+### Installation
+
+```bash
+pip install -r requirements.txt
+```
+
+### Important Notes
+
+- Dataset generation code (`src/dataset/`) is Python 3 compatible
+- Model training code (`src/model/`) uses Python 2 syntax and may require porting
+- CUDA is expected for model training
+- The `embedding.npy` file must be placed in the dataset directory for training
+
+## Common Commands
+
+### Dataset Generation
+
+```bash
+# Generate dataset with default settings (20,000 samples per configuration)
+python src/dataset/main.py --num-samples 20000 --save-dir /path/to/dataset
+
+# Key arguments:
+#   --num-samples: Samples per configuration (default: 20000)
+#   --save-dir: Output directory
+#   --seed: Random seed (default: 1234)
+#   --fuse: Fuse configurations (0/1, default: 0)
+#   --val: Validation set proportion (default: 2 = 20%)
+#   --test: Test set proportion (default: 2 = 20%)
+```
+
+### Model Training
+
+```bash
+# Train a model
+python src/model/main.py --model Resnet18_MLP --path /path/to/dataset
+
+# Key arguments:
+#   --model: Model name (Resnet18_MLP, CNN_MLP, CNN_LSTM)
+#   --path: Path to dataset
+#   --epochs: Training epochs (default: 200)
+#   --batch_size: Batch size (default: 32)
+#   --lr: Learning rate (default: 1e-4)
+#   --device: CUDA device ID (default: 0)
+#   --resume: Resume from checkpoint (True/False)
+#   --save: Checkpoint save directory
+#   --meta_alpha, --meta_beta: Auxiliary loss weights
+```
+
+## Dataset Format
+
+### NPZ Files
+
+Each `.npz` file contains:
+- `image`: (16, 160, 160) array - 8 context panels + 8 answer choices
+- `target`: Correct answer index (0-7)
+- `structure`: Serialized tree structure
+- `meta_matrix`: Rule encoding matrix
+- `meta_target`: Bitwise-OR of meta_matrix rows
+- `meta_structure`: Structure encoding
+
+### XML Files
+
+Companion XML files provide detailed annotations:
+- Panel decomposition (Struct → Component → Layout → Entity)
+- Entity attributes with value indices
+- Rule groups per component
+
+## Code Patterns
+
+### Model Architecture Pattern
+
+All models inherit from `BasicModel`:
+
+```python
+class CustomModel(BasicModel):
+    def __init__(self, args):
+        super(CustomModel, self).__init__(args)
+        # Define layers
+        self.optimizer = optim.Adam(self.parameters(), ...)
+
+    def compute_loss(self, output, target, meta_target, meta_structure):
+        # Return loss tensor
+
+    def forward(self, x, embedding, indicator):
+        # Return (prediction, meta_target_pred, meta_struct_pred)
+```
+
+### Rule Implementation Pattern
+
+Rules inherit from the `Rule` base class:
+
+```python
+class CustomRule(Rule):
+    def __init__(self, name, attr, param, component_idx):
+        super(CustomRule, self).__init__(name, attr, param, component_idx)
+
+    def apply_rule(self, aot, in_aot=None):
+        # Transform AoT and return modified copy
+        return modified_aot
+```
+
+## Testing and Validation
+
+- The dataset includes a built-in solver (`src/dataset/solver.py`) that achieves 100% accuracy
+- Human baseline: ~84% accuracy
+- Best model (ResNet+DRT): ~60% accuracy
+
+## Important Files to Understand
+
+1. **`src/dataset/AoT.py`**: Core grammar implementation - understand the tree structure
+2. **`src/dataset/Rule.py`**: Rule logic - understand how patterns are generated
+3. **`src/dataset/const.py`**: All constant values and attribute ranges
+4. **`src/model/basic_model.py`**: Base training loop interface
+5. **`src/model/fc_tree_net.py`**: Novel DRT (Dynamic Residual Tree) module
+
+## Conventions
+
+### Naming Conventions
+
+- Classes: `PascalCase` (e.g., `AoTNode`, `BasicModel`)
+- Functions/methods: `snake_case` (e.g., `apply_rule`, `compute_loss`)
+- Private methods: Prefixed with `_` (e.g., `_sample`, `_resample`)
+- Constants: `UPPER_SNAKE_CASE` (e.g., `IMAGE_SIZE`, `COLOR_VALUES`)
+
+### Code Style
+
+- Python 2.7 compatible syntax
+- NumPy for array operations
+- PyTorch for neural network operations
+- Deep copying used extensively for tree manipulation
+
+## Common Issues
+
+1. **Dataset generation**: Fully compatible with Python 3.7+
+2. **Model training**: Uses Python 2 syntax - may need porting for Python 3
+3. **scipy.misc.imresize**: Deprecated in newer SciPy - use `skimage.transform.resize`
+4. **CUDA required**: Models expect GPU acceleration
+
+## External Resources
+
+- [Project Page](http://wellyzhang.github.io/project/raven.html)
+- [Paper PDF](http://wellyzhang.github.io/attach/cvpr19zhang.pdf)
+- [WReN Implementation](https://github.com/Fen9/WReN) (alternative model)
diff --git a/dataset/DATASET_GENERATION.md b/dataset/DATASET_GENERATION.md
@@ -0,0 +1,129 @@
+# Dataset Generation Report
+
+## Overview
+
+This document describes the process used to generate the RAVEN dataset samples.
+
+## Generation Command
+
+```bash
+python src/dataset/main.py --num-samples 50 --save-dir ./dataset --seed 42
+```
+
+## Parameters Used
+
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| `--num-samples` | 50 | Number of samples per configuration |
+| `--save-dir` | `./dataset` | Output directory |
+| `--seed` | 42 | Random seed for reproducibility |
+
+## Dataset Statistics
+
+- **Total Samples**: 350 (50 samples x 7 configurations)
+- **Files per Sample**: 2 (one .npz, one .xml)
+- **Total Files**: 700
+
+### Samples per Configuration
+
+| Configuration | Samples | Train | Val | Test |
+|--------------|---------|-------|-----|------|
+| center_single | 50 | 30 | 10 | 10 |
+| distribute_four | 50 | 30 | 10 | 10 |
+| distribute_nine | 50 | 30 | 10 | 10 |
+| left_center_single_right_center_single | 50 | 30 | 10 | 10 |
+| up_center_single_down_center_single | 50 | 30 | 10 | 10 |
+| in_center_single_out_center_single | 50 | 30 | 10 | 10 |
+| in_distribute_four_out_center_single | 50 | 30 | 10 | 10 |
+
+## Solver Accuracy
+
+The built-in solver achieved **100% accuracy** on all configurations, verifying that all generated puzzles have valid solutions.
+
+## File Format
+
+### NPZ Files
+
+Each `.npz` file contains:
+- `image`: (16, 160, 160) uint8 array - 8 context panels + 8 answer choices
+- `target`: int - Correct answer index (0-7)
+- `predict`: int - Solver's prediction (should match target)
+- `meta_matrix`: (8, 9) uint8 array - Rule encoding matrix
+- `meta_target`: (9,) uint8 array - Bitwise-OR of meta_matrix rows
+- `structure`: str - Serialized tree structure
+- `meta_structure`: (21,) uint8 array - Structure encoding
+
+### XML Files
+
+Each `.xml` file contains detailed annotations:
+- Panel decomposition (Struct -> Component -> Layout -> Entity)
+- Entity attributes with value indices (Type, Size, Color, Angle)
+- Bounding boxes (original and real)
+- RLE-encoded masks
+- Rule groups with names and attributes
+
+## Generation Process
+
+1. **Tree Construction**: For each configuration, an And-Or Tree (AoT) is built representing the grammatical structure of the puzzle.
+
+2. **Rule Sampling**: Rules are randomly sampled for each component:
+   - Number/Position rules (Constant, Progression, Arithmetic, Distribute_Three)
+   - Type rules (Constant, Progression, Distribute_Three)
+   - Size rules (Constant, Progression, Arithmetic, Distribute_Three)
+   - Color rules (Constant, Progression, Arithmetic, Distribute_Three)
+
+3. **Constraint Pruning**: The AoT is pruned to satisfy all rule constraints.
+
+4. **Panel Generation**:
+   - Row 1: Initial configuration sampled, rules applied to generate 3 panels
+   - Row 2: Rules applied across rows
+   - Row 3: Rules applied to generate context panels and answer
+
+5. **Answer Generation**:
+   - Correct answer derived from rule application
+   - 7 wrong answers generated by modifying attributes
+
+6. **Rendering**: Each panel is rendered as a 160x160 grayscale image using OpenCV.
+
+7. **Serialization**: Data saved as NPZ (numpy) and XML (annotations).
+
+## Python 3 Compatibility
+
+This dataset was generated using Python 3.11.14. The codebase was refactored from Python 2 to Python 3 with the following changes:
+
+- `scipy.misc.comb` -> `scipy.special.comb`
+- Integer division `/` -> `//` where needed
+- `print` statements -> `print()` functions
+- `dict.keys()` wrapped in `list()` for compatibility
+- `range()` wrapped in `list()` where list operations are needed
+- XML serialization uses `encoding='unicode'`
+- Binary mask generation fixed for RLE encoding
+
+## Reproducibility
+
+Using the same seed (42) and parameters will produce identical results. The random number generators used are:
+- `random.seed(42)`
+- `numpy.random.seed(42)`
+
+## Directory Structure
+
+```
+dataset/
+├── DATASET_GENERATION.md          # This file
+├── center_single/
+│   ├── RAVEN_0_train.npz
+│   ├── RAVEN_0_train.xml
+│   └── ... (100 files total)
+├── distribute_four/
+│   └── ... (100 files)
+├── distribute_nine/
+│   └── ... (100 files)
+├── left_center_single_right_center_single/
+│   └── ... (100 files)
+├── up_center_single_down_center_single/
+│   └── ... (100 files)
+├── in_center_single_out_center_single/
+│   └── ... (100 files)
+└── in_distribute_four_out_center_single/
+    └── ... (100 files)
+```
diff --git a/dataset/center_single/RAVEN_0_train.npz b/dataset/center_single/RAVEN_0_train.npz
diff --git a/dataset/center_single/RAVEN_0_train.xml b/dataset/center_single/RAVEN_0_train.xml
diff --git a/dataset/center_single/RAVEN_10_train.npz b/dataset/center_single/RAVEN_10_train.npz
diff --git a/dataset/center_single/RAVEN_10_train.xml b/dataset/center_single/RAVEN_10_train.xml
diff --git a/dataset/center_single/RAVEN_11_train.npz b/dataset/center_single/RAVEN_11_train.npz
diff --git a/dataset/center_single/RAVEN_11_train.xml b/dataset/center_single/RAVEN_11_train.xml
diff --git a/dataset/center_single/RAVEN_12_train.npz b/dataset/center_single/RAVEN_12_train.npz
diff --git a/dataset/center_single/RAVEN_12_train.xml b/dataset/center_single/RAVEN_12_train.xml
diff --git a/dataset/center_single/RAVEN_13_train.npz b/dataset/center_single/RAVEN_13_train.npz
diff --git a/dataset/center_single/RAVEN_13_train.xml b/dataset/center_single/RAVEN_13_train.xml
diff --git a/dataset/center_single/RAVEN_14_train.npz b/dataset/center_single/RAVEN_14_train.npz
diff --git a/dataset/center_single/RAVEN_14_train.xml b/dataset/center_single/RAVEN_14_train.xml
diff --git a/dataset/center_single/RAVEN_15_train.npz b/dataset/center_single/RAVEN_15_train.npz
diff --git a/dataset/center_single/RAVEN_15_train.xml b/dataset/center_single/RAVEN_15_train.xml
diff --git a/dataset/center_single/RAVEN_16_val.npz b/dataset/center_single/RAVEN_16_val.npz
diff --git a/dataset/center_single/RAVEN_16_val.xml b/dataset/center_single/RAVEN_16_val.xml
diff --git a/dataset/center_single/RAVEN_17_val.npz b/dataset/center_single/RAVEN_17_val.npz
diff --git a/dataset/center_single/RAVEN_17_val.xml b/dataset/center_single/RAVEN_17_val.xml
diff --git a/dataset/center_single/RAVEN_18_test.npz b/dataset/center_single/RAVEN_18_test.npz
diff --git a/dataset/center_single/RAVEN_18_test.xml b/dataset/center_single/RAVEN_18_test.xml
diff --git a/dataset/center_single/RAVEN_19_test.npz b/dataset/center_single/RAVEN_19_test.npz
diff --git a/dataset/center_single/RAVEN_19_test.xml b/dataset/center_single/RAVEN_19_test.xml
diff --git a/dataset/center_single/RAVEN_1_train.npz b/dataset/center_single/RAVEN_1_train.npz
diff --git a/dataset/center_single/RAVEN_1_train.xml b/dataset/center_single/RAVEN_1_train.xml
diff --git a/dataset/center_single/RAVEN_20_train.npz b/dataset/center_single/RAVEN_20_train.npz
diff --git a/dataset/center_single/RAVEN_20_train.xml b/dataset/center_single/RAVEN_20_train.xml
diff --git a/dataset/center_single/RAVEN_21_train.npz b/dataset/center_single/RAVEN_21_train.npz
diff --git a/dataset/center_single/RAVEN_21_train.xml b/dataset/center_single/RAVEN_21_train.xml
diff --git a/dataset/center_single/RAVEN_22_train.npz b/dataset/center_single/RAVEN_22_train.npz
diff --git a/dataset/center_single/RAVEN_22_train.xml b/dataset/center_single/RAVEN_22_train.xml
diff --git a/dataset/center_single/RAVEN_23_train.npz b/dataset/center_single/RAVEN_23_train.npz
diff --git a/dataset/center_single/RAVEN_23_train.xml b/dataset/center_single/RAVEN_23_train.xml
diff --git a/dataset/center_single/RAVEN_24_train.npz b/dataset/center_single/RAVEN_24_train.npz