LoRA-Squeeze¶
LoRA-Squeeze finds the optimal LoRA rank for your trained adapter using memory-efficient SVD. Train at high rank, then squeeze to find the smallest rank that retains the information you need.
Based on LoRA-Squeeze (arXiv 2602.10993): "it is better to first learn an expressive, higher-rank solution and then compress it."
How it works¶
- Load the trained LoRA adapter (lora_A and lora_B matrices for each layer)
- For each layer, compute QR decompositions of A and B (never forms the full m x n product)
- SVD on the small r x r core matrix to get singular values
- Measure cumulative variance at each target rank:
V(k) = sum(s^2[:k]) / sum(s^2) - Recommend the smallest rank where mean variance across layers >= threshold
The algorithm operates on the small r x r core matrix, so it's fast and memory-efficient even for large models.
Two ways to use it¶
1. Auto-squeeze in campaigns (recommended)¶
Add [squeeze] to your campaign TOML. After the first training run completes, retrain automatically analyzes the adapter and reports the optimal rank — no manual step needed.
[campaign]
seeds = [42, 101, 202, 303]
max_steps = 50
[[campaign.conditions]]
advantage_mode = "grpo"
transform_mode = "none"
[model]
lora_rank = 128 # train at high rank
[squeeze]
min_variance_retention = 0.95
[logging]
wandb_project = "my-project"
The campaign will:
- Run the first training job (e.g.,
grpo+none_s42) - Analyze its final adapter via SVD
- Print the variance table and recommended rank
- Log results to wandb (if enabled)
- Save the recommendation to
manifest.json - Continue the remaining campaign runs
2. Standalone squeeze¶
Analyze any saved adapter directly:
# squeeze.toml
[squeeze]
adapter_path = "logs/campaign_.../runs/grpo+none_s42/final"
min_variance_retention = 0.95
[model]
lora_rank = 128 # source rank fallback (if not detectable from weights)
For Tinker adapters, use the tinker:// path directly:
Compression¶
To also compress the adapter to the recommended (or a specific) rank:
[squeeze]
adapter_path = "logs/.../final"
min_variance_retention = 0.95
output_path = "logs/squeezed" # save compressed adapter here
compress_to = 0 # 0 = use recommended rank, or set a specific rank
The compressed adapter is a valid PEFT adapter with scaled lora_alpha:
Example output¶
============================================================
Auto-squeeze: analyzing tinker://run-abc123/weights/final
source_rank=128, min_variance_retention=0.95
Source rank: 128
Layers analyzed: 196
Rank Mean Var% Min Var% Max Var%
-------------------------------------------
1 12.47% 5.23% 22.81%
2 23.15% 12.08% 38.42%
4 42.31% 28.67% 61.05%
8 68.94% 52.13% 82.47%
16 85.72% 74.30% 93.18%
32 95.48% 91.22% 98.15% <--
64 99.12% 97.84% 99.73%
128 100.00% 100.00% 100.00%
Recommended rank: 32 (>= 95% variance retained)
============================================================
Read the table as: "rank 32 retains 95.48% of the mean variance across all 196 LoRA layers, with the worst layer still retaining 91.22%."
Configuration reference¶
[squeeze] section¶
| Key | Type | Default | Description |
|---|---|---|---|
adapter_path |
str | "" |
Path to adapter directory. Required for standalone, auto-filled in campaigns |
min_variance_retention |
float | 0.95 |
Minimum variance fraction to retain (0.95 = 95%) |
source_rank |
int | 0 |
Expected source rank. 0 = detect from [model].lora_rank |
target_ranks |
list[int] | [] |
Ranks to evaluate. [] = auto power-of-2 sequence |
output_path |
str | "" |
Directory to save compressed adapter. Empty = analyze only |
compress_to |
int | 0 |
Target rank for compression. 0 = use recommended rank |
device |
str | "cpu" |
Torch device for SVD computation |
wandb integration¶
When used in a campaign with wandb_project set, squeeze creates a dedicated "squeeze-analysis" run in the same project with:
| What | Details |
|---|---|
| Variance table | wandb Table with columns: rank, mean_variance, min_variance, max_variance, recommended |
| Variance curves | Per-rank line chart data (squeeze/mean_variance, squeeze/min_variance, squeeze/max_variance) |
| Summary metrics | squeeze/recommended_rank, squeeze/source_rank, squeeze/min_variance_retention, squeeze/num_layers |
The squeeze run appears alongside training runs in your wandb project, tagged with squeeze and rank-{N}.
Recommended workflow¶
- Train at high rank — use
lora_rank = 64or128in your campaign - Add
[squeeze]— setmin_variance_retentionto your desired threshold (0.95 is a good default) - Run the campaign — squeeze runs automatically after the first training run
- Check the recommendation — in the terminal output,
manifest.json, or wandb - Use the recommended rank — for future production training runs, set
lora_rankto the recommended value
Choosing min_variance_retention¶
| Threshold | Use case |
|---|---|
0.99 |
Conservative — minimal information loss, higher rank |
0.95 |
Balanced — good default for most experiments |
0.90 |
Aggressive — smaller adapter, some information loss acceptable |
0.80 |
Very aggressive — only for quick iteration or memory-constrained setups |
Python API¶
For programmatic use:
from retrain.squeeze import analyze_adapter, compress_adapter
# Analyze
analysis = analyze_adapter(
adapter_path="logs/.../final",
min_variance_retention=0.95,
)
print(f"Recommended rank: {analysis.recommended_rank}")
print(f"Variance at rank 32: {analysis.mean_variance[32]:.2%}")
# Compress
compress_adapter(
adapter_path="logs/.../final",
output_path="logs/squeezed",
target_rank=analysis.recommended_rank,
)