sparselearning.funcs¶
sparselearning.funcs.decay¶
Implements decay functions (cosine, linear, iterative pruning).
-
class
sparselearning.funcs.decay.CosineDecay(prune_rate: float = 0.3, T_max: int = 1000, eta_min: float = 0.0, last_epoch: int = - 1)¶ Bases:
sparselearning.funcs.decay.DecayDecays a pruning rate according to a cosine schedule. Just a wrapper around PyTorch’s CosineAnnealingLR.
- Parameters
prune_rate (float) – lpha described in RigL’s paper, initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)
eta_min (float) – final prune rate (default 0.0)
last_epoch (int) – epoch to reset annealing. If -1, doesn’t reset (default -1).
-
get_dr()¶
-
step(step: int = - 1)¶
-
class
sparselearning.funcs.decay.LinearDecay(prune_rate: float = 0.3, T_max: int = 1000)¶ Bases:
sparselearning.funcs.decay.DecayAnneals the pruning rate linearly with each step.
- Parameters
prune_rate (float) – Initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)
-
get_dr()¶
-
step(step: int = - 1)¶
-
class
sparselearning.funcs.decay.MagnitudePruneDecay(initial_sparsity: float = 0.0, final_sparsity: float = 0.3, T_max: int = 30000, T_start: int = 350, interval: int = 100)¶ Bases:
sparselearning.funcs.decay.DecayAnneals according to Zhu and Gupta 2018, “To prune or not to prune”. We implement cumulative sparsity and take a finite difference to get sparsity(t).
Amount to prune = sparsity.
-
T_max: int = 30000¶
-
T_start: int = 350¶
-
cumulative_sparsity(step)¶
-
final_sparsity: float = 0.3¶
-
get_dr()¶
-
initial_sparsity: float = 0.0¶
-
interval: int = 100¶
-
step(step: int = - 1, current_sparsity=- 1)¶
-
sparselearning.funcs.grow¶
Implements Growth function.
Modifies binary mask to enable gradient flow. New weights by default 0 and it can be changed in the function.
Functions have access to the masking object enabling greater flexibility in designing custom growth modes.
Signature: <func>(masking, name, total_regrowth, weight)
-
sparselearning.funcs.grow.abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Grows weights in places where the abs(grad) is largest (among present zero’ed weights).
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.momentum_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Grows weights in places where the momentum is largest.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.no_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ No growth.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.random_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Random growth.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.struct_abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor, criterion: Callable = <built-in method mean of type object>)¶ Performs absolute gradient growth channel-wise.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
criterion (Callable) – callable to perform reduction
- Returns
New boolean mask
- Return type
torch.Tensor
sparselearning.funcs.init_scheme¶
-
sparselearning.funcs.init_scheme.erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)¶
-
sparselearning.funcs.init_scheme.get_erdos_renyi_dist(masking: Masking, is_kernel: bool = True) → Dict[str, float]¶ Get layer-wise densities distributed according to ER or ERK (erdos-renyi or erdos-renyi-kernel).
Ensures resulting densities do not cross 1 for any layer.
- Parameters
masking – Masking instance
is_kernel – use ERK (True), ER (False)
- Returns
Layer-wise density dict
-
sparselearning.funcs.init_scheme.lottery_ticket_init(masking: Masking, lottery_mask_path: Path, shuffle: bool = False)¶ Shuffle: use layer wise densities, but not exact mask
-
sparselearning.funcs.init_scheme.random_init(masking: Masking, **kwargs)¶
-
sparselearning.funcs.init_scheme.resume_init(masking: Masking, **kwargs)¶
-
sparselearning.funcs.init_scheme.struct_erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)¶
-
sparselearning.funcs.init_scheme.struct_random_init(masking: Masking, **kwargs)¶
sparselearning.funcs.prune¶
Implements Pruning function.
Modifies binary mask to prevent gradient flow.
Functions have access to the masking object enabling greater flexibility in designing custom prune modes.
Signature: <func>(masking, mask, weight, name)
-
sparselearning.funcs.prune.global_magnitude_prune(masking: Masking) → int¶ Global Magnitude (L1) pruning. Modifies in-place.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
- Returns
number of weights removed
- Return type
int
-
sparselearning.funcs.prune.magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str) → Tensor¶ Prunes the weights with smallest magnitude.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name
- Returns
pruned mask
- Return type
torch.Tensor
-
sparselearning.funcs.prune.struct_magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str, criterion: Callable = <built-in method mean of type object>) → Tensor¶ Prunes the weights channel-wise, with reduced smallest magnitude.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name
criterion (Callable) – determines reduction function. (reduces a kernel to a single statisitc, eg: mean/max/min).
- Returns
pruned mask
- Return type
torch.Tensor
sparselearning.funcs.redistribute¶
Implements Redistribution function.
Modifies layer-wise sparsity during mask update.
Functions have access to the masking object enabling greater flexibility in designing custom redistribution modes.
Masking class implements the output redistribution in a valid manner, ensuring no weight exceeds its capacity.
Signature: <func>(masking, name, weight, mask)
-
sparselearning.funcs.redistribute.grad_redistribution(masking, name, weight, mask)¶ Calculates gradient redistribution statistics.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float
-
sparselearning.funcs.redistribute.momentum_redistribution(masking, name, weight, mask) → float¶ Calculates momentum redistribution statistics.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float
-
sparselearning.funcs.redistribute.nonzero_redistribution(masking, name, weight, mask)¶ Calculates non-zero redistribution statistics. Ideally, this just preserves the older distribution, upto numerical error. In practice, we prefer to skip redistribution if non-zero is chosen.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float