sparselearning.funcs¶
sparselearning.funcs.decay¶
Implements decay functions (cosine, linear, iterative pruning).
-
class
sparselearning.funcs.decay.
CosineDecay
(prune_rate: float = 0.3, T_max: int = 1000, eta_min: float = 0.0, last_epoch: int = - 1)¶ Bases:
sparselearning.funcs.decay.Decay
Decays a pruning rate according to a cosine schedule. Just a wrapper around PyTorch’s CosineAnnealingLR.
- Parameters
prune_rate (float) – lpha described in RigL’s paper, initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)
eta_min (float) – final prune rate (default 0.0)
last_epoch (int) – epoch to reset annealing. If -1, doesn’t reset (default -1).
-
get_dr
()¶
-
step
(step: int = - 1)¶
-
class
sparselearning.funcs.decay.
LinearDecay
(prune_rate: float = 0.3, T_max: int = 1000)¶ Bases:
sparselearning.funcs.decay.Decay
Anneals the pruning rate linearly with each step.
- Parameters
prune_rate (float) – Initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)
-
get_dr
()¶
-
step
(step: int = - 1)¶
-
class
sparselearning.funcs.decay.
MagnitudePruneDecay
(initial_sparsity: float = 0.0, final_sparsity: float = 0.3, T_max: int = 30000, T_start: int = 350, interval: int = 100)¶ Bases:
sparselearning.funcs.decay.Decay
Anneals according to Zhu and Gupta 2018, “To prune or not to prune”. We implement cumulative sparsity and take a finite difference to get sparsity(t).
Amount to prune = sparsity.
-
T_max
: int = 30000¶
-
T_start
: int = 350¶
-
cumulative_sparsity
(step)¶
-
final_sparsity
: float = 0.3¶
-
get_dr
()¶
-
initial_sparsity
: float = 0.0¶
-
interval
: int = 100¶
-
step
(step: int = - 1, current_sparsity=- 1)¶
-
sparselearning.funcs.grow¶
Implements Growth function.
Modifies binary mask to enable gradient flow. New weights by default 0 and it can be changed in the function.
Functions have access to the masking object enabling greater flexibility in designing custom growth modes.
Signature: <func>(masking, name, total_regrowth, weight)
-
sparselearning.funcs.grow.
abs_grad_growth
(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Grows weights in places where the abs(grad) is largest (among present zero’ed weights).
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.
momentum_growth
(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Grows weights in places where the momentum is largest.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.
no_growth
(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ No growth.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.
random_growth
(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶ Random growth.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
- Returns
New boolean mask
- Return type
torch.Tensor
-
sparselearning.funcs.grow.
struct_abs_grad_growth
(masking: Masking, name: str, total_regrowth: int, weight: Tensor, criterion: Callable = <built-in method mean of type object>)¶ Performs absolute gradient growth channel-wise.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
criterion (Callable) – callable to perform reduction
- Returns
New boolean mask
- Return type
torch.Tensor
sparselearning.funcs.init_scheme¶
-
sparselearning.funcs.init_scheme.
erdos_renyi_init
(masking: Masking, is_kernel: bool = True, **kwargs)¶
-
sparselearning.funcs.init_scheme.
get_erdos_renyi_dist
(masking: Masking, is_kernel: bool = True) → Dict[str, float]¶ Get layer-wise densities distributed according to ER or ERK (erdos-renyi or erdos-renyi-kernel).
Ensures resulting densities do not cross 1 for any layer.
- Parameters
masking – Masking instance
is_kernel – use ERK (True), ER (False)
- Returns
Layer-wise density dict
-
sparselearning.funcs.init_scheme.
lottery_ticket_init
(masking: Masking, lottery_mask_path: Path, shuffle: bool = False)¶ Shuffle: use layer wise densities, but not exact mask
-
sparselearning.funcs.init_scheme.
random_init
(masking: Masking, **kwargs)¶
-
sparselearning.funcs.init_scheme.
resume_init
(masking: Masking, **kwargs)¶
-
sparselearning.funcs.init_scheme.
struct_erdos_renyi_init
(masking: Masking, is_kernel: bool = True, **kwargs)¶
-
sparselearning.funcs.init_scheme.
struct_random_init
(masking: Masking, **kwargs)¶
sparselearning.funcs.prune¶
Implements Pruning function.
Modifies binary mask to prevent gradient flow.
Functions have access to the masking object enabling greater flexibility in designing custom prune modes.
Signature: <func>(masking, mask, weight, name)
-
sparselearning.funcs.prune.
global_magnitude_prune
(masking: Masking) → int¶ Global Magnitude (L1) pruning. Modifies in-place.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
- Returns
number of weights removed
- Return type
int
-
sparselearning.funcs.prune.
magnitude_prune
(masking: Masking, mask: Tensor, weight: Tensor, name: str) → Tensor¶ Prunes the weights with smallest magnitude.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name
- Returns
pruned mask
- Return type
torch.Tensor
-
sparselearning.funcs.prune.
struct_magnitude_prune
(masking: Masking, mask: Tensor, weight: Tensor, name: str, criterion: Callable = <built-in method mean of type object>) → Tensor¶ Prunes the weights channel-wise, with reduced smallest magnitude.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name
criterion (Callable) – determines reduction function. (reduces a kernel to a single statisitc, eg: mean/max/min).
- Returns
pruned mask
- Return type
torch.Tensor
sparselearning.funcs.redistribute¶
Implements Redistribution function.
Modifies layer-wise sparsity during mask update.
Functions have access to the masking object enabling greater flexibility in designing custom redistribution modes.
Masking class implements the output redistribution in a valid manner, ensuring no weight exceeds its capacity.
Signature: <func>(masking, name, weight, mask)
-
sparselearning.funcs.redistribute.
grad_redistribution
(masking, name, weight, mask)¶ Calculates gradient redistribution statistics.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float
-
sparselearning.funcs.redistribute.
momentum_redistribution
(masking, name, weight, mask) → float¶ Calculates momentum redistribution statistics.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float
-
sparselearning.funcs.redistribute.
nonzero_redistribution
(masking, name, weight, mask)¶ Calculates non-zero redistribution statistics. Ideally, this just preserves the older distribution, upto numerical error. In practice, we prefer to skip redistribution if non-zero is chosen.
- Parameters
masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask
- Returns
Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.
- Return type
float