sparselearning.funcs¶

sparselearning.funcs.decay¶

Implements decay functions (cosine, linear, iterative pruning).

class sparselearning.funcs.decay.CosineDecay(prune_rate: float = 0.3, T_max: int = 1000, eta_min: float = 0.0, last_epoch: int = - 1)¶

Bases: sparselearning.funcs.decay.Decay

Decays a pruning rate according to a cosine schedule. Just a wrapper around PyTorch’s CosineAnnealingLR.

Parameters

prune_rate (float) – lpha described in RigL’s paper, initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)
eta_min (float) – final prune rate (default 0.0)
last_epoch (int) – epoch to reset annealing. If -1, doesn’t reset (default -1).

get_dr()¶

step(step: int = - 1)¶

class sparselearning.funcs.decay.Decay¶

Bases: object

Template decay class

get_dr()¶

step()¶

class sparselearning.funcs.decay.LinearDecay(prune_rate: float = 0.3, T_max: int = 1000)¶

Bases: sparselearning.funcs.decay.Decay

Anneals the pruning rate linearly with each step.

Parameters

prune_rate (float) – Initial prune rate (default 0.3)
T_max (int) – Max mask-update steps (default 1000)

get_dr()¶

step(step: int = - 1)¶

class sparselearning.funcs.decay.MagnitudePruneDecay(initial_sparsity: float = 0.0, final_sparsity: float = 0.3, T_max: int = 30000, T_start: int = 350, interval: int = 100)¶

Bases: sparselearning.funcs.decay.Decay

Anneals according to Zhu and Gupta 2018, “To prune or not to prune”. We implement cumulative sparsity and take a finite difference to get sparsity(t).

Amount to prune = sparsity.

T_max: int = 30000¶

T_start: int = 350¶

cumulative_sparsity(step)¶

final_sparsity: float = 0.3¶

get_dr()¶

initial_sparsity: float = 0.0¶

interval: int = 100¶

step(step: int = - 1, current_sparsity=- 1)¶

sparselearning.funcs.grow¶

Implements Growth function.

Modifies binary mask to enable gradient flow. New weights by default 0 and it can be changed in the function.

Functions have access to the masking object enabling greater flexibility in designing custom growth modes.

Signature: <func>(masking, name, total_regrowth, weight)

sparselearning.funcs.grow.abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶

Grows weights in places where the abs(grad) is largest (among present zero’ed weights).

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.momentum_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶

Grows weights in places where the momentum is largest.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.no_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶

No growth.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.random_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor¶

Random growth.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.struct_abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor, criterion: Callable = <built-in method mean of type object>)¶

Performs absolute gradient growth channel-wise.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
total_regrowth (int) – amount to re-grow
weight (torch.Tensor) – layer weight
criterion (Callable) – callable to perform reduction

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.init_scheme¶

sparselearning.funcs.init_scheme.erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)¶

sparselearning.funcs.init_scheme.get_erdos_renyi_dist(masking: Masking, is_kernel: bool = True) → Dict[str, float]¶

Get layer-wise densities distributed according to ER or ERK (erdos-renyi or erdos-renyi-kernel).

Ensures resulting densities do not cross 1 for any layer.

Parameters

masking – Masking instance
is_kernel – use ERK (True), ER (False)

Returns

Layer-wise density dict

sparselearning.funcs.init_scheme.lottery_ticket_init(masking: Masking, lottery_mask_path: Path, shuffle: bool = False)¶: Shuffle: use layer wise densities, but not exact mask

sparselearning.funcs.init_scheme.random_init(masking: Masking, **kwargs)¶

sparselearning.funcs.init_scheme.resume_init(masking: Masking, **kwargs)¶

sparselearning.funcs.init_scheme.struct_erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)¶

sparselearning.funcs.init_scheme.struct_random_init(masking: Masking, **kwargs)¶

sparselearning.funcs.prune¶

Implements Pruning function.

Modifies binary mask to prevent gradient flow.

Functions have access to the masking object enabling greater flexibility in designing custom prune modes.

Signature: <func>(masking, mask, weight, name)

sparselearning.funcs.prune.global_magnitude_prune(masking: Masking) → int¶

Global Magnitude (L1) pruning. Modifies in-place.

Parameters: masking (sparselearning.core.Masking) – Masking instance
Returns: number of weights removed
Return type: int

sparselearning.funcs.prune.magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str) → Tensor¶

Prunes the weights with smallest magnitude.

Parameters

masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name

Returns

pruned mask

Return type

torch.Tensor

sparselearning.funcs.prune.struct_magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str, criterion: Callable = <built-in method mean of type object>) → Tensor¶

Prunes the weights channel-wise, with reduced smallest magnitude.

Parameters

masking (sparselearning.core.Masking) – Masking instance
mask (torch.Tensor) – layer mask
weight (torch.Tensor) – layer weight
name (str) – layer name
criterion (Callable) – determines reduction function. (reduces a kernel to a single statisitc, eg: mean/max/min).

Returns

pruned mask

Return type

torch.Tensor

sparselearning.funcs.redistribute¶

Implements Redistribution function.

Modifies layer-wise sparsity during mask update.

Functions have access to the masking object enabling greater flexibility in designing custom redistribution modes.

Masking class implements the output redistribution in a valid manner, ensuring no weight exceeds its capacity.

Signature: <func>(masking, name, weight, mask)

sparselearning.funcs.redistribute.grad_redistribution(masking, name, weight, mask)¶

Calculates gradient redistribution statistics.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float

sparselearning.funcs.redistribute.momentum_redistribution(masking, name, weight, mask) → float¶

Calculates momentum redistribution statistics.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float

sparselearning.funcs.redistribute.nonzero_redistribution(masking, name, weight, mask)¶

Calculates non-zero redistribution statistics. Ideally, this just preserves the older distribution, upto numerical error. In practice, we prefer to skip redistribution if non-zero is chosen.

Parameters

masking (sparselearning.core.Masking) – Masking instance
name (str) – layer name
weight (torch.Tensor) – layer weight
mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float