sparselearning.funcs

sparselearning.funcs.decay

Implements decay functions (cosine, linear, iterative pruning).

class sparselearning.funcs.decay.CosineDecay(prune_rate: float = 0.3, T_max: int = 1000, eta_min: float = 0.0, last_epoch: int = - 1)

Bases: sparselearning.funcs.decay.Decay

Decays a pruning rate according to a cosine schedule. Just a wrapper around PyTorch’s CosineAnnealingLR.

Parameters
  • prune_rate (float) – lpha described in RigL’s paper, initial prune rate (default 0.3)

  • T_max (int) – Max mask-update steps (default 1000)

  • eta_min (float) – final prune rate (default 0.0)

  • last_epoch (int) – epoch to reset annealing. If -1, doesn’t reset (default -1).

get_dr()
step(step: int = - 1)
class sparselearning.funcs.decay.Decay

Bases: object

Template decay class

get_dr()
step()
class sparselearning.funcs.decay.LinearDecay(prune_rate: float = 0.3, T_max: int = 1000)

Bases: sparselearning.funcs.decay.Decay

Anneals the pruning rate linearly with each step.

Parameters
  • prune_rate (float) – Initial prune rate (default 0.3)

  • T_max (int) – Max mask-update steps (default 1000)

get_dr()
step(step: int = - 1)
class sparselearning.funcs.decay.MagnitudePruneDecay(initial_sparsity: float = 0.0, final_sparsity: float = 0.3, T_max: int = 30000, T_start: int = 350, interval: int = 100)

Bases: sparselearning.funcs.decay.Decay

Anneals according to Zhu and Gupta 2018, “To prune or not to prune”. We implement cumulative sparsity and take a finite difference to get sparsity(t).

Amount to prune = sparsity.

T_max: int = 30000
T_start: int = 350
cumulative_sparsity(step)
final_sparsity: float = 0.3
get_dr()
initial_sparsity: float = 0.0
interval: int = 100
step(step: int = - 1, current_sparsity=- 1)

sparselearning.funcs.grow

Implements Growth function.

Modifies binary mask to enable gradient flow. New weights by default 0 and it can be changed in the function.

Functions have access to the masking object enabling greater flexibility in designing custom growth modes.

Signature: <func>(masking, name, total_regrowth, weight)

sparselearning.funcs.grow.abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor

Grows weights in places where the abs(grad) is largest (among present zero’ed weights).

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • total_regrowth (int) – amount to re-grow

  • weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.momentum_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor

Grows weights in places where the momentum is largest.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • total_regrowth (int) – amount to re-grow

  • weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.no_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor

No growth.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • total_regrowth (int) – amount to re-grow

  • weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.random_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor) → Tensor

Random growth.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • total_regrowth (int) – amount to re-grow

  • weight (torch.Tensor) – layer weight

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.grow.struct_abs_grad_growth(masking: Masking, name: str, total_regrowth: int, weight: Tensor, criterion: Callable = <built-in method mean of type object>)

Performs absolute gradient growth channel-wise.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • total_regrowth (int) – amount to re-grow

  • weight (torch.Tensor) – layer weight

  • criterion (Callable) – callable to perform reduction

Returns

New boolean mask

Return type

torch.Tensor

sparselearning.funcs.init_scheme

sparselearning.funcs.init_scheme.erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)
sparselearning.funcs.init_scheme.get_erdos_renyi_dist(masking: Masking, is_kernel: bool = True) → Dict[str, float]

Get layer-wise densities distributed according to ER or ERK (erdos-renyi or erdos-renyi-kernel).

Ensures resulting densities do not cross 1 for any layer.

Parameters
  • masking – Masking instance

  • is_kernel – use ERK (True), ER (False)

Returns

Layer-wise density dict

sparselearning.funcs.init_scheme.lottery_ticket_init(masking: Masking, lottery_mask_path: Path, shuffle: bool = False)

Shuffle: use layer wise densities, but not exact mask

sparselearning.funcs.init_scheme.random_init(masking: Masking, **kwargs)
sparselearning.funcs.init_scheme.resume_init(masking: Masking, **kwargs)
sparselearning.funcs.init_scheme.struct_erdos_renyi_init(masking: Masking, is_kernel: bool = True, **kwargs)
sparselearning.funcs.init_scheme.struct_random_init(masking: Masking, **kwargs)

sparselearning.funcs.prune

Implements Pruning function.

Modifies binary mask to prevent gradient flow.

Functions have access to the masking object enabling greater flexibility in designing custom prune modes.

Signature: <func>(masking, mask, weight, name)

sparselearning.funcs.prune.global_magnitude_prune(masking: Masking) → int

Global Magnitude (L1) pruning. Modifies in-place.

Parameters

masking (sparselearning.core.Masking) – Masking instance

Returns

number of weights removed

Return type

int

sparselearning.funcs.prune.magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str) → Tensor

Prunes the weights with smallest magnitude.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • mask (torch.Tensor) – layer mask

  • weight (torch.Tensor) – layer weight

  • name (str) – layer name

Returns

pruned mask

Return type

torch.Tensor

sparselearning.funcs.prune.struct_magnitude_prune(masking: Masking, mask: Tensor, weight: Tensor, name: str, criterion: Callable = <built-in method mean of type object>) → Tensor

Prunes the weights channel-wise, with reduced smallest magnitude.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • mask (torch.Tensor) – layer mask

  • weight (torch.Tensor) – layer weight

  • name (str) – layer name

  • criterion (Callable) – determines reduction function. (reduces a kernel to a single statisitc, eg: mean/max/min).

Returns

pruned mask

Return type

torch.Tensor

sparselearning.funcs.redistribute

Implements Redistribution function.

Modifies layer-wise sparsity during mask update.

Functions have access to the masking object enabling greater flexibility in designing custom redistribution modes.

Masking class implements the output redistribution in a valid manner, ensuring no weight exceeds its capacity.

Signature: <func>(masking, name, weight, mask)

sparselearning.funcs.redistribute.grad_redistribution(masking, name, weight, mask)

Calculates gradient redistribution statistics.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • weight (torch.Tensor) – layer weight

  • mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float

sparselearning.funcs.redistribute.momentum_redistribution(masking, name, weight, mask) → float

Calculates momentum redistribution statistics.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • weight (torch.Tensor) – layer weight

  • mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float

sparselearning.funcs.redistribute.nonzero_redistribution(masking, name, weight, mask)

Calculates non-zero redistribution statistics. Ideally, this just preserves the older distribution, upto numerical error. In practice, we prefer to skip redistribution if non-zero is chosen.

Parameters
  • masking (sparselearning.core.Masking) – Masking instance

  • name (str) – layer name

  • weight (torch.Tensor) – layer weight

  • mask (torch.Tensor) – layer mask

Returns

Layer Statistic—unnormalized layer statistics for the layer. Normalizing across layers gives the density distribution.

Return type

float