sparselearning package

sparselearning.core

Wraps PyTorch model parameters with a boolean mask to simulate unstructured sparsity.

Example usage:

optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)

# Wrapped optimizer step mask.step

# Mask update step mask.update_connections()

class sparselearning.core.LayerStats(variance_dict: Dict[str, float] = <factory>, zeros_dict: Dict[str, int] = <factory>, nonzeros_dict: Dict[str, int] = <factory>, removed_dict: Dict[str, int] = <factory>, total_variance: float = 0, total_zero: int = 0, total_nonzero: int = 0, total_removed: int = 0)

Bases: object

Layer-wise statistics

load_state_dict(*initial_data, **kwargs)
nonzeros_dict: Dict[str, int]
removed_dict: Dict[str, int]
state_dict()
property total_density
total_nonzero: int = 0
total_removed: int = 0
total_variance: float = 0
total_zero: int = 0
variance_dict: Dict[str, float]
zeros_dict: Dict[str, int]
class sparselearning.core.Masking(optimizer: optim, prune_rate_decay: Decay, density: float = 0.1, sparse_init: str = 'random', dense_gradients: bool = False, prune_mode: str = 'magnitude', growth_mode: str = 'momentum', redistribution_mode: str = 'momentum', prune_threshold: float = 0.001, growth_threshold: float = 0.001, growth_increment: float = 0.2, increment: float = 0.2, tolerance: float = 1e-06, mask_dict: Dict[str, Tensor] = <factory>, module: nn.Module = None, mask_step: int = 0)

Bases: object

Wraps PyTorch model parameters with a sparse mask.

Creates a mask for each parameter tensor contained in the model. When apply_mask() is called, it applies the sparsity pattern to the parameters.

Basic usage:

optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)

Removing layers: Layers can be removed individually, by type, or by partial match of their name.

  • mask.remove_weight(name) requires an exact name of

a parameter.
  • mask.remove_weight_partial_name(partial_name=name) removes all parameters that contain the partial name. For example ‘conv’ would remove all layers with ‘conv’ in their name.

  • mask.remove_type(type) removes all layers of a certain type. For example, mask.remove_type(torch.nn.BatchNorm2d) removes all 2D batch norm layers.

add_module(module, lottery_mask_path: Path = None)

Store dict of parameters to mask

Parameters
  • module – to mask

  • lottery_mask_path – initialize from an existing model’s mask.

Returns

adjust_prune_rate()

Modify prune rate for layers with low sparsity

apply_mask()

Applies boolean mask to modules

apply_mask_gradients()

Applies boolean mask to modules’s gradients

property avg_inference_FLOPs
Returns

running average of inference FLOPs

Return type

float

calc_redistributed_densities()

Computes layer-wise density given a redistribution scheme.

Ensures that layer-wise densities are valid (i.e. 0 <= density <= 1).

Returns

Layer-wise valid densities.

Return type

Dict[str, float]

property dense_FLOPs

Calculates dense inference FLOPs of the model

Returns

dense FLOPs

Return type

int

dense_gradients = False
density = 0.1
gather_statistics()

Gather layer-wise & global stats. Typically performed before each mask update.

get_momentum_for_weight(weight: str) → Tensor

Return momentum from optimizer (SGD or Adam)

Parameters

weight (str) – weight name

Returns

Momentum buffer for layer

Return type

torch.Tensor

property global_prune
property growth_func
growth_increment = 0.2
growth_mode = 'momentum'
growth_threshold = 0.001
increment = 0.2
property inference_FLOPs

Calculates dense inference FLOPs of the model

Returns

inference FLOPs

Return type

float

init(lottery_mask_path: Path)

Sparsity initialization

Parameters

lottery_mask_path (Path) – Mask path, if using Lottery Ticket Hypothesis (Frankle & Carbin 2018).

load_state_dict(*initial_data, **kwargs)
mask_step = 0
module = None
print_nonzero_counts()
property prune_func

Calls prune func from the registry.

We use @property, so that it is always synced with prune_mode

prune_mode = 'magnitude'
property prune_rate

Get prune rate from the decay object

prune_threshold = 0.001
property redistribution_func

Calls redistribution func from the registry.

We use @property, so that it is always synced with redistribution_mode

redistribution_mode = 'momentum'
remove_type(nn_type)

Remove layer by type (eg: nn.Linear, nn.Conv2d, etc.)

Parameters

nn_type (nn.Module) – type of layer

remove_weight(name)

Remove layer by complete name

Parameters

name (str) – layer name

remove_weight_partial_name(partial_name: str)

Remove module by partial name (eg: conv).

Parameters

partial_name (str) – partial layer name

reset_momentum()

Mask momentum buffers

sparse_init = 'random'
sparsify(**kwargs)

Call sparsity init func (see sparselearning/funcs/init_scheme.py)

state_dict() → Dict
step()

Performs an optimizer step (i.e, no update to mask topology).

to_module_device_()

Send to module’s device

tolerance = 1e-06
truncate_weights()

Perform grow / prune / redistribution step

update_connections()

Performs a mask update (i.e, update to mask topology).