sparselearning package¶

Subpackages¶

sparselearning.core¶

Wraps PyTorch model parameters with a boolean mask to simulate unstructured sparsity.

Example usage:

optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)

# Wrapped optimizer step mask.step

# Mask update step mask.update_connections()

class sparselearning.core.LayerStats(variance_dict: Dict[str, float] = <factory>, zeros_dict: Dict[str, int] = <factory>, nonzeros_dict: Dict[str, int] = <factory>, removed_dict: Dict[str, int] = <factory>, total_variance: float = 0, total_zero: int = 0, total_nonzero: int = 0, total_removed: int = 0)¶

Bases: object

Layer-wise statistics

load_state_dict(*initial_data, **kwargs)¶

nonzeros_dict: Dict[str, int]¶

removed_dict: Dict[str, int]¶

state_dict()¶

property total_density¶

total_nonzero: int = 0¶

total_removed: int = 0¶

total_variance: float = 0¶

total_zero: int = 0¶

variance_dict: Dict[str, float]¶

zeros_dict: Dict[str, int]¶

class sparselearning.core.Masking(optimizer: optim, prune_rate_decay: Decay, density: float = 0.1, sparse_init: str = 'random', dense_gradients: bool = False, prune_mode: str = 'magnitude', growth_mode: str = 'momentum', redistribution_mode: str = 'momentum', prune_threshold: float = 0.001, growth_threshold: float = 0.001, growth_increment: float = 0.2, increment: float = 0.2, tolerance: float = 1e-06, mask_dict: Dict[str, Tensor] = <factory>, module: nn.Module = None, mask_step: int = 0)¶

Bases: object

Wraps PyTorch model parameters with a sparse mask.

Creates a mask for each parameter tensor contained in the model. When apply_mask() is called, it applies the sparsity pattern to the parameters.

Basic usage:: optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)

Removing layers: Layers can be removed individually, by type, or by partial match of their name.

mask.remove_weight(name) requires an exact name of

a parameter.

mask.remove_weight_partial_name(partial_name=name) removes all parameters that contain the partial name. For example ‘conv’ would remove all layers with ‘conv’ in their name.
mask.remove_type(type) removes all layers of a certain type. For example, mask.remove_type(torch.nn.BatchNorm2d) removes all 2D batch norm layers.

add_module(module, lottery_mask_path: Path = None)¶

Store dict of parameters to mask

Parameters

module – to mask
lottery_mask_path – initialize from an existing model’s mask.

Returns

adjust_prune_rate()¶: Modify prune rate for layers with low sparsity

apply_mask()¶: Applies boolean mask to modules

apply_mask_gradients()¶: Applies boolean mask to modules’s gradients

property avg_inference_FLOPs¶

Returns: running average of inference FLOPs
Return type: float

calc_redistributed_densities()¶

Computes layer-wise density given a redistribution scheme.

Ensures that layer-wise densities are valid (i.e. 0 <= density <= 1).

Returns: Layer-wise valid densities.
Return type: Dict[str, float]

property dense_FLOPs¶

Calculates dense inference FLOPs of the model

Returns: dense FLOPs
Return type: int

dense_gradients = False¶

density = 0.1¶

gather_statistics()¶: Gather layer-wise & global stats. Typically performed before each mask update.

get_momentum_for_weight(weight: str) → Tensor¶

Return momentum from optimizer (SGD or Adam)

Parameters: weight (str) – weight name
Returns: Momentum buffer for layer
Return type: torch.Tensor

property global_prune¶

property growth_func¶

growth_increment = 0.2¶

growth_mode = 'momentum'¶

growth_threshold = 0.001¶

increment = 0.2¶

property inference_FLOPs¶

Calculates dense inference FLOPs of the model

Returns: inference FLOPs
Return type: float

init(lottery_mask_path: Path)¶

Sparsity initialization

Parameters: lottery_mask_path (Path) – Mask path, if using Lottery Ticket Hypothesis (Frankle & Carbin 2018).

load_state_dict(*initial_data, **kwargs)¶

mask_step = 0¶

module = None¶

print_nonzero_counts()¶

property prune_func¶

Calls prune func from the registry.

We use @property, so that it is always synced with prune_mode

prune_mode = 'magnitude'¶

property prune_rate¶: Get prune rate from the decay object

prune_threshold = 0.001¶

property redistribution_func¶

Calls redistribution func from the registry.

We use @property, so that it is always synced with redistribution_mode

redistribution_mode = 'momentum'¶

remove_type(nn_type)¶

Remove layer by type (eg: nn.Linear, nn.Conv2d, etc.)

Parameters: nn_type (nn.Module) – type of layer

remove_weight(name)¶

Remove layer by complete name

Parameters: name (str) – layer name

remove_weight_partial_name(partial_name: str)¶

Remove module by partial name (eg: conv).

Parameters: partial_name (str) – partial layer name

reset_momentum()¶: Mask momentum buffers

sparse_init = 'random'¶

sparsify(**kwargs)¶: Call sparsity init func (see sparselearning/funcs/init_scheme.py)

state_dict() → Dict¶

step()¶: Performs an optimizer step (i.e, no update to mask topology).

to_module_device_()¶: Send to module’s device

tolerance = 1e-06¶

truncate_weights()¶: Perform grow / prune / redistribution step

update_connections()¶: Performs a mask update (i.e, update to mask topology).