sparselearning package¶
sparselearning.core¶
Wraps PyTorch model parameters with a boolean mask to simulate unstructured sparsity.
- Example usage:
optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)
# Wrapped optimizer step mask.step
# Mask update step mask.update_connections()
-
class
sparselearning.core.
LayerStats
(variance_dict: Dict[str, float] = <factory>, zeros_dict: Dict[str, int] = <factory>, nonzeros_dict: Dict[str, int] = <factory>, removed_dict: Dict[str, int] = <factory>, total_variance: float = 0, total_zero: int = 0, total_nonzero: int = 0, total_removed: int = 0)¶ Bases:
object
Layer-wise statistics
-
load_state_dict
(*initial_data, **kwargs)¶
-
nonzeros_dict
: Dict[str, int]¶
-
removed_dict
: Dict[str, int]¶
-
state_dict
()¶
-
property
total_density
¶
-
total_nonzero
: int = 0¶
-
total_removed
: int = 0¶
-
total_variance
: float = 0¶
-
total_zero
: int = 0¶
-
variance_dict
: Dict[str, float]¶
-
zeros_dict
: Dict[str, int]¶
-
-
class
sparselearning.core.
Masking
(optimizer: optim, prune_rate_decay: Decay, density: float = 0.1, sparse_init: str = 'random', dense_gradients: bool = False, prune_mode: str = 'magnitude', growth_mode: str = 'momentum', redistribution_mode: str = 'momentum', prune_threshold: float = 0.001, growth_threshold: float = 0.001, growth_increment: float = 0.2, increment: float = 0.2, tolerance: float = 1e-06, mask_dict: Dict[str, Tensor] = <factory>, module: nn.Module = None, mask_step: int = 0)¶ Bases:
object
Wraps PyTorch model parameters with a sparse mask.
Creates a mask for each parameter tensor contained in the model. When apply_mask() is called, it applies the sparsity pattern to the parameters.
- Basic usage:
optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)
Removing layers: Layers can be removed individually, by type, or by partial match of their name.
mask.remove_weight(name) requires an exact name of
- a parameter.
mask.remove_weight_partial_name(partial_name=name) removes all parameters that contain the partial name. For example ‘conv’ would remove all layers with ‘conv’ in their name.
mask.remove_type(type) removes all layers of a certain type. For example, mask.remove_type(torch.nn.BatchNorm2d) removes all 2D batch norm layers.
-
add_module
(module, lottery_mask_path: Path = None)¶ Store dict of parameters to mask
- Parameters
module – to mask
lottery_mask_path – initialize from an existing model’s mask.
- Returns
-
adjust_prune_rate
()¶ Modify prune rate for layers with low sparsity
-
apply_mask
()¶ Applies boolean mask to modules
-
apply_mask_gradients
()¶ Applies boolean mask to modules’s gradients
-
property
avg_inference_FLOPs
¶ - Returns
running average of inference FLOPs
- Return type
float
-
calc_redistributed_densities
()¶ Computes layer-wise density given a redistribution scheme.
Ensures that layer-wise densities are valid (i.e. 0 <= density <= 1).
- Returns
Layer-wise valid densities.
- Return type
Dict[str, float]
-
property
dense_FLOPs
¶ Calculates dense inference FLOPs of the model
- Returns
dense FLOPs
- Return type
int
-
dense_gradients
= False¶
-
density
= 0.1¶
-
gather_statistics
()¶ Gather layer-wise & global stats. Typically performed before each mask update.
-
get_momentum_for_weight
(weight: str) → Tensor¶ Return momentum from optimizer (SGD or Adam)
- Parameters
weight (str) – weight name
- Returns
Momentum buffer for layer
- Return type
torch.Tensor
-
property
global_prune
¶
-
property
growth_func
¶
-
growth_increment
= 0.2¶
-
growth_mode
= 'momentum'¶
-
growth_threshold
= 0.001¶
-
increment
= 0.2¶
-
property
inference_FLOPs
¶ Calculates dense inference FLOPs of the model
- Returns
inference FLOPs
- Return type
float
-
init
(lottery_mask_path: Path)¶ Sparsity initialization
- Parameters
lottery_mask_path (Path) – Mask path, if using Lottery Ticket Hypothesis (Frankle & Carbin 2018).
-
load_state_dict
(*initial_data, **kwargs)¶
-
mask_step
= 0¶
-
module
= None¶
-
print_nonzero_counts
()¶
-
property
prune_func
¶ Calls prune func from the registry.
We use @property, so that it is always synced with prune_mode
-
prune_mode
= 'magnitude'¶
-
property
prune_rate
¶ Get prune rate from the decay object
-
prune_threshold
= 0.001¶
-
property
redistribution_func
¶ Calls redistribution func from the registry.
We use @property, so that it is always synced with redistribution_mode
-
redistribution_mode
= 'momentum'¶
-
remove_type
(nn_type)¶ Remove layer by type (eg: nn.Linear, nn.Conv2d, etc.)
- Parameters
nn_type (nn.Module) – type of layer
-
remove_weight
(name)¶ Remove layer by complete name
- Parameters
name (str) – layer name
-
remove_weight_partial_name
(partial_name: str)¶ Remove module by partial name (eg: conv).
- Parameters
partial_name (str) – partial layer name
-
reset_momentum
()¶ Mask momentum buffers
-
sparse_init
= 'random'¶
-
sparsify
(**kwargs)¶ Call sparsity init func (see sparselearning/funcs/init_scheme.py)
-
state_dict
() → Dict¶
-
step
()¶ Performs an optimizer step (i.e, no update to mask topology).
-
to_module_device_
()¶ Send to module’s device
-
tolerance
= 1e-06¶
-
truncate_weights
()¶ Perform grow / prune / redistribution step
-
update_connections
()¶ Performs a mask update (i.e, update to mask topology).