sparselearning package¶
sparselearning.core¶
Wraps PyTorch model parameters with a boolean mask to simulate unstructured sparsity.
- Example usage:
optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)
# Wrapped optimizer step mask.step
# Mask update step mask.update_connections()
-
class
sparselearning.core.LayerStats(variance_dict: Dict[str, float] = <factory>, zeros_dict: Dict[str, int] = <factory>, nonzeros_dict: Dict[str, int] = <factory>, removed_dict: Dict[str, int] = <factory>, total_variance: float = 0, total_zero: int = 0, total_nonzero: int = 0, total_removed: int = 0)¶ Bases:
objectLayer-wise statistics
-
load_state_dict(*initial_data, **kwargs)¶
-
nonzeros_dict: Dict[str, int]¶
-
removed_dict: Dict[str, int]¶
-
state_dict()¶
-
property
total_density¶
-
total_nonzero: int = 0¶
-
total_removed: int = 0¶
-
total_variance: float = 0¶
-
total_zero: int = 0¶
-
variance_dict: Dict[str, float]¶
-
zeros_dict: Dict[str, int]¶
-
-
class
sparselearning.core.Masking(optimizer: optim, prune_rate_decay: Decay, density: float = 0.1, sparse_init: str = 'random', dense_gradients: bool = False, prune_mode: str = 'magnitude', growth_mode: str = 'momentum', redistribution_mode: str = 'momentum', prune_threshold: float = 0.001, growth_threshold: float = 0.001, growth_increment: float = 0.2, increment: float = 0.2, tolerance: float = 1e-06, mask_dict: Dict[str, Tensor] = <factory>, module: nn.Module = None, mask_step: int = 0)¶ Bases:
objectWraps PyTorch model parameters with a sparse mask.
Creates a mask for each parameter tensor contained in the model. When apply_mask() is called, it applies the sparsity pattern to the parameters.
- Basic usage:
optimizer = torchoptim.SGD(model.parameters(),lr=args.lr) decay = CosineDecay(args.prune_rate, len(train_loader)*(args.epochs)) mask = Masking(optimizer, prune_rate_decay=decay) model = MyModel() mask.add_module(model)
Removing layers: Layers can be removed individually, by type, or by partial match of their name.
mask.remove_weight(name) requires an exact name of
- a parameter.
mask.remove_weight_partial_name(partial_name=name) removes all parameters that contain the partial name. For example ‘conv’ would remove all layers with ‘conv’ in their name.
mask.remove_type(type) removes all layers of a certain type. For example, mask.remove_type(torch.nn.BatchNorm2d) removes all 2D batch norm layers.
-
add_module(module, lottery_mask_path: Path = None)¶ Store dict of parameters to mask
- Parameters
module – to mask
lottery_mask_path – initialize from an existing model’s mask.
- Returns
-
adjust_prune_rate()¶ Modify prune rate for layers with low sparsity
-
apply_mask()¶ Applies boolean mask to modules
-
apply_mask_gradients()¶ Applies boolean mask to modules’s gradients
-
property
avg_inference_FLOPs¶ - Returns
running average of inference FLOPs
- Return type
float
-
calc_redistributed_densities()¶ Computes layer-wise density given a redistribution scheme.
Ensures that layer-wise densities are valid (i.e. 0 <= density <= 1).
- Returns
Layer-wise valid densities.
- Return type
Dict[str, float]
-
property
dense_FLOPs¶ Calculates dense inference FLOPs of the model
- Returns
dense FLOPs
- Return type
int
-
dense_gradients= False¶
-
density= 0.1¶
-
gather_statistics()¶ Gather layer-wise & global stats. Typically performed before each mask update.
-
get_momentum_for_weight(weight: str) → Tensor¶ Return momentum from optimizer (SGD or Adam)
- Parameters
weight (str) – weight name
- Returns
Momentum buffer for layer
- Return type
torch.Tensor
-
property
global_prune¶
-
property
growth_func¶
-
growth_increment= 0.2¶
-
growth_mode= 'momentum'¶
-
growth_threshold= 0.001¶
-
increment= 0.2¶
-
property
inference_FLOPs¶ Calculates dense inference FLOPs of the model
- Returns
inference FLOPs
- Return type
float
-
init(lottery_mask_path: Path)¶ Sparsity initialization
- Parameters
lottery_mask_path (Path) – Mask path, if using Lottery Ticket Hypothesis (Frankle & Carbin 2018).
-
load_state_dict(*initial_data, **kwargs)¶
-
mask_step= 0¶
-
module= None¶
-
print_nonzero_counts()¶
-
property
prune_func¶ Calls prune func from the registry.
We use @property, so that it is always synced with prune_mode
-
prune_mode= 'magnitude'¶
-
property
prune_rate¶ Get prune rate from the decay object
-
prune_threshold= 0.001¶
-
property
redistribution_func¶ Calls redistribution func from the registry.
We use @property, so that it is always synced with redistribution_mode
-
redistribution_mode= 'momentum'¶
-
remove_type(nn_type)¶ Remove layer by type (eg: nn.Linear, nn.Conv2d, etc.)
- Parameters
nn_type (nn.Module) – type of layer
-
remove_weight(name)¶ Remove layer by complete name
- Parameters
name (str) – layer name
-
remove_weight_partial_name(partial_name: str)¶ Remove module by partial name (eg: conv).
- Parameters
partial_name (str) – partial layer name
-
reset_momentum()¶ Mask momentum buffers
-
sparse_init= 'random'¶
-
sparsify(**kwargs)¶ Call sparsity init func (see sparselearning/funcs/init_scheme.py)
-
state_dict() → Dict¶
-
step()¶ Performs an optimizer step (i.e, no update to mask topology).
-
to_module_device_()¶ Send to module’s device
-
tolerance= 1e-06¶
-
truncate_weights()¶ Perform grow / prune / redistribution step
-
update_connections()¶ Performs a mask update (i.e, update to mask topology).