rai_toolbox.optim.L1qNormedGradientOptim#
- class rai_toolbox.optim.L1qNormedGradientOptim(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#
A gradient-transforming optimizer that sparsifies a parameter’s gradient and normalizes the gradient to have an \(L^1\)-norm of
grad_scale
, prior to updating the parameter usingInnerOpt.step
.The sparsification process retains only the signs (i.e., \(\pm 1\)) of the gradient’s elements. The transformation is applied to the gradient in accordance with
param_ndim
.See also
- __init__(params, InnerOpt=<class 'torch.optim.sgd.SGD'>, *, q=<required parameter>, dq=0.0, param_ndim=-1, grad_scale=1.0, grad_bias=0.0, defaults=None, div_by_zero_eps=1.1754943508222875e-38, generator=<torch._C.Generator object>, **inner_opt_kwargs)[source]#
- Parameters:
- paramsSequence[Tensor] | Iterable[Mapping[str, Any]]
Iterable of parameters or dicts defining parameter groups.
- InnerOptType[Optimizer] | Partial[Optimizer], optional (default=`torch.nn.optim.SGD`)
The optimizer that updates the parameters after their gradients have been transformed.
- qfloat
Specifies the (fractional) percentile of absolute-largest gradient elements to retain when sparsifying the gradient. E.g.,
q=0.9
means that only the gradient elements within the 90th-percentile will be retained.Must be within
[0.0, 1.0]
. The sparsification is applied to the gradient in accordance toparam_ndim
.- dqfloat, optional (default=0.0)
If specified, the sparsity factor for each gradient transformation will be drawn from a uniform distribution over \([q - dq, q + dq] \in [0.0, 1.0]\).
- param_ndimUnion[int, None], optional (default=-1)
Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both
_pre_step_transform_
and_post_step_transform_
. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-groupA positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None
means that the transformation will be applied directly to the tensor without any broadcasting.
See
ParamTransformingOptimizer
for more details and examples.- grad_scalefloat, optional (default=1.0)
Multiplies each gradient in-place after the in-place transformation is performed. This can be specified per param-group.
- grad_biasfloat, optional (default=0.0)
Added to each gradient in-place after the in-place transformation is performed. This can be specified per param-group.
- defaultsOptional[Dict[str, Any]]
Specifies default parameters for all parameter groups.
- div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)
A lower bound used to clamp the normalization factor to prevent div-by-zero.
- generatortorch.Generator, optional (default=`torch.default_generator`)
Controls the RNG source.
- **inner_opt_kwargsAny
Named arguments used to initialize
InnerOpt
.
Examples
Let’s use
L1qNormedGradientOptim
along with a standard SGD-step with a learning rate of1.0
. We’ll sparsify the gradient to retain the top 70% elements of the tensor, and we’ll normalize the sparse gradient to have a \(L^1\)-norm of1.8
.>>> import torch as tr >>> from rai_toolbox.optim import L1qNormedGradientOptim
Creating a parameter for our optimizer to update, and our optimizer. We specify
param_ndim=None
so that the sparsification/normalization occurs on the gradient without any broadcasting.>>> x = tr.tensor([1.0, 1.0, 1.0], requires_grad=True) >>> optim = L1qNormedGradientOptim( ... [x], ... q=0.30, ... grad_scale=1.8, ... InnerOpt=tr.optim.SGD, ... lr=1.0, ... param_ndim=None, ... )
Performing a simple calculation with
x
and performing backprop to create a gradient.>>> x.backward(gradient=tr.tensor([0.0, 1.0, 2.0])) >>> x.grad # the original gradient tensor([0., 1., 2.])
Performing a step with our optimizer sparsifies and normalizes the gradient in-place, and then updates the parameter using
SGD([x], lr=1.0).step()
.>>> optim.step() >>> x.grad # the signed, sparsified, and normalized gradient tensor([0.0000, 0.9000, 0.9000]) >>> x # the updated parameter tensor([1.0000, 0.1000, 0.1000], requires_grad=True)
Methods
__init__
(params[, InnerOpt, q, dq, ...])- Parameters: