rai_toolbox.optim.L2FrankWolfe#
- class rai_toolbox.optim.L2FrankWolfe(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#
A Frank-Wolfe [1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^2\) space, centered on the origin.
This parameter-transforming optimizer is useful for producing error counter factuals and performing visual concept probing [2].
Notes
The method
L2NormedGradientOptim._pre_step_transform_
is responsible for computing the negative linear minimization oracle for a parameter and storing it onparam.grad
.References
[2]Roberts, Jay, and Theodoros Tsiligkaridis. Controllably Sparse Perturbations of Robust Classifiers for Explaining Predictions and Probing Learned Concepts. (2021).
- __init__(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#
- Parameters:
- paramsSequence[Tensor] | Iterable[Mapping[str, Any]]
Iterable of parameters or dicts defining parameter groups.
- epsilonfloat
The radius of the of the L2 ball to which each updated parameter will be constrained. Can be specified per parameter-group.
- lrfloat, optional (default=2.0)
Indicates the weight with which the LMO contributes to the parameter update. See
use_default_lr_schedule
for additional details. Ifuse_default_lr_schedule=False
thenlr
must be be in the domain[0, 1]
.- use_default_lr_schedulebool, optional (default=True)
If
True
, then the per-parameter “learning rate” is scaled by \(\hat{l_r} = l_r / (l_r + k)\) where k is the update index for that parameter, which starts at 0.- param_ndimUnion[int, None], optional (default=-1)
Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both
_pre_step_transform_
and_post_step_transform_
. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-groupA positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None
means that the transformation will be applied directly to the tensor without any broadcasting.
See
ParamTransformingOptimizer
for more details and examples.- div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)
Prevents div-by-zero error in learning rate schedule.
Examples
Using
L2FrankWolfe
, we’ll constrain the updated parameter to fall within a \(L^2\)-ball of radius1.8
.>>> import torch as tr >>> from rai_toolbox.optim import L2FrankWolfe
Creating a parameter for our optimizer to update, and our optimizer. We specify
param_ndim=None
so that the constrain occurs on the parameter without any broadcasting.>>> x = tr.tensor([1.0, 1.0], requires_grad=True) >>> optim = L2FrankWolfe([x], epsilon=1.8, param_ndim=None)
Performing a simple calculation with
x
and performing backprop to create a gradient.>>> (tr.tensor([1.0, 2.0]) * x).sum().backward()
Performing a step with our optimizer uses the Frank-Wolfe algorithm to update its parameters. Note that the updated parameter falls within/on the \(L^2\)-ball of radius
1.8
.>>> optim.step() >>> x tensor([-0.8050, -1.6100], requires_grad=True)
Methods
__init__
(params, *, epsilon[, lr, ...])- Parameters: