rai_toolbox.optim.FrankWolfe#

class rai_toolbox.optim.FrankWolfe(params, *, lr=2.0, lmo_scaling_factor=1.0, use_default_lr_schedule=True, div_by_zero_eps=1.1754943508222875e-38)[source]#

Implements the Frank-Wolfe minimization algorithm [1].

\[w_{k+1} = (1 - l_r) w_k + l_r * s_k\]

where \(s_k\) is the linear minimization oracle (LMO).

It is critical to note that this optimizer assumes that the grad attribute of each parameter has been modified so as to store the negative of the LMO for that parameter, and not the gradient itself.

References

[1]

https://en.wikipedia.org/wiki/Frank%E2%80%93Wolfe_algorithm#Algorithm

__init__(params, *, lr=2.0, lmo_scaling_factor=1.0, use_default_lr_schedule=True, div_by_zero_eps=1.1754943508222875e-38)[source]#

Parameters:

paramsIterable: Iterable of tensor parameters to optimize or dicts defining parameter groups.
lrfloat, optional (default=2.0): Indicates the weight with which the LMO contributes to the parameter update. See use_default_lr_schedule for additional details. If use_default_lr_schedule=False then lr must be be in the domain [0, 1].
lmo_scaling_factorfloat, optional (default=1.0): A scaling factor applied to \(s_k\) prior to each step.
use_default_lr_schedulebool, optional (default=True): If True, then the per-parameter “learning rate” is scaled by \(\hat{l_r} = l_r / (l_r + k)\) where k is the update index for that parameter.
div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`): Prevents div-by-zero error in learning rate schedule.

Methods

__init__(params, *[, lr, ...])

Parameters: