Optimal learning strategy for non-differentiable transformation choices in computation graph synthesis

Ascertain whether gradient descent is an optimal learning strategy for selecting sequences of non-differentiable mathematical transformations when synthesizing computation graphs for math word problem solvers.

Background

The authors discuss that solving math word problems involves applying a series of mathematical transformations, which can be framed as synthesizing a computation graph. They note that conventional learning via gradient descent relies on differentiability and iterative error reduction, which may be ill-suited when the action choices (transformations) are non-differentiable.

They highlight uncertainty about the optimality of gradient descent in this setting and suggest reinforcement learning as a potentially better-suited paradigm, given the exponential search space and non-differentiable decisions inherent in constructing computation graphs.

References

The choices of these mathematical transformations are not differential, and hence it is unclear if its the optimal strategy.

Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems  (2111.05364 - Faldu et al., 2021) in Section: Reinforcement Learning