-
-
Notifications
You must be signed in to change notification settings - Fork 198
Add Wolfe line search to Laplace approximation #3250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…lues for W, B, etc. are used
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
… use across the stan math library. Adds docs for laplace helper functions. clean up control logic in reverse mode autodiff laplace approximation.
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
WardBrian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some low-hanging fruit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do what is done for e.g. sigmaz.hpp and just convert the test data to hpp files with a const member, rather than include a csv reader into the tests for these functions only
| } else { | ||
| static_assert( | ||
| sizeof(std::decay_t<output_i_t>*) == 0, | ||
| "INTERNAL ERROR:(laplace_marginal_lpdf) set_zero_adjoints was " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've moved this out of the laplace code, I think this string should be updated
| * @param[in, out] output The output whose adjoints will be set to zero | ||
| */ | ||
| template <typename Output> | ||
| inline void set_zero_adjoint(Output&& output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name and file name should match: adjoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this code was not even used so I can delete it
| } else { | ||
| static_assert( | ||
| sizeof(std::decay_t<output_i_t>*) == 0, | ||
| "INTERNAL ERROR:(laplace_marginal_lpdf) collect_adjoints was " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar, if these will live in core they shouldn't mention laplace functions (applies to all the functions in this file)
| }, | ||
| std::forward<Args>(args)...); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: should the internal namespace end here? deep/shallow copy look much more like normal functions than conditional_copy_and_promote does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used in other internal functions so I think it is better to have in internal
…to fix/wolfe-zoom1
…to fix/wolfe-zoom1
|
The current version seems to be more robust than before. I have tested with integrated LOO and leave-one-group-out cross-validation
In both cases, laplace_marginal_tol() is fast enough that it woul be possible to use it also in the model block, but at this point I've focused on using them in generated quantities block I'll continue experimenting with other models |
|
Awesome thank you! |
WardBrian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ is complicated but manageable -- but someone else (@charlesm93) is gonna need to review the actual algorithmic pieces for correctness
| partial_parm, ll_args_filter); | ||
| if (options.solver == 1) { | ||
| using stan::math::internal::ZeroOut; | ||
| if (md_est.solver_used == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With fallthrough of the solvers, do we actually only calculate the gradient for the final solver used, or do we need to account for the previous solvers that did not get picked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: should we hard restart when we switch to a new solver?
| if constexpr (is_any_var_scalar_v<scalar_type_t<CovarArgs>>) { | ||
| [&covar_args_refs, &covar_args_adj, &md_est, &R, &s2, | ||
| &covariance_function, &msgs]() mutable { | ||
| const nested_rev_autodiff nested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steve note to self: double check that not having a nested here is fine
| namespace internal { | ||
|
|
||
| template <std::size_t N, typename Tuple, typename CheckType> | ||
| inline constexpr bool is_tuple_type_v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to prim/meta / re-use existing programs there
| template <typename Ops> | ||
| inline constexpr auto tuple_to_laplace_options(Ops&& ops) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up the manual typechecking here -- I'm not even sure if it is necessary, since the std::gets will raise a compiler error if they're wrong in the happy path
| * @note This helper is currently unused in the Laplace solvers in this file. | ||
| */ | ||
| template <typename WRootMat> | ||
| inline void block_matrix_chol_L(WRootMat& W_root, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused function
| * @warning The vectors must have identical size. Non-finite inputs yield the | ||
| * safe fallback. | ||
| */ | ||
| inline double barzilai_borwein_step_size(const Eigen::VectorXd& s, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesm93 the C++ looks fine to me for this function but I'd appreciate someone else putting eyes on the math
| * The routine assumes a 1-D bracket [x_left, x_right], together with function | ||
| * values and directional derivatives at both endpoints. Internally it: | ||
| * | ||
| * 1. Normalizes the interval to s ∈ [0, 1] via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: Is doxygen happy with the unicode etc?
| struct Candidate { | ||
| Scalar s_; | ||
| Scalar value_; | ||
| }; | ||
| Candidate best{0.5, eval(0.5)}; // Start from bisection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be two variables, s_best and value_best
| auto assign_step | ||
| = [](WolfeData& out, WolfeData& buf, auto&& e) { out.update(buf, e); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline
| auto armijo_ok = [&prev, &opt](const Eval& eval) -> bool { | ||
| return check_armijo(eval, prev, opt); | ||
| }; | ||
| auto wolfe_ok = [&prev, &opt](const Eval& eval) -> bool { | ||
| return check_wolfe(eval, prev, opt); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
Summary
This PR makes the following changes for the laplace approximation:
thetastarted the model in the tail of the distribution. The quick line search we did which only tested half of a newton step was not robust enough for this model to reach convergance. This PR adds a full wolfe line search to the Newton solver used in the laplace approximation to improve convergence in such cases.The graphic below shows the difference in estimates of the log likelihood for
laplacerelative tointegrate_1don the roach test data plotted along the mu and sigma estimates. There is still a bias relative tointegrate_1das mu becomes negative and sigma becomes larger, but it is much nicer than before.laplace_marginal_density_estis expensive as it requires calculating either a diagonal hessian or block diagonal hessian with 2nd order autodiff. The wolfe line search only requires the gradients of the likelihood with respect to theta. So with that in mind the wolfe line search tries pretty aggressively get the best step size. If our initial step size is successful, we try to keep doubling until we hit a step size where the strong wolfe conditions fail and then return the information for the step right before that failure. If our initial step size does not satisfy strong wolfe then we do a bracketed zoom with cubic interpolation until till we find a step size that satisfies the strong wolfe conditions.Tests for the wolfe line search are added to
test/unit/math/laplace/wolfe_line_search.hpp.In the last iteration of the laplace approximation we were returning the negative block diagonal hessian and derived matrices from the previous search. This is fine if the line search in that last step failed. But if the line search succeeds then we need to go back and recalculate the negative block diagonal hessian and it's derived quantities.
Previously we had one
block_hessianfunction that calculated both the block hessian or the diagonal hessian at runtime. But this function is only used in places where we know at compile time whether we want a block or diagonal hessian. So I split out the two functions to avoid unnecessary runtime branching.For an initial step size estimate before each line search we use the Barzilai-Borwein method to get an estimate.
Previously we calculated them eargerly in each laplace iteration. But they are not needed within the inner loop so we wait till we finish the inner search then calculate their adjoints once afterwards.
We were calculating the covariance matrix from inside of
laplace_density_est, but this required us to then return it from that function and imo looked weird. So I pulled it out and nowlaplace_marginal_density_estis passed the covariance matrix.There were a few places where we could use
log_sum_expetc. so I made those changes.The finite difference method in Stan was previously using stepsize optimzied a 2nd order method. But the code is a 6th order method. I modified
finite_diff_stepsizeto use epsilon^(1/7) instead of cbrt(epsilon). With this change all of the laplace tests pass with a much higher tolerance for precision.Tests
All the AD tests now have a tighter tolerance for the laplace approximation.
There are also tests for the wolfe line search in
test/unit/math/laplace/wolfe_line_search.hpp.Release notes
Improve laplace approximation with wolfe line search and bug fixes.
Checklist
Copyright holder: Steve Bronder
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit)make test-headers)make test-math-dependencies)make doxygen)make cpplint)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested