Stochastic Optimization: the knowledge gradient

Today in Princeton 544 course, we have learned the knowledge gradient, and it is a one-period lookahead that maximizes the value of information: V_{x}^{KG,n} = E\{\max_{y} F(y, B^{n+1}(x))\}-\max_{y}F(y,B^{n}), where

  • B^{n} is current belief state, B^{n} is updated parameter estimates after running experiments;
  • \max_{y}F(y,B^{n}) is for choosing the best design given what we know;
  • x is experiment proposed;
  • We do expectation for averaging over possible outcomes of the experiment (and our different belief about parameters)

This seems to be new for me, and less applicable for my current project, but I would like to introduce general contents below, which is summarized for four policies in all Stochastic optimization:

  1. Policy function approximation
  2. Parametric cost function approximation
  3. Value function approximation
  4. Direct lookheads

(1) and (2) need tunning for parameters, while it is more simple and easy to implement. and They are just parameter estimation in X^{\pi}(S, \theta), for determining \theta; (3) and (4) need complex modeling.

发表评论

Fill in your details below or click an icon to log in:

WordPress.com 徽标

您正在使用您的 WordPress.com 账号评论。 注销 /  更改 )

Twitter picture

您正在使用您的 Twitter 账号评论。 注销 /  更改 )

Facebook photo

您正在使用您的 Facebook 账号评论。 注销 /  更改 )

Connecting to %s

在 WordPress.com 上创建免费网站或博客

向上 ↑

%d 博主赞过: