Stochastic Optimization: the knowledge gradient

Today in Princeton 544 course, we have learned the knowledge gradient, and it is a one-period lookahead that maximizes the value of information: V_{x}^{KG,n} = E\{\max_{y} F(y, B^{n+1}(x))\}-\max_{y}F(y,B^{n}), where

  • B^{n} is current belief state, B^{n} is updated parameter estimates after running experiments;
  • \max_{y}F(y,B^{n}) is for choosing the best design given what we know;
  • x is experiment proposed;
  • We do expectation for averaging over possible outcomes of the experiment (and our different belief about parameters)

This seems to be new for me, and less applicable for my current project, but I would like to introduce general contents below, which is summarized for four policies in all Stochastic optimization:

  1. Policy function approximation
  2. Parametric cost function approximation
  3. Value function approximation
  4. Direct lookheads

(1) and (2) need tunning for parameters, while it is more simple and easy to implement. and They are just parameter estimation in X^{\pi}(S, \theta), for determining \theta; (3) and (4) need complex modeling.

Advertisements

发表评论

Fill in your details below or click an icon to log in:

WordPress.com 徽标

You are commenting using your WordPress.com account. Log Out /  更改 )

Google photo

You are commenting using your Google account. Log Out /  更改 )

Twitter picture

You are commenting using your Twitter account. Log Out /  更改 )

Facebook photo

You are commenting using your Facebook account. Log Out /  更改 )

Connecting to %s

Create a website or blog at WordPress.com

向上 ↑

%d 博主赞过: