Statistical Learning Theory
- Learning with a finite dictionary
Recall from the end of last lecture our setup: We are working with a finite dictionary
H = {h1, . . . , hM } of estimators, and we would like to understand the scaling of this problem
with respect to M and the sample size n. Given H, one idea is to simply try to minimize
ˆ the empirical risk based on the samples, and so we define the empirical risk minimizer, h
erm,
by
hˆerm ∈ ˆ argmin Rn(h).
h∈H
ˆ ˆ In what follows, we will simply write h instead of h
erm when possible. Also recall the
definition of the oracle, h ̄, which (somehow) minimizes the true risk and is defined by
h ̄ ∈ argmin R(h).
h∈H
The following theorem shows that, although hˆ ̄ cannot hope to do better than h in
general, the difference should not be too large as long as the sample size is not too small
compared to M.
Comments
Post a Comment