No Free Lunch Theorem
Any 2 algorithms are equivalent when their performance is averaged across all possible problems.
Bias-Variance tradeoff
Bias = \(\mathbb{E}[\hat{\beta} - \beta]\)
\(\beta\) = estimated parameter
Error of the model. Difference between the expected prediction accuracy and the true prediction.
Variance = \(\mathbf{E}[(\hat{\beta} - \mathbb{E}[\hat{\beta})^{2}]\)
\(\hat{\beta}\) = estimator
\(\mathbb{E}[\hat{\beta}]\) = expected value
Variability/sensitivity of model's predictions if we repeat the learning process many times with small perturbations in the training data.
Number of hidden layers - rule of thumb
\(N_{h} = \frac{N_{s}}{[a(N_{i} + N_{o})]}\)
\(N_{i}\) = number of input neurons
\(N_{o}\) = number of output neurons
\(N_{s}\) = number of samples in training data
\(a\) = arbitary scaling factor, usually 2-10
| node | label | shape | fillcolor | 
|---|---|---|---|
| Sstart | observed: data y | ellipse | green | 
| Send | ununobserved: model parameters θ | ellipse | red | 
| from | to | label | 
|---|---|---|
| Sstart | Send | statistical inference | 
| Send | Sstart | probability |