Random thoughts — Ideas regarding PhD topic

Ideas regarding PhD topic

By kirk86, Tuesday 14 February 2017, 0 comments.

Ideas:

Kernels run out of memory while NN's are compact function classes providing a trade off between storage vs training time computation.

Exploit the trades of both of the methodls and combine them for nonparametric statistical test, generative modes, message passing, bandit algorithms and other things that need good statistical analysis and flexible models.

Problem which still remains to be solved, is how to incorporate model decompositions efficiently into deep learning?

deep learning + spectral methods ==> How to combine them?

This can be done e.g. using some of the objective functions for from graphical models .e.g. Conditional Random Fields, Structured loss, anything similar

Differences between graphical models and deep learning:

graphical models are good if you've got a lot of variables and want to know how they depend on each other. Explains a lot about clustering, topic models, Bayesian nonparametrics, causality and message passing

deep learning is about understanding how to use them efficiently and which are the limitations. Statistical learning theory in this case is necessary to prove theorems about whether your algorithm works or not. You want to have a guarantee for what you're doing won't go wrong but you don't really want to use the theoremsfor parameter tuning.

LSTM's are latent variable auto-regressive models with some fine tuning to deal with vanishing gradients

Adverserial Environments hard to handle

The Master Algorithm by P.Domingos:

At the time of writing it has been identified that there are 5 different tribes, schools/paradigms regarding machine learning related to the way that each school or technique uses they preferred methodology or algorithm inside the machine learning community.

How do computers discover new knowledge?

Fill in gaps in existing knowledge
Emulate the brain
Simulate evolution
Systematically reduce uncertainty
Notice similarities between old and new

Tribe	Origins	Master Algorithm	People
Symbolists	Logic, philosophy	Inverse deduction	Tom Mitchel, Steve Muggleton, Ross Quinlan
Connectionists	Neuroscience	Backpropagation	LeCun, Hinton, Bengio
Evolutionaries	Evolutionary Biology	Genetic programming	John Koza, John Holland, Hod Lipson
Bayesians	Statistics	Probabilistic inference	David Heckerman, Judea Pearl, Michael Jordan
Analogizers	Psychology	Kernel machines	Peter Hart, V.Vapnik, Douglas Hofstadter

Putting pieces together:

Representation
- Probabilistic logic (e.g. Markov logic networks)
- Weighted formulas –> Distriubtion over states
Evaluation
- Posterior probability
- User defined objective function
Optimization
- Formula discovery: Genetic programming
- Weight learning: Backpropagation
Towards a universal learner
- New ideas and tribes are needed ==> ?

Grand unifying theory => unify all 5 learning tribes

Unifying representations => starting from theorist and Bayesians (logic and graphical models => has been done => Markov Logic Networks)

Start with a FOL rule if…then…
Give each rule a weight depending whether you believe it or not
Evaluation function. Find in the hypotheses space the candidate that maximizes or minimizes my evaluation function. In this case that's just the posterior that Bayesians use. It shouldn't be part of the algorithm. It should be provided by the user. The objective function to optimize should be given by the user.
How do we find the model to optimize the algorithm. When you have your formulas you have to come up with weights for optimizing those formulas i.e. Backprop.

Different projects:

Project 1: Methods for Semi-supervised Learning and Active Labeling How can we exploit unlabeled data for a supervised learning problem and how can we identify the most informative subset of examples to be annotated by an expert?

Project 2: Methods for Robust Feature Learning How can we learn robust features that remain maximally predictive even if the distribution of test data is very different from the distribution of training data?

Project 3: Calibrated Uncertainty Estimation How can we provide reliable confidence intervals for deep neural network predictions?