MetaLearning Learning Note - 4
- optimization based on meta learning
- non-parametric few-shot learning
- properties of meta learning algorithms.
Recap Optimization based meta learning
-
Fine-tuning => MAML our pre-trained parameters will be fine tuned in the test process. . In this way, these two varaibles are the same value.
-
Probabilistic Interpretation of Optimization-Based Interference
meta-parameter serve as prior;
And if we use to initalize , we can have this formula:
as the empirical bayes.

- Optimizaton-Based Inference
Acquire through optimization
initialization for fine-tuning
gradient decent + early stopping == MAML
Challenges:
- NAS is also important.
- Bi level optimization can exhibit instabilities - metaRL learn inner vector learning rate - MetaSGD, AlphaMAML
- Backpropagating through many inner GD is compute & memory intensive
we can compute the gradient of call the meta gradient implicit function theory.
Can we compute the meta-gradient without differentiating through the optimization path?
Idea: Derive meta-gradient using the implicit function theorem.
Allows for second order optimizers in inenr loop.
Non parametric few shot learning
In low data regimes, non-parametric methods are simple work well.

Learn how to compare using your meta-learning traning for your new tasks
- Siamese network to predict whether or not two images are the same class

but this way, in meta-training: binary classification, but in the metatest, we need to do N-binary classification which means N-way classification.
we do not need to use meta-learner to learn a meta parameter. And do not need to use meta parameter to generate parameter for meta test.

- idea: learn non-linear relation module on embeddings.
- idea: learn infinite mxitre of prototypes
- idea: Perform relationships like nodes in the graph
Properties of meta learning algorithms
- Comparison of approaches
- computation graph perspective We can mix & match complonents, e.g. embedding + meta-training or MAML + Prototypes
- Algorithmic Properties Perspective
- Expresive power the ability for f to represnet a range of learning procedures
- Consistency learned learning procedure will solve task with enough data. Reduce reliance Black box - no inductive bias at the initialization
- Uncertainty awareness ability to reason about ambiguity during learning. principled Bayesian approaches.