06/07/2022 |
Principles of Deep Learning Theory Chapters 1 & 2 |
Chapter 1 Chapter 2 |
20/07/2022 |
PoDLT Chapter 3 |
Chapter 3 Chapter 4 |
27/07/2022 |
We decided to change approach and read papers |
|
03/08/2022 |
Neural Tangent Kernel |
https://rajatvd.github.io/NTK/ Notes |
17/08/2022 |
Multilayer Feedforward Networks are Universal Approximators |
Notes |
31/08/2022 |
Explaining Neural Scaling Laws |
|
22/09/2022 |
Git Re-Basin |
|
05/10/2022 |
Monte Carlo Gradient Estimation in Machine Learning Section 4 |
Notes |
12/10/2022 |
Monte Carlo Gradient Estimation in Machine Learning Section 5 |
Notes |
19/10/2022 |
Monte Carlo Gradient Estimation in Machine Learning Section 7… |
Notes |
26/10/2022 |
Toy Models of Superposition Section 1,2 & 3 |
Notes |
16/11/2022 |
Toy Models of Superposition Section 4,5 & 6 |
|
23/11/2022 |
Toy Models of Superposition Section 7,8 & 9,10 |
|
30/11/2022 |
Gradient Estimation with Discrete Stein Operators |
|
07/12/2022 |
Exact learning dynamics of deep linear networks with prior knowledge |
|
04/01/2023 |
Neural networks and physical systems with emergent collective computational abilities (Hopfield Networks) |
Notes |
11/01/2023 |
CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 1 |
|
18/01/2023 |
CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 2 |
|
11/01/2023 |
CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 3 |
|
25/01/2023 |
CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 4 |
|
01/02/2023 |
CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 5 |
|
15/03/2023 |
Understanding the Diffusion Objective as a Weighted Integral of ELBOs |
|
22/03/2023 |
Laplace Redux – Effortless Bayesian Deep Learning |
|
29/03/2023 |
A Theory on Adam Instability in Large-Scale Machine Learning |
|
12/04/2023 |
Sigma-Reparam: Stable Transformer Training with Spectral Reparametrization |
|
24/05/2023 |
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning |
|
31/05/2023 |
Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent |
|
7/06/2023 |
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training |
|
14/06/2023 |
QLORA: Efficient Finetuning of Quantized LLMs |
|
09/08/2023 |
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent |
|
23/08/2023 |
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning |
|
20/09/2023 |
Transformers as Support Vector Machines |
|
27/09/2023 |
Flat Minima |
|
04/10/2023 |
Language Modeling Is Compression |
|
11/10/2023 |
Efficient Streaming Language Models with Attention Sinks |
|
25/10/2023 |
SGPT: GPT Sentence Embeddings for Semantic Search |
|
01/11/2023 |
Adam through a Second-Order Lens |
|
08/11/2023 |
Maximum a Posteriori Policy Optimisation |
|
22/11/2023 |
Simplifying Transformer Blocks |
|