thomas

Reading Group

I help organize a reading group on Deep Learning Theory.

Archived Schedule (we have graduated to a spreadsheet)

Date	Paper	Resources
06/07/2022	Principles of Deep Learning Theory Chapters 1 & 2	Chapter 1 Chapter 2
20/07/2022	PoDLT Chapter 3	Chapter 3 Chapter 4
27/07/2022	We decided to change approach and read papers
03/08/2022	Neural Tangent Kernel	https://rajatvd.github.io/NTK/ Notes
17/08/2022	Multilayer Feedforward Networks are Universal Approximators	Notes
31/08/2022	Explaining Neural Scaling Laws
22/09/2022	Git Re-Basin
05/10/2022	Monte Carlo Gradient Estimation in Machine Learning Section 4	Notes
12/10/2022	Monte Carlo Gradient Estimation in Machine Learning Section 5	Notes
19/10/2022	Monte Carlo Gradient Estimation in Machine Learning Section 7…	Notes
26/10/2022	Toy Models of Superposition Section 1,2 & 3	Notes
16/11/2022	Toy Models of Superposition Section 4,5 & 6
23/11/2022	Toy Models of Superposition Section 7,8 & 9,10
30/11/2022	Gradient Estimation with Discrete Stein Operators
07/12/2022	Exact learning dynamics of deep linear networks with prior knowledge
04/01/2023	Neural networks and physical systems with emergent collective computational abilities (Hopfield Networks)	Notes
11/01/2023	CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 1
18/01/2023	CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 2
11/01/2023	CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 3
25/01/2023	CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 4
01/02/2023	CSC2541 Winter 2022 Topics in Machine Learning: Neural Net Training Dynamics Lecture 5
15/03/2023	Understanding the Diffusion Objective as a Weighted Integral of ELBOs
22/03/2023	Laplace Redux – Effortless Bayesian Deep Learning
29/03/2023	A Theory on Adam Instability in Large-Scale Machine Learning
12/04/2023	Sigma-Reparam: Stable Transformer Training with Spectral Reparametrization
24/05/2023	Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
31/05/2023	Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent
7/06/2023	Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
14/06/2023	QLORA: Efficient Finetuning of Quantized LLMs
09/08/2023	Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
23/08/2023	The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
20/09/2023	Transformers as Support Vector Machines
27/09/2023	Flat Minima
04/10/2023	Language Modeling Is Compression
11/10/2023	Efficient Streaming Language Models with Attention Sinks
25/10/2023	SGPT: GPT Sentence Embeddings for Semantic Search
01/11/2023	Adam through a Second-Order Lens
08/11/2023	Maximum a Posteriori Policy Optimisation
22/11/2023	Simplifying Transformer Blocks