Видео 111
Просмотров 1 484 025

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

11:38

Shapley Values Explained | Interpretability for AI models, even LLMs!

9:59

Stealing Part of a Production LLM | API protect LLMs no more

18:49

Genie explained 🧞 Generative Interactive Environments paper explained

9:22

MAMBA and State Space Models explained | SSM explained

22:27

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

13:17

Supercharging RAG with Generative Feedback Loops from Weaviate

How to supercharge RAG applications? With Generative Feedback Loops: feed data from a database to your LLM, store the outputs back into the database with a vector embedding! Then, search through the generated data in near real-time for future applications.
In this video, we explain RAG, Generative Feedback Loops, give examples of applications that require them, and show you how to implement them with @Weaviate .
Weaviate (Sponsor) 👉 weaviate.io/
AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/
Outline:
00:00 Generative Feedback Loops Motivation
01:32 RAG explained
03:21 Generative Feedback Loops
05:09 Concrete example with Weaviate code
08:30 DSPy: More Applications of Generative Feed...

Видео

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

11:38

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Просмотров 8 тыс.Месяц назад

We explain GaLore, a new parameter-efficient training technique that outperforms LoRA in accuracy and supports both pre-training and fine-tuning. Now you can train LLMs without running out of GPU memory! You can even pre-train a LLaMA-7B from scratch on one 24GB GPU (NVIDIA RTX 4090), for example. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us i...

Shapley Values Explained | Interpretability for AI models, even LLMs!

9:59

Shapley Values Explained | Interpretability for AI models, even LLMs!

Просмотров 4,6 тыс.Месяц назад

Ever wondered how to interpret your machine learning models? 🤔 We explain a powerful interpretability technique for machine learning models: Shapley Values. They can be used to explain any model. 💻 We show a simple example code of how they work, and then � explain the theory behind them. AssemblyAI (Sponsor) 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.crea...

Stealing Part of a Production LLM | API protect LLMs no more

18:49

Stealing Part of a Production LLM | API protect LLMs no more

Просмотров 16 тыс.2 месяца назад

How it is possible to steal part of LLMs protected behind an API? 🥷 We explain both papers that made a breakthrough on this, one from Carlini et al. (Google), and the other one from Finlayson et al. (USC), see references below. SPONSOR: AssemblyAI 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📄 Carlini, Nicholas, Daniel Paleka, Krishnamu...

Genie explained 🧞 Generative Interactive Environments paper explained

9:22

Genie explained 🧞 Generative Interactive Environments paper explained

Просмотров 3,5 тыс.4 месяца назад

Genie just watches RUclips videos and inferred actions and learned to render environments! 🗺️ In this video, we explain the Genie paper from Google DeepMind. "Genie: Generative Interactive Environments" paper explained. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael Outline:...

MAMBA and State Space Models explained | SSM explained

22:27

MAMBA and State Space Models explained | SSM explained

Просмотров 41 тыс.4 месяца назад

We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBA...

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

13:17

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Просмотров 5 тыс.5 месяцев назад

Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Liu, Zichang, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava et al. "Deja vu: Contextual sparsity for efficient llms at inference time." In Internati...

Transformers explained | The architecture behind LLMs

19:48

Transformers explained | The architecture behind LLMs

Просмотров 20 тыс.5 месяцев назад

All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers. 9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionali...

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

8:55

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Просмотров 19 тыс.6 месяцев назад

Direct Preference Optimization (DPO) to finetune LLMs without reinforcement learning. DPO was one of the two Outstanding Main Track Runner-Up papers. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward mod...

LLM hallucinations discover new math solutions!? | FunSearch explained

11:36

LLM hallucinations discover new math solutions!? | FunSearch explained

Просмотров 12 тыс.6 месяцев назад

The unreasonable effectiveness of guided confabulation: Solving math problems with hallucinatory LLMs is now possible!? 🤯 We explain how Google DeepMind did it. Bonus: The answer of Fields Medalist 2022 Hugo Duminil-Copin at the last #HLF23 after I asked about AI helping mathematicians solve problems. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📚 FunSearch blog post: deepmind...

DALL-E 3 is better at following Text Prompts! Here is why. - DALL-E 3 explained

8:03

DALL-E 3 is better at following Text Prompts! Here is why. - DALL-E 3 explained

Просмотров 3,9 тыс.7 месяцев назад

Synthetic captions help DALL-E 3 follow text prompts better than DALL-E 2. We explain how OpenAI innovates the training of diffusion models with better image captions. ► Sponsor: Gradient 👉 gradient.1stcollab.com/aicoffeebreak 📜 „ Improving Image Generation with Better Captions“ James Betker et al., 2023 cdn.openai.com/papers/dall-e-3.pdf 📚 openai.com/dall-e-3 📜 The Google Paper about recaption...

Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23

13:06

Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23

Просмотров 2,9 тыс.8 месяцев назад

🎙️ Interview with David Stutz from Google DeepMind at the 10th HLF. We spoke about adversarial attacks and defenses for neural networks and about hypotheses that aim to explain their existence. The HLF is an annual gathering of 200 young researchers from math and computer science and laureates of the most prestigious awards in these two fields, such as the ACM Turing Award, ACM Prize in Computi...

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

8:22

What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

Просмотров 36 тыс.9 месяцев назад

How does LoRA work? Low-Rank Adaptation for Parameter-Efficient LLM Finetuning explained. Works for any other neural network as well, not just for LLMs. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 „Lora: Low-rank adaptation of large language models“ Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. and Chen, W., 2021. arxiv.org/abs/2106.09685 📚 sebas...

Are ChatBots their own death? | Training on Generated Data Makes Models Forget - Paper explained

11:53

Are ChatBots their own death? | Training on Generated Data Makes Models Forget - Paper explained

Просмотров 5 тыс.10 месяцев назад

If LLMs flood the Internet with content over the next years, they will likely sign their own death certificate. How likely is it that this happens and why is training on AI content so bad? ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Edvard Grødem, Vignesh Valliappan, Mutual Information, Kshitij 📜...

The first law on AI regulation | The EU AI Act

14:37

The first law on AI regulation | The EU AI Act

Просмотров 10 тыс.11 месяцев назад

The European Union is the first major regulator to propose a concrete law for regulating AI and it will surely not be the last. Let’s have a look at what’s in the draft of EU’s AI act and what it means for researchers, consumers, and citizens inside and outside the EU. ► Sponsor: AssemblyAI www.assemblyai.com/ 👉 LeMUR Blog: www.assemblyai.com/blog/lemur 👉 LeMUR Playground: www.assemblyai.com/pl...

Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP

50:36

Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP

Просмотров 3,8 тыс.11 месяцев назад

Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP

ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote

4:46

ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote

Просмотров 3,8 тыс.11 месяцев назад

ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote

[Own work] MM-SHAP to measure modality contributions

6:55

[Own work] MM-SHAP to measure modality contributions

Просмотров 4,1 тыс.Год назад

[Own work] MM-SHAP to measure modality contributions

Eight Things to Know about Large Language Models

14:46

Eight Things to Know about Large Language Models

Просмотров 18 тыс.Год назад

Eight Things to Know about Large Language Models

Moral Self-Correction in Large Language Models | paper explained

14:50

Moral Self-Correction in Large Language Models | paper explained

Просмотров 3,4 тыс.Год назад

Moral Self-Correction in Large Language Models | paper explained

AI beats us at another game: STRATEGO | DeepNash paper explained

16:39

AI beats us at another game: STRATEGO | DeepNash paper explained

Просмотров 4,2 тыс.Год назад

AI beats us at another game: STRATEGO | DeepNash paper explained

Why ChatGPT fails | Language Model Limitations EXPLAINED

11:35

Why ChatGPT fails | Language Model Limitations EXPLAINED

Просмотров 8 тыс.Год назад

Why ChatGPT fails | Language Model Limitations EXPLAINED

"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

16:05

"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

Просмотров 8 тыс.Год назад

"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?

Training learned optimizers: VeLO paper EXPLAINED

12:56

Training learned optimizers: VeLO paper EXPLAINED

Просмотров 5 тыс.Год назад

Training learned optimizers: VeLO paper EXPLAINED

16:23

ChatGPT vs Sparrow - Battle of Chatbots

Просмотров 23 тыс.Год назад

ChatGPT vs Sparrow - Battle of Chatbots

Paella: Text to image FASTER than diffusion models | Paella paper explained

10:12

Paella: Text to image FASTER than diffusion models | Paella paper explained

Просмотров 9 тыс.Год назад

Paella: Text to image FASTER than diffusion models | Paella paper explained

Generate long form video with Transformers | Phenaki from Google Brain explained

13:28

Generate long form video with Transformers | Phenaki from Google Brain explained

Просмотров 11 тыс.Год назад

Generate long form video with Transformers | Phenaki from Google Brain explained

Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain

14:38

Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain

Просмотров 14 тыс.Год назад

Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain

Beyond neural scaling laws - Paper Explained

13:16

Beyond neural scaling laws - Paper Explained

Просмотров 12 тыс.Год назад

Beyond neural scaling laws - Paper Explained

How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED

13:16

How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED

Просмотров 87 тыс.Год назад

How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED

@supanutsookkho2749 2 дня назад
Great video and a good explanation. Thanks for your hard work on this amazing video!!
@robert75019 8 дней назад
Very clear 👏👏
@AICoffeeBreak 8 дней назад
Thank you!
@uw10isplaya 9 дней назад
Had to go back and rewatch a section after I realized I'd been spacing out staring at the coffee bean's reactions.
@RyanRobots 9 дней назад
Thanks!
@AICoffeeBreak 9 дней назад
Thank you! Wow, this is an old video, now that I think about it. 😅
@maxvell77 10 дней назад
Most insightful explanation I have found on this subject so far. I was looking for it for days... Thank you! Keep going, you rock!
@AICoffeeBreak 10 дней назад
Thank you a lot! Also for the super thanks!
@maxvell77 10 дней назад
Thanks!
@AICoffeeBreak 10 дней назад
Oh, thank you!
@elpepemandioca 11 дней назад
Hey letitia. I think you should consider changing the logo. It kinda looks like a turd
@AICoffeeBreak 11 дней назад
@partywen 14 дней назад
Super informative and helpful! Thanks a lot!
@AICoffeeBreak 14 дней назад
Oh wow, thanks!
@daniyalkabir6527 15 дней назад
Excellent video
@heejuneAhn 16 дней назад
BEST of BEST Explanation. 1) Visually, 2) intuitively, 3) by numerical examples. And your English is better than native for Foreigners to listen.
@nogueirad 16 дней назад
Awesome! Thanks for sharing!
@user-wr4yl7tx3w 17 дней назад
but is the benefit worth the potential negatives? surely, there are tradeoffs here.
@marcfruchtman9473 17 дней назад
Thanks for the video. So, is Weaviate a database service that is leveraging the LLM to generate natural language queries to access the data? So, basically rather than hiring an SQL programmer, you just feed Weaviate the data (somehow), and use the LLM to gen the queries, which are then fed back to Weaviate so it can provide the query results?
@AICoffeeBreak 14 дней назад
Thanks for watching it and for your question! Weaviate is a vector database (we've made a whole video on that topic, it's linked in the description, if you're interested), which means in addition to storing structured and unstructured data, like a "classical" database, it also stores the vector embeddings of the items. This way you can find related terms and use natural language queries. The default way to interact with Weaviate as a developer would be through GraphQL, or one of the other APIs (e.g. Python). Do check out our video on the topic, if you want to know more details!
@marcfruchtman9473 14 дней назад
@@AICoffeeBreak Ah Thanks!
@TemporaryForstudy 17 дней назад
nice video. I am from India. I am working in the AI field. do you have any remote opportunities for me?
@kfliden 17 дней назад
Thanks, first video on probing that makes sense to me. But just wondering if probing is just for diagnostics or it's actual option for fine tuning in production?
@AICoffeeBreak 17 дней назад
When the representations of the model are really good, it might happen that probing (tuning just a linear head at the end) is enough. But most of the time, in production you need new model abilities and knowledge, so fine-tuning is often the option.
@Ben_D. 17 дней назад
☺
@femkeplantinga3726 17 дней назад
Great video! 🤌
@connorshorten6311 17 дней назад
Awesome!!
@DerPylz 17 дней назад
Cool! I love these hands-on videos with code examples! Thanks for sharing the notebook!
@vladimirtchuiev2218 18 дней назад
One of the selling points of Lora is to be able to mix and match the A and B matrices for different fine-tuning runs, without having to keep the weights of the full model if they are available elsewhere. Here it seems you have to save the entire model, so this is a big tradeoff compared to Lora and derivatives.
@renanmonteirobarbosa8129 19 дней назад
I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.
@brunofilipeaguiar 19 дней назад
yes
@AICoffeeBreak 20 дней назад
8:39 What I meant to say, is VQ-VAE, not VQGAN! Thanks to Luis Cunha for spotting this!
@soulfuljourney22 20 дней назад
Concept of rank of a matrix,tauught in such an effective way
@AICoffeeBreak 20 дней назад
Cheers!
@HomunMage 24 дня назад
I love this paper. The solution is elegant.
@nerisozen9029 25 дней назад
thanks! also very cute merch!
@AICoffeeBreak 25 дней назад
thanks!
@proterotype 28 дней назад
I ran into trouble trying to Fine Tune a Swin model with LoRA. That type of model isn’t supported yet for LoRA. I wonder if it’ll be the same for GaLoRA
@DeepakKori-vn8zr 28 дней назад
OMG, such a amazing video to explain Positional Embedding....
@AICoffeeBreak 20 дней назад
thank you!
@AaronNicholsonAI Месяц назад
Thank you so much! Super helpful.
@AICoffeeBreak Месяц назад
@ravindrasharma85 Месяц назад
excellent explanation!
@AICoffeeBreak Месяц назад
@vardhan254 Месяц назад
congrats leititia!!
@AICoffeeBreak 20 дней назад
Thank you!
@ArnaldurBjarnason Месяц назад
The size of a LoRA (on disk) is a fraction of the size of the model it's applied to. GaLore looses this benefit by updating the weights. If you have use for many fine tunes of a single model, two GaLore fine-tunes will take more space than ten LoRAs (depending on rank, to be fair). I assume they don't mention this very significant tradeoff, as you don't mention it in the video. That seems like a dishonest comparison, if that's the case.
@AICoffeeBreak Месяц назад
Fair. It's just that HDD / SSD storage is not considered a bottleneck, while the size of the GPUs surely is. The first is cheap and abundant (terabytes), the second one is very expensive and limited (tens of gigabytes).
@vinno97 14 дней назад
@@AICoffeeBreak Minor nitpick: It also allows you to fit multiple LoRa's in GPU memory at the same time and thus run inference on different finetunes at the same time P.S. Congrats on the PhD!
@cosmic_reef_17 Месяц назад
Great video!
@AICoffeeBreak Месяц назад
Thank you!
@timothywcrane Месяц назад
Can info shed during the lossy compression process be set aside in a non memory fashion for retrieval? Thinking out loud. Not always helpful. But state mapping would be interesting in the training process as well as post.
@learnsomethingnew1651 Месяц назад
Great video! Got a follow up question: what kind of finetuning is the finetuning provided by the openai API, where it finetunes a model based on a training set of Q&A pairs provided by the user?
@AICoffeeBreak Месяц назад
Ask Sam Altman? 😅 I didn't see any recent technical paper about fine-tuning from OpenAI, nor do they explain on their website what they do. They are too open for us to comprehend. Since they have large GPUs, it is safe to assume they are not forced to do parameter-efficient tuning like us noobs with Gaming GPUs.
@alexis-michelmugabushaka2297 Месяц назад
Congratulations Frau Doktor.
@SU3D3 Месяц назад
😘 LOL! 礼 Stay for the EPIC lipstick! M4D L0V3!
@AaronALAI Месяц назад
Holy frick what a perfectly concise video on galore! There is a GitHub implementation of the research from the paper, they are currently working on a multigpu implementation. I too am curious how well things scale up to modern and larger llms, and have a multigpu rig I want to test it out on.
@shahidjabbar5933 Месяц назад
Congratulations
@AICoffeeBreak Месяц назад
Thank you!
@IxCIHAoX Месяц назад
As far as i get it, i determine the gradients G and and also a low rank component P. That component allows me to „shrink“ the gradient matrix G i calculate at every step to down to R before applying it to W. So i do not save compute while calculating the gradients, but while applying and saving (as momentum or such) them?
@elinetshaaf75 Месяц назад
underwhelmed?
@IxCIHAoX Месяц назад
@@elinetshaaf75 on the contrary, i am afraid that i don’t quite get it😅
@MechanicumMinds Месяц назад
I never thought I'd say this, but I'm actually excited to learn about efficient training methods for deep learning models on consumer GPUs. Who knew running out of GPU memory could be so... enlightening? Thanks for explaining LoRA and Galore in a way that doesn't make my brain hurt (too much). Now, if you'll excuse me, I have some large language models to train or at least, try not to run out of GPU memory
@AICoffeeBreak Месяц назад
Cheers!
@jiahao2709 Месяц назад
Thank you a lot for your explain! may i know which software you using for making animation in your video? thanks!
@AICoffeeBreak Месяц назад
My editor uses Adobe Premiere Pro for video editing (this is also when Ms. Coffee Bean comes in). I use PowerPoint for all other visualisations.
@jiahao2709 Месяц назад
@@AICoffeeBreak really beautiful!
@JosephCatrambone Месяц назад
Congrats on the PhD! :D
@AICoffeeBreak Месяц назад
Thank you!
@vladimirnadvornik8254 Месяц назад
Do I understand it correctly that it can't work with quantization and the model must fit in memory in 16bit?
@TheRyulord Месяц назад
One important thing to note is that while this technique is more memory efficient it's also 17% slower in the setup the authors use. That's a pretty big deal, especially for pretraining.
@tornyu Месяц назад
But since it's more memory efficient, could you run more training in parallel, in a distributed or federated way?
@TheRyulord Месяц назад
@@tornyu Sure but it's still 17% more compute and that means 17% more money. With LLMs costing millions to train from scratch 17% more money is a big ask for pretraining.
@tornyu Месяц назад
But federated training could enable larger open source models, which until now could only be fine-tuned?
@amitgalor Месяц назад
I always liked Galor(e), though I might be biased.
@azertyQ Месяц назад
Congrats on your PhD! Could you use 'r' as a hyperparameter during pretraining as well? e.g. start pretraining with low r and gradually increase it as more precision is needed? I don't think it could do that much since gains are already very high at the start.
@AICoffeeBreak Месяц назад
Sure. Great idea.

AI Coffee Break with Letitia

Видео

Комментарии