- Видео 111
- Просмотров 1 484 025
AI Coffee Break with Letitia
Германия
Добавлен 26 апр 2020
Lighthearted bite-sized ML videos for your AI Coffee Break! 📺 Mostly videos about the latest technical advancements in AI, such as large language models (LLMs), text-to-image models and everything cool in natural language processing, computer vision, etc.!
We try to post twice a month, if not more often! 🤞 But, you know, a PhD thesis has still to be worked on.
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
Impressum: aicoffeebreak.com/impressum.html
We try to post twice a month, if not more often! 🤞 But, you know, a PhD thesis has still to be worked on.
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
Impressum: aicoffeebreak.com/impressum.html
Supercharging RAG with Generative Feedback Loops from Weaviate
How to supercharge RAG applications? With Generative Feedback Loops: feed data from a database to your LLM, store the outputs back into the database with a vector embedding! Then, search through the generated data in near real-time for future applications.
In this video, we explain RAG, Generative Feedback Loops, give examples of applications that require them, and show you how to implement them with @Weaviate .
Weaviate (Sponsor) 👉 weaviate.io/
AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/
Outline:
00:00 Generative Feedback Loops Motivation
01:32 RAG explained
03:21 Generative Feedback Loops
05:09 Concrete example with Weaviate code
08:30 DSPy: More Applications of Generative Feed...
In this video, we explain RAG, Generative Feedback Loops, give examples of applications that require them, and show you how to implement them with @Weaviate .
Weaviate (Sponsor) 👉 weaviate.io/
AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/
Outline:
00:00 Generative Feedback Loops Motivation
01:32 RAG explained
03:21 Generative Feedback Loops
05:09 Concrete example with Weaviate code
08:30 DSPy: More Applications of Generative Feed...
Просмотров: 3 367
Видео
GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Просмотров 8 тыс.Месяц назад
We explain GaLore, a new parameter-efficient training technique that outperforms LoRA in accuracy and supports both pre-training and fine-tuning. Now you can train LLMs without running out of GPU memory! You can even pre-train a LLaMA-7B from scratch on one 24GB GPU (NVIDIA RTX 4090), for example. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us i...
Shapley Values Explained | Interpretability for AI models, even LLMs!
Просмотров 4,6 тыс.Месяц назад
Ever wondered how to interpret your machine learning models? 🤔 We explain a powerful interpretability technique for machine learning models: Shapley Values. They can be used to explain any model. 💻 We show a simple example code of how they work, and then explain the theory behind them. AssemblyAI (Sponsor) 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.crea...
Stealing Part of a Production LLM | API protect LLMs no more
Просмотров 16 тыс.2 месяца назад
How it is possible to steal part of LLMs protected behind an API? 🥷 We explain both papers that made a breakthrough on this, one from Carlini et al. (Google), and the other one from Finlayson et al. (USC), see references below. SPONSOR: AssemblyAI 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📄 Carlini, Nicholas, Daniel Paleka, Krishnamu...
Genie explained 🧞 Generative Interactive Environments paper explained
Просмотров 3,5 тыс.4 месяца назад
Genie just watches RUclips videos and inferred actions and learned to render environments! 🗺️ In this video, we explain the Genie paper from Google DeepMind. "Genie: Generative Interactive Environments" paper explained. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael Outline:...
MAMBA and State Space Models explained | SSM explained
Просмотров 41 тыс.4 месяца назад
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBA...
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
Просмотров 5 тыс.5 месяцев назад
Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Liu, Zichang, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava et al. "Deja vu: Contextual sparsity for efficient llms at inference time." In Internati...
Transformers explained | The architecture behind LLMs
Просмотров 20 тыс.5 месяцев назад
All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers. 9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionali...
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
Просмотров 19 тыс.6 месяцев назад
Direct Preference Optimization (DPO) to finetune LLMs without reinforcement learning. DPO was one of the two Outstanding Main Track Runner-Up papers. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward mod...
LLM hallucinations discover new math solutions!? | FunSearch explained
Просмотров 12 тыс.6 месяцев назад
The unreasonable effectiveness of guided confabulation: Solving math problems with hallucinatory LLMs is now possible!? 🤯 We explain how Google DeepMind did it. Bonus: The answer of Fields Medalist 2022 Hugo Duminil-Copin at the last #HLF23 after I asked about AI helping mathematicians solve problems. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📚 FunSearch blog post: deepmind...
DALL-E 3 is better at following Text Prompts! Here is why. - DALL-E 3 explained
Просмотров 3,9 тыс.7 месяцев назад
Synthetic captions help DALL-E 3 follow text prompts better than DALL-E 2. We explain how OpenAI innovates the training of diffusion models with better image captions. ► Sponsor: Gradient 👉 gradient.1stcollab.com/aicoffeebreak 📜 „ Improving Image Generation with Better Captions“ James Betker et al., 2023 cdn.openai.com/papers/dall-e-3.pdf 📚 openai.com/dall-e-3 📜 The Google Paper about recaption...
Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23
Просмотров 2,9 тыс.8 месяцев назад
🎙️ Interview with David Stutz from Google DeepMind at the 10th HLF. We spoke about adversarial attacks and defenses for neural networks and about hypotheses that aim to explain their existence. The HLF is an annual gathering of 200 young researchers from math and computer science and laureates of the most prestigious awards in these two fields, such as the ACM Turing Award, ACM Prize in Computi...
What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED
Просмотров 36 тыс.9 месяцев назад
How does LoRA work? Low-Rank Adaptation for Parameter-Efficient LLM Finetuning explained. Works for any other neural network as well, not just for LLMs. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 „Lora: Low-rank adaptation of large language models“ Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. and Chen, W., 2021. arxiv.org/abs/2106.09685 📚 sebas...
Are ChatBots their own death? | Training on Generated Data Makes Models Forget - Paper explained
Просмотров 5 тыс.10 месяцев назад
If LLMs flood the Internet with content over the next years, they will likely sign their own death certificate. How likely is it that this happens and why is training on AI content so bad? ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Edvard Grødem, Vignesh Valliappan, Mutual Information, Kshitij 📜...
The first law on AI regulation | The EU AI Act
Просмотров 10 тыс.11 месяцев назад
The European Union is the first major regulator to propose a concrete law for regulating AI and it will surely not be the last. Let’s have a look at what’s in the draft of EU’s AI act and what it means for researchers, consumers, and citizens inside and outside the EU. ► Sponsor: AssemblyAI www.assemblyai.com/ 👉 LeMUR Blog: www.assemblyai.com/blog/lemur 👉 LeMUR Playground: www.assemblyai.com/pl...
Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP
Просмотров 3,8 тыс.11 месяцев назад
Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP
ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote
Просмотров 3,8 тыс.11 месяцев назад
ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote
[Own work] MM-SHAP to measure modality contributions
Просмотров 4,1 тыс.Год назад
[Own work] MM-SHAP to measure modality contributions
Eight Things to Know about Large Language Models
Просмотров 18 тыс.Год назад
Eight Things to Know about Large Language Models
Moral Self-Correction in Large Language Models | paper explained
Просмотров 3,4 тыс.Год назад
Moral Self-Correction in Large Language Models | paper explained
AI beats us at another game: STRATEGO | DeepNash paper explained
Просмотров 4,2 тыс.Год назад
AI beats us at another game: STRATEGO | DeepNash paper explained
Why ChatGPT fails | Language Model Limitations EXPLAINED
Просмотров 8 тыс.Год назад
Why ChatGPT fails | Language Model Limitations EXPLAINED
"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?
Просмотров 8 тыс.Год назад
"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?
Training learned optimizers: VeLO paper EXPLAINED
Просмотров 5 тыс.Год назад
Training learned optimizers: VeLO paper EXPLAINED
ChatGPT vs Sparrow - Battle of Chatbots
Просмотров 23 тыс.Год назад
ChatGPT vs Sparrow - Battle of Chatbots
Paella: Text to image FASTER than diffusion models | Paella paper explained
Просмотров 9 тыс.Год назад
Paella: Text to image FASTER than diffusion models | Paella paper explained
Generate long form video with Transformers | Phenaki from Google Brain explained
Просмотров 11 тыс.Год назад
Generate long form video with Transformers | Phenaki from Google Brain explained
Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain
Просмотров 14 тыс.Год назад
Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain
Beyond neural scaling laws - Paper Explained
Просмотров 12 тыс.Год назад
Beyond neural scaling laws - Paper Explained
How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED
Просмотров 87 тыс.Год назад
How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED
Great video and a good explanation. Thanks for your hard work on this amazing video!!
Very clear 👏👏
Thank you!
Had to go back and rewatch a section after I realized I'd been spacing out staring at the coffee bean's reactions.
Thanks!
Thank you! Wow, this is an old video, now that I think about it. 😅
Most insightful explanation I have found on this subject so far. I was looking for it for days... Thank you! Keep going, you rock!
Thank you a lot! Also for the super thanks!
Thanks!
Oh, thank you!
Hey letitia. I think you should consider changing the logo. It kinda looks like a turd
Super informative and helpful! Thanks a lot!
Oh wow, thanks!
Excellent video
BEST of BEST Explanation. 1) Visually, 2) intuitively, 3) by numerical examples. And your English is better than native for Foreigners to listen.
Awesome! Thanks for sharing!
but is the benefit worth the potential negatives? surely, there are tradeoffs here.
Thanks for the video. So, is Weaviate a database service that is leveraging the LLM to generate natural language queries to access the data? So, basically rather than hiring an SQL programmer, you just feed Weaviate the data (somehow), and use the LLM to gen the queries, which are then fed back to Weaviate so it can provide the query results?
Thanks for watching it and for your question! Weaviate is a vector database (we've made a whole video on that topic, it's linked in the description, if you're interested), which means in addition to storing structured and unstructured data, like a "classical" database, it also stores the vector embeddings of the items. This way you can find related terms and use natural language queries. The default way to interact with Weaviate as a developer would be through GraphQL, or one of the other APIs (e.g. Python). Do check out our video on the topic, if you want to know more details!
@@AICoffeeBreak Ah Thanks!
nice video. I am from India. I am working in the AI field. do you have any remote opportunities for me?
Thanks, first video on probing that makes sense to me. But just wondering if probing is just for diagnostics or it's actual option for fine tuning in production?
When the representations of the model are really good, it might happen that probing (tuning just a linear head at the end) is enough. But most of the time, in production you need new model abilities and knowledge, so fine-tuning is often the option.
☺
Great video! 🤌
Awesome!!
Cool! I love these hands-on videos with code examples! Thanks for sharing the notebook!
One of the selling points of Lora is to be able to mix and match the A and B matrices for different fine-tuning runs, without having to keep the weights of the full model if they are available elsewhere. Here it seems you have to save the entire model, so this is a big tradeoff compared to Lora and derivatives.
I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.
yes
8:39 What I meant to say, is VQ-VAE, not VQGAN! Thanks to Luis Cunha for spotting this!
Concept of rank of a matrix,tauught in such an effective way
Cheers!
I love this paper. The solution is elegant.
thanks! also very cute merch!
thanks!
I ran into trouble trying to Fine Tune a Swin model with LoRA. That type of model isn’t supported yet for LoRA. I wonder if it’ll be the same for GaLoRA
OMG, such a amazing video to explain Positional Embedding....
thank you!
Thank you so much! Super helpful.
excellent explanation!
congrats leititia!!
Thank you!
The size of a LoRA (on disk) is a fraction of the size of the model it's applied to. GaLore looses this benefit by updating the weights. If you have use for many fine tunes of a single model, two GaLore fine-tunes will take more space than ten LoRAs (depending on rank, to be fair). I assume they don't mention this very significant tradeoff, as you don't mention it in the video. That seems like a dishonest comparison, if that's the case.
Fair. It's just that HDD / SSD storage is not considered a bottleneck, while the size of the GPUs surely is. The first is cheap and abundant (terabytes), the second one is very expensive and limited (tens of gigabytes).
@@AICoffeeBreak Minor nitpick: It also allows you to fit multiple LoRa's in GPU memory at the same time and thus run inference on different finetunes at the same time P.S. Congrats on the PhD!
Great video!
Thank you!
Can info shed during the lossy compression process be set aside in a non memory fashion for retrieval? Thinking out loud. Not always helpful. But state mapping would be interesting in the training process as well as post.
Great video! Got a follow up question: what kind of finetuning is the finetuning provided by the openai API, where it finetunes a model based on a training set of Q&A pairs provided by the user?
Ask Sam Altman? 😅 I didn't see any recent technical paper about fine-tuning from OpenAI, nor do they explain on their website what they do. They are too open for us to comprehend. Since they have large GPUs, it is safe to assume they are not forced to do parameter-efficient tuning like us noobs with Gaming GPUs.
Congratulations Frau Doktor.
😘 LOL! 礼 Stay for the EPIC lipstick! M4D L0V3!
Holy frick what a perfectly concise video on galore! There is a GitHub implementation of the research from the paper, they are currently working on a multigpu implementation. I too am curious how well things scale up to modern and larger llms, and have a multigpu rig I want to test it out on.
Congratulations
Thank you!
As far as i get it, i determine the gradients G and and also a low rank component P. That component allows me to „shrink“ the gradient matrix G i calculate at every step to down to R before applying it to W. So i do not save compute while calculating the gradients, but while applying and saving (as momentum or such) them?
underwhelmed?
@@elinetshaaf75 on the contrary, i am afraid that i don’t quite get it😅
I never thought I'd say this, but I'm actually excited to learn about efficient training methods for deep learning models on consumer GPUs. Who knew running out of GPU memory could be so... enlightening? Thanks for explaining LoRA and Galore in a way that doesn't make my brain hurt (too much). Now, if you'll excuse me, I have some large language models to train or at least, try not to run out of GPU memory
Cheers!
Thank you a lot for your explain! may i know which software you using for making animation in your video? thanks!
My editor uses Adobe Premiere Pro for video editing (this is also when Ms. Coffee Bean comes in). I use PowerPoint for all other visualisations.
@@AICoffeeBreak really beautiful!
Congrats on the PhD! :D
Thank you!
Do I understand it correctly that it can't work with quantization and the model must fit in memory in 16bit?
One important thing to note is that while this technique is more memory efficient it's also 17% slower in the setup the authors use. That's a pretty big deal, especially for pretraining.
But since it's more memory efficient, could you run more training in parallel, in a distributed or federated way?
@@tornyu Sure but it's still 17% more compute and that means 17% more money. With LLMs costing millions to train from scratch 17% more money is a big ask for pretraining.
But federated training could enable larger open source models, which until now could only be fine-tuned?
I always liked Galor(e), though I might be biased.
Congrats on your PhD! Could you use 'r' as a hyperparameter during pretraining as well? e.g. start pretraining with low r and gradually increase it as more precision is needed? I don't think it could do that much since gains are already very high at the start.
Sure. Great idea.