AI Coffee Break with Letitia
AI Coffee Break with Letitia
  • Видео 111
  • Просмотров 1 484 025
Supercharging RAG with Generative Feedback Loops from Weaviate
How to supercharge RAG applications? With Generative Feedback Loops: feed data from a database to your LLM, store the outputs back into the database with a vector embedding! Then, search through the generated data in near real-time for future applications.
In this video, we explain RAG, Generative Feedback Loops, give examples of applications that require them, and show you how to implement them with @Weaviate .
Weaviate (Sponsor) 👉 weaviate.io/
AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/
Outline:
00:00 Generative Feedback Loops Motivation
01:32 RAG explained
03:21 Generative Feedback Loops
05:09 Concrete example with Weaviate code
08:30 DSPy: More Applications of Generative Feed...
Просмотров: 3 367

Видео

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Просмотров 8 тыс.Месяц назад
We explain GaLore, a new parameter-efficient training technique that outperforms LoRA in accuracy and supports both pre-training and fine-tuning. Now you can train LLMs without running out of GPU memory! You can even pre-train a LLaMA-7B from scratch on one 24GB GPU (NVIDIA RTX 4090), for example. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us i...
Shapley Values Explained | Interpretability for AI models, even LLMs!
Просмотров 4,6 тыс.Месяц назад
Ever wondered how to interpret your machine learning models? 🤔 We explain a powerful interpretability technique for machine learning models: Shapley Values. They can be used to explain any model. 💻 We show a simple example code of how they work, and then explain the theory behind them. AssemblyAI (Sponsor) 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.crea...
Stealing Part of a Production LLM | API protect LLMs no more
Просмотров 16 тыс.2 месяца назад
How it is possible to steal part of LLMs protected behind an API? 🥷 We explain both papers that made a breakthrough on this, one from Carlini et al. (Google), and the other one from Finlayson et al. (USC), see references below. SPONSOR: AssemblyAI 👉 www.assemblyai.com/research/universal-1/? AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📄 Carlini, Nicholas, Daniel Paleka, Krishnamu...
Genie explained 🧞 Generative Interactive Environments paper explained
Просмотров 3,5 тыс.4 месяца назад
Genie just watches RUclips videos and inferred actions and learned to render environments! 🗺️ In this video, we explain the Genie paper from Google DeepMind. "Genie: Generative Interactive Environments" paper explained. AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael Outline:...
MAMBA and State Space Models explained | SSM explained
Просмотров 41 тыс.4 месяца назад
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBA...
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
Просмотров 5 тыс.5 месяцев назад
Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Liu, Zichang, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava et al. "Deja vu: Contextual sparsity for efficient llms at inference time." In Internati...
Transformers explained | The architecture behind LLMs
Просмотров 20 тыс.5 месяцев назад
All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers. 9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionali...
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
Просмотров 19 тыс.6 месяцев назад
Direct Preference Optimization (DPO) to finetune LLMs without reinforcement learning. DPO was one of the two Outstanding Main Track Runner-Up papers. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward mod...
LLM hallucinations discover new math solutions!? | FunSearch explained
Просмотров 12 тыс.6 месяцев назад
The unreasonable effectiveness of guided confabulation: Solving math problems with hallucinatory LLMs is now possible!? 🤯 We explain how Google DeepMind did it. Bonus: The answer of Fields Medalist 2022 Hugo Duminil-Copin at the last #HLF23 after I asked about AI helping mathematicians solve problems. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📚 FunSearch blog post: deepmind...
DALL-E 3 is better at following Text Prompts! Here is why. - DALL-E 3 explained
Просмотров 3,9 тыс.7 месяцев назад
Synthetic captions help DALL-E 3 follow text prompts better than DALL-E 2. We explain how OpenAI innovates the training of diffusion models with better image captions. ► Sponsor: Gradient 👉 gradient.1stcollab.com/aicoffeebreak 📜 „ Improving Image Generation with Better Captions“ James Betker et al., 2023 cdn.openai.com/papers/dall-e-3.pdf 📚 openai.com/dall-e-3 📜 The Google Paper about recaption...
Adversarial Attacks and Defenses. The Dimpled Manifold Hypothesis. David Stutz from DeepMind #HLF23
Просмотров 2,9 тыс.8 месяцев назад
🎙️ Interview with David Stutz from Google DeepMind at the 10th HLF. We spoke about adversarial attacks and defenses for neural networks and about hypotheses that aim to explain their existence. The HLF is an annual gathering of 200 young researchers from math and computer science and laureates of the most prestigious awards in these two fields, such as the ACM Turing Award, ACM Prize in Computi...
What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED
Просмотров 36 тыс.9 месяцев назад
How does LoRA work? Low-Rank Adaptation for Parameter-Efficient LLM Finetuning explained. Works for any other neural network as well, not just for LLMs. ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ 📜 „Lora: Low-rank adaptation of large language models“ Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L. and Chen, W., 2021. arxiv.org/abs/2106.09685 📚 sebas...
Are ChatBots their own death? | Training on Generated Data Makes Models Forget - Paper explained
Просмотров 5 тыс.10 месяцев назад
If LLMs flood the Internet with content over the next years, they will likely sign their own death certificate. How likely is it that this happens and why is training on AI content so bad? ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Edvard Grødem, Vignesh Valliappan, Mutual Information, Kshitij 📜...
The first law on AI regulation | The EU AI Act
Просмотров 10 тыс.11 месяцев назад
The European Union is the first major regulator to propose a concrete law for regulating AI and it will surely not be the last. Let’s have a look at what’s in the draft of EU’s AI act and what it means for researchers, consumers, and citizens inside and outside the EU. ► Sponsor: AssemblyAI www.assemblyai.com/ 👉 LeMUR Blog: www.assemblyai.com/blog/lemur 👉 LeMUR Playground: www.assemblyai.com/pl...
Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP
Просмотров 3,8 тыс.11 месяцев назад
Author Interviews, Poster Highlights, Summary of the ACL 2023 Toronto NLP
ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote
Просмотров 3,8 тыс.11 месяцев назад
ChatGPT ist not an intelligent agent. It is a cultural technology. - Gopnik Keynote
[Own work] MM-SHAP to measure modality contributions
Просмотров 4,1 тыс.Год назад
[Own work] MM-SHAP to measure modality contributions
Eight Things to Know about Large Language Models
Просмотров 18 тыс.Год назад
Eight Things to Know about Large Language Models
Moral Self-Correction in Large Language Models | paper explained
Просмотров 3,4 тыс.Год назад
Moral Self-Correction in Large Language Models | paper explained
AI beats us at another game: STRATEGO | DeepNash paper explained
Просмотров 4,2 тыс.Год назад
AI beats us at another game: STRATEGO | DeepNash paper explained
Why ChatGPT fails | Language Model Limitations EXPLAINED
Просмотров 8 тыс.Год назад
Why ChatGPT fails | Language Model Limitations EXPLAINED
"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?
Просмотров 8 тыс.Год назад
"Watermarking Language Models" paper and GPTZero EXPLAINED | How to detect text by ChatGPT?
Training learned optimizers: VeLO paper EXPLAINED
Просмотров 5 тыс.Год назад
Training learned optimizers: VeLO paper EXPLAINED
ChatGPT vs Sparrow - Battle of Chatbots
Просмотров 23 тыс.Год назад
ChatGPT vs Sparrow - Battle of Chatbots
Paella: Text to image FASTER than diffusion models | Paella paper explained
Просмотров 9 тыс.Год назад
Paella: Text to image FASTER than diffusion models | Paella paper explained
Generate long form video with Transformers | Phenaki from Google Brain explained
Просмотров 11 тыс.Год назад
Generate long form video with Transformers | Phenaki from Google Brain explained
Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain
Просмотров 14 тыс.Год назад
Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain
Beyond neural scaling laws - Paper Explained
Просмотров 12 тыс.Год назад
Beyond neural scaling laws - Paper Explained
How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED
Просмотров 87 тыс.Год назад
How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED

Комментарии

  • @supanutsookkho2749
    @supanutsookkho2749 2 дня назад

    Great video and a good explanation. Thanks for your hard work on this amazing video!!

  • @robert75019
    @robert75019 8 дней назад

    Very clear 👏👏

  • @uw10isplaya
    @uw10isplaya 9 дней назад

    Had to go back and rewatch a section after I realized I'd been spacing out staring at the coffee bean's reactions.

  • @RyanRobots
    @RyanRobots 9 дней назад

    Thanks!

    • @AICoffeeBreak
      @AICoffeeBreak 9 дней назад

      Thank you! Wow, this is an old video, now that I think about it. 😅

  • @maxvell77
    @maxvell77 10 дней назад

    Most insightful explanation I have found on this subject so far. I was looking for it for days... Thank you! Keep going, you rock!

    • @AICoffeeBreak
      @AICoffeeBreak 10 дней назад

      Thank you a lot! Also for the super thanks!

  • @maxvell77
    @maxvell77 10 дней назад

    Thanks!

  • @elpepemandioca
    @elpepemandioca 11 дней назад

    Hey letitia. I think you should consider changing the logo. It kinda looks like a turd

  • @partywen
    @partywen 14 дней назад

    Super informative and helpful! Thanks a lot!

  • @daniyalkabir6527
    @daniyalkabir6527 15 дней назад

    Excellent video

  • @heejuneAhn
    @heejuneAhn 16 дней назад

    BEST of BEST Explanation. 1) Visually, 2) intuitively, 3) by numerical examples. And your English is better than native for Foreigners to listen.

  • @nogueirad
    @nogueirad 16 дней назад

    Awesome! Thanks for sharing!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 17 дней назад

    but is the benefit worth the potential negatives? surely, there are tradeoffs here.

  • @marcfruchtman9473
    @marcfruchtman9473 17 дней назад

    Thanks for the video. So, is Weaviate a database service that is leveraging the LLM to generate natural language queries to access the data? So, basically rather than hiring an SQL programmer, you just feed Weaviate the data (somehow), and use the LLM to gen the queries, which are then fed back to Weaviate so it can provide the query results?

    • @AICoffeeBreak
      @AICoffeeBreak 14 дней назад

      Thanks for watching it and for your question! Weaviate is a vector database (we've made a whole video on that topic, it's linked in the description, if you're interested), which means in addition to storing structured and unstructured data, like a "classical" database, it also stores the vector embeddings of the items. This way you can find related terms and use natural language queries. The default way to interact with Weaviate as a developer would be through GraphQL, or one of the other APIs (e.g. Python). Do check out our video on the topic, if you want to know more details!

    • @marcfruchtman9473
      @marcfruchtman9473 14 дней назад

      @@AICoffeeBreak Ah Thanks!

  • @TemporaryForstudy
    @TemporaryForstudy 17 дней назад

    nice video. I am from India. I am working in the AI field. do you have any remote opportunities for me?

  • @kfliden
    @kfliden 17 дней назад

    Thanks, first video on probing that makes sense to me. But just wondering if probing is just for diagnostics or it's actual option for fine tuning in production?

    • @AICoffeeBreak
      @AICoffeeBreak 17 дней назад

      When the representations of the model are really good, it might happen that probing (tuning just a linear head at the end) is enough. But most of the time, in production you need new model abilities and knowledge, so fine-tuning is often the option.

  • @Ben_D.
    @Ben_D. 17 дней назад

  • @femkeplantinga3726
    @femkeplantinga3726 17 дней назад

    Great video! 🤌

  • @connorshorten6311
    @connorshorten6311 17 дней назад

    Awesome!!

  • @DerPylz
    @DerPylz 17 дней назад

    Cool! I love these hands-on videos with code examples! Thanks for sharing the notebook!

  • @vladimirtchuiev2218
    @vladimirtchuiev2218 18 дней назад

    One of the selling points of Lora is to be able to mix and match the A and B matrices for different fine-tuning runs, without having to keep the weights of the full model if they are available elsewhere. Here it seems you have to save the entire model, so this is a big tradeoff compared to Lora and derivatives.

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 19 дней назад

    I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.

  • @brunofilipeaguiar
    @brunofilipeaguiar 19 дней назад

    yes

  • @AICoffeeBreak
    @AICoffeeBreak 20 дней назад

    8:39 What I meant to say, is VQ-VAE, not VQGAN! Thanks to Luis Cunha for spotting this!

  • @soulfuljourney22
    @soulfuljourney22 20 дней назад

    Concept of rank of a matrix,tauught in such an effective way

  • @HomunMage
    @HomunMage 24 дня назад

    I love this paper. The solution is elegant.

  • @nerisozen9029
    @nerisozen9029 25 дней назад

    thanks! also very cute merch!

  • @proterotype
    @proterotype 28 дней назад

    I ran into trouble trying to Fine Tune a Swin model with LoRA. That type of model isn’t supported yet for LoRA. I wonder if it’ll be the same for GaLoRA

  • @DeepakKori-vn8zr
    @DeepakKori-vn8zr 28 дней назад

    OMG, such a amazing video to explain Positional Embedding....

  • @AaronNicholsonAI
    @AaronNicholsonAI Месяц назад

    Thank you so much! Super helpful.

  • @ravindrasharma85
    @ravindrasharma85 Месяц назад

    excellent explanation!

  • @vardhan254
    @vardhan254 Месяц назад

    congrats leititia!!

  • @ArnaldurBjarnason
    @ArnaldurBjarnason Месяц назад

    The size of a LoRA (on disk) is a fraction of the size of the model it's applied to. GaLore looses this benefit by updating the weights. If you have use for many fine tunes of a single model, two GaLore fine-tunes will take more space than ten LoRAs (depending on rank, to be fair). I assume they don't mention this very significant tradeoff, as you don't mention it in the video. That seems like a dishonest comparison, if that's the case.

    • @AICoffeeBreak
      @AICoffeeBreak Месяц назад

      Fair. It's just that HDD / SSD storage is not considered a bottleneck, while the size of the GPUs surely is. The first is cheap and abundant (terabytes), the second one is very expensive and limited (tens of gigabytes).

    • @vinno97
      @vinno97 14 дней назад

      @@AICoffeeBreak Minor nitpick: It also allows you to fit multiple LoRa's in GPU memory at the same time and thus run inference on different finetunes at the same time P.S. Congrats on the PhD!

  • @cosmic_reef_17
    @cosmic_reef_17 Месяц назад

    Great video!

  • @timothywcrane
    @timothywcrane Месяц назад

    Can info shed during the lossy compression process be set aside in a non memory fashion for retrieval? Thinking out loud. Not always helpful. But state mapping would be interesting in the training process as well as post.

  • @learnsomethingnew1651
    @learnsomethingnew1651 Месяц назад

    Great video! Got a follow up question: what kind of finetuning is the finetuning provided by the openai API, where it finetunes a model based on a training set of Q&A pairs provided by the user?

    • @AICoffeeBreak
      @AICoffeeBreak Месяц назад

      Ask Sam Altman? 😅 I didn't see any recent technical paper about fine-tuning from OpenAI, nor do they explain on their website what they do. They are too open for us to comprehend. Since they have large GPUs, it is safe to assume they are not forced to do parameter-efficient tuning like us noobs with Gaming GPUs.

  • @alexis-michelmugabushaka2297
    @alexis-michelmugabushaka2297 Месяц назад

    Congratulations Frau Doktor.

  • @SU3D3
    @SU3D3 Месяц назад

    😘 LOL! 礼 Stay for the EPIC lipstick! M4D L0V3!

  • @AaronALAI
    @AaronALAI Месяц назад

    Holy frick what a perfectly concise video on galore! There is a GitHub implementation of the research from the paper, they are currently working on a multigpu implementation. I too am curious how well things scale up to modern and larger llms, and have a multigpu rig I want to test it out on.

  • @shahidjabbar5933
    @shahidjabbar5933 Месяц назад

    Congratulations

  • @IxCIHAoX
    @IxCIHAoX Месяц назад

    As far as i get it, i determine the gradients G and and also a low rank component P. That component allows me to „shrink“ the gradient matrix G i calculate at every step to down to R before applying it to W. So i do not save compute while calculating the gradients, but while applying and saving (as momentum or such) them?

    • @elinetshaaf75
      @elinetshaaf75 Месяц назад

      underwhelmed?

    • @IxCIHAoX
      @IxCIHAoX Месяц назад

      @@elinetshaaf75 on the contrary, i am afraid that i don’t quite get it😅

  • @MechanicumMinds
    @MechanicumMinds Месяц назад

    I never thought I'd say this, but I'm actually excited to learn about efficient training methods for deep learning models on consumer GPUs. Who knew running out of GPU memory could be so... enlightening? Thanks for explaining LoRA and Galore in a way that doesn't make my brain hurt (too much). Now, if you'll excuse me, I have some large language models to train or at least, try not to run out of GPU memory

  • @jiahao2709
    @jiahao2709 Месяц назад

    Thank you a lot for your explain! may i know which software you using for making animation in your video? thanks!

    • @AICoffeeBreak
      @AICoffeeBreak Месяц назад

      My editor uses Adobe Premiere Pro for video editing (this is also when Ms. Coffee Bean comes in). I use PowerPoint for all other visualisations.

    • @jiahao2709
      @jiahao2709 Месяц назад

      @@AICoffeeBreak really beautiful!

  • @JosephCatrambone
    @JosephCatrambone Месяц назад

    Congrats on the PhD! :D

  • @vladimirnadvornik8254
    @vladimirnadvornik8254 Месяц назад

    Do I understand it correctly that it can't work with quantization and the model must fit in memory in 16bit?

  • @TheRyulord
    @TheRyulord Месяц назад

    One important thing to note is that while this technique is more memory efficient it's also 17% slower in the setup the authors use. That's a pretty big deal, especially for pretraining.

    • @tornyu
      @tornyu Месяц назад

      But since it's more memory efficient, could you run more training in parallel, in a distributed or federated way?

    • @TheRyulord
      @TheRyulord Месяц назад

      @@tornyu Sure but it's still 17% more compute and that means 17% more money. With LLMs costing millions to train from scratch 17% more money is a big ask for pretraining.

    • @tornyu
      @tornyu Месяц назад

      But federated training could enable larger open source models, which until now could only be fine-tuned?

  • @amitgalor
    @amitgalor Месяц назад

    I always liked Galor(e), though I might be biased.

  • @azertyQ
    @azertyQ Месяц назад

    Congrats on your PhD! Could you use 'r' as a hyperparameter during pretraining as well? e.g. start pretraining with low r and gradually increase it as more precision is needed? I don't think it could do that much since gains are already very high at the start.