248
227 466

8:55

GraphReader: RAG with multi-step reasoning over graphs

7:11

Studying GSM8K Leaderboard

8:17

Think-and-execute prompting for LLMs

7:37

AutoGen: Programming LLM Agents

9:17

Fine-tuning LLMs encourages hallucinations

5:01

LLM Agents beat Human Debaters

arxiv.org/abs/2408.04472
github.com/ZhangYiqun018/agent-for-debate
00:00 Introduction to LLM Agent Systems for Debating
00:44 Overview of Competitive Debating Structure
02:01 Four Agents in the Debating System
02:52 The Searcher Agent
03:08 The Analyzer Agent
03:54 The Writer Agent
04:12 The Reviewer/Critic Agent
05:43 Evaluating the Debating System
06:54 Comparison with Baseline and Human Evaluators
07:31 Performance Results: Debatrix Evaluation
08:27 Performance Results: Human Evaluation
09:06 GitHub Repository and Prompts

Відео

8:55

Laypeople cannot prompt LLMs

Переглядів 1,4 тис.14 днів тому

dl.acm.org/doi/10.1145/3544548.3581388 0:00 - Introduction to Large Language Models and Prompting 2:00 - Overview of the Prompting Tool Used in the Study 4:45 - Study Results: Challenges in Prompting for Non-Experts 6:49 - Fundamental Barriers and Evaluation Issues 8:10 - Conclusion: Difficulties of Prompting for General Users

GraphReader: RAG with multi-step reasoning over graphs

7:11

GraphReader: RAG with multi-step reasoning over graphs

Переглядів 651Місяць тому

arxiv.org/abs/2406.14550v1 Previous video on GraphRAG: ua-cam.com/video/ODomovYfI6I/v-deo.html 00:00 Introduction and Background 00:31 Overview of Graph Reader System 00:55 Advantages over Traditional Models 01:36 Graph Construction Process 02:08 Query and Reasoning Workflow 03:32 Evaluation and Results 05:06 Importance of Knowledge Graph Quality 05:28 Prompts for Graph Reader System 06:20 Comp...

8:17

Studying GSM8K Leaderboard

Переглядів 198Місяць тому

paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k arxiv.org/pdf/2404.14963v3.pdf arxiv.org/pdf/2308.07921v1.pdf 0:00 Introduction to GSM 8K Benchmark 0:47 Flattening of Scores in Recent Years 1:20 Top Approach: "Deeply Understanding the Problem" 1:45 Three-Step Problem-Solving Method 3:08 Comparison to Zero-Shot Chain of Thought 3:39 Second Top Approach: Code Generation 4:15 Code Interprete...

7:37

Think-and-execute prompting for LLMs

Переглядів 316Місяць тому

arxiv.org/abs/2404.02575 00:00 Introduction 00:18 New Prompting Method for LLMs 00:37 Overview of Chain of Thought and Program of Thought 01:19 Think and Execute Approach Explained 02:06 Detailed Steps of Think and Execute 03:10 Instructor and Reasoner Models 04:02 Creating Pseudo Code in Think Phase 05:00 Experiment Results with Different Models 06:07 Benefits of Code Pre-trained Models 06:33 ...

9:17

AutoGen: Programming LLM Agents

Переглядів 3372 місяці тому

github.com/microsoft/autogen openreview.net/pdf?id=uAjxFFing2 0:00 Introduction to Agents and LLMs 0:29 Understanding the Need for Agents 1:28 Overview of the Autogen Framework 2:05 Why Agents Work with LLMs 3:17 Autogen Structure and Abstractions 4:27 Example: Math Problem Solving Agents 6:25 Example: Retrieval Augmented Q&A 8:59 Conclusion and Wrap-up vivekhaldar.com x.com/vivekhaldar

Fine-tuning LLMs encourages hallucinations

5:01

Fine-tuning LLMs encourages hallucinations

Переглядів 3172 місяці тому

arxiv.org/abs//2405.05904 0:00 Introduction and recap of previous paper 0:29 Fine-tuning LLMs can lead to hallucination 1:18 Constructing an experiment to test the conjecture 1:51 Categorizing knowledge into four categories 2:59 Fine-tuning with different percentages of unknown examples 3:31 Impact of unknown items on fine-tuning accuracy 4:02 Fine-tuning improves utilization of pre-existing kn...

8:05

Fine-tuning or RAG?

Переглядів 8312 місяці тому

arxiv.org/abs/2312.05934 0:00 Comparing Fine-tuning and Retrieval Augmented Generation 0:34 Using LLMs for Specialized Domains 1:13 Fine-tuning vs In-context Learning Techniques 2:23 Causes of LLM Factual Errors and Hallucinations 3:50 Constructing the Experiment Dataset 4:45 Models Tested and Accuracy Comparison 5:51 RAG Outperforms Fine-tuning Across Models 6:20 Why RAG Performs Better Than F...

15:04

Fixing RAG with GraphRAG

Переглядів 7 тис.3 місяці тому

arxiv.org/abs/2404.16130 0:00 Introduction to RAG and its Limitations 1:08 Sense-Making and Graph-Based Approaches to RAG 2:27 Overview of the Graph RAG Pipeline 4:19 Extracting Concepts and Relationships from Documents 5:07 Summarizing Graph Elements and Clustering into Communities 6:32 Answering Queries with Graph RAG 8:58 Evaluating Graph RAG: Datasets and Question Generation 10:42 Comparing...

LLMs improve writing-based knowledge work

6:59

LLMs improve writing-based knowledge work

Переглядів 2823 місяці тому

0:00 Introduction: Impact of LLMs on Knowledge Workers 0:33 Experiment Setup: Professionals & Writing Tasks 1:02 Results Overview: Positive Effects of LLMs 2:33 Detailed Results: Time & Grade Improvements 3:29 AI Impact: Lower vs Higher Performers 4:48 Time Allocation: Shifting to Editing with LLMs 5:35 Job Satisfaction: Increased with LLM Use 5:42 Summary of Benefits: Quality & Speed Improveme...

13:06

Co-intelligence: book review

Переглядів 4483 місяці тому

www.oneusefulthing.org/ www.hbs.edu/ris/Publication Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf ua-cam.com/video/ogQbgdZQiaI/v-deo.html www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html RobertRMorris/status/1611450197707464706 replika.com/ ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf ua-cam.com/video/bJIhXrfOH58/v-deo.html studio.ribbonf...

Winning prompt! $10k LLM reasoning challenge

10:09

Winning prompt! $10k LLM reasoning challenge

Переглядів 5184 місяці тому

0:00 Introduction to the AB Problem Challenge 0:29 Overview of the Winning Prompt 1:52 Detailed Mechanics and Problem-Solving Steps 3:43 Few-Shot Prompting with Textual Solution Format 4:53 Key Lessons from the Winning Prompt 5:53 Emphasis on Repetition and Instruction Clarity 6:40 Trial and Error Process for Prompt Creation 7:31 Alternative Approach: LLM-Generated Code Solution 8:59 Concluding...

6:24

$10k for LLM reasoning

Переглядів 8974 місяці тому

0:00 Introduction and Problem Description 0:23 LLM Reasoning Challenge 0:51 Details of the $10,000 Challenge 1:15 Internet Takes Up the Challenge 2:28 Winning Entry and Success Rates 3:25 LLMs Can Do Reasoning with Prompting 4:10 Benchmarking LLM Reasoning Capabilities 5:07 Boundaries of LLM Reasoning Unclear 5:36 Claude Opus Outperforms GPT-4 6:08 Conclusion and Future Video Plans Original cla...

7:40

LLM agents do software engineering

Переглядів 7024 місяці тому

0:00 Introduction to Autodev and Agent-based LLM Systems 0:48 Limitations of Co-pilot and the Need for Automation 1:10 Autodev Architecture Overview 2:01 The Role of the Conversation Manager and Agents 3:22 Demonstrating Autodev's End-to-End Flow 5:03 Comparing Autodev's Performance to GPT-4 Baseline 6:03 Autodev's Performance on the Human Eval Benchmark 6:37 Autodev's Performance on Test Gener...

11:02

LLM benchmarks

Переглядів 7774 місяці тому

How are LLMs evaluated? 00:00 - Introduction and motivation for looking at LLM benchmarks 00:38 HumanEval benchmark for code synthesis 02:27 - Exploring the HumanEval dataset 03:24 - MMLU (Massive Multitask Language Understanding) benchmark 04:37 - Exploring the MMLU dataset 05:58 - BigBench meta-benchmark with 200 tasks 06:50 - Exploring a logical reasoning task in BigBench 08:13 - BigBench Ha...

9:06

LLMs eat entry-level SWEs

Переглядів 1,1 тис.5 місяців тому

LLMs eat entry-level SWEs

8:54

LLMs can debug with prints

Переглядів 4715 місяців тому

LLMs can debug with prints

10:07

Determinism ⇒ Fast LLMs (Groq)

Переглядів 6055 місяців тому

Determinism ⇒ Fast LLMs (Groq)

Self-discovery: Choosing the Best Prompt for the Problem

4:42

Self-discovery: Choosing the Best Prompt for the Problem

Переглядів 3905 місяців тому

Self-discovery: Choosing the Best Prompt for the Problem

Asleep at the wheel: can AI reduce performance?

6:50

Asleep at the wheel: can AI reduce performance?

Переглядів 1295 місяців тому

Asleep at the wheel: can AI reduce performance?

The Future of RAG in the Age of Large Context Windows

5:41

The Future of RAG in the Age of Large Context Windows

Переглядів 6695 місяців тому

The Future of RAG in the Age of Large Context Windows

6:58

GPT-4 passed the Turing Test!

Переглядів 2,2 тис.6 місяців тому

GPT-4 passed the Turing Test!

CS higher ed in North America: all the stats you should know

5:35

CS higher ed in North America: all the stats you should know

Переглядів 1486 місяців тому

CS higher ed in North America: all the stats you should know

I remember @AndrejKarpathy 's deleted tweet

4:18

I remember @AndrejKarpathy 's deleted tweet

Переглядів 3836 місяців тому

I remember @AndrejKarpathy 's deleted tweet

9:28

LLMs for real world knowledge work

Переглядів 2676 місяців тому

LLMs for real world knowledge work

7:38

Watch me build a GPT for journaling

Переглядів 3037 місяців тому

Watch me build a GPT for journaling

8:30

LLMs can "breed" their own prompts

Переглядів 1,2 тис.7 місяців тому

LLMs can "breed" their own prompts

9:35

LLMs with infinite context?

Переглядів 7777 місяців тому

LLMs with infinite context?

Can prompt engineering beat fine-tuning?

9:44

Can prompt engineering beat fine-tuning?

Переглядів 7067 місяців тому

Can prompt engineering beat fine-tuning?

7:48

Can LLMs discover new math and CS?

Переглядів 1,1 тис.7 місяців тому

Can LLMs discover new math and CS?

КОМЕНТАРІ

@somanshukumar1344 2 дні тому
Always wanted to see this type of content
@rtos 13 днів тому
Unfortunately even the so called power users of LLM with their own UA-cam channels, always seem to have a small set of stock prompts, which get repeated with every new review. If LLMs were trained on these specific questions then they're going start appearing super intelligent! Things like 'why is the sky blue', or write a 'snake game in python' are hardly a test of machine intelligence, as all that is needed is to be trained in accurate code or factual data.
@matty-oz6yd 16 днів тому
I really value what you do my dude <3
@RAHUDAS 17 днів тому
Bot designer ltd crashed, I not able to access
@ashwinnair5803 17 днів тому
Why not just use RAPTOR instead?
@yiwensin5913 18 днів тому
Excellent! I didn't know you before and I just stumbled upon your video while searching for material on prompting LLMs (for a local LLM project). You now have a new sub :)
@VivekHaldar 17 днів тому
Welcome aboard!
@user-wr4yl7tx3w 18 днів тому
can you discuss DSPy and give your opinion on it given how it is related to prompting
@arthurdhonneur276 18 днів тому
Nice video thank you very much !
@Starhopp3r 26 днів тому
Excellent review! Thank you. I really enjoyed this book and have been recommending it to people in order to help them set expectations about “AI” without excessive optimism or pessimism. Currently reading Deep Utopia; hope to see your review soon!
@user-bw6oi5mf9y Місяць тому
I think the main problem is how they traverse the graph by asking LLM at each step. I'm not sure if this is feasible in production.
@sasha297603ha Місяць тому
Great paper, thanks for covering!
@user-wr4yl7tx3w Місяць тому
Excellent content. Well explained.
@PreetiGuptaAvril Місяць тому
Sir, Do you explore the vision domain as well or do you any such youtuber whom i can follow for paper understanding.
@themax2go Місяць тому
very well "ragged"... both on the local domain (details) and global domain (overview of pros-cons) 😉😎
@wayneqwele8847 Місяць тому
Thank you for the video, that was a great paper to go through. I find RAG research techniques have so much insight to how we can develop and identify our own cognitive impediments to our own judgement. The Comprehensiveness, Diversity of perspective, Empowerment and Directness is such a good mental model to use in our own human judgement.
@fintech1378 Місяць тому
super excellent video
@goelnikhils Місяць тому
Amazing explanation Vivek
@awakenwithoutcoffee Місяць тому
great presentation Vivek. Some questions: - is graphRAG production ready ? if not, would it be difficult to upgrade RAG methods once we are in production ? - is there a RAG provider/stack that you prefer ? (datastax, pinecone, weaviate + a bunch of others who are all competing for attention) - what are your thoughts on LangChain vs LangGraph ?
@christopherconyers767 Місяць тому
Awesome review - thanks for the great work!
@brandonheaton6197 Місяць тому
can you pontificate on the combination of upcoming transformer inference ASICs with deep agentic workflows employing GraphRAG style strategies? Seems like we will be close to our personal assistants writing a PhD thesis in the background whenever we ask a question. SOHU is reporting 500,000 tokens per second with Llama3 70B....
@RoulDukeGonzo Місяць тому
Seems clear that for 'current events' rag is going to win, but for broader, domain specific themes or logic, how does fine tuning stack up? E.g. create code using our internal suite of APIs... If context is big enough, icl should be fine, but rag may miss some key docs based on semantic similarity alone... I guess... I should write a paper 😂
@sasha297603ha Місяць тому
Very interesting papers, thanks for covering!
@stevenwatson2927 Місяць тому
It's surprising to see ChatGPT achieving below 99% when Wolfram Alpha can basically answer anything just by having specific knowledge. It's also surprising to see that "playing" with the word prompt does anything at all yet alone give a better result. It makes no sense especially when we can clearly see from the research the information entropy is basically the same between prompts with and without extra steps.
@therobotocracy Місяць тому
Is it flattening out because it maxes at %100?
@VivekHaldar Місяць тому
Yes, that too! People have started looking at harder benchmarks like GSM8k-Hard and MATH.
@karinlv890 Місяць тому
Thank you for saving my group meeting! Your video helps a lot!
@wanfuse Місяць тому
wouldn't it cut to the chase to train an llm on your own data? theres your graph use one of these OpenAI's GPT-3/4 Hugging Face Transformers (e.g., GPT-2, GPT-3 via third-party providers) Google's T5 (Text-to-Text Transfer Transformer) Meta's BART and BlenderBot Anthropic's Claude every week update the llm summarization is the death of real data, better off one level of summarization? Just a thought!
@mccleod6235 Місяць тому
Maybe you don't want to send all your valuable business data to third party companies.
@wanfuse Місяць тому
@@mccleod6235 thats true but its not necessary, there are models that are open source you can train air gapped from a jetson
@bohnohboh676 Місяць тому
"every week update the llm" yeah no way unless you have tons of cash, compute, and time
@wanfuse Місяць тому
maybe maybe not, let you know! your probably right, will see if my idea pans out
@rafikyahia7100 Місяць тому
Excellent content summarizing cutting edge approaches, thank you!
@sasha297603ha Місяць тому
Very interesting paper! Looks like team lead model and a bunch of juniors 😅 Thanks for covering!
@christopherd.winnan8701 Місяць тому
Are there any models where we can try this think and exe method for ourselves?
@VivekHaldar Місяць тому
As described in the paper, authors tried it with GPT-3.5, and Llama. They have prompts in the paper, you could try it with any LLM of your choice.
@vida91963 2 місяці тому
Nice presentation thank you!
@jordycollingwood 2 місяці тому
Really great explanation, I’m currently struggling to decide on my own KG structure for a 2000 medical pdf corpus, so this was very helpful
@awakenwithoutcoffee Місяць тому
same here brother. There are so many techniques, everyday I learn something new which is both good and terrifying ha. What stack are you thinking of using ? We are researching DataStax, Pinecone, Weaviate and are learning to build agents with LangGraph.
@kaixiliu7469 2 місяці тому
Thanks for sharing the review Vivek! Would you mind sharing your book list as well?
@VivekHaldar Місяць тому
Hey Kaixi! Don't have an explicit list, just pick up what looks interesting at the time... :-)
@btscheung 2 місяці тому
Really appreciate your in depth review of the book! This provides more thoughtful reading when I start the book.
@thankqwerty 2 місяці тому
Thanks for sharing the paper. In my experience with using Llama3-8B, in my benchmark dataset, I noticed that LLM has learned an incorrect fact or in contradiction with my application. I tried to clarify that in the prompt, but noticed the LLM is actually quite stubborn, and lead to quite fragile responses, i.e. the LLM sometimes get it right sometimes get it wrong with minimal changes in the prompt, could be as small as adding spaces. I wonder if you have come across similar situation or papers that discuss this behavior. Thanks.
@VivekHaldar 2 місяці тому
Yes that kind of brittleness is a common issue unfortunately.
@harivarsha4016 2 місяці тому
I love this kind of content, please never stop !!!
@atomwalk 2 місяці тому
Awesome work! Thanks🤗
@user-wr4yl7tx3w 2 місяці тому
More agent paper please. Thanks 😊
@willtipton1698 2 місяці тому
Nice video ty
@colinwar 2 місяці тому
You ask vanilla questions if you can't un-cloak a machine response!. The reasoning is not there with language models, how stupid are people to not be able to ask the right questions? I call lies on these claims. Show the test as proof, I doubt you can or will show the actual test. This is absurd.
@gilinachum 2 місяці тому
But why is the paper's fine tuning different than the original pre-training and alignment fine tuning that came before it. All expose the model to a mix of existing and new data...
@VivekHaldar 2 місяці тому
You are correct -- in principle fine-tuning works the same way as pre-training (updating weights), so FT can be thought of as continued PT. Difference is in data used. One will FT when they have a domain-specific set of data that's very different from the PT data.
@hosseinmohammadi4574 2 місяці тому
Interesting! Tnx
@sasha297603ha 2 місяці тому
Very interesting paper, thanks for covering!
@HampusAhlgren 2 місяці тому
Just wanted to say I really appreciate your videos. Everything is short and concise and I love that you’re always using papers as the foundation for the conclusions. Keep it up!
@VivekHaldar 2 місяці тому
Thanks for the kind words. That's the idea!
@dennyoviedo4102 2 місяці тому
Good brother 😊thanks for an excellent explanation , Peer 2 peer of BTC formula. I’ll eat this info into my brain 🧠 until my neurons starting a new circuit.😂.
@sasha297603ha 3 місяці тому
Very interesting paper, thanks for covering!
@MatySiman 3 місяці тому
Great video! Why the > 1 was a mistake? Didn't it return False as it should?
@MatySiman 3 місяці тому
More specifically, I find this sentence a bit weird: "This means that if any element has more than 1 duplicate , the function will return False . However , the task requires that if there are more than 1 duplicate of the same number , the function should return False ."
@MatySiman 2 місяці тому
@VivekHaldar
@christopherd.winnan8701 3 місяці тому
Does this also mean that experts in their field might choose to wait for improved AI abilities so that they can do more than just superficial improvements? I predict that we will see a tsunami of low quality generations followed by a true paradigm leap in terms of content.
@VivekHaldar 3 місяці тому
They don't need to wait. You can get pretty far before hitting the limits of current SOTA models. There is a tiny fraction of writers who are sought out for their unique voice. Everyone else is producing generic sounding copy, ripe for replacement.
@christopherd.winnan8701 3 місяці тому
@@VivekHaldar - I still cannot find a model that can handle more advanced tasks. Do you have any recs?
@VivekHaldar 3 місяці тому
@@christopherd.winnan8701 Example of an advanced task you see LLMs having trouble with? See recent videos (two weeks ago) on the channel about the $10k reasoning challenge for an example problem and resulting prompt that solved it.
@christopherd.winnan8701 3 місяці тому
@@VivekHaldar Is it an open access LLM?

Vivek Haldar

КОМЕНТАРІ