Knowledge Science - Alles über KI, ML und NLP

Episode 149 - English AI generated : KS Pulse - Theorem Proving, LongEmbed

April 23, 2024 Sigurd Schacht, Carsten Lanquillon Season 1 Episode 149
Knowledge Science - Alles über KI, ML und NLP
Episode 149 - English AI generated : KS Pulse - Theorem Proving, LongEmbed
Show Notes Transcript

Englisch Version - The German Version also exists but content differ minimal:
AI-generated News of the Day. The Pulse is an experiment to see if it is interesting to get the latest news in 5 min. small packages generated by an AI every day.

It is completely AI-generated. Only the content is curated. Carsten and I select suitable news items. After that, both the manuscript and the audio file are created completely automatically.

Accordingly, we cannot always guarantee accuracy.

Topic 1: Towards Large Language Models as Copilots for Theorem Proving in Lean https://arxiv.org/pdf/2404.12534.pdf
Topic 2: LongEmbed: Extending Embedding Models for Long Context Retrieval https://arxiv.org/abs/2404.12096

It would be great if you compare the German to the English version and give us feedback.

Support the Show.

Welcome to the Knowledge Science Pulse podcast, where we discuss the latest advancements in AI and machine learning. I'm your host Sigurd, and today I'm joined by my co-host Carsten. We have an exciting episode for you today, delving into three fascinating research papers that explore the intersection of large language models, theorem proving, and embedding models for long context retrieval. Carsten, let's dive right in!

#### Absolutely, Sigurd! The first paper we'll discuss is "Towards Large Language Models as Copilots for Theorem Proving in Lean" by Song et al. This work introduces Lean Copilot, a framework for running large language model inference in the Lean theorem prover. Instead of proving theorems autonomously, the authors propose using LLMs as copilots to assist humans in the proving process.

#### That's a very interesting approach, Carsten. The paper showcases tools built using Lean Copilot for suggesting proof steps, completing intermediate proof goals, and selecting relevant premises. Users can bring their own pre-trained models or use the provided ones, running either locally or on the cloud. The experimental results demonstrate the effectiveness of this method in assisting humans and automating the theorem proving process compared to existing rule-based proof automation in Lean.

#### Indeed, Sigurd. The authors also open-source their code under a permissive MIT license to facilitate further research in this area. Moving on to the second paper, "LONG EMBED: Extending Embedding Models for Long Context Retrieval" by Zhu et al., the authors explore extending the context window of embedding models for long input scenarios.

#### Yes, while large language models have surpassed the million-token context limit, embedding models are still confined to a narrow context window not exceeding 8k tokens. This paper introduces the LONGEMBED benchmark, comprising synthetic tasks and carefully chosen real-world tasks featuring documents of varying lengths and dispersed target information.

#### The benchmarking results underscore the need for improvement in current embedding models. The authors demonstrate that training-free context window extension strategies, such as position interpolation, can effectively extend the context window of existing embedding models by several folds, regardless of their original context being 512 or beyond 4k tokens.

#### Moreover, for models employing absolute position encoding (APE), the authors show the possibility of further fine-tuning to achieve notable performance gains while strictly preserving original behavior for short inputs. For models using rotary position embedding (RoPE), significant enhancements are observed when employing RoPE-specific methods, such as NTK and SelfExtend, indicating RoPE's superiority over APE for context window extension.

#### To facilitate future research, the authors release different models, along with the LONGEMBED benchmark. This work paves the way for developing embedding models capable of handling long context scenarios, opening up new possibilities for various applications.

#### Absolutely, Carsten. Both papers contribute significantly to the advancement of AI and machine learning, particularly in the areas of theorem proving and long context retrieval. The Lean Copilot framework demonstrates the potential for human-AI collaboration in interactive theorem proving, while the LONGEMBED benchmark and the context window extension strategies provide valuable insights for developing more capable embedding models.

#### Indeed, Sigurd. These papers not only push the boundaries of what's possible with large language models and embedding models but also highlight the importance of benchmarking and open-sourcing code to drive progress in the field. We're excited to see how these advancements will shape the future of AI and its applications.

#### Well, that wraps up our discussion for today. We hope you found this episode informative and engaging. Join us next time as we continue to explore the latest developments in AI and machine learning. Until then, stay curious and keep learning!