LLM4All

Large Language Models (LLM) of sufficient size exhibit emergent abilities, such as learning from their input context and decomposing a complex problem into a chain of simpler steps. These emergent abilities together with overall model performance suggest that model size plays a key role in creating a powerful language model. The LLM4all project focuses on resolving two outstanding problems for such LLMs.

First, it addresses the question of how to take a foundation model and continue its training in a way that avoids the well-known problem of catastrophic forgetting in which the original model loses some of its knowledge and abilities. Such continual pretraining would bypass the need to train a model from scratch using both the old and new data—a process that is so costly that even high-profile commercial models tend to not be trained on the most recent data. LLM4all will develop an approach to continual pretraining by, for instance, combining neural networks with sparse architectures.

Second, the project aims to reduce the cost of processing these large LLMs and will explore a variety of potential solutions depending on the target use case. These include optimizing algorithms that trade speed for memory, voluntary collaborative computing that distributes the computation load onto several nodes, sparse approaches such as Mixture of Experts, or distillation approaches when the task is known.

Beyond releasing generic, multimodal LLMs, these approaches will also be validated on two use cases in French: automatic meeting summarization and emergency call analysis.

ANR project

1 october, 2023 – 31 March, 2027

Lead: LORIA