Anyone who has used generative AI for any length of time will be more than familiar with hallucinations. This is the case when AI systems generate false or misleading information, an error that often stems from limitations in their training data or model design. Such inaccuracies can emerge unpredictably and vary widely in severity – from minor errors to substantial biases that can significantly skew decision-making processes.
Lamini Memory Tuning aims to significantly reduce hallucinations, from 50% to 5%, a 90% reduction. The technology allows for the embedding of exact facts into LLMs, reportedly achieving accuracy rates of up to 95%, a significant jump from the 50% accuracy offered by previous methods.
By specifically tuning millions of expert adapters, such as LoRAs (Low-Rank Adaptions) to each open-source LLM, Lamini Memory Tuning accurately preserves facts ranging from historical events to complex technical data, without the high latency and costs typically associated with accompanied by such precision.
Mixture of memory experts
This method, inspired by mind mapping, selectively activates the most relevant experts from an index during inference, dramatically reducing unnecessary computations.
For example, the company says that when the system is instructed to recall specific facts about the Roman Empire, the system retrieves only the necessary information about Julius Caesar, aqueducts or legions, avoiding the activation of irrelevant model weights.
The underlying technology behind Lamini Memory Tuning includes a sparse activation framework known as a Mixture of Memory Experts (MoME), which can scale to support a large number of facts limited only by the size of the training data. Lamini says this approach not only improves model responsiveness, but also significantly reduces computational requirements, making it a viable solution for improving the performance of LLMs in various applications.