Transforming LLMs with Externalized World Knowledge
The concept of externalizing world knowledge to make language models more efficient has been gaining traction in the field of AI. Current LLMs are equipped with enormous amounts of data, but not all of it is useful or relevant. Therefore, it is important to offload the "facts" and allow LLMs to focus on language and reasoning skills. One potential solution is to use a vector database to store world knowledge.
However, some have questioned the feasibility of this approach, as it may not be possible to separate "common sense" from "language" in LLMs. Idiomatic language constructions, figures of speech, and metaphors all require a shared understanding of the world. Nevertheless, externalizing world knowledge could lead to fairer and more transparent language translation and LLM development.
While this idea is not new, it is similar to the RETRO project, which also uses a vector database to store world knowledge. The key difference is that the proposed approach focuses on separating language from external knowledge, while RETRO aims to use external knowledge to improve model performance.
Another consideration is the role of memorization in LLMs. While it can be useful, it is not always necessary and can lead to wasted memory. An idealized model would focus on computation with CoT reasoning rather than memorization, but it is uncertain whether current technology can achieve this level of reasoning without some degree of memorization.
Overall, the proposed approach has potential to transform LLMs and make them more efficient and transparent. However, it is important to consider the limitations and challenges of the approach and continue to explore ways to improve language models.
Tags: LLMs, AI, RETRO, language models, vector database, world knowledge, language translation, CoT reasoning, memorization, model performance.