Stack Llama and Vicuna-13B Comparison
Stack Llama, available on the TRL Library, is a RLHF model that works well with logical tasks, similar to the performance of normal Vicuna-13B 1.1 in initial testing. However, it requires about 25.2GB of dedicated GPU VRAM and takes approximately 12 seconds to load.
The Stack Llama model was trained using the StableLM training method, which aims to improve the stability of the model's training and make it more robust to the effects of noisy data. The model was also trained on a diverse set of tasks, including summarization, question answering, and text generation, making it more versatile than previous models.
In comparison to other models, TheBloke_stable-vicuna-13B-HF and eachadea_vicuna-13b-1.1 are two models with low perplexity that have been tested and shown in comparison charts. The former has a stable version and is available on Hugging Face. The latter has been tested against other models and the results are available here.
Open source in this context means that the code is available for review, use, and modification by anyone, without requiring payment or license fees. However, the Llama models are not completely free as they require a large amount of computing resources to run.
Delta weights are weights that have been trained on additional data and can be combined with original Llama weights. It is unclear from the comments if HF version is necessary for the combination. The commercial usefulness of Llama models is debated.
There is enthusiasm for the StableLM releases and hope for future initiatives.
For more information on Stack Llama, refer to the Hugging Face blog post.