Automated Reasoning with Language Models
Automated reasoning with language models is a fascinating field that can test reasoning skills. Recently, a model named Supercot showed accidental proficiency in prose/story creation. However, it's essential to use original riddles or modify existing ones to ensure that the models are reasoning and not merely spewing out existing knowledge on the web.
Several models have been tested in a series of reasoning tasks, and Vicuna-1.1-Free-V4.3-13B-ggml-q5_1 has been tested among others. It performed well, except for two coding points. Koala performed slightly better than Vicuna-1.1-Free-V4.3-13B-ggml-q5_1 when these two coding points were removed. Stable Vicuna and Open Assistant did not perform as well as expected, even though Open Assistant was partly trained on a data set for reasoning tasks.
The tests involved solving riddles, which is an effective way to test reasoning skills. However, automating the process is challenging as the answers are not always in the correct format. The suggested approach is to create a script to automate the process and test different parameters.
The experiment showed that WizardLM performed better than expected, and it managed to beat all its same parameter peers, except for the 13b wizard model. The correct answer to the riddle about David's brothers is that David has zero brothers, as each of his sisters has one brother.
If you're interested in learning more about automated reasoning, you can check out this article on Automated Reasoning.