Running large language models (LLMs) on your local machine has become increasingly popular, offering privacy, offline access, and customization. Ollama is a ...
Turn it into an ik_llama.cpp k quant, and you should be able to squeeze even more out!
FYI you can find more models like this by looking up a base model (not the instruct) of interest, then clicking on the ‘finetunes’ category. For example:
One other thing. A lot of folks (like me) tend to use the base models, not instruct finetunes, in completion mode since they tend to be devoid of AI slop. But you have to prompt them different than a regular LLM: instead of multi turn conversation, you write out a starting block of text for them to ‘latch onto’, and get them to continue it from your cursor.
But prompt them right, and they will do literally whatever you want, devoid of any sycophancy or guardrails.
Mikupad is great for this since it also shows token probablities. So you can, for instance, click on a critial word, and see what ‘choices’ the LLM was considering internally as a set of branches, and regenerate from there.
Turn it into an ik_llama.cpp k quant, and you should be able to squeeze even more out!
FYI you can find more models like this by looking up a base model (not the instruct) of interest, then clicking on the ‘finetunes’ category. For example:
https://huggingface.co/models?other=base_model%3Afinetune%3AQwen%2FQwen3-30B-A3B-Base&sort=modified
https://huggingface.co/models?other=base_model%3Afinetune%3Amistralai%2FMistral-Small-24B-Base-2501&sort=modified
This one’s also the perfect size for you, but has no finetunes yet: https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Base-PT
One other thing. A lot of folks (like me) tend to use the base models, not instruct finetunes, in completion mode since they tend to be devoid of AI slop. But you have to prompt them different than a regular LLM: instead of multi turn conversation, you write out a starting block of text for them to ‘latch onto’, and get them to continue it from your cursor.
But prompt them right, and they will do literally whatever you want, devoid of any sycophancy or guardrails.
Mikupad is great for this since it also shows token probablities. So you can, for instance, click on a critial word, and see what ‘choices’ the LLM was considering internally as a set of branches, and regenerate from there.