If you pay enough for a GPU, you can cyber with it.
What a time to be alive
/u/Lower_Tradition3145, 12023-12-11
If you've ever been struck with the strange desire to chat with your PC, you can do so because we live in the future. Here's a condensed version of what I've spent the past few months learning about.
Before we begin, here's a quick definition list of all the jargon used in this community.
Hardware Needed
- CPU: Any Ryzen or Gen 4+ Core
- The AVX2 instruction set is needed for some optimizations, unless using koboldcpp.
- RAM: 16-128GB, depending on model complexity
- 16GB should be enough for working with 1-13B models, but you'll want to step it up to 64GB for 20-34B, 128GB for 70-120B, and so on.
- GPU: Any 700-series or later GeForce card(s) with 6+GB VRAM
-
Yes, it should be one from Nvidia. Some people are able to get Radeon GPUs
working, but
AMD's software support for their cards is a mess, and only designed for
specific
versions of specific operating systems. If your Radeon is 5+ years old,
it probably won't work without cryptic hacker bypass shit that failed on my
machine.
The more VRAM, the better. As far as speed is concerned, it's damn near the only variable that matters. I'd recommend 16GB minimum, or more if you can justify the upgrade. But, 8 also works for using smaller models at acceptable speeds.
While a 1060 or newer is preferred for current software support, the older 7-9 series GeForce cards will work if you install CUDA 11.8; TextGenWebUI will give you the option to try this method on its first startup.
- NOTE: GPU is 'optional' (Patience Required if missing/unusable).
- It is possible to generate text entirely on the CPU, but doing it this way will take substantially longer. As in, it'll be an entire order of magnitude slower. This is something you definitely want a supported video card for unless you're going to run some weaksauce 'mini' model with ≤3B parameters.
Software Needed
Backend with OpenAI-compatible API
- llama.cpp
- OG command-line utility for the minimalists, purists, and control freaks.
- Ollama
- Simple interface for running LLMs locally, with its own library of compatible models.
- KoboldCPP
- Single-file LLM and Stable Diffusion interface which is a great starting point for people just getting into this. Long writing sessions go much faster by way of its god-tier Context Shifting feature.
- TextGen WebUI ("oobabooga")
- The most popular model loader, and the best option if you're the type to try a bunch of different things. Simply run the startup script to get everything installed, add "--api" to cmd_flags.txt, and enjoy.
Frontend: SillyTavern
Absolute must-have for getting serious generative writing done. Lots of excellent features that you won't want to live without. Configures your inputs real nice, keeps your shit organized with a tagging system, and easily lets you create alternate universes (swipes/branching) without resorting to wiping out earlier replies or manipulating files. You can even directly modify the characters you're playing with, without having to remember specific parameters or trying to figure out JSON.
Large Language Models
Unless you have one or more RTX cards with shittons of VRAM, you're going to want to use GGUF files, as they allow the model to be partially offloaded. This way, you can run stuff that's more advanced than you'd otherwise be able to. In my case, here's what I was able to run and how fast:
Model | 1070 8GB | 2x 1070 8GB |
---|---|---|
7B | Q5_K_M 35/35, 17t/s | Q8_K.. 35/35, 14.65t/s |
10.7B | Untested | Q5_K_M 51/51, 12.54t/s |
13B | Q4_K_M 30/41, 5.4t/s | Q4_K_M 43/43, 11.58t/s |
20B | Q3_K_M 37/63, 3.6t/s | Q4_K_S 60/65, 3.99t/s |
34B | Untested | Q3_K_S 40/63, 3.12t/s |
t/s is tokens generated per second. What's a token? Who the fuck knows. To my eyes, 8/s is slow but tolerable, 12/s is a comfortable reading speed, and 16+/s outpaces me.
And, of course, to chat with your system, you must provide it with some vocabulary. Do this by slapping the desired GGUF file into your preferred backend's models directory. Here's the ones I've seen recommended most often:
Characters
Lastly, if you're going to do some RPing, you need to define the Rs that will be Pd. This is achieved with weird PNG files that have embedded text descriptions of the pictured characters. Here's a few creators whose characters I've been enjoying:
Unless stated otherwise, all graphics on this page are Copyright the respective
rightsholders, and all text is published under the Creative Commons
Attribution-ShareAlike License.
Originally appeared
on Bytemoth's Brook / CC BY-SA
Please enjoy responsibly.