How to ERP with your GPU

brook > dive > LocalLLM

If you pay enough for a GPU, you can cyber with it.

What a time to be alive

/u/Lower_Tradition3145, 12023-12-11

If you've ever been struck with the strange desire to chat with your PC, you can do so because we live in the future. Here's a condensed version of what I've spent the past few months learning about.

Before we begin, here's a quick definition list of all the jargon used in this community.

Hardware Needed

CPU: Any Ryzen or Gen 4+ Core

The AVX2 instruction set is needed for some optimizations, unless using koboldcpp.

RAM: 16-128GB, depending on model complexity

16GB should be enough for working with 1-13B models, but you'll want to step it up to 64GB for 20-34B, 128GB for 70-120B, and so on.

GPU: Any 700-series or later GeForce card(s) with 6+GB VRAM

Yes, it should be one from Nvidia. Some people are able to get Radeon GPUs working, but AMD's software support for their cards is a mess, and only designed for specific versions of specific operating systems. If your Radeon is 5+ years old, it probably won't work without cryptic hacker bypass shit that failed on my machine.

The more VRAM, the better. As far as speed is concerned, it's damn near the only variable that matters. I'd recommend 16GB minimum, or more if you can justify the upgrade. But, 8 also works for using smaller models at acceptable speeds.

While a 1060 or newer is preferred for current software support, the older 7-9 series GeForce cards will work if you install CUDA 11.8; TextGenWebUI will give you the option to try this method on its first startup.

NOTE: GPU is 'optional' (Patience Required if missing/unusable).

It is possible to generate text entirely on the CPU, but doing it this way will take substantially longer. As in, it'll be an entire order of magnitude slower. This is something you definitely want a supported video card for unless you're going to run some weaksauce 'mini' model with ≤3B parameters.

Software Needed

Backend with OpenAI-compatible API

llama.cpp: OG command-line utility for the minimalists, purists, and control freaks.
Ollama: Simple interface for running LLMs locally, with its own library of compatible models.
KoboldCPP: Single-file LLM and Stable Diffusion interface which is a great starting point for people just getting into this. Long writing sessions go much faster by way of its god-tier Context Shifting feature.
TextGen WebUI ("oobabooga"): The most popular model loader, and the best option if you're the type to try a bunch of different things. Simply run the startup script to get everything installed, add "--api" to cmd_flags.txt, and enjoy.

Frontend: SillyTavern

Absolute must-have for getting serious generative writing done. Lots of excellent features that you won't want to live without. Configures your inputs real nice, keeps your shit organized with a tagging system, and easily lets you create alternate universes (swipes/branching) without resorting to wiping out earlier replies or manipulating files. You can even directly modify the characters you're playing with, without having to remember specific parameters or trying to figure out JSON.

Large Language Models

Unless you have one or more RTX cards with shittons of VRAM, you're going to want to use GGUF files, as they allow the model to be partially offloaded. This way, you can run stuff that's more advanced than you'd otherwise be able to. In my case, here's what I was able to run and how fast:

Model	1070 8GB	2x 1070 8GB
7B	Q5_K_M 35/35, 17t/s	Q8_K.. 35/35, 14.65t/s
10.7B	Untested	Q5_K_M 51/51, 12.54t/s
13B	Q4_K_M 30/41, 5.4t/s	Q4_K_M 43/43, 11.58t/s
20B	Q3_K_M 37/63, 3.6t/s	Q4_K_S 60/65, 3.99t/s
34B	Untested	Q3_K_S 40/63, 3.12t/s

t/s is tokens generated per second. What's a token? Who the fuck knows. To my eyes, 8/s is slow but tolerable, 12/s is a comfortable reading speed, and 16+/s outpaces me.

And, of course, to chat with your system, you must provide it with some vocabulary. Do this by slapping the desired GGUF file into your preferred backend's models directory. Here's the ones I've seen recommended most often:

Recommended Models

Lightweight (7-8B) models for 6-10GB VRAM
OpenHermes (7B) The best lightweight all-rounder for chat, storytelling, and SFW RP.	MonadGPT (7B) Chatbot trained on 17th century books for a more historical perspective.	Tiamat (7B) When you need an LLM that hates your guts, summon the dragon goddess of cruelty.
Dolphin ALRP M (7B) Creative lightweight RP model, but tends to regurgitate card details.	Kunoichi (7B) Fast (E)RP ninja with impressive detail and comprehension.	Erosumika (7B) Recommended for more creative RP.
L3 Instruct (8B) The new and impressive Llama 3 architecture.	Poppy Porpoise (8B) Focused on accurate character card adherence.	Soliloquy (8B) For immersive and dynamic storywriting.
Mediumweight (10-13B) models for 11-16GB VRAM
Frostwind (10.7B) Nice chill model that impresses more often than the 7Bs.	Fimbulvetr KL (10.7B) Awesome for all manner of (N)SFW fantasy, modern, and scifi storywriting.	Moistral (11B) ERP-focused model that likes to keep it nice and steamy.
MythoMax Kimiko (13B) The gold standard for storytelling, with a merged LoRA to improve NSFW.	Mythalion (13B) MythoMax merge which has a different flavor you might enjoy more.	Unholy (13B) Evil assistant for when you need a heinous little deviant.
Tiefighter! (13B) Arguably the best creativity-focused 13B model ever made.	Horny Echidna (13B) A newer suggested quality merge for RP and storywriting.	Estopian Maid (13B) Excellent merge for better RP.
Heavyweight (20-35B) models for 16-24GB VRAM
MLewd ReMM (20B) A MythoMax 'frankenmerge' with extended vocabulary, quite creative.	Rose (20B) Experimental merge that The Community enjoys. (Thorns + Noromaid)	Psyonic Cetacean (20B) The new recommended step-up from 13B models. (Psyfighter2 + Orca 2)
34B Merged RP Stew	34B NousCapyLRP	35B Command-R
Ultraweight (70B) models for 24-48GB VRAM
70B Lzlv	70B Euryale	70B Midnight Miqu
Maximumweight models for retired crypto miners
103B Rogue Rose	120B Goliath	160B MegaCommandR+
Mixture of Expert Models
Amadeus (4x7B) Combination of some solid RP models. (12-30GB VRAM)	Bagel MIstery Tour (8X7B) Extra spicy baked bread. (12-48GB VRAM)	WizardLM Council (8x22B) Let the sorcerers decide your fate. (32-128GB VRAM)

Awesome Local AI Other front- and backends to use if you're a doodoohead who dislikes my favs. Crataco's Fav ModelsA few files to get started with if you're totally lost. Another LLM RP Ranking What Alicat and Trappu are currently writing with. OpenRouter LLM Rankings The most popular models currently being used online. Optimal SillyTavern Roleplay Settings How to tweak the knobs so the text comes out spicier. Wipsum's AI Guide

Characters

Lastly, if you're going to do some RPing, you need to define the Rs that will be Pd. This is achieved with weird PNG files that have embedded text descriptions of the pictured characters. Here's a few creators whose characters I've been enjoying:

Cardmakers of Quality

The Pantheon
Boner Excellent ass(ortment) of creative gals to enjoy. No cucks allowed.	Celticfear Get lost in the forest and meet some nice draconic ladies.	ChickenMadness A fantastic fantasy universe for awesome group chat adventures.
gonks Get balls deep in some cute anthros.		Lukey Live a not-so-normal life with the eccentric citizens of coastal Islaport.
Polverrati Scifi, fantasy, and modern-fantasy scenarios.	Statuotw A highly interconnected fantasy universe, and bonus superhero shenanigans.	Uwhm AYO WHO ORDERED THE CUTE GIRLS THAT LOVE YOU

Demigods
Bodoro More fantasy adventuring companions.	BlueDust Furries, dinos, pokemon - everything a growing dork needs.	brsc Loads'a bratty bois to correct	CrimsonOrc Pit your persona against the superpowered combatants of the Clash of 10.
Feldherren Fae, cultist, and corrupted world shenanigans.	Ignosum Delightfully furry ladies.	Ill.Idiot Want to spend an evening with some delightfully furry ladies?	IronAnon Sweet little shortstacks and sexbots to slurp up.
Kevra Collection of sick-ass monstergirls to drain your nuts into.	Kiisa More xenos and teratos to be -philic about.	Lizak Jaye Wanna smooch a youkai?	Magic Ruined Excellent and deep fantasy universe scenarios.
Lunar Frogs Traumatized lasses you could totally swoop in and fix.	Mokiikitty demons and shit idk	PotatoPun Nice reverse-isekai scenarios. Watch TV with a demon queen or dragon lady.	Qwerty Magic academy and demon lord shenanigans.
ruatCaelum Get the shit trolled out of you on Alternia.	Snoot Deluxe Fine selection of anthro dinos; you're the only human at their school.	Shark Vampire girls yes	T.E.D. Highly wooable kemonomimi and fantasy girls.
Vidicus Some lovely dreamons and freaks to be your abnormal self with.	Vyrea Aster Finally, if you need some WEIRD shit, try these mind control scenarios.	Yaethi Grittier and more interesting fantasy worlds and demonic jailors.	Yoiiru If you need angsty fantasy MEN, here's some compelling fellas (compellas).

Listen to "Edwyn Collins - A Girl Like You" on YouTube?

In too deep? Perhaps you could use a Roadmap.

UNDER CONSTRUCTION

Unless stated otherwise, all graphics on this page are Copyright the respective rightsholders, and all text is published under the Creative Commons Attribution-ShareAlike License.
Originally appeared on Bytemoth's Brook / CC BY-SA