-Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties. It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs. +Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties.
+ +
+It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs.
Parameter Count
@@ -150,8 +155,7 @@ Of course using Q4_0 will introduce some rounding error in quantization step, bu
-
-
+
I highlighted the Q4_K_S and Q4_K_M quantization methods since they're usually offer the best balance between model size and accuracy.
They usually use a bit more than 4 bits per parameter, but have better precision than plain Q4_0. If you're pulling model from ollama without specifying the precision, there's a high chance that you'll get Q4_K_M variant since it has been the default for some time.
Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy concious people.
However there are some good use-cases even for privacy and anonymity. We already discussed how stylometry protection can be achieved with local AI.
Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy conscious people.
However when it comes to self-hosted models, there are some good use-cases even for privacy and anonymity. We already discussed how stylometry protection can be achieved with an LLM running locally.
Translation - LLMs provide high-quality, real-time translations, allowing for communication across languages without external data leaks.
Rewriting - They assist in paraphrasing content to protect against stylometry or improving the flow.
Solving Problems - LLMs can be used as personal assistants to answer every day questions and help with personal issues.
@@ -208,7 +212,7 @@ We'll show how to check prompt length and set appropriate context size in Open W
To follow this tutorial, you'll need a system running Debian 12. Although ollama can work on CPU only, the performance will be much worse than having model that fits in GPU's VRAM.
To comfortably use an 8B model, it's strongly advised to have a dedicated GPU with at least 6GB of VRAM. You can check the supported GPU models here.
This tutorial showcases ollama setup with Nvidia drivers, but AMD GPUs are also supported.
-If you want to expose Open WebUI via Tor to access it remotely, you should have a hidden service setup.
+If you want to expose Open WebUI via Tor to access it remotely, you should have an onion v3 vanity address and Tor installed.
It's also possible to set this up inside a Proxmox VE or any KVM based VM. You just need to PCI passthrough appropriate GPU inside the Hardware tab:
To install Docker, follow the official guide: Install Docker Engine on Debian. After installation, add your user to the docker group:
-oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
+oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
oxeo@andromeda:~$ sudo systemctl enable docker
This ensures you can manage Docker without needing sudo privileges. Finally, reboot your system.
-oxeo@andromeda:~$ sudo systemctl reboot
+oxeo@andromeda:~$ sudo systemctl reboot
Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:
-deb http://deb.debian.org/debian/ bookworm main contrib non-free
+Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:
+deb http://deb.debian.org/debian/ bookworm main contrib non-free
deb-src http://deb.debian.org/debian/ bookworm main contrib non-free
deb http://security.debian.org/debian-security bookworm-security main contrib non-free
deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free
Run:
-oxeo@andromeda:~$ sudo apt update
+oxeo@andromeda:~$ sudo apt update
oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-nonfree
To verify installation, execute:
-oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
This command checks if the NVIDIA driver is accessible within Docker.
Create a docker-compose.yml file in ~/openwebui-stack with the following contents. This setup uses ollama for LLM management and open-webui as the user interface.
-services:
+services:
ollama:
image: ollama/ollama
container_name: ollama
@@ -306,8 +310,8 @@ volumes:
open-webui: {}
To start the stack:
-cd ~/openwebui-stack
-docker compose up -d
+oxeo@andromeda:~$ cd ~/openwebui-stack
+oxeo@andromeda:~$ docker compose up -d
To expose open-webui via Tor, edit your torrc file:
-SocksPort 9050
-HiddenServiceDir /var/lib/tor/hidden_service/
-HiddenServiceVersion 3
+HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 127.0.0.1:3000
Restart Tor and check the generated hostname:
-sudo systemctl restart tor
+sudo systemctl restart tor
cat /var/lib/tor/hidden_service/hostname