diff --git a/opsec/openwebuilocalllms/index.html b/opsec/openwebuilocalllms/index.html index 457490c..95f5fe5 100644 --- a/opsec/openwebuilocalllms/index.html +++ b/opsec/openwebuilocalllms/index.html @@ -135,7 +135,12 @@ There're several such services including ppq.ai,

Open LLMs Primer

-Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties. It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs. +Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties.

+ + + +


+It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs.

Parameter Count
@@ -150,8 +155,7 @@ Of course using Q4_0 will introduce some rounding error in quantization step, bu -

-
+


I highlighted the Q4_K_S and Q4_K_M quantization methods since they're usually offer the best balance between model size and accuracy. They usually use a bit more than 4 bits per parameter, but have better precision than plain Q4_0. If you're pulling model from ollama without specifying the precision, there's a high chance that you'll get Q4_K_M variant since it has been the default for some time.

@@ -186,9 +190,9 @@ We'll show how to check prompt length and set appropriate context size in Open W
-

Uses of local AI

+

Use-Cases

-

Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy concious people.
However there are some good use-cases even for privacy and anonymity. We already discussed how
stylometry protection can be achieved with local AI.

+

Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy conscious people.
However when it comes to self-hosted models, there are some good use-cases even for privacy and anonymity. We already discussed how stylometry protection can be achieved with an LLM running locally.

Translation - LLMs provide high-quality, real-time translations, allowing for communication across languages without external data leaks.
Rewriting - They assist in paraphrasing content to protect against stylometry or improving the flow.
Solving Problems - LLMs can be used as personal assistants to answer every day questions and help with personal issues.
@@ -208,7 +212,7 @@ We'll show how to check prompt length and set appropriate context size in Open W

To follow this tutorial, you'll need a system running Debian 12. Although ollama can work on CPU only, the performance will be much worse than having model that fits in GPU's VRAM.
To comfortably use an 8B model, it's strongly advised to have a dedicated GPU with at least 6GB of VRAM. You can check the supported GPU models here.

This tutorial showcases ollama setup with Nvidia drivers, but AMD GPUs are also supported.

-

If you want to expose Open WebUI via Tor to access it remotely, you should have a hidden service setup.

+

If you want to expose Open WebUI via Tor to access it remotely, you should have an onion v3 vanity address and Tor installed.

It's also possible to set this up inside a Proxmox VE or any KVM based VM. You just need to PCI passthrough appropriate GPU inside the Hardware tab:

@@ -222,11 +226,11 @@ To comfortably use an 8B model, it's strongly advised to have a dedicated GPU wi

Docker Setup

To install Docker, follow the official guide: Install Docker Engine on Debian. After installation, add your user to the docker group:

-
oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
+
oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
 oxeo@andromeda:~$ sudo systemctl enable docker
 

This ensures you can manage Docker without needing sudo privileges. Finally, reboot your system.

-
oxeo@andromeda:~$ sudo systemctl reboot
+
oxeo@andromeda:~$ sudo systemctl reboot
 
@@ -238,19 +242,19 @@ oxeo@andromeda:~$ sudo systemctl enable docker

Nvidia Driver and Container Toolkit

-

Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:

-
deb http://deb.debian.org/debian/ bookworm main contrib non-free
+

Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:

+
deb http://deb.debian.org/debian/ bookworm main contrib non-free
 deb-src http://deb.debian.org/debian/ bookworm main contrib non-free
 
 deb http://security.debian.org/debian-security bookworm-security main contrib non-free
 deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free
 

Run:

-
oxeo@andromeda:~$ sudo apt update
+
oxeo@andromeda:~$ sudo apt update
 oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-nonfree
 

To verify installation, execute:

-
oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+
oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
 

This command checks if the NVIDIA driver is accessible within Docker.

@@ -265,7 +269,7 @@ oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-non

Open WebUI Docker Stack

Create a docker-compose.yml file in ~/openwebui-stack with the following contents. This setup uses ollama for LLM management and open-webui as the user interface.

-
services:
+
services:
   ollama:
     image: ollama/ollama
     container_name: ollama
@@ -306,8 +310,8 @@ volumes:
   open-webui: {}
 

To start the stack:

-
cd ~/openwebui-stack
-docker compose up -d
+
oxeo@andromeda:~$ cd ~/openwebui-stack
+oxeo@andromeda:~$ docker compose up -d
 
@@ -320,13 +324,11 @@ docker compose up -d

Exposing Hidden Service

To expose open-webui via Tor, edit your torrc file:

-
SocksPort 9050
-HiddenServiceDir /var/lib/tor/hidden_service/
-HiddenServiceVersion 3
+
HiddenServiceDir /var/lib/tor/hidden_service/
 HiddenServicePort 80 127.0.0.1:3000
 

Restart Tor and check the generated hostname:

-
sudo systemctl restart tor
+
sudo systemctl restart tor
 cat /var/lib/tor/hidden_service/hostname