diff --git a/graphs/.$openwebuilocalllms.drawio.bkp b/graphs/.$openwebuilocalllms.drawio.bkp new file mode 100644 index 0000000..f0ca867 --- /dev/null +++ b/graphs/.$openwebuilocalllms.drawio.bkp @@ -0,0 +1,713 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/graphs/openwebuilocalllms.drawio b/graphs/openwebuilocalllms.drawio new file mode 100644 index 0000000..0f0f894 --- /dev/null +++ b/graphs/openwebuilocalllms.drawio @@ -0,0 +1,722 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/opsec/openwebuilocalllms/0.png b/opsec/openwebuilocalllms/0.png new file mode 100644 index 0000000..ea3bb6d Binary files /dev/null and b/opsec/openwebuilocalllms/0.png differ diff --git a/opsec/openwebuilocalllms/1.png b/opsec/openwebuilocalllms/1.png new file mode 100644 index 0000000..7c60d39 Binary files /dev/null and b/opsec/openwebuilocalllms/1.png differ diff --git a/opsec/openwebuilocalllms/10.png b/opsec/openwebuilocalllms/10.png new file mode 100644 index 0000000..aacb0ae Binary files /dev/null and b/opsec/openwebuilocalllms/10.png differ diff --git a/opsec/openwebuilocalllms/11.png b/opsec/openwebuilocalllms/11.png new file mode 100644 index 0000000..d1137b8 Binary files /dev/null and b/opsec/openwebuilocalllms/11.png differ diff --git a/opsec/openwebuilocalllms/12.png b/opsec/openwebuilocalllms/12.png new file mode 100644 index 0000000..0c14beb Binary files /dev/null and b/opsec/openwebuilocalllms/12.png differ diff --git a/opsec/openwebuilocalllms/13.png b/opsec/openwebuilocalllms/13.png new file mode 100644 index 0000000..8f0fb37 Binary files /dev/null and b/opsec/openwebuilocalllms/13.png differ diff --git a/opsec/openwebuilocalllms/14.png b/opsec/openwebuilocalllms/14.png new file mode 100644 index 0000000..0e2d0c3 Binary files /dev/null and b/opsec/openwebuilocalllms/14.png differ diff --git a/opsec/openwebuilocalllms/15.png b/opsec/openwebuilocalllms/15.png new file mode 100644 index 0000000..f75e778 Binary files /dev/null and b/opsec/openwebuilocalllms/15.png differ diff --git a/opsec/openwebuilocalllms/16.png b/opsec/openwebuilocalllms/16.png new file mode 100644 index 0000000..133a86d Binary files /dev/null and b/opsec/openwebuilocalllms/16.png differ diff --git a/opsec/openwebuilocalllms/17.png b/opsec/openwebuilocalllms/17.png new file mode 100644 index 0000000..725ef2e Binary files /dev/null and b/opsec/openwebuilocalllms/17.png differ diff --git a/opsec/openwebuilocalllms/18.png b/opsec/openwebuilocalllms/18.png new file mode 100644 index 0000000..30a2dc8 Binary files /dev/null and b/opsec/openwebuilocalllms/18.png differ diff --git a/opsec/openwebuilocalllms/19.png b/opsec/openwebuilocalllms/19.png new file mode 100644 index 0000000..f4137f3 Binary files /dev/null and b/opsec/openwebuilocalllms/19.png differ diff --git a/opsec/openwebuilocalllms/2.png b/opsec/openwebuilocalllms/2.png new file mode 100644 index 0000000..44aeafd Binary files /dev/null and b/opsec/openwebuilocalllms/2.png differ diff --git a/opsec/openwebuilocalllms/20.png b/opsec/openwebuilocalllms/20.png new file mode 100644 index 0000000..8a3c8b9 Binary files /dev/null and b/opsec/openwebuilocalllms/20.png differ diff --git a/opsec/openwebuilocalllms/21.png b/opsec/openwebuilocalllms/21.png new file mode 100644 index 0000000..5de041a Binary files /dev/null and b/opsec/openwebuilocalllms/21.png differ diff --git a/opsec/openwebuilocalllms/22.png b/opsec/openwebuilocalllms/22.png new file mode 100644 index 0000000..86abfca Binary files /dev/null and b/opsec/openwebuilocalllms/22.png differ diff --git a/opsec/openwebuilocalllms/23.png b/opsec/openwebuilocalllms/23.png new file mode 100644 index 0000000..7734a35 Binary files /dev/null and b/opsec/openwebuilocalllms/23.png differ diff --git a/opsec/openwebuilocalllms/24.png b/opsec/openwebuilocalllms/24.png new file mode 100644 index 0000000..02c136a Binary files /dev/null and b/opsec/openwebuilocalllms/24.png differ diff --git a/opsec/openwebuilocalllms/25.png b/opsec/openwebuilocalllms/25.png new file mode 100644 index 0000000..c571bef Binary files /dev/null and b/opsec/openwebuilocalllms/25.png differ diff --git a/opsec/openwebuilocalllms/26.png b/opsec/openwebuilocalllms/26.png new file mode 100644 index 0000000..55f9899 Binary files /dev/null and b/opsec/openwebuilocalllms/26.png differ diff --git a/opsec/openwebuilocalllms/27.png b/opsec/openwebuilocalllms/27.png new file mode 100644 index 0000000..90500ca Binary files /dev/null and b/opsec/openwebuilocalllms/27.png differ diff --git a/opsec/openwebuilocalllms/28.png b/opsec/openwebuilocalllms/28.png new file mode 100644 index 0000000..725e4ac Binary files /dev/null and b/opsec/openwebuilocalllms/28.png differ diff --git a/opsec/openwebuilocalllms/29.png b/opsec/openwebuilocalllms/29.png new file mode 100644 index 0000000..e641b42 Binary files /dev/null and b/opsec/openwebuilocalllms/29.png differ diff --git a/opsec/openwebuilocalllms/3.png b/opsec/openwebuilocalllms/3.png new file mode 100644 index 0000000..860ecd2 Binary files /dev/null and b/opsec/openwebuilocalllms/3.png differ diff --git a/opsec/openwebuilocalllms/30.png b/opsec/openwebuilocalllms/30.png new file mode 100644 index 0000000..875e7ac Binary files /dev/null and b/opsec/openwebuilocalllms/30.png differ diff --git a/opsec/openwebuilocalllms/31.png b/opsec/openwebuilocalllms/31.png new file mode 100644 index 0000000..9896696 Binary files /dev/null and b/opsec/openwebuilocalllms/31.png differ diff --git a/opsec/openwebuilocalllms/32.png b/opsec/openwebuilocalllms/32.png new file mode 100644 index 0000000..d9b4d79 Binary files /dev/null and b/opsec/openwebuilocalllms/32.png differ diff --git a/opsec/openwebuilocalllms/33.png b/opsec/openwebuilocalllms/33.png new file mode 100644 index 0000000..b28539e Binary files /dev/null and b/opsec/openwebuilocalllms/33.png differ diff --git a/opsec/openwebuilocalllms/4.png b/opsec/openwebuilocalllms/4.png new file mode 100644 index 0000000..dd96d05 Binary files /dev/null and b/opsec/openwebuilocalllms/4.png differ diff --git a/opsec/openwebuilocalllms/5.png b/opsec/openwebuilocalllms/5.png new file mode 100644 index 0000000..408fed5 Binary files /dev/null and b/opsec/openwebuilocalllms/5.png differ diff --git a/opsec/openwebuilocalllms/6.png b/opsec/openwebuilocalllms/6.png new file mode 100644 index 0000000..922134b Binary files /dev/null and b/opsec/openwebuilocalllms/6.png differ diff --git a/opsec/openwebuilocalllms/7.png b/opsec/openwebuilocalllms/7.png new file mode 100644 index 0000000..297e096 Binary files /dev/null and b/opsec/openwebuilocalllms/7.png differ diff --git a/opsec/openwebuilocalllms/8.png b/opsec/openwebuilocalllms/8.png new file mode 100644 index 0000000..19c9a6a Binary files /dev/null and b/opsec/openwebuilocalllms/8.png differ diff --git a/opsec/openwebuilocalllms/gen_quant_graph.py b/opsec/openwebuilocalllms/gen_quant_graph.py new file mode 100644 index 0000000..e715f06 --- /dev/null +++ b/opsec/openwebuilocalllms/gen_quant_graph.py @@ -0,0 +1,53 @@ +import matplotlib.pyplot as plt + +# Your data: quantization level -> (memory usage, accuracy) +data = { + "Q2_K": (3032, 74.29), + "Q3_K_S": (3495, 82.19), + "Q3_K_M": (3833, 93.29), + "Q4_0": (4460, 96.09), + "Q4_K_S": (4476, 97.38), + "Q4_K_M": (4693, 97.67), + "Q4_1": (4893, 97.18), + "Q5_0": (5354, 98.98), + "Q5_K_S": (5340, 99.08), + "Q5_K_M": (5468, 99.00), + "Q5_1": (5788, 99.16), + "Q6_K": (6291, 99.58), + "Q8_0": (8146, 99.93) +} + +# Extract labels, memory usage, and accuracy +labels = list(data.keys()) +memory_usage = [value[0] for value in data.values()] +accuracy = [value[1] for value in data.values()] + +# Plot setup using a dark theme +plt.style.use('dark_background') + +fig, ax1 = plt.subplots() +plt.title('Quantization Levels of llama 3.1 8B') + +# Create two y-axes: one for memory usage and the other for accuracy +color_memory = 'tab:cyan' +ax1.set_xlabel('Quantization Level') +ax1.set_ylabel('Memory Usage (MB)', color=color_memory) +ax1.bar(labels, memory_usage, color=color_memory, alpha=0.8, label='Memory Usage') +ax1.tick_params(axis='y', labelcolor=color_memory) + +# Second y-axis for accuracy +ax2 = ax1.twinx() +color_accuracy = 'tab:orange' +ax2.set_ylabel('Accuracy (%)', color=color_accuracy) +ax2.plot(labels, accuracy, color=color_accuracy, marker='o', linestyle='-', linewidth=2, markersize=8, label='Accuracy') +ax2.tick_params(axis='y', labelcolor=color_accuracy) + +# Adding legends +fig.tight_layout() # To ensure the layout is tight +lines1, labels1 = ax1.get_legend_handles_labels() +lines2, labels2 = ax2.get_legend_handles_labels() +ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left') + + +# Show plot +plt.show() diff --git a/opsec/openwebuilocalllms/index.html b/opsec/openwebuilocalllms/index.html new file mode 100644 index 0000000..457490c --- /dev/null +++ b/opsec/openwebuilocalllms/index.html @@ -0,0 +1,408 @@ + + + + + + + + + + + Anonymity - Self-Hosted LLM Hidden Service + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ Previous Page

oxeo0 - 2025 / 04 / 18

+

Anonymity - Self-Hosted LLM Hidden Service

+ + +

+

Sidenote: Help us improve this tutorial by letting us know if there's anything missing or incorrect on this git issue directly!

+
+
+
+
+ + +
+
+
+
+

Current state of LLMs

+

If you've been on the internet recently, there's a high chance you heard about Large Language Models. Most notable companies in this field include OpenAI, Google, Antropic and xAI. To access their models you typically need to communicate with the service via an API. Such use while convenient, means user has little to no knowledge about how the data sent there is stored and used.

+

Additionally, when users submit data through these services, it might be embedded into future models. Companies often train new models on a variety of user-submitted data, which can include any text inputs you've provided. This raises serious privacy concerns, as personal information could inadvertently become part of the training set for subsequent AI models. AI giants will often say they're trying to respect your privacy with data "anonymization" and other techniques, but we all know how this works in practice. See: Anthropic's Privacy Policy and OpenAI explaining how they "improve model performance" with users data.

+

The vast amount of sensitive user data stored can have devastating consequences if a leak occurs. In AI space it's not uncommon to leak data either via compromised servers or models themselves. In the past year alone companies suffering such leaks include: OpenAI, Anthropic and DeepSeek.

+

Assume all conversations with online chatbots can be public at any time.

+ + +
+
+
+
+ +
+
+
+
+

Privacy LLM frontends

+

A partial solution to those problems could be a service that aggregates multiple model APIs and anonymizes their users. A bit like searxng does for search engines.
+AI companies can't know who exactly uses their models since the amount of metadata is heavily limited. +

+ +

+There're several such services including ppq.ai, NanoGPT or DuckDuckGo chat. This is only a partial solution since your conversation contents can still be saved and used for later training by large AI companies. +

+ + + +
+
+
+
+ +
+
+
+
+

Open LLMs Primer

+

+Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties. It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs. +

+

+Parameter Count
+Each open model has specified number of parameters. This can range from 0.5 billion (qwen 2.5) to even 671 billion (deepseek r1). The more parameters model has, the more knowledge can be packed into it. This comes at a cost of more physical RAM/VRAM memory being used. Newer generation models are fairly capable even at 8 billion parameters but it's not uncommon to use 12, 14 or 32 B ones. +

+ +

+Quantization (improving memory usage)
+Usually the model + context needs to fit into RAM/VRAM memory. Each model parameter can be represented with certain precision. For example, FP16 uses 16 bits (2 bytes) of memory to store a single parameter, while Q4_0 uses only 4 bits. This means that FP16 model will use ~4x of the memory compared to Q4_0. +Of course using Q4_0 will introduce some rounding error in quantization step, but it's usually not a big deal. Look at the graph below to see how different quantization parameters affect model accuracy and memory usage of llama 3.1 8B. +

+ + + +

+
+I highlighted the Q4_K_S and Q4_K_M quantization methods since they're usually offer the best balance between model size and accuracy. +They usually use a bit more than 4 bits per parameter, but have better precision than plain Q4_0. If you're pulling model from ollama without specifying the precision, there's a high chance that you'll get Q4_K_M variant since it has been the default for some time. +

+ +

+The rough formula for calculating memory usage of an Q4_K_M quantized LLM would be: [n billion parameters] * (4.5 / 8) + [context window size].
+For 8B model, we would require around 6 GB VRAM/RAM to comfortably run it as Q4_K_M. +

+ +

+Context size
+Context size is the number of tokens that LLM remembers from previous messages to generate a response. It's usually measured in tokens.
+In ollama it's usually set to 2048 tokens, which is around 1200 words or 6 kilobytes of text. With larger sizes, more memory is required to store the context. Also, the models have a context size limit (ex. 16k tokens for Phi-4, 128k for Gemma 3). +If the context size is too small, the LLM may forget what it was doing before. Take a look at this simplified example:
+

+In order to generate a correct response, the entire prompt should fit into the context window:
+

+ +We'll show how to check prompt length and set appropriate context size in Open WebUI a bit later on. +

+ +

+Model recommendations
+[table that I accidentally deleted...] +

+
+
+
+
+ +
+
+
+
+

Uses of local AI

+ +

Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy concious people.
However there are some good use-cases even for privacy and anonymity. We already discussed how stylometry protection can be achieved with local AI.

+

Translation - LLMs provide high-quality, real-time translations, allowing for communication across languages without external data leaks.
+ Rewriting - They assist in paraphrasing content to protect against stylometry or improving the flow.
+ Solving Problems - LLMs can be used as personal assistants to answer every day questions and help with personal issues.
+ Programming Aid - Developers use them for code suggestions and debugging without exposing their sensitive codebases.

+ +

It's crucial to stress that AI can hallucinate (make stuff up). Thus it's never to be fully trusted with anything important. You should always check the information in reputable sources in case of any doubts.

+
+
+
+
+ +
+
+
+
+

Prerequisites

+

To follow this tutorial, you'll need a system running Debian 12. Although ollama can work on CPU only, the performance will be much worse than having model that fits in GPU's VRAM.
+To comfortably use an 8B model, it's strongly advised to have a dedicated GPU with at least 6GB of VRAM. You can check the supported GPU models here.

+

This tutorial showcases ollama setup with Nvidia drivers, but AMD GPUs are also supported.

+

If you want to expose Open WebUI via Tor to access it remotely, you should have a hidden service setup.

+

It's also possible to set this up inside a Proxmox VE or any KVM based VM. You just need to PCI passthrough appropriate GPU inside the Hardware tab:

+ +
+
+
+
+ +
+
+
+
+

Docker Setup

+

To install Docker, follow the official guide: Install Docker Engine on Debian. After installation, add your user to the docker group:

+
oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
+oxeo@andromeda:~$ sudo systemctl enable docker
+
+

This ensures you can manage Docker without needing sudo privileges. Finally, reboot your system.

+
oxeo@andromeda:~$ sudo systemctl reboot
+
+
+
+
+
+ +
+
+
+
+

Nvidia Driver and Container Toolkit

+

Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:

+
deb http://deb.debian.org/debian/ bookworm main contrib non-free
+deb-src http://deb.debian.org/debian/ bookworm main contrib non-free
+
+deb http://security.debian.org/debian-security bookworm-security main contrib non-free
+deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free
+
+

Run:

+
oxeo@andromeda:~$ sudo apt update
+oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-nonfree
+
+

To verify installation, execute:

+
oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+
+

This command checks if the NVIDIA driver is accessible within Docker.

+
+
+
+
+ +
+
+
+
+

Open WebUI Docker Stack

+

Create a docker-compose.yml file in ~/openwebui-stack with the following contents. This setup uses ollama for LLM management and open-webui as the user interface.

+ +
services:
+  ollama:
+    image: ollama/ollama
+    container_name: ollama
+    volumes:
+      - ollama:/root/.ollama
+    pull_policy: always
+    ports:
+      - 127.0.0.1:11434:11434
+    tty: true
+    restart: unless-stopped
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities:
+                - gpu
+
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:main
+    container_name: open-webui
+    volumes:
+      - open-webui:/app/backend/data
+    depends_on:
+      - ollama
+    ports:
+      - 127.0.0.1:3000:8080  # Remove "127.0.0.1:" to access from LAN
+    environment:
+      - 'OLLAMA_BASE_URL=http://ollama:11434'
+      - 'WEBUI_SECRET_KEY='
+    extra_hosts:
+      - host.docker.internal:host-gateway
+    restart: unless-stopped
+
+volumes:
+  ollama: {}
+  open-webui: {}
+
+

To start the stack:

+
cd ~/openwebui-stack
+docker compose up -d
+
+
+
+
+
+ +
+
+
+
+

Exposing Hidden Service

+

To expose open-webui via Tor, edit your torrc file:

+
SocksPort 9050
+HiddenServiceDir /var/lib/tor/hidden_service/
+HiddenServiceVersion 3
+HiddenServicePort 80 127.0.0.1:3000
+
+

Restart Tor and check the generated hostname:

+
sudo systemctl restart tor
+cat /var/lib/tor/hidden_service/hostname
+
+
+
+
+
+ +
+
+
+
+

Troubleshooting

+

If you encounter issues with hardware acceleration on ollama, check:

+
    +
  • Ensure the NVIDIA driver is correctly installed and accessible within Docker.
  • +
  • Verify GPU resources are allocated by running `docker run --rm --runtime=nvidia nvidia-smi`.
  • +
  • Check logs with `docker compose logs -f` for any error messages.
  • +
+
+
+
+
+ +
+
+
+
+

Closing Remarks

+

In this tutorial, you've set up a private LLM experience using ollama and open-webui. By exposing it via Tor, your interactions remain anonymous and secure. While consumer-grade hardware may offer less computational power than corporate setups, you retain full control over your data and privacy.

+
+
+
+
+ + + +
+
+
+
+

Nihilism

+

+ Until there is Nothing left.

Legal Disclaimer

Creative Commons Zero: No Rights Reserved
+ +

+
+ +
+

My Links

+

+ + RSS Feed
SimpleX Chatrooms
+ +

+
+ +
+

About nihilist

+

Donate XMR: + 8AUYjhQeG3D5aodJDtqG499N5jXXM71gYKD8LgSsFB9BUV1o7muLv3DXHoydRTK4SZaaUBq4EAUqpZHLrX2VZLH71Jrd9k8 +


+

Donate XMR to the author: + 862Sp3N5Y8NByFmPVLTPrJYzwdiiVxkhQgAdt65mpYKJLdVDHyYQ8swLgnVr8D3jKphDUcWUCVK1vZv9u8cvtRJCUBFb8MQ

+

Contact: nihilist@contact.nowhere.moe (PGP)

+
+ +
+ +
+
+ + + + + + +