[wip] small fixes in openwebui local llms

This commit is contained in:
oxeo0 2025-04-18 02:36:41 +02:00
parent 2f9ff33267
commit 25ee57dcad

View file

@ -135,7 +135,12 @@ There're several such services including <a href="https://ppq.ai">ppq.ai</a>, <a
<div class="col-lg-8 col-lg-offset-2">
<h2><b>Open LLMs Primer</b></h2>
<p>
Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties. It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs.
Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties.</p>
<img src="3.png" style="width:480px">
<p><br>
It can work fully offline on device but you'll need to have the required resources. You also have to understand certain more advanced concepts related to LLMs.
</p>
<p>
<b>Parameter Count</b><br>
@ -150,8 +155,7 @@ Of course using Q4_0 will introduce some rounding error in quantization step, bu
<img src="2.png" class="imgRz">
<p>
<br>
<p><br>
I highlighted the <b>Q4_K_S</b> and <b>Q4_K_M</b> quantization methods since they're usually offer the best balance between model size and accuracy.
They usually use a bit more than 4 bits per parameter, but have better precision than plain <b>Q4_0</b>. If you're pulling model from ollama without specifying the precision, there's a high chance that you'll get <b>Q4_K_M</b> variant since it has been the default for some time.
</p>
@ -186,9 +190,9 @@ We'll show how to check prompt length and set appropriate context size in Open W
<div class="container">
<div class="row">
<div class="col-lg-8 col-lg-offset-2">
<h2><b>Uses of local AI</b></h2>
<h2><b>Use-Cases</b></h2>
<p>Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy concious people.<br>However there are some good use-cases even for privacy and anonymity. We already discussed how <a href="../stylometry/index.html">stylometry protection</a> can be achieved with local AI.</p>
<p>Contrary to what companies in the field often say - AI isn't a silver bullet. It won't solve all most problems we face as privacy conscious people.<br>However when it comes to self-hosted models, there are some good use-cases even for privacy and anonymity. We already discussed how <a href="../stylometry/index.html">stylometry protection</a> can be achieved with an LLM running locally.</p>
<p><b>Translation</b> - LLMs provide high-quality, real-time translations, allowing for communication across languages without external data leaks.<br>
<b>Rewriting</b> - They assist in paraphrasing content to protect against stylometry or improving the flow.<br>
<b>Solving Problems</b> - LLMs can be used as personal assistants to answer every day questions and help with personal issues.<br>
@ -208,7 +212,7 @@ We'll show how to check prompt length and set appropriate context size in Open W
<p>To follow this tutorial, you'll need a system running Debian 12. Although ollama can work on CPU only, the performance will be much worse than having model that fits in GPU's VRAM.<br>
To comfortably use an 8B model, it's strongly advised to have a dedicated GPU with at least 6GB of VRAM. You can check the supported GPU models <a href="https://github.com/ollama/ollama/blob/main/docs/gpu.md">here</a>.</p>
<p>This tutorial showcases ollama setup with Nvidia drivers, but AMD GPUs are also supported.</p>
<p>If you want to expose Open WebUI via Tor to access it remotely, you should have a <a href="../torwebsite/index.html">hidden service</a> setup.</p>
<p>If you want to expose Open WebUI via Tor to access it remotely, you should have an <a href="../torwebsite/index.html">onion v3 vanity address and Tor installed</a>.</p>
<p>It's also possible to set this up inside a Proxmox VE or any KVM based VM. You just need to PCI passthrough appropriate GPU inside the <b>Hardware tab</b>:</p>
<img src="6.png" class="imgRz">
</div>
@ -222,11 +226,11 @@ To comfortably use an 8B model, it's strongly advised to have a dedicated GPU wi
<div class="col-lg-8 col-lg-offset-2">
<h2><b>Docker Setup</b></h2>
<p>To install Docker, follow the official guide: <a href="https://docs.docker.com/engine/install/debian/">Install Docker Engine on Debian</a>. After installation, add your user to the docker group:</p>
<pre><code>oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
<pre><code class="nim">oxeo@andromeda:~$ /sbin/usermod -aG docker oxeo
oxeo@andromeda:~$ sudo systemctl enable docker
</code></pre>
<p>This ensures you can manage Docker without needing sudo privileges. Finally, reboot your system.</p>
<pre><code>oxeo@andromeda:~$ sudo systemctl reboot
<pre><code class="nim">oxeo@andromeda:~$ sudo systemctl reboot
</code></pre>
</div>
</div><!-- /row -->
@ -238,19 +242,19 @@ oxeo@andromeda:~$ sudo systemctl enable docker
<div class="row">
<div class="col-lg-8 col-lg-offset-2">
<h2><b>Nvidia Driver and Container Toolkit</b></h2>
<p>Update your package list to include "contrib non-free" at the end of every line in /etc/apt/sources.list:</p>
<pre><code>deb http://deb.debian.org/debian/ bookworm main contrib non-free
<p>Update your package list to include "contrib non-free" at the end of every line in <b>/etc/apt/sources.list</b>:</p>
<pre><code class="nim">deb http://deb.debian.org/debian/ bookworm main contrib non-free
deb-src http://deb.debian.org/debian/ bookworm main contrib non-free
deb http://security.debian.org/debian-security bookworm-security main contrib non-free
deb-src http://security.debian.org/debian-security bookworm-security main contrib non-free
</code></pre>
<p>Run:</p>
<pre><code>oxeo@andromeda:~$ sudo apt update
<pre><code class="nim">oxeo@andromeda:~$ sudo apt update
oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-nonfree
</code></pre>
<p>To verify installation, execute:</p>
<pre><code>oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
<pre><code class="nim">oxeo@andromeda:~$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
</code></pre>
<p>This command checks if the NVIDIA driver is accessible within Docker.</p>
</div>
@ -265,7 +269,7 @@ oxeo@andromeda:~$ sudo apt install linux-headers nvidia-driver firmware-misc-non
<h2><b>Open WebUI Docker Stack</b></h2>
<p>Create a docker-compose.yml file in <b>~/openwebui-stack</b> with the following contents. This setup uses ollama for LLM management and open-webui as the user interface.</p>
<pre><code>services:
<pre><code class="nim">services:
ollama:
image: ollama/ollama
container_name: ollama
@ -306,8 +310,8 @@ volumes:
open-webui: {}
</code></pre>
<p>To start the stack:</p>
<pre><code>cd ~/openwebui-stack
docker compose up -d
<pre><code class="nim">oxeo@andromeda:~$ cd ~/openwebui-stack
oxeo@andromeda:~$ docker compose up -d
</code></pre>
</div>
</div><!-- /row -->
@ -320,13 +324,11 @@ docker compose up -d
<div class="col-lg-8 col-lg-offset-2">
<h2><b>Exposing Hidden Service</b></h2>
<p>To expose open-webui via Tor, edit your torrc file:</p>
<pre><code>SocksPort 9050
HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServiceVersion 3
<pre><code class="nim">HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 127.0.0.1:3000
</code></pre>
<p>Restart Tor and check the generated hostname:</p>
<pre><code>sudo systemctl restart tor
<pre><code class="nim">sudo systemctl restart tor
cat /var/lib/tor/hidden_service/hostname
</code></pre>
</div>