From f52a71dbc9765553f9e2e586a7df9268af90ec85 Mon Sep 17 00:00:00 2001 From: oxeo0 Date: Fri, 18 Apr 2025 03:05:07 +0200 Subject: [PATCH] [wip] add info on configuring openwebui and start downloading models section --- opsec/openwebuilocalllms/index.html | 52 +++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/opsec/openwebuilocalllms/index.html b/opsec/openwebuilocalllms/index.html index 95f5fe5..f098a2d 100644 --- a/opsec/openwebuilocalllms/index.html +++ b/opsec/openwebuilocalllms/index.html @@ -133,7 +133,7 @@ There're several such services including ppq.ai,
-

Open LLMs Primer

+

Open LLMs Primer

Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties.

@@ -150,7 +150,7 @@ Each open model has specified number of parameters. This can range from 0.5 bill

Quantization (improving memory usage)
Usually the model + context needs to fit into RAM/VRAM memory. Each model parameter can be represented with certain precision. For example, FP16 uses 16 bits (2 bytes) of memory to store a single parameter, while Q4_0 uses only 4 bits. This means that FP16 model will use ~4x of the memory compared to Q4_0. -Of course using Q4_0 will introduce some rounding error in quantization step, but it's usually not a big deal. Look at the graph below to see how different quantization parameters affect model accuracy and memory usage of llama 3.1 8B. +Of course using Q4_0 will introduce some rounding error in quantization step, but it's usually not a big deal. Look at the graph below to see how different quantization parameters affect
model accuracy and memory usage of llama 3.1 8B:

@@ -198,7 +198,7 @@ We'll show how to check prompt length and set appropriate context size in Open W Solving Problems - LLMs can be used as personal assistants to answer every day questions and help with personal issues.
Programming Aid - Developers use them for code suggestions and debugging without exposing their sensitive codebases.

-

It's crucial to stress that AI can hallucinate (make stuff up). Thus it's never to be fully trusted with anything important. You should always check the information in reputable sources in case of any doubts.

+

It's crucial to stress that AI can hallucinate (make stuff up) thus it's never to be fully trusted with anything important. You should always check the information in reputable sources in case of any doubts.

@@ -337,9 +337,55 @@ cat /var/lib/tor/hidden_service/hostname
+
+
+
+

Initial Open WebUI Configuration

+

+Go to the local IP or onion address of the Open WebUI and create admin account once you're asked to. You don't need to put any real data but save it somewhere so that you can login later. +

+ + +


+After that, you should be greeted with Open WebUI main interface and changelog popup. You can close it. +

+ + +


+Then, we'll go into the settings page and change theme to dark mode. +

+ + + +


+Go to the Admin settings and proceed with next steps. +

+ +
+
+
+
+ +
+

Downloading Model

+

+To see available models, head to ollama library. Sadly they block Tor traffic so if you have to use Tor, use their chinese mirror.
+Next, pick a model you want to download. In our case, we want Gemma 3. Then click on Tags to see all available variants. +

+ + +
+
+
+
+ +
+
+
+

Troubleshooting

If you encounter issues with hardware acceleration on ollama, check: