Another option available is to self-host LLM on your own infrastructure. This will effectively prevent sending your data from being sent to third parties.
@@ -150,7 +150,7 @@ Each open model has specified number of parameters. This can range from 0.5 bill
Quantization (improving memory usage)
Usually the model + context needs to fit into RAM/VRAM memory. Each model parameter can be represented with certain precision. For example, FP16 uses 16 bits (2 bytes) of memory to store a single parameter, while Q4_0 uses only 4 bits. This means that FP16 model will use ~4x of the memory compared to Q4_0.
-Of course using Q4_0 will introduce some rounding error in quantization step, but it's usually not a big deal. Look at the graph below to see how different quantization parameters affect model accuracy and memory usage of llama 3.1 8B.
+Of course using Q4_0 will introduce some rounding error in quantization step, but it's usually not a big deal. Look at the graph below to see how different quantization parameters affect model accuracy and memory usage of llama 3.1 8B:
It's crucial to stress that AI can hallucinate (make stuff up). Thus it's never to be fully trusted with anything important. You should always check the information in reputable sources in case of any doubts.
+It's crucial to stress that AI can hallucinate (make stuff up) thus it's never to be fully trusted with anything important. You should always check the information in reputable sources in case of any doubts.
+Go to the local IP or onion address of the Open WebUI and create admin account once you're asked to. You don't need to put any real data but save it somewhere so that you can login later. +
+
+After that, you should be greeted with Open WebUI main interface and changelog popup. You can close it.
+
+Then, we'll go into the settings page and change theme to dark mode.
+
+Go to the Admin settings and proceed with next steps.
+
+To see available models, head to ollama library. Sadly they block Tor traffic so if you have to use Tor, use their chinese mirror.
+Next, pick a model you want to download. In our case, we want Gemma 3. Then click on Tags to see all available variants.
+
If you encounter issues with hardware acceleration on ollama, check: