The world of Linux: December 2025

Tired of paying for ChatGPT or worrying about your data in the cloud? In 2026, you can run powerful AI models locally on your Linux machine completely offline and private using Ollama.

Ollama makes it dead simple to download and run open-source large language models (LLMs) like Llama 3.2, DeepSeek-R1, Gemma 3, or Qwen. No complex setup, and it supports NVIDIA GPU acceleration for fast responses.

Lets talk about it!! I will cover below points:

Installing Ollama on Ubuntu/Fedora/other distros
Enabling NVIDIA GPU support (common issues fixed!)
Downloading and running models
Basic usage and tips
Optional: Web UI for a ChatGPT-like interface

Step 1: Install Ollama

The easiest way is the official one-liner script (works on most Linux distros):

curl -fsSL https://ollama.com/install.sh | sh

This downloads and sets up Ollama as a service. After it finishes, verify:

ollama --version

You should see something like "ollama version 0.x.x".

Ollama now runs in the background: systemctl status ollama (or start it manually with ollama serve in a terminal).

Step 2: Enable NVIDIA GPU Acceleration (If You Have an NVIDIA Card)

Ollama auto-detects NVIDIA GPUs with recent drivers no need to install full CUDA toolkit separately.

First, ensure drivers are installed:

On Ubuntu: sudo ubuntu-drivers autoinstall then reboot.
Verify: nvidia-smi (should show your GPU and driver version).

Common issues & fixes:

"No GPU detected" → Reinstall Ollama after drivers: run the install script again.
Old drivers → Update to latest (535+ recommended).
After suspend/resume, GPU lost → Restart Ollama service: sudo systemctl restart ollama.
Multiple GPUs → Limit with export CUDA_VISIBLE_DEVICES=0 (replace 0 with your GPU ID).

When you run a model, check usage with nvidia-smi or nvtop (install via sudo apt install nvtop).

Step 3: Download and Run a Model

List popular models: ollama list (or browse https://ollama.com/library)

Start with a fast one:

ollama pull phi3 # Small & quick (3.8B params, great for beginners) ollama pull llama3.2 # Meta's latest, excellent general-purpose ollama pull gemma3 # Google's new powerhouse ollama pull deepseek-r1 # Top reasoning model in 2025

Larger ones (e.g., :70b) need more RAM/VRAM (16GB+ recommended).

Run it:

ollama run llama3.2

Then chat! Type prompts and hit Enter. Exit with /bye.

Example:

>>> Explain quantum computing simply

Pro Tip: Quantized models (e.g., llama3.2:8b) are faster on modest hardware.

Step 4: Optional - ChatGPT-Like Web Interface (Open WebUI)

For a beautiful browser UI:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Open http://localhost:3000, sign up, and connect to Ollama (it auto-detects).

Common Problems [SOLVED]

Slow on CPU only → Get NVIDIA drivers working!
Out of memory → Use smaller model (e.g., :3b or :8b) or add swap.
Model download stuck → Check internet; retry with ollama pull <model>.
Service not starting → journalctl -u ollama for logs.

Conclusion:

You've now got your own private AI running locally faster than cloud for many tasks, zero cost, full privacy.

Start experimenting! Try coding help with deepseek-coder or image understanding with llava.

Share your experiences in the comments. What model are you running?

[Tags: Linux, AI, Ollama, Local LLM, NVIDIA GPU]

The world of Linux

Sunday, December 21, 2025

Run Your Own ChatGPT-Like AI on Linux for Free with Ollama [SOLVED]