Add GPU monitoring docs, Agap Installation page

2026-03-08 06:51:40 +00:00
parent 458699f31b
commit 304198ebbd
3 changed files with 111 additions and 0 deletions

85
Agap-Installation.md Normal file

@@ -0,0 +1,85 @@
# Agap Installation
Steps to set up a fresh Agap server from scratch.
## 1. GPU & Docker
```bash
sudo ./nvidia-docker-install.sh # Docker + NVIDIA Container Toolkit
./install-cuda.sh # CUDA toolkit (no driver)
```
## 2. Zabbix Agent (host)
Install agent and plugins:
```bash
# Add Zabbix repo
wget https://repo.zabbix.com/zabbix/7.4/release/ubuntu/pool/main/z/zabbix-release/zabbix-release_latest_7.4+ubuntu24.04_all.deb
sudo dpkg -i zabbix-release_latest_7.4+ubuntu24.04_all.deb
sudo apt update
# Install agent and GPU plugin
sudo apt install zabbix-agent2 zabbix-agent2-plugin-nvidia-gpu
```
Configure `/etc/zabbix/zabbix_agent2.conf`:
```ini
Server=127.0.0.1
ServerActive=127.0.0.1:10051
Hostname=AgapHost
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf
Include=/etc/zabbix/zabbix_agent2.d/*.conf
```
```bash
sudo systemctl enable --now zabbix-agent2
```
In Zabbix UI, link these templates to the `AgapHost` host:
- **Linux by Zabbix agent active**
- **Nvidia by Zabbix agent 2 active**
## 3. Custom Zabbix UserParameters
Add backup monitoring to `/etc/zabbix/zabbix_agent2.d/gitea_backup.conf`:
```ini
# Gitea backup
UserParameter=gitea.backup.status,grep -c "Finish dumping" /mnt/backups/gitea/backup.log 2>/dev/null | grep -qx 1 && echo 1 || echo 0
UserParameter=gitea.backup.age,f=$(ls -t /mnt/backups/gitea/gitea-dump-*.zip 2>/dev/null | head -1); [ -n "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
# DBS backup
UserParameter=dbs.backup.age,f=/mnt/backups/dbs/.last_sync; [ -f "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
# Immich backup
UserParameter=immich.backup.age,f=/mnt/backups/media/.last_sync; [ -f "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
```
## 4. Root Cron Jobs
```bash
sudo crontab -e
```
Add:
```
0 3 * * * /home/alvis/agap_git/gitea/backup.sh >> /mnt/backups/gitea/cron.log 2>&1
30 2 * * * /home/alvis/agap_git/immich-app/backup.sh >> /mnt/backups/media/cron.log 2>&1
30 3 * * * rsync -a --delete /mnt/ssd/dbs/ /mnt/backups/dbs/ >> /mnt/backups/dbs/cron.log 2>&1 && touch /mnt/backups/dbs/.last_sync
```
## 5. Services
Start all Docker services:
```bash
cd ~/agap_git
docker compose up -d # Immich
cd gitea && docker compose up -d # Gitea
cd ../openai && docker compose up -d # Open WebUI + Ollama
cd ../zabbix && docker compose up -d # Zabbix
```

@@ -2,6 +2,7 @@
## Infrastructure
- [[Agap-Installation]] — Fresh install guide
- [[Network]] — Netplan, Caddy, port forwarding
- [[Storage]] — LVM setup
- [[Backups]] — Gitea and database backups

@@ -64,6 +64,31 @@ Host "HA Agap" receives alerts from Home Assistant via `history.push` API.
To add a new HA alert: create a trapper item + trigger on "HA Agap", add `rest_command` in HA `configuration.yaml`, create HA automation to call it.
## GPU Monitoring (AgapHost)
The host agent monitors the GTX 1070 GPU via the `zabbix-agent2-plugin-nvidia-gpu` package.
**Installed packages:**
```bash
apt install zabbix-agent2-plugin-nvidia-gpu
```
Plugin binary: `/usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu`
Plugin config: `/etc/zabbix/zabbix_agent2.d/plugins.d/nvidia.conf`
Template linked to AgapHost: **Nvidia by Zabbix agent 2 active** — uses `nvml.*` keys.
Key metrics reported:
| Item | Key |
|------|-----|
| GPU memory free | `nvml.device.memory.fb.free` |
| GPU memory used | `nvml.device.memory.fb.used` |
| GPU utilization | `nvml.device.utilization.gpu` |
| Temperature | `nvml.device.temperature` |
| Power usage | `nvml.device.power.usage` |
> ECC memory error items show expected errors — GTX 1070 does not support ECC.
## Notes
- Zabbix server port 10051 is exposed on the host for the host agent