Add GPU monitoring docs, Agap Installation page
85
Agap-Installation.md
Normal file
85
Agap-Installation.md
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
# Agap Installation
|
||||||
|
|
||||||
|
Steps to set up a fresh Agap server from scratch.
|
||||||
|
|
||||||
|
## 1. GPU & Docker
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./nvidia-docker-install.sh # Docker + NVIDIA Container Toolkit
|
||||||
|
./install-cuda.sh # CUDA toolkit (no driver)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. Zabbix Agent (host)
|
||||||
|
|
||||||
|
Install agent and plugins:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add Zabbix repo
|
||||||
|
wget https://repo.zabbix.com/zabbix/7.4/release/ubuntu/pool/main/z/zabbix-release/zabbix-release_latest_7.4+ubuntu24.04_all.deb
|
||||||
|
sudo dpkg -i zabbix-release_latest_7.4+ubuntu24.04_all.deb
|
||||||
|
sudo apt update
|
||||||
|
|
||||||
|
# Install agent and GPU plugin
|
||||||
|
sudo apt install zabbix-agent2 zabbix-agent2-plugin-nvidia-gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
Configure `/etc/zabbix/zabbix_agent2.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
Server=127.0.0.1
|
||||||
|
ServerActive=127.0.0.1:10051
|
||||||
|
Hostname=AgapHost
|
||||||
|
PluginSocket=/run/zabbix/agent.plugin.sock
|
||||||
|
ControlSocket=/run/zabbix/agent.sock
|
||||||
|
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf
|
||||||
|
Include=/etc/zabbix/zabbix_agent2.d/*.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl enable --now zabbix-agent2
|
||||||
|
```
|
||||||
|
|
||||||
|
In Zabbix UI, link these templates to the `AgapHost` host:
|
||||||
|
- **Linux by Zabbix agent active**
|
||||||
|
- **Nvidia by Zabbix agent 2 active**
|
||||||
|
|
||||||
|
## 3. Custom Zabbix UserParameters
|
||||||
|
|
||||||
|
Add backup monitoring to `/etc/zabbix/zabbix_agent2.d/gitea_backup.conf`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# Gitea backup
|
||||||
|
UserParameter=gitea.backup.status,grep -c "Finish dumping" /mnt/backups/gitea/backup.log 2>/dev/null | grep -qx 1 && echo 1 || echo 0
|
||||||
|
UserParameter=gitea.backup.age,f=$(ls -t /mnt/backups/gitea/gitea-dump-*.zip 2>/dev/null | head -1); [ -n "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
|
||||||
|
|
||||||
|
# DBS backup
|
||||||
|
UserParameter=dbs.backup.age,f=/mnt/backups/dbs/.last_sync; [ -f "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
|
||||||
|
|
||||||
|
# Immich backup
|
||||||
|
UserParameter=immich.backup.age,f=/mnt/backups/media/.last_sync; [ -f "$f" ] && echo $(( $(date +%s) - $(stat -c %Y "$f") )) || echo -1
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4. Root Cron Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo crontab -e
|
||||||
|
```
|
||||||
|
|
||||||
|
Add:
|
||||||
|
```
|
||||||
|
0 3 * * * /home/alvis/agap_git/gitea/backup.sh >> /mnt/backups/gitea/cron.log 2>&1
|
||||||
|
30 2 * * * /home/alvis/agap_git/immich-app/backup.sh >> /mnt/backups/media/cron.log 2>&1
|
||||||
|
30 3 * * * rsync -a --delete /mnt/ssd/dbs/ /mnt/backups/dbs/ >> /mnt/backups/dbs/cron.log 2>&1 && touch /mnt/backups/dbs/.last_sync
|
||||||
|
```
|
||||||
|
|
||||||
|
## 5. Services
|
||||||
|
|
||||||
|
Start all Docker services:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/agap_git
|
||||||
|
docker compose up -d # Immich
|
||||||
|
cd gitea && docker compose up -d # Gitea
|
||||||
|
cd ../openai && docker compose up -d # Open WebUI + Ollama
|
||||||
|
cd ../zabbix && docker compose up -d # Zabbix
|
||||||
|
```
|
||||||
1
Home.md
1
Home.md
@@ -2,6 +2,7 @@
|
|||||||
|
|
||||||
## Infrastructure
|
## Infrastructure
|
||||||
|
|
||||||
|
- [[Agap-Installation]] — Fresh install guide
|
||||||
- [[Network]] — Netplan, Caddy, port forwarding
|
- [[Network]] — Netplan, Caddy, port forwarding
|
||||||
- [[Storage]] — LVM setup
|
- [[Storage]] — LVM setup
|
||||||
- [[Backups]] — Gitea and database backups
|
- [[Backups]] — Gitea and database backups
|
||||||
|
|||||||
25
Zabbix.md
25
Zabbix.md
@@ -64,6 +64,31 @@ Host "HA Agap" receives alerts from Home Assistant via `history.push` API.
|
|||||||
|
|
||||||
To add a new HA alert: create a trapper item + trigger on "HA Agap", add `rest_command` in HA `configuration.yaml`, create HA automation to call it.
|
To add a new HA alert: create a trapper item + trigger on "HA Agap", add `rest_command` in HA `configuration.yaml`, create HA automation to call it.
|
||||||
|
|
||||||
|
## GPU Monitoring (AgapHost)
|
||||||
|
|
||||||
|
The host agent monitors the GTX 1070 GPU via the `zabbix-agent2-plugin-nvidia-gpu` package.
|
||||||
|
|
||||||
|
**Installed packages:**
|
||||||
|
```bash
|
||||||
|
apt install zabbix-agent2-plugin-nvidia-gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
Plugin binary: `/usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu`
|
||||||
|
Plugin config: `/etc/zabbix/zabbix_agent2.d/plugins.d/nvidia.conf`
|
||||||
|
|
||||||
|
Template linked to AgapHost: **Nvidia by Zabbix agent 2 active** — uses `nvml.*` keys.
|
||||||
|
|
||||||
|
Key metrics reported:
|
||||||
|
| Item | Key |
|
||||||
|
|------|-----|
|
||||||
|
| GPU memory free | `nvml.device.memory.fb.free` |
|
||||||
|
| GPU memory used | `nvml.device.memory.fb.used` |
|
||||||
|
| GPU utilization | `nvml.device.utilization.gpu` |
|
||||||
|
| Temperature | `nvml.device.temperature` |
|
||||||
|
| Power usage | `nvml.device.power.usage` |
|
||||||
|
|
||||||
|
> ECC memory error items show expected errors — GTX 1070 does not support ECC.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Zabbix server port 10051 is exposed on the host for the host agent
|
- Zabbix server port 10051 is exposed on the host for the host agent
|
||||||
|
|||||||
Reference in New Issue
Block a user