首次提交:初始化项目
This commit is contained in:
458
009-基础设施/007-keda/readme.md
Normal file
458
009-基础设施/007-keda/readme.md
Normal file
@@ -0,0 +1,458 @@
|
||||
# KEDA 自动扩缩容
|
||||
|
||||
## 功能说明
|
||||
|
||||
KEDA (Kubernetes Event Driven Autoscaling) 为 K3s 集群提供基于事件驱动的自动扩缩容能力。
|
||||
|
||||
### 核心功能
|
||||
|
||||
- **按需启动/停止服务**:空闲时自动缩容到 0,节省资源
|
||||
- **基于指标自动扩缩容**:根据实际负载动态调整副本数
|
||||
- **多种触发器支持**:CPU、内存、Prometheus 指标、数据库连接等
|
||||
- **与 Prometheus 集成**:利用现有监控数据进行扩缩容决策
|
||||
|
||||
## 部署方式
|
||||
|
||||
```bash
|
||||
cd /home/fei/k3s/009-基础设施/007-keda
|
||||
bash deploy.sh
|
||||
```
|
||||
|
||||
## 已配置的服务
|
||||
|
||||
### 1. Navigation 导航服务 ✅
|
||||
|
||||
- **最小副本数**: 0(空闲时完全停止)
|
||||
- **最大副本数**: 10
|
||||
- **触发条件**:
|
||||
- HTTP 请求速率 > 10 req/min
|
||||
- CPU 使用率 > 60%
|
||||
- **冷却期**: 3 分钟
|
||||
|
||||
**配置文件**: `scalers/navigation-scaler.yaml`
|
||||
|
||||
### 2. Redis 缓存服务 ⏳
|
||||
|
||||
- **最小副本数**: 0(空闲时完全停止)
|
||||
- **最大副本数**: 5
|
||||
- **触发条件**:
|
||||
- 有客户端连接
|
||||
- CPU 使用率 > 70%
|
||||
- **冷却期**: 5 分钟
|
||||
|
||||
**配置文件**: `scalers/redis-scaler.yaml`
|
||||
**状态**: 待应用(需要先为 Redis 添加 Prometheus exporter)
|
||||
|
||||
### 3. PostgreSQL 数据库 ❌
|
||||
|
||||
**不推荐使用 KEDA 扩展 PostgreSQL!**
|
||||
|
||||
原因:
|
||||
- PostgreSQL 是有状态服务,多个副本会导致存储冲突
|
||||
- 需要配置主从复制才能安全扩展
|
||||
- 建议使用 PostgreSQL Operator 或 PgBouncer + KEDA
|
||||
|
||||
详细说明:`scalers/postgresql-说明.md`
|
||||
|
||||
## 应用 ScaledObject
|
||||
|
||||
### 部署所有 Scaler
|
||||
|
||||
```bash
|
||||
# 应用 Navigation Scaler
|
||||
kubectl apply -f scalers/navigation-scaler.yaml
|
||||
|
||||
# 应用 Redis Scaler(需要先配置 Redis exporter)
|
||||
kubectl apply -f scalers/redis-scaler.yaml
|
||||
|
||||
# ⚠️ PostgreSQL 不推荐使用 KEDA 扩展
|
||||
# 详见: scalers/postgresql-说明.md
|
||||
```
|
||||
|
||||
### 查看 ScaledObject 状态
|
||||
|
||||
```bash
|
||||
# 查看所有 ScaledObject
|
||||
kubectl get scaledobject -A
|
||||
|
||||
# 查看详细信息
|
||||
kubectl describe scaledobject navigation-scaler -n navigation
|
||||
kubectl describe scaledobject redis-scaler -n redis
|
||||
kubectl describe scaledobject postgresql-scaler -n postgresql
|
||||
```
|
||||
|
||||
### 查看自动创建的 HPA
|
||||
|
||||
```bash
|
||||
# KEDA 会自动创建 HorizontalPodAutoscaler
|
||||
kubectl get hpa -A
|
||||
```
|
||||
|
||||
## 支持的触发器类型
|
||||
|
||||
### 1. Prometheus 指标
|
||||
|
||||
```yaml
|
||||
triggers:
|
||||
- type: prometheus
|
||||
metadata:
|
||||
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
|
||||
metricName: custom_metric
|
||||
query: sum(rate(http_requests_total[1m]))
|
||||
threshold: "100"
|
||||
```
|
||||
|
||||
### 2. CPU/内存使用率
|
||||
|
||||
```yaml
|
||||
triggers:
|
||||
- type: cpu
|
||||
metadata:
|
||||
type: Utilization
|
||||
value: "70"
|
||||
- type: memory
|
||||
metadata:
|
||||
type: Utilization
|
||||
value: "80"
|
||||
```
|
||||
|
||||
### 3. Redis 队列长度
|
||||
|
||||
```yaml
|
||||
triggers:
|
||||
- type: redis
|
||||
metadata:
|
||||
address: redis.redis.svc.cluster.local:6379
|
||||
listName: mylist
|
||||
listLength: "5"
|
||||
```
|
||||
|
||||
### 4. PostgreSQL 查询
|
||||
|
||||
```yaml
|
||||
triggers:
|
||||
- type: postgresql
|
||||
metadata:
|
||||
connectionString: postgresql://user:pass@host:5432/db
|
||||
query: "SELECT COUNT(*) FROM tasks WHERE status='pending'"
|
||||
targetQueryValue: "10"
|
||||
```
|
||||
|
||||
### 5. Cron 定时触发
|
||||
|
||||
```yaml
|
||||
triggers:
|
||||
- type: cron
|
||||
metadata:
|
||||
timezone: Asia/Shanghai
|
||||
start: 0 8 * * * # 每天 8:00 扩容
|
||||
end: 0 18 * * * # 每天 18:00 缩容
|
||||
desiredReplicas: "3"
|
||||
```
|
||||
|
||||
## 为新服务添加自动扩缩容
|
||||
|
||||
### 步骤 1: 确保服务配置正确
|
||||
|
||||
服务的 Deployment 必须配置 `resources.requests`:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: myapp
|
||||
spec:
|
||||
# 不要设置 replicas,由 KEDA 管理
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: myapp
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
```
|
||||
|
||||
### 步骤 2: 创建 ScaledObject
|
||||
|
||||
```yaml
|
||||
apiVersion: keda.sh/v1alpha1
|
||||
kind: ScaledObject
|
||||
metadata:
|
||||
name: myapp-scaler
|
||||
namespace: myapp
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
name: myapp
|
||||
minReplicaCount: 0
|
||||
maxReplicaCount: 10
|
||||
pollingInterval: 30
|
||||
cooldownPeriod: 300
|
||||
triggers:
|
||||
- type: prometheus
|
||||
metadata:
|
||||
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
|
||||
metricName: myapp_requests
|
||||
query: sum(rate(http_requests_total{app="myapp"}[1m]))
|
||||
threshold: "50"
|
||||
```
|
||||
|
||||
### 步骤 3: 应用配置
|
||||
|
||||
```bash
|
||||
kubectl apply -f myapp-scaler.yaml
|
||||
```
|
||||
|
||||
## 监控和调试
|
||||
|
||||
### 查看 KEDA 日志
|
||||
|
||||
```bash
|
||||
# Operator 日志
|
||||
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
|
||||
|
||||
# Metrics Server 日志
|
||||
kubectl logs -n keda -l app.kubernetes.io/name=keda-metrics-apiserver -f
|
||||
```
|
||||
|
||||
### 查看扩缩容事件
|
||||
|
||||
```bash
|
||||
# 查看 HPA 事件
|
||||
kubectl describe hpa -n <namespace>
|
||||
|
||||
# 查看 Pod 事件
|
||||
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
### 在 Prometheus 中查询 KEDA 指标
|
||||
|
||||
访问 https://prometheus.u6.net3w.com,查询:
|
||||
|
||||
```promql
|
||||
# KEDA Scaler 活跃状态
|
||||
keda_scaler_active
|
||||
|
||||
# KEDA Scaler 错误
|
||||
keda_scaler_errors_total
|
||||
|
||||
# 当前指标值
|
||||
keda_scaler_metrics_value
|
||||
```
|
||||
|
||||
### 在 Grafana 中查看 KEDA 仪表板
|
||||
|
||||
1. 访问 https://grafana.u6.net3w.com
|
||||
2. 导入 KEDA 官方仪表板 ID: **14691**
|
||||
3. 查看实时扩缩容状态
|
||||
|
||||
## 测试自动扩缩容
|
||||
|
||||
### 测试 Navigation 服务
|
||||
|
||||
**测试缩容到 0:**
|
||||
|
||||
```bash
|
||||
# 1. 停止访问导航页面,等待 3 分钟
|
||||
sleep 180
|
||||
|
||||
# 2. 检查副本数
|
||||
kubectl get deployment navigation -n navigation
|
||||
|
||||
# 预期输出:READY 0/0
|
||||
```
|
||||
|
||||
**测试从 0 扩容:**
|
||||
|
||||
```bash
|
||||
# 1. 访问导航页面
|
||||
curl https://dh.u6.net3w.com
|
||||
|
||||
# 2. 监控副本数变化
|
||||
kubectl get deployment navigation -n navigation -w
|
||||
|
||||
# 预期:副本数从 0 变为 1(约 10-30 秒)
|
||||
```
|
||||
|
||||
### 测试 Redis 服务
|
||||
|
||||
**测试基于连接数扩容:**
|
||||
|
||||
```bash
|
||||
# 1. 连接 Redis
|
||||
kubectl run redis-client --rm -it --image=redis:7-alpine -- redis-cli -h redis.redis.svc.cluster.local
|
||||
|
||||
# 2. 在另一个终端监控
|
||||
kubectl get deployment redis -n redis -w
|
||||
|
||||
# 预期:有连接时副本数从 0 变为 1
|
||||
```
|
||||
|
||||
### 测试 PostgreSQL 服务
|
||||
|
||||
**测试基于连接数扩容:**
|
||||
|
||||
```bash
|
||||
# 1. 创建多个数据库连接
|
||||
for i in {1..15}; do
|
||||
kubectl run pg-client-$i --image=postgres:16-alpine --restart=Never -- \
|
||||
psql -h postgresql-service.postgresql.svc.cluster.local -U postgres -c "SELECT pg_sleep(60);" &
|
||||
done
|
||||
|
||||
# 2. 监控副本数
|
||||
kubectl get statefulset postgresql -n postgresql -w
|
||||
|
||||
# 预期:连接数超过 10 时,副本数从 1 增加到 2
|
||||
```
|
||||
|
||||
## 故障排查
|
||||
|
||||
### ScaledObject 未生效
|
||||
|
||||
**检查 ScaledObject 状态:**
|
||||
|
||||
```bash
|
||||
kubectl describe scaledobject <name> -n <namespace>
|
||||
```
|
||||
|
||||
**常见问题:**
|
||||
|
||||
1. **Deployment 设置了固定 replicas**
|
||||
- 解决:移除 Deployment 中的 `replicas` 字段
|
||||
|
||||
2. **缺少 resources.requests**
|
||||
- 解决:为容器添加 `resources.requests` 配置
|
||||
|
||||
3. **Prometheus 查询错误**
|
||||
- 解决:在 Prometheus UI 中测试查询语句
|
||||
|
||||
### 服务无法缩容到 0
|
||||
|
||||
**可能原因:**
|
||||
|
||||
1. **仍有活跃连接或请求**
|
||||
- 检查:查看 Prometheus 指标值
|
||||
|
||||
2. **cooldownPeriod 未到**
|
||||
- 检查:等待冷却期结束
|
||||
|
||||
3. **minReplicaCount 设置错误**
|
||||
- 检查:确认 `minReplicaCount: 0`
|
||||
|
||||
### 扩容速度慢
|
||||
|
||||
**优化建议:**
|
||||
|
||||
1. **减少 pollingInterval**
|
||||
```yaml
|
||||
pollingInterval: 15 # 从 30 秒改为 15 秒
|
||||
```
|
||||
|
||||
2. **降低 threshold**
|
||||
```yaml
|
||||
threshold: "5" # 降低触发阈值
|
||||
```
|
||||
|
||||
3. **使用多个触发器**
|
||||
```yaml
|
||||
triggers:
|
||||
- type: prometheus
|
||||
# ...
|
||||
- type: cpu
|
||||
# ...
|
||||
```
|
||||
|
||||
## 最佳实践
|
||||
|
||||
### 1. 合理设置副本数范围
|
||||
|
||||
- **无状态服务**:`minReplicaCount: 0`,节省资源
|
||||
- **有状态服务**:`minReplicaCount: 1`,保证可用性
|
||||
- **关键服务**:`minReplicaCount: 2`,保证高可用
|
||||
|
||||
### 2. 选择合适的冷却期
|
||||
|
||||
- **快速响应服务**:`cooldownPeriod: 60-180`(1-3 分钟)
|
||||
- **一般服务**:`cooldownPeriod: 300`(5 分钟)
|
||||
- **数据库服务**:`cooldownPeriod: 600-900`(10-15 分钟)
|
||||
|
||||
### 3. 监控扩缩容行为
|
||||
|
||||
- 定期查看 Grafana 仪表板
|
||||
- 设置告警规则
|
||||
- 分析扩缩容历史
|
||||
|
||||
### 4. 测试冷启动时间
|
||||
|
||||
- 测量从 0 扩容到可用的时间
|
||||
- 优化镜像大小和启动脚本
|
||||
- 考虑使用 `minReplicaCount: 1` 避免冷启动
|
||||
|
||||
## 配置参考
|
||||
|
||||
### ScaledObject 完整配置示例
|
||||
|
||||
```yaml
|
||||
apiVersion: keda.sh/v1alpha1
|
||||
kind: ScaledObject
|
||||
metadata:
|
||||
name: example-scaler
|
||||
namespace: example
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
name: example-deployment
|
||||
kind: Deployment # 可选:Deployment, StatefulSet
|
||||
apiVersion: apps/v1 # 可选
|
||||
minReplicaCount: 0 # 最小副本数
|
||||
maxReplicaCount: 10 # 最大副本数
|
||||
pollingInterval: 30 # 轮询间隔(秒)
|
||||
cooldownPeriod: 300 # 缩容冷却期(秒)
|
||||
idleReplicaCount: 0 # 空闲时的副本数
|
||||
fallback: # 故障回退配置
|
||||
failureThreshold: 3
|
||||
replicas: 2
|
||||
advanced: # 高级配置
|
||||
restoreToOriginalReplicaCount: false
|
||||
horizontalPodAutoscalerConfig:
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Percent
|
||||
value: 50
|
||||
periodSeconds: 60
|
||||
triggers:
|
||||
- type: prometheus
|
||||
metadata:
|
||||
serverAddress: http://prometheus:9090
|
||||
metricName: custom_metric
|
||||
query: sum(rate(metric[1m]))
|
||||
threshold: "100"
|
||||
```
|
||||
|
||||
## 卸载 KEDA
|
||||
|
||||
```bash
|
||||
# 删除所有 ScaledObject
|
||||
kubectl delete scaledobject --all -A
|
||||
|
||||
# 卸载 KEDA
|
||||
helm uninstall keda -n keda
|
||||
|
||||
# 删除命名空间
|
||||
kubectl delete namespace keda
|
||||
```
|
||||
|
||||
## 参考资源
|
||||
|
||||
- KEDA 官方文档: https://keda.sh/docs/
|
||||
- KEDA Scalers: https://keda.sh/docs/scalers/
|
||||
- KEDA GitHub: https://github.com/kedacore/keda
|
||||
- Grafana 仪表板: https://grafana.com/grafana/dashboards/14691
|
||||
|
||||
---
|
||||
|
||||
**KEDA 让您的 K3s 集群更智能、更高效!** 🚀
|
||||
Reference in New Issue
Block a user