Compare commits

...

2 Commits

Author SHA1 Message Date
douxu 64b6562784 docs: overhaul deploy.md cleanup and pg verification sections
- add pg connection verification commands (pg_isready, psql queries)
  - renumber pg subsections (4.4.2→4.4.5) to accommodate new section
  - remove MongoDB deploy section (section 4.5) from modelRT deploy guide
  - remove MongoDB SSH tunnel port-forward entries (27017/30017)
  - rewrite section 8 cleanup guide: split into local Docker, local run,
    and K8s(Minikube) categories with scale-down and full-delete options
  - add one-liner kubectl delete -f deploy/k8s/ for full teardown
2026-06-10 16:42:29 +08:00
douxu 05c64dda14 chore: add imagePullPolicy and migrate WaitGroup to wg.Go
- add imagePullPolicy: IfNotPresent to all k8s Deployments, DaemonSet
    (grafana, jaeger, loki, rabbitmq, redis, promtail)
  - migrate wg.Add(1)/go/defer wg.Done() pattern to wg.Go() (Go 1.25+)
    in logger/loki_syncer.go and task/worker.go
  - simplify redundant map existence check before delete in diagram/graph.go
  - update deploy.md to reflect pg PVC size (6Gi) and resource limits
2026-06-10 16:40:50 +08:00
12 changed files with 146 additions and 83 deletions

5
.gitignore vendored
View File

@ -22,7 +22,7 @@
go.work
.vscode
.idea
.idea
# Shield all log files in the log folder
/log/
# Shield config files in the configs folder
@ -32,6 +32,7 @@ go.work
# ai config
.cursor/
.claude/
.codewhale/
.cursorrules
.copilot/
.chatgpt/
@ -39,4 +40,4 @@ go.work
.vector_cache/
ai-debug.log
*.patch
*.diff
*.diff

View File

@ -695,7 +695,11 @@ kubectl apply -f deploy/k8s/pg-service.yaml
| **数据库** | `demo` | ConfigMap 中 `POSTGRES_DB` |
| **用户名** | `postgres` | ConfigMap 中 `POSTGRES_USER` |
| **密码** | `coslight` | ConfigMap `postgres-config` 中配置,生产环境迁移至 Secret |
| **存储** | `2Gi` | PVC `postgres-data` |
| **存储** | `6Gi` | PVC `postgres-data` |
| **CPU** | `100m` 请求 / `500m` 上限 | StatefulSet `resources` 字段 |
| **内存** | `256Mi` 请求 / `512Mi` 上限 | StatefulSet `resources` 字段 |
> **注意:** 密码当前以明文形式存储在 `pg-configmap.yaml` 中,生产环境应将其迁移至 K8s Secret并通过环境变量注入容器避免将明文密码提交至版本库。
##### 4.4.1 等待 Pod 就绪
@ -703,7 +707,23 @@ kubectl apply -f deploy/k8s/pg-service.yaml
kubectl wait --for=condition=ready pod -l app=postgres --timeout=120s
```
##### 4.4.2 初始化异步任务表
##### 4.4.2 连接验证
```bash
# 快速检查 PostgreSQL 是否接受连接
kubectl exec -it $(kubectl get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}') \
-- pg_isready -U postgres -d demo
# 进入 psql 执行简单查询确认数据库可用
kubectl exec -it $(kubectl get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}') \
-- psql -U postgres -d demo -c "SELECT current_database(), version();"
# 列出所有数据库(确认 demo 库已创建)
kubectl exec -it $(kubectl get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}') \
-- psql -U postgres -c "\l"
```
##### 4.4.3 初始化异步任务表
PostgreSQL 就绪后执行 1.4 节的建表 SQL可通过以下方式进入容器执行
@ -717,14 +737,14 @@ kubectl exec -i $(kubectl get pod -l app=postgres -o jsonpath='{.items[0].metada
-- psql -U postgres -d demo < /path/to/init.sql
```
##### 4.4.3 状态检查
##### 4.4.4 状态检查
```bash
kubectl get pods -l app=postgres
kubectl logs -l app=postgres --tail=30
```
##### 4.4.4 清理
##### 4.4.5 清理
```bash
kubectl delete -f deploy/k8s/pg-service.yaml \
@ -733,54 +753,6 @@ kubectl delete -f deploy/k8s/pg-service.yaml \
-f deploy/k8s/pg-configmap.yaml
```
#### 4.5 部署 MongoDB
```bash
kubectl apply -f deploy/k8s/mongodb-secret.yaml
kubectl apply -f deploy/k8s/mongodb-pvc.yaml
kubectl apply -f deploy/k8s/mongodb-statefulset.yaml
kubectl apply -f deploy/k8s/mongodb-service.yaml
```
| 参数 | 值 | 说明 |
| :--- | :--- | :--- |
| **镜像** | `mongo:7.0` | MongoDB 7.0 |
| **NodePort** | `30017` | 集群外访问端口 |
| **用户名** | `admin` | Root 管理员 |
| **密码** | `coslight` | Secret `mongodb-secret` 中配置,生产环境请替换强密码 |
| **存储** | `2Gi` | PVC `mongodb-data` |
> **注意:** 密码存储在 `mongodb-secret.yaml``stringData` 中,生产环境应替换为强密码,并避免将明文密码提交至版本库。
##### 4.5.1 等待 Pod 就绪
```bash
kubectl wait --for=condition=ready pod -l app=mongodb --timeout=120s
```
##### 4.5.2 连接验证
```bash
kubectl exec -it $(kubectl get pod -l app=mongodb -o jsonpath='{.items[0].metadata.name}') \
-- mongosh -u admin -p coslight --authenticationDatabase admin
```
##### 4.5.3 状态检查
```bash
kubectl get pods -l app=mongodb
kubectl logs -l app=mongodb --tail=30
```
##### 4.5.4 清理
```bash
kubectl delete -f deploy/k8s/mongodb-service.yaml \
-f deploy/k8s/mongodb-statefulset.yaml \
-f deploy/k8s/mongodb-pvc.yaml \
-f deploy/k8s/mongodb-secret.yaml
```
### 5\. 部署 ModelRTKubernetes
所有资源部署在 `default` 命名空间YAML 文件位于 `deploy/k8s/`
@ -1008,7 +980,6 @@ Mac 本地端口 ──SSH隧道──▶ Ubuntu 宿主机 (192.168.1.101)
```bash
ssh -L 5432:192.168.49.2:30432 \
-L 27017:192.168.49.2:30017 \
-L 5671:192.168.49.2:30671 \
-L 15671:192.168.49.2:31671 \
-L 6379:192.168.49.2:30001 \
@ -1024,7 +995,6 @@ ssh -L 5432:192.168.49.2:30432 \
```bash
ssh -fN \
-L 5432:192.168.49.2:30432 \
-L 27017:192.168.49.2:30017 \
-L 5671:192.168.49.2:30671 \
-L 15671:192.168.49.2:31671 \
-L 6379:192.168.49.2:30001 \
@ -1040,7 +1010,6 @@ ssh -fN \
| Mac 本地端口 | Minikube NodePort | 服务 | 说明 |
| :--- | :--- | :--- | :--- |
| `5432` | `30432` | PostgreSQL | 数据库连接 `localhost:5432` |
| `27017` | `30017` | MongoDB | 数据库连接 `localhost:27017` |
| `5671` | `30671` | RabbitMQ AMQP | ModelRT / EventRT 消息队列连接 |
| `15671` | `31671` | RabbitMQ Management | RabbitMQ 管理界面 `http://localhost:15671` |
| `6379` | `30001` | Redis | 分布式锁 / 数据存储 |
@ -1064,14 +1033,111 @@ kill <PID>
### 8\. 后续操作(停止与清理)
#### 8.1 停止容器
#### 8.1 本地 Docker 部署清理
适用于第 1、2 节使用 `docker run` 启动的 PostgreSQL 和 Redis 容器。
```bash
# 停止容器
docker stop postgres redis
```
#### 8.2 删除容器(删除后数据将丢失)
```bash
# 删除容器(容器内数据将同步丢失)
docker rm postgres redis
```
#### 8.2 本地运行清理
适用于第 3 节以 `go run` 或编译后二进制方式在本地启动的 ModelRT 服务。
前台运行时直接 `Ctrl+C` 终止;后台运行时查找并终止进程:
```bash
# 终止 go run 启动的进程
pkill -f "go run main.go"
# 或终止编译后的二进制进程
pkill model-rt
```
#### 8.3 K8s(Minikube) 部署清理
适用于第 4、5、6 节在 Minikube 中部署的所有资源。
##### 8.3.1 分服务清理
**仅停止(缩容至 0PVC 数据保留)**
将所有 Deployment 和 StatefulSet 缩容至 0 副本Pod 停止运行但持久卷数据不删除,之后可直接缩容回 1 恢复服务。
```bash
# 停止所有 DeploymentRedis / RabbitMQ / ModelRT / Jaeger / Loki / Grafana
kubectl scale deployment --all --replicas=0
# 停止所有 StatefulSetPostgreSQLPVC 数据保留)
kubectl scale statefulset --all --replicas=0
```
恢复时:
```bash
kubectl scale deployment --all --replicas=1
kubectl scale statefulset --all --replicas=1
```
> **注意:** DaemonSetPromtail无法通过 `scale` 停止,如需停用可手动删除其资源:`kubectl delete -f deploy/k8s/promtail-daemonset.yaml`。
---
**永久清理(删除所有资源,包含 PVC数据不可恢复**
按部署顺序反向删除各服务资源:
```bash
# 可观测性栈Grafana / Promtail / Loki / Jaeger
kubectl delete -f deploy/k8s/grafana-service.yaml \
-f deploy/k8s/grafana-deployment.yaml \
-f deploy/k8s/grafana-configmap.yaml \
-f deploy/k8s/promtail-daemonset.yaml \
-f deploy/k8s/promtail-configmap.yaml \
-f deploy/k8s/promtail-rbac.yaml \
-f deploy/k8s/loki-service.yaml \
-f deploy/k8s/loki-deployment.yaml \
-f deploy/k8s/loki-pvc.yaml \
-f deploy/k8s/loki-configmap.yaml \
-f deploy/k8s/jaeger-service.yaml \
-f deploy/k8s/jaeger-deployment.yaml
# ModelRT 应用
kubectl delete -f deploy/k8s/modelrt-service.yaml \
-f deploy/k8s/modelrt-deployment.yaml \
-f deploy/k8s/modelrt-configmap.yaml \
-f deploy/k8s/modelrt-secret.yaml
kubectl delete secret modelrt-certs
# PostgreSQL
kubectl delete -f deploy/k8s/pg-service.yaml \
-f deploy/k8s/pg-statefulset.yaml \
-f deploy/k8s/pg-pvc.yaml \
-f deploy/k8s/pg-configmap.yaml
# RabbitMQ
kubectl delete -f deploy/k8s/rabbitmq-service.yaml \
-f deploy/k8s/rabbitmq-deployment.yaml \
-f deploy/k8s/rabbitmq-users-config.yaml \
-f deploy/k8s/rabbitmq-config.yaml \
-f deploy/k8s/rabbitmq-secret.yaml
kubectl delete secret rabbitmq-certs
# Redis
kubectl delete -f deploy/k8s/redis-service.yaml \
-f deploy/k8s/redis-deployment.yaml
```
##### 8.3.2 一键清理
> **注意:** 此操作会删除 `deploy/k8s/` 下所有 YAML 对应的 K8s 资源,包括 PVC**持久化数据将永久丢失**,请确认后执行。
```bash
kubectl delete -f deploy/k8s/
kubectl delete secret rabbitmq-certs modelrt-certs
```

View File

@ -16,6 +16,7 @@ spec:
containers:
- name: grafana
image: grafana/grafana:10.4.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
env:

View File

@ -15,6 +15,7 @@ spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:1.56
imagePullPolicy: IfNotPresent
env:
- name: COLLECTOR_OTLP_ENABLED
value: "true"

View File

@ -20,6 +20,7 @@ spec:
containers:
- name: loki
image: grafana/loki:2.9.4
imagePullPolicy: IfNotPresent
args:
- -config.file=/etc/loki/loki.yaml
ports:

View File

@ -34,9 +34,9 @@ spec:
- mongosh
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 10
failureThreshold: 12
livenessProbe:
exec:
@ -44,10 +44,10 @@ spec:
- mongosh
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 30
periodSeconds: 20
timeoutSeconds: 3
failureThreshold: 3
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 5
resources:
requests:
cpu: 100m

View File

@ -19,6 +19,7 @@ spec:
containers:
- name: promtail
image: grafana/promtail:2.9.4
imagePullPolicy: IfNotPresent
args:
- -config.file=/etc/promtail/promtail.yaml
ports:

View File

@ -15,6 +15,7 @@ spec:
containers:
- name: rabbitmq
image: rabbitmq:4.1.1-management-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4369
- containerPort: 5671

View File

@ -15,6 +15,7 @@ spec:
containers:
- name: redis
image: redis/redis-stack-server:latest
imagePullPolicy: IfNotPresent
resources:
limits:
memory: "128Mi"

View File

@ -65,9 +65,7 @@ func (g *Graph) AddEdge(from, to uuid.UUID) {
// 创建新的拓扑信息时,如果被链接的点已经存在于游离节点中
// 则将其移除
if _, exist := g.FreeVertexs[toKey]; exist {
delete(g.FreeVertexs, toKey)
}
delete(g.FreeVertexs, toKey)
}
// DelNode delete a node to the graph

View File

@ -47,8 +47,7 @@ func newLokiSyncer(lCfg config.LokiConfig) *lokiSyncer {
client: &http.Client{Timeout: 5 * time.Second},
ch: make(chan string, 512),
}
ls.wg.Add(1)
go ls.run()
ls.wg.Go(ls.run)
return ls
}
@ -70,7 +69,6 @@ func (ls *lokiSyncer) Sync() error {
}
func (ls *lokiSyncer) run() {
defer ls.wg.Done()
ticker := time.NewTicker(2 * time.Second)
defer ticker.Stop()

View File

@ -185,13 +185,11 @@ func (w *TaskWorker) Start() error {
// Start multiple consumers for better throughput
for i := 0; i < w.cfg.QueueConsumerCount; i++ {
w.wg.Add(1)
go w.consumerLoop(i)
w.wg.Go(func() { w.consumerLoop(i) })
}
// Start health check goroutine
w.wg.Add(1)
go w.healthCheckLoop()
w.wg.Go(w.healthCheckLoop)
logger.Info(w.ctx, "task worker started successfully")
return nil
@ -199,8 +197,6 @@ func (w *TaskWorker) Start() error {
// consumerLoop runs a single RabbitMQ consumer
func (w *TaskWorker) consumerLoop(consumerID int) {
defer w.wg.Done()
logger.Info(w.ctx, "starting consumer", "consumer_id", consumerID)
// Consume messages from the queue
@ -478,8 +474,6 @@ func (w *TaskWorker) dispatch(ctx context.Context, taskType TaskType, taskID uui
// healthCheckLoop periodically checks worker health and metrics
func (w *TaskWorker) healthCheckLoop() {
defer w.wg.Done()
ticker := time.NewTicker(w.cfg.PollingInterval)
defer ticker.Stop()