最近披露的 Linux 内核漏洞 CVE-2026-31431 被称为"Copy Fail",攻击面集中在 AF_ALG 套接字——Linux 内核 crypto API 的用户态接口。漏洞本身不在 Docker,但 Docker Engine v29.4.3 之前的默认 seccomp profile 允许容器创建 AF_ALG 套接字,等于给攻击者留了一扇门。v29.4.3 起,默认 profile 已将此 syscall 收紧。
下面拆解风险边界、检查方法和加固操作。
漏洞与 Docker 的交集在哪
CVE-2026-31431 的利用路径依赖 socket(AF_ALG, ...) 调用。在 Linux 内核中,AF_ALG 让用户态程序直接访问内核加密算法(AES、SHA 等),这本是性能优化设计,但也成了内核态漏洞的入口。
Docker Engine 的默认 seccomp profile 基于 Docker 官方维护的白名单。在 v29.4.3 之前,该白名单包含 socket(AF_ALG) 相关的系统调用,容器进程可以正常创建 crypto 套接字。这意味着:
- 容器内进程如果被攻陷,攻击者可以通过
AF_ALG触发内核漏洞,实现从容器到宿主机内核的越权。 - 漏洞不是 Docker 自身的代码缺陷,而是默认策略过于宽松,没有阻断已知的危险 syscall 入口。
v29.4.3 的改动很直接:默认 seccomp profile 中移除了 AF_ALG 套接字的创建许可。新启动的容器不再能调用 socket(AF_ALG, ...)。
你是否受影响:快速判断
满足以下任一条件即安全:
- Docker Engine 版本 ≥ v29.4.3
- 使用了自定义 seccomp profile 且已屏蔽
AF_ALG - 使用了 AppArmor profile 且已限制
AF_ALG相关操作 - 容器以
--privileged运行——但注意,privileged 本身就是更大的风险敞口,不建议作为"解决方案"
先查版本:
# 查看 Docker Engine 版本
docker version --format '{{.Server.Version}}'
再查当前默认 seccomp profile 是否仍放行 AF_ALG:
# 导出当前 Docker 使用的默认 seccomp profile
docker info --format '{{.SecurityOptions}}' | tr ',' '\n'
# 如果输出包含 seccomp=unconfined,说明完全没有 seccomp 限制——高风险
# 更直接:获取默认 profile 内容并搜索 AF_ALG
wget -q https://raw.githubusercontent.com/moby/moby/v29.4.3/profiles/seccomp/default.json -O /tmp/seccomp-default-v2943.json
grep -i "alg" /tmp/seccomp-default-v2943.json || echo "未找到 AF_ALG 相关条目——已修复"
# 对比旧版本 profile(v28.x)
wget -q https://raw.githubusercontent.com/moby/moby/v28.0.0/profiles/seccomp/default.json -O /tmp/seccomp-default-v28.json
grep -i "alg" /tmp/seccomp-default-v28.json | head -5
如果旧版 profile 输出中出现了 AF_ALG 或 socket 相关的 alg 条目,说明你的容器存在暴露面。
加固实操:三种路径
1. 升级 Docker Engine(推荐)
最干净的方式:
# Ubuntu/Debian 示例
sudo apt-get update
sudo apt-get install docker-ce=29.4.3-*
# 升级后确认
docker version --format '{{.Server.Version}}'
# 应输出 29.4.3 或更高
升级后已运行的容器不会自动获得新 profile,需要重建:
# 列出所有运行中容器
docker ps --format '{{.ID}} {{.Image}} {{.Names}}'
# 逐个重建(示例:对一个容器)
docker stop my-container
docker rm my-container
docker run -d --name my-container my-image
# 新容器将使用 v29.4.3 的默认 seccomp profile
2. 自定义 seccomp profile 显式阻断 AF_ALG
如果暂时无法升级 Engine,可以用自定义 profile 收紧。核心改动:在 syscalls 白名单中排除 socket 对 AF_ALG(值为 38)的调用。
创建 /tmp/seccomp-no-afalg.json:
{
"defaultAction": "SCMP_ACT_ERRNO",
"defaultErrnoRet": 1,
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86_32"],
"syscalls": [
{
"names": [
"accept", "arch_prctl", "bind", "brk", "capget",
"capset", "chdir", "chmod", "chown", "chroot",
"close", "connect", "dup", "dup2", "dup3",
"epoll_create", "epoll_create1", "epoll_ctl",
"epoll_wait", "eventfd", "eventfd2", "execve",
"exit", "exit_group", "faccessat", "fadvise64",
"fallocate", "fchmod", "fchmodat", "fchown",
"fchownat", "fcntl", "fdatasync", "fgetxattr",
"flistxattr", "flock", "fork", "fremovexattr",
"fsetxattr", "fstat", "fstatfs", "fsync",
"ftruncate", "futex", "getcwd", "getdents",
"getdents64", "getegid", "geteuid", "getgid",
"getgroups", "getpeername", "getpgrp", "getpid",
"getppid", "getpriority", "getresgid",
"getresuid", "getrlimit", "getsockname",
"getsockopt", "gettid", "getuid", "ioctl",
"keyctl", "lseek", "lstat", "madvise",
"mincore", "mkdir", "mkdirat", "mknod",
"mknodat", "mlock", "mlockall", "mmap",
"mprotect", "mremap", "msync", "munlock",
"munlockall", "munmap", "nanosleep", "newfstatat",
"open", "openat", "pause", "pipe", "pipe2",
"poll", "prctl", "pread64", "preadv",
"prlimit64", "pwrite64", "pwritev", "read",
"readahead", "readlink", "readlinkat", "readv",
"recv", "recvfrom", "recvmmsg", "recvmsg",
"rename", "renameat", "renameat2", "restart_syscall",
"rmdir", "rt_sigaction", "rt_sigprocmask",
"rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait",
"sched_getaffinity", "sched_yield",
"seccomp", "select", "send", "sendfile",
"sendmmsg", "sendmsg", "sendto", "set_robust_list",
"set_tid_address", "setdomainname", "setfsgid",
"setfsuid", "setgid", "setgroups", "sethostname",
"setpriority", "setresgid", "setresuid",
"setrlimit", "setsid", "setsockopt", "setuid",
"shutdown", "sigaltstack", "socketpair",
"splice", "stat", "statfs", "symlink",
"symlinkat", "sync", "syncfs", "sysinfo",
"tee", "tgkill", "timer_create",
"timer_delete", "timer_getoverrun",
"timer_gettime", "timer_settime",
"timerfd_create", "timerfd_gettime",
"timerfd_settime", "times", "tkill", "truncate",
"umask", "uname", "unlink", "unlinkat",
"unshare", "utime", "utimensat", "utimes",
"vfork", "vmsplice", "wait4", "waitid",
"write", "writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
关键点:上面的白名单中没有 socket。这过于严格——大多数容器需要 socket 创建普通网络套接字。更精确的做法是允许 socket 但限制 address family:
{
"defaultAction": "SCMP_ACT_ERRNO",
"defaultErrnoRet": 1,
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": ["socket"],
"action": "SCMP_ACT_ALLOW",
"args": [
{
"index": 0,
"op": "SCMP_CMP_NE",
"value": 38,
"valueTwo": 0
}
],
"comment": "允许所有 socket 调用,但 AF_ALG(38) 被阻断"
},
{
"names": [
"accept", "bind", "connect", "close", "read",
"write", "exit", "exit_group", "fcntl", "fstat",
"lseek", "mmap", "mprotect", "munmap", "open",
"openat", "poll", "readv", "recvfrom", "recvmsg",
"sendmsg", "sendto", "sigaltstack", "socketpair",
"stat", "writev", "dup", "dup2", "dup3",
"epoll_create1", "epoll_ctl", "epoll_wait",
"fork", "vfork", "clone", "execve",
"getpid", "getppid", "getuid", "getgid",
"geteuid", "getegid", "getgroups",
"setuid", "setgid", "setgroups",
"prctl", "seccomp", "rt_sigaction",
"rt_sigprocmask", "access", "chdir",
"chmod", "chown", "fchmod", "fchown",
"lstat", "mkdir", "rmdir", "unlink",
"rename", "link", "symlink", "readlink",
"umask", "uname", "sysinfo", "times",
"getrlimit", "setrlimit", "getdents64",
"getcwd", "nanosleep", "clock_gettime",
"clock_nanosleep", "pipe", "pipe2",
"select", "pselect6", "pause",
"sigsuspend", "sigwaitinfo", "sigtimedwait",
"timer_create", "timer_settime",
"timer_gettime", "timer_delete",
"timerfd_create", "timerfd_settime",
"timerfd_gettime", "gettimeofday",
"settimeofday", "getresuid", "getresgid",
"setresuid", "setresgid", "kill", "tgkill",
"tkill", "raise", "signal", "sigaction"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
args 条目中 index: 0 指向 socket() 的第一个参数 domain,SCMP_CMP_NE + value: 38 表示"domain 不等于 38(AF_ALG)时允许"。这样容器仍可创建 AF_INET、AF_UNIX 等正常套接字,唯独 AF_ALG 被拦截。
使用方式:
# 用自定义 profile 启动容器
docker run -d \
--security-opt seccomp=/tmp/seccomp-no-afalg.json \
--name my-hardened-container \
my-image
# 验证:进入容器尝试创建 AF_ALG 套接字
docker exec my-hardened-container python3 -c "
import socket
try:
s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET)
print('AF_ALG 创建成功——profile 未生效!')
except OSError as e:
print(f'AF_ALG 被阻断: {e}')
"
# 期望输出: AF_ALG 被阻断: [Errno 1] Operation not permitted
3. AppArmor 补充限制
如果宿主机启用了 AppArmor,可以叠加一层:
# 查看是否启用
docker info --format '{{.SecurityOptions}}' | grep apparmor
# 创建 AppArmor profile(/etc/apparmor.d/docker-no-afalg)
cat > /etc/apparmor.d/docker-no-afalg << 'EOF'
#include <tunables/global>
profile docker-no-afalg flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/base>
network inet,
network inet6,
network unix,
network icmp,
deny network alg,
/** rw,
/proc/** r,
}
EOF
# 加载 profile
sudo apparmor_parser -r /etc/apparmor.d/docker-no-afalg
# 启动容器时指定
docker run -d \
--security-opt seccomp=/tmp/seccomp-no-afalg.json \
--security-opt apparmor=docker-no-afalg \
--name my-double-hardened-container \
my-image
deny network alg 直接在 AppArmor 层阻断 AF_ALG 网络操作,与 seccomp 形成双重保险。
需要注意的边界
- 已运行的容器不受影响:seccomp profile 在容器创建时绑定,升级 Engine 或修改 profile 后,旧容器仍用旧规则。必须重建容器。
--privileged模式绕过一切:privileged 容器不受 seccomp 和 AppArmor 限制,AF_ALG完全放行。如果你的工作负载必须 privileged,内核补丁是唯一的防线。- 内核补丁才是根本:Docker 的 seccomp 限制只是缩小攻击面,真正修复 CVE-2026-31431 需要内核升级。检查宿主机内核版本并跟进发行版的安全更新。
- 容器内确实需要 crypto 的场景:如果应用依赖
AF_ALG做硬件加速加密(如 IPSec、TLS offload),阻断后性能会回退到用户态实现。这类场景应优先升级内核,而非依赖 seccomp 阻断。
检查清单
| 检查项 | 命令 / 操作 | 安全标准 |
|---|---|---|
| Docker Engine 版本 | docker version --format '{{.Server.Version}}' |
≥ 29.4.3 |
| seccomp 是否启用 | docker info --format '{{.SecurityOptions}}' |
包含 seccomp=default,不含 unconfined |
| 容器是否用旧 profile | docker inspect <container> --format '{{.HostConfig.SecurityOpt}}' |
无 seccomp=unconfined,或指定了收紧的 profile |
| 容器内 AF_ALG 可用性 | 上文 Python 验证脚本 | 创建失败,返回 EPERM |
| 宿主机内核版本 | uname -r |
已包含 CVE-2026-31431 补丁 |
| AppArmor 状态 | docker info | grep apparmor |
启用且含 deny network alg |
总结一句话:升级 Engine 到 v29.4.3+ 并重建容器,同时跟进内核补丁,两层都到位才算真正闭环。seccomp 阻断是快速止血,内核修复是根治。