Docker Engine 修复 CVE-2026-31431("Copy Fail"):AF_ALG 套接字默认不再放行

2026-05-27 13 预计阅读时间:1 分钟
来源:docker.com AI 摘要 原文链接

免责声明:本文为 AI 摘要整理,建议结合原文阅读。摘要可能省略上下文、版本差异或边界条件,不作为官方说明。

预计阅读时间:13 分钟

最近披露的 Linux 内核漏洞 CVE-2026-31431 被称为"Copy Fail",攻击面集中在 AF_ALG 套接字——Linux 内核 crypto API 的用户态接口。漏洞本身不在 Docker,但 Docker Engine v29.4.3 之前的默认 seccomp profile 允许容器创建 AF_ALG 套接字,等于给攻击者留了一扇门。v29.4.3 起,默认 profile 已将此 syscall 收紧。

下面拆解风险边界、检查方法和加固操作。

漏洞与 Docker 的交集在哪

CVE-2026-31431 的利用路径依赖 socket(AF_ALG, ...) 调用。在 Linux 内核中,AF_ALG 让用户态程序直接访问内核加密算法(AES、SHA 等),这本是性能优化设计,但也成了内核态漏洞的入口。

Docker Engine 的默认 seccomp profile 基于 Docker 官方维护的白名单。在 v29.4.3 之前,该白名单包含 socket(AF_ALG) 相关的系统调用,容器进程可以正常创建 crypto 套接字。这意味着:

  • 容器内进程如果被攻陷,攻击者可以通过 AF_ALG 触发内核漏洞,实现从容器到宿主机内核的越权。
  • 漏洞不是 Docker 自身的代码缺陷,而是默认策略过于宽松,没有阻断已知的危险 syscall 入口。

v29.4.3 的改动很直接:默认 seccomp profile 中移除了 AF_ALG 套接字的创建许可。新启动的容器不再能调用 socket(AF_ALG, ...)

你是否受影响:快速判断

满足以下任一条件即安全:

  1. Docker Engine 版本 ≥ v29.4.3
  2. 使用了自定义 seccomp profile 且已屏蔽 AF_ALG
  3. 使用了 AppArmor profile 且已限制 AF_ALG 相关操作
  4. 容器以 --privileged 运行——但注意,privileged 本身就是更大的风险敞口,不建议作为"解决方案"

先查版本:

# 查看 Docker Engine 版本
docker version --format '{{.Server.Version}}'

再查当前默认 seccomp profile 是否仍放行 AF_ALG

# 导出当前 Docker 使用的默认 seccomp profile
docker info --format '{{.SecurityOptions}}' | tr ',' '\n'
# 如果输出包含 seccomp=unconfined,说明完全没有 seccomp 限制——高风险

# 更直接:获取默认 profile 内容并搜索 AF_ALG
wget -q https://raw.githubusercontent.com/moby/moby/v29.4.3/profiles/seccomp/default.json -O /tmp/seccomp-default-v2943.json
grep -i "alg" /tmp/seccomp-default-v2943.json || echo "未找到 AF_ALG 相关条目——已修复"

# 对比旧版本 profile(v28.x)
wget -q https://raw.githubusercontent.com/moby/moby/v28.0.0/profiles/seccomp/default.json -O /tmp/seccomp-default-v28.json
grep -i "alg" /tmp/seccomp-default-v28.json | head -5

如果旧版 profile 输出中出现了 AF_ALGsocket 相关的 alg 条目,说明你的容器存在暴露面。

加固实操:三种路径

1. 升级 Docker Engine(推荐)

最干净的方式:

# Ubuntu/Debian 示例
sudo apt-get update
sudo apt-get install docker-ce=29.4.3-*

# 升级后确认
docker version --format '{{.Server.Version}}'
# 应输出 29.4.3 或更高

升级后已运行的容器不会自动获得新 profile,需要重建:

# 列出所有运行中容器
docker ps --format '{{.ID}} {{.Image}} {{.Names}}'

# 逐个重建(示例:对一个容器)
docker stop my-container
docker rm my-container
docker run -d --name my-container my-image
# 新容器将使用 v29.4.3 的默认 seccomp profile

2. 自定义 seccomp profile 显式阻断 AF_ALG

如果暂时无法升级 Engine,可以用自定义 profile 收紧。核心改动:在 syscalls 白名单中排除 socketAF_ALG(值为 38)的调用。

创建 /tmp/seccomp-no-afalg.json

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "defaultErrnoRet": 1,
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86_32"],
  "syscalls": [
    {
      "names": [
        "accept", "arch_prctl", "bind", "brk", "capget",
        "capset", "chdir", "chmod", "chown", "chroot",
        "close", "connect", "dup", "dup2", "dup3",
        "epoll_create", "epoll_create1", "epoll_ctl",
        "epoll_wait", "eventfd", "eventfd2", "execve",
        "exit", "exit_group", "faccessat", "fadvise64",
        "fallocate", "fchmod", "fchmodat", "fchown",
        "fchownat", "fcntl", "fdatasync", "fgetxattr",
        "flistxattr", "flock", "fork", "fremovexattr",
        "fsetxattr", "fstat", "fstatfs", "fsync",
        "ftruncate", "futex", "getcwd", "getdents",
        "getdents64", "getegid", "geteuid", "getgid",
        "getgroups", "getpeername", "getpgrp", "getpid",
        "getppid", "getpriority", "getresgid",
        "getresuid", "getrlimit", "getsockname",
        "getsockopt", "gettid", "getuid", "ioctl",
        "keyctl", "lseek", "lstat", "madvise",
        "mincore", "mkdir", "mkdirat", "mknod",
        "mknodat", "mlock", "mlockall", "mmap",
        "mprotect", "mremap", "msync", "munlock",
        "munlockall", "munmap", "nanosleep", "newfstatat",
        "open", "openat", "pause", "pipe", "pipe2",
        "poll", "prctl", "pread64", "preadv",
        "prlimit64", "pwrite64", "pwritev", "read",
        "readahead", "readlink", "readlinkat", "readv",
        "recv", "recvfrom", "recvmmsg", "recvmsg",
        "rename", "renameat", "renameat2", "restart_syscall",
        "rmdir", "rt_sigaction", "rt_sigprocmask",
        "rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait",
        "sched_getaffinity", "sched_yield",
        "seccomp", "select", "send", "sendfile",
        "sendmmsg", "sendmsg", "sendto", "set_robust_list",
        "set_tid_address", "setdomainname", "setfsgid",
        "setfsuid", "setgid", "setgroups", "sethostname",
        "setpriority", "setresgid", "setresuid",
        "setrlimit", "setsid", "setsockopt", "setuid",
        "shutdown", "sigaltstack", "socketpair",
        "splice", "stat", "statfs", "symlink",
        "symlinkat", "sync", "syncfs", "sysinfo",
        "tee", "tgkill", "timer_create",
        "timer_delete", "timer_getoverrun",
        "timer_gettime", "timer_settime",
        "timerfd_create", "timerfd_gettime",
        "timerfd_settime", "times", "tkill", "truncate",
        "umask", "uname", "unlink", "unlinkat",
        "unshare", "utime", "utimensat", "utimes",
        "vfork", "vmsplice", "wait4", "waitid",
        "write", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

关键点:上面的白名单中没有 socket。这过于严格——大多数容器需要 socket 创建普通网络套接字。更精确的做法是允许 socket 但限制 address family:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "defaultErrnoRet": 1,
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": ["socket"],
      "action": "SCMP_ACT_ALLOW",
      "args": [
        {
          "index": 0,
          "op": "SCMP_CMP_NE",
          "value": 38,
          "valueTwo": 0
        }
      ],
      "comment": "允许所有 socket 调用,但 AF_ALG(38) 被阻断"
    },
    {
      "names": [
        "accept", "bind", "connect", "close", "read",
        "write", "exit", "exit_group", "fcntl", "fstat",
        "lseek", "mmap", "mprotect", "munmap", "open",
        "openat", "poll", "readv", "recvfrom", "recvmsg",
        "sendmsg", "sendto", "sigaltstack", "socketpair",
        "stat", "writev", "dup", "dup2", "dup3",
        "epoll_create1", "epoll_ctl", "epoll_wait",
        "fork", "vfork", "clone", "execve",
        "getpid", "getppid", "getuid", "getgid",
        "geteuid", "getegid", "getgroups",
        "setuid", "setgid", "setgroups",
        "prctl", "seccomp", "rt_sigaction",
        "rt_sigprocmask", "access", "chdir",
        "chmod", "chown", "fchmod", "fchown",
        "lstat", "mkdir", "rmdir", "unlink",
        "rename", "link", "symlink", "readlink",
        "umask", "uname", "sysinfo", "times",
        "getrlimit", "setrlimit", "getdents64",
        "getcwd", "nanosleep", "clock_gettime",
        "clock_nanosleep", "pipe", "pipe2",
        "select", "pselect6", "pause",
        "sigsuspend", "sigwaitinfo", "sigtimedwait",
        "timer_create", "timer_settime",
        "timer_gettime", "timer_delete",
        "timerfd_create", "timerfd_settime",
        "timerfd_gettime", "gettimeofday",
        "settimeofday", "getresuid", "getresgid",
        "setresuid", "setresgid", "kill", "tgkill",
        "tkill", "raise", "signal", "sigaction"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

args 条目中 index: 0 指向 socket() 的第一个参数 domainSCMP_CMP_NE + value: 38 表示"domain 不等于 38(AF_ALG)时允许"。这样容器仍可创建 AF_INETAF_UNIX 等正常套接字,唯独 AF_ALG 被拦截。

使用方式:

# 用自定义 profile 启动容器
docker run -d \
  --security-opt seccomp=/tmp/seccomp-no-afalg.json \
  --name my-hardened-container \
  my-image

# 验证:进入容器尝试创建 AF_ALG 套接字
docker exec my-hardened-container python3 -c "
import socket
try:
    s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET)
    print('AF_ALG 创建成功——profile 未生效!')
except OSError as e:
    print(f'AF_ALG 被阻断: {e}')
"
# 期望输出: AF_ALG 被阻断: [Errno 1] Operation not permitted

3. AppArmor 补充限制

如果宿主机启用了 AppArmor,可以叠加一层:

# 查看是否启用
docker info --format '{{.SecurityOptions}}' | grep apparmor

# 创建 AppArmor profile(/etc/apparmor.d/docker-no-afalg)
cat > /etc/apparmor.d/docker-no-afalg << 'EOF'
#include <tunables/global>
profile docker-no-afalg flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>
  network inet,
  network inet6,
  network unix,
  network icmp,
  deny network alg,
  /** rw,
  /proc/** r,
}
EOF

# 加载 profile
sudo apparmor_parser -r /etc/apparmor.d/docker-no-afalg

# 启动容器时指定
docker run -d \
  --security-opt seccomp=/tmp/seccomp-no-afalg.json \
  --security-opt apparmor=docker-no-afalg \
  --name my-double-hardened-container \
  my-image

deny network alg 直接在 AppArmor 层阻断 AF_ALG 网络操作,与 seccomp 形成双重保险。

需要注意的边界

  • 已运行的容器不受影响:seccomp profile 在容器创建时绑定,升级 Engine 或修改 profile 后,旧容器仍用旧规则。必须重建容器。
  • --privileged 模式绕过一切:privileged 容器不受 seccomp 和 AppArmor 限制,AF_ALG 完全放行。如果你的工作负载必须 privileged,内核补丁是唯一的防线。
  • 内核补丁才是根本:Docker 的 seccomp 限制只是缩小攻击面,真正修复 CVE-2026-31431 需要内核升级。检查宿主机内核版本并跟进发行版的安全更新。
  • 容器内确实需要 crypto 的场景:如果应用依赖 AF_ALG 做硬件加速加密(如 IPSec、TLS offload),阻断后性能会回退到用户态实现。这类场景应优先升级内核,而非依赖 seccomp 阻断。

检查清单

检查项 命令 / 操作 安全标准
Docker Engine 版本 docker version --format '{{.Server.Version}}' ≥ 29.4.3
seccomp 是否启用 docker info --format '{{.SecurityOptions}}' 包含 seccomp=default,不含 unconfined
容器是否用旧 profile docker inspect <container> --format '{{.HostConfig.SecurityOpt}}' seccomp=unconfined,或指定了收紧的 profile
容器内 AF_ALG 可用性 上文 Python 验证脚本 创建失败,返回 EPERM
宿主机内核版本 uname -r 已包含 CVE-2026-31431 补丁
AppArmor 状态 docker info | grep apparmor 启用且含 deny network alg

总结一句话:升级 Engine 到 v29.4.3+ 并重建容器,同时跟进内核补丁,两层都到位才算真正闭环。seccomp 阻断是快速止血,内核修复是根治。


相关推荐