2018年5月31日 0 条评论 2.19k 次阅读 1 人点赞

开源镜像安全扫描项目 —— Clair分析

1. 前言

2. Clair 架构概览

3. Clair 源码分析

  • 3.1. 程序入口 main.go
  • 3.2. Boot 函数
  • 3.3. ProcessLayer 函数
  • 3.4. detectContent 函数
  • 3.5. detectFeatureVersions 函数

4. 实现自定义扫描需求的思路



1. 前言

Clair 是一款开源的 Docker 镜像安全扫描工具,具备对 Docker 镜像中存在的漏洞进行静态扫描的能力。

本文基于 Clair v2.0.3 Release (https://github.com/coreos/clair/archive/v2.0.3.zip) 版本源码进行分析。

本文的重点会放在 Clair 如何实现对 Docker 镜像进行静态扫描部分,并会考虑如何实现一些自定义的扫描需求。

2. Clair 架构概览

Clair 整体架构如下图所示:



Clair 提供一组 RESTFul API 接口,用于上传需要扫描的镜像 layer 文件,以及查询已入库的漏洞细节与漏洞修复建议。






调用 POST /v1/layers 接口时,启动 worker 对 layer 文件进行扫描。


3. Clair 源码分析

3.1. 程序入口 main.go

main.go 函数接收若干 Clair 运行参数,包括:

  • cpu-profile 参数: runtime/pprof 标准库记录 CPU Profile 的文件路径,默认不记录。
  • log-level 参数: 日志等级,默认为 info
  • insecure-tls 参数: 拉取镜像 layer 时,是否使用 tls 认证,默认为 false
  • config 参数: yaml 格式的配置文件路径,定义 database, api, worker, updater, notifier 等基本组件的行为,默认路径为 /etc/clair/config.yaml

一个 yaml 配置文件的示例如下:

# Copyright 2015 clair authors
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

# The values specified here are the default values that Clair uses if no configuration file is specified or if the keys are not defined.
  # 定义 Clair 使用的数据库
    # Database driver
    type: pgsql
      # PostgreSQL Connection string
      # https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING
      # source: host=localhost port=5432 user=postgres sslmode=disable statement_timeout=60000
      source: postgresql://postgres:passw0rd@

      # Number of elements kept in the cache
      # Values unlikely to change (e.g. namespaces) are cached in order to save prevent needless roundtrips to the database.
      cachesize: 16384

      # 32-bit URL-safe base64 key used to encrypt pagination tokens
      # If one is not provided, it will be generated.
      # Multiple clair instances in the same cluster need the same value.

  # 定义 Clair API 行为
    # v3 grpc/RESTful API server address
    addr: ""

    # Health server address
    # This is an unencrypted endpoint useful for load balancers to check to healthiness of the clair server.
    healthaddr: ""

    # Deadline before an API request will respond with a 503
    timeout: 900s

    # Optional PKI configuration
    # If you want to easily generate client certificates and CAs, try the following projects:
    # https://github.com/coreos/etcd-ca
    # https://github.com/cloudflare/cfssl

  # 定义对 layer 进行安全扫描的 worker 行为
      - os-release
      - lsb-release
      - apt-sources
      - alpine-release
      - redhat-release

      - apk
      - dpkg
      - rpm

  # 定义更新漏洞库数据的行为
    # Frequency the database will be updated with vulnerabilities from the default data sources
    # The value 0 disables the updater entirely.
    interval: 2h
      - debian
      - ubuntu
      - rhel
      - oracle
      - alpine

  # 定义通知组件的行为
    # Number of attempts before the notification is marked as failed to be sent
    attempts: 3

    # Duration before a failed notification is retried
    renotifyinterval: 2h

      # Optional endpoint that will receive notifications via POST requests

      # Optional PKI configuration
      # If you want to easily generate client certificates and CAs, try the following projects:
      # https://github.com/cloudflare/cfssl
      # https://github.com/coreos/etcd-ca

      # Optional HTTP Proxy: must be a valid URL (including the scheme).

在完成运行参数解析后,会调用 Boot 函数运行几个关键 goroutine

3.2. Boot 函数


func Boot(config *Config)

Boot 函数中启动了四个 goroutine,并通过一个全局的 stop chan 来控制各个 goroutine 的生命周期。

四个 goroutine 分别为:

  • api: 提供 Clair 的 RESTFul API 服务
  • api-healthcheck: api 的健康检查
  • notifier: 扫描到新漏洞时,通知用户的组件
  • updater: 定时从漏洞源更新漏洞数据的组件

其中 api-healthcheck notifier updater 不在本文的重点讨论范围内,不展开讨论了。

api goroutine 中运行了 Clair 提供服务的 RESTFul 接口,具体接口细节可以参考:https://coreos.com/clair/docs/latest/api_v1.html

其中,关键接口是 POST /v1/layer,它用于传入指定 Docker 镜像的 layer 并进行安全扫描。

该接口的处理函数为 postLayer,在 postLayer 函数内部调用 ProcessLayer 函数对 layer 进行扫描。


  "Layer": {
    "Name": "523ef1d23f222195488575f52a39c729c76a8c5630c9a194139cb246fb212da6",
    "Path": "https://mystorage.com/layers/523ef1d23f222195488575f52a39c729c76a8c5630c9a194139cb246fb212da6/layer.tar",
    "ParentName": "140f9bdfeb9784cf8730e9dab5dd12fbd704151cf555ac8cae650451794e5ac2",
    "Format": "Docker"
  • Name: 当前 layer 的 sha256 摘要名称
  • Path: 当前 layer 的访问路径
  • ParentName: 当前 layer 的父辈 layer 的名称
  • Format: layer 的格式,目前支持 Docker 与 appc

对于一个完整的 Docker 镜像,需要逐级提交构成镜像的 layer 进行分析。本文中使用 ubuntu 作为测试镜像,该镜像包含 5 个 layer,所以对 ubuntu 镜像完成扫描需要按照 layer 的继承关系,依次进行 5 次请求:

3.3. ProcessLayer 函数


func ProcessLayer(datastore database.Datastore, imageFormat, name, parentName, path string, headers map[string]string) error

ProcessLayer 函数首先从数据库中查询当前的 layer 是否已经有过之前的扫描结果。

  • 如果有的话直接返回
  • 如果没有的话,首先会判断 parentName 所指定的 parent layer 是否在数据库中已有扫描结果。如果没有的话,会抛出异常。如果有的话,会调用 detectContent 函数对 layer 文件进行扫描,并将扫描结果入库。

3.4. detectContent 函数


func detectContent(imageFormat, name, path string, headers map[string]string, parent *database.Layer) (namespace *database.Namespace, featureVersions []database.FeatureVersion, err error)

detectContent 函数内部处理过程如下:

  1. 根据 imageFormat 格式,调用对应的 imagefmt.Extract 接口,将 layer 文件保存到 files 变量中 (数据格式为: tarutil.FilesMap)
  2. 传入 layerName, parentName, files 调用 detectNamespace 函数,判断当前 layer 需要在什么上下文中,扫描漏洞。(namespace: a context around features and vulnerabilities (e.g. an operating system or a programming language)
  3. 传入 layerName, parentName, files, namespace 调用 detectFeatureVersions 函数,扫描是否存在已知漏洞的 feature (feature: anything that when present in a filesystem could be an indication of a vulnerability (e.g. the presence of a file or an installed software package))
  4. 返回 detectFeatureVersions 函数的扫描结果。

其中,在第一步中解析出来的 layer 文件,其数据格式为 tarutil.FilesMap (map[string][]byte)

为了更好的探究 layer 中的文件内容,我们在 detectContent 函数中添加以下代码帮助我们调试,将 files 里的文件输出到本地硬盘中:

for key, value := range files {
    os.Mkdir("~/tmp/clair/files/" + name, 0777)
    f, _ := os.Create(fmt.Sprintf("~/tmp/clair/files/%s/%s", name, strings.Replace(key, "/", "___", -1)))

输出如下,可以看到 ubuntu 镜像中的以下 5 个 layer 中:

  • cc8487ed6373e8b38c60ff8fc5bdfdd9576aa49226a6e4dcac522f61f5f19d31
  • 614c02cb92ee20d3cd51770f07d67503f87a75602ddf032a0a6163527fcf97e0
  • 08ca6384a97957eac5a5a69cdc799434739655c88e69efb23d2bb963110dbf48
  • 0fa211e5edebeb29d3e29cc2c8c87e9a6a8306901816c19b7f6fb6a7392c3cef
  • daf8616e33b20539309a114814ba9864367630ad8da63d4e96bea40dd22841ba

第一层 layer: cc8487ed6373e8b38c60ff8fc5bdfdd9576aa49226a6e4dcac522f61f5f19d31,与第四层 layer: 0fa211e5edebeb29d3e29cc2c8c87e9a6a8306901816c19b7f6fb6a7392c3cef,中包含文件

3.5. detectFeatureVersions 函数


func detectFeatureVersions(name string, files tarutil.FilesMap, namespace *database.Namespace, parent *database.Layer) (features []database.FeatureVersion, err error) {
    // 调用所有已经注册了的 featurefmt 插件, 对 files 进行扫描
    // 目前默认集成的插件有 apt, rpm, dpkg
    features, err = featurefmt.ListFeatures(files)
    if err != nil {

    // 如果当前 layer 未扫描到 feature,尝试返回 parent layer 的 feature
    if len(features) == 0 && parent != nil {
        features = parent.Features

    // Build a map of the namespaces for each FeatureVersion in our parent layer.
    parentFeatureNamespaces := make(map[string]database.Namespace)
    if parent != nil {
        for _, parentFeature := range parent.Features {
            parentFeatureNamespaces[parentFeature.Feature.Name+":"+parentFeature.Version] = parentFeature.Feature.Namespace

    // 确保每一个 feature 都能关联到一个 namespace 上
    for i, feature := range features {
        if feature.Feature.Namespace.Name != "" {
            // There is a Namespace associated.

        if parentFeatureNamespace, ok := parentFeatureNamespaces[feature.Feature.Name+":"+feature.Version]; ok {
            // The FeatureVersion is present in the parent layer; associate with their Namespace.
            features[i].Feature.Namespace = parentFeatureNamespace

        if namespace != nil {
            // The Namespace has been detected in this layer; associate it.
            features[i].Feature.Namespace = *namespace

        log.WithFields(log.Fields{"feature name": feature.Feature.Name, "feature version": feature.Version, logLayerName: name}).Warning("Namespace unknown")
        err = ErrUnsupported


可见 detectFeatureVersions 函数中,扫描 feature 的部分,在 featurefmt.ListFeatures 方法中实现。

// ListFeatures produces the list of FeatureVersions in an image layer using// every registered Lister.func ListFeatures(files tarutil.FilesMap) ([]database.FeatureVersion, error) {
    defer listersM.RUnlock()

    var totalFeatures []database.FeatureVersion
    for _, lister := range listers {
        features, err := lister.ListFeatures(files)
        if err != nil {
            return []database.FeatureVersion{}, err
        totalFeatures = append(totalFeatures, features...)

    return totalFeatures, nil

其中 lister 是一个 Lister 类型接口, 如之前提到,默认集成了 apt, rpm, dpkg 这三个检测插件。

本文中以 ubuntu 镜像作为示例,故只看一下 dpkg 插件的 ListFeatures() 函数实现。

以下是 dpkg 插件的 ListFeatures 函数实现代码:

func (l lister) ListFeatures(files tarutil.FilesMap) ([]database.FeatureVersion, error) {
    f, hasFile := files["var/lib/dpkg/status"]
    if !hasFile {
        return []database.FeatureVersion{}, nil

    // Create a map to store packages and ensure their uniqueness
    packagesMap := make(map[string]database.FeatureVersion)

    var pkg database.FeatureVersion
    var err error
    scanner := bufio.NewScanner(strings.NewReader(string(f)))
    for scanner.Scan() {
        line := scanner.Text()

        if strings.HasPrefix(line, "Package: ") {
            // Package line
            // Defines the name of the package

            pkg.Feature.Name = strings.TrimSpace(strings.TrimPrefix(line, "Package: "))
            pkg.Version = ""
        } else if strings.HasPrefix(line, "Source: ") {
            // Source line (Optionnal)
            // Gives the name of the source package
            // May also specifies a version

            srcCapture := dpkgSrcCaptureRegexp.FindAllStringSubmatch(line, -1)[0]
            md := map[string]string{}
            for i, n := range srcCapture {
                md[dpkgSrcCaptureRegexpNames[i]] = strings.TrimSpace(n)

            pkg.Feature.Name = md["name"]
            if md["version"] != "" {
                version := md["version"]
                err = versionfmt.Valid(dpkg.ParserName, version)
                if err != nil {
                    log.WithError(err).WithField("version", string(line[1])).Warning("could not parse package version. skipping")
                } else {
                    pkg.Version = version
        } else if strings.HasPrefix(line, "Version: ") && pkg.Version == "" {
            // Version line
            // Defines the version of the package
            // This version is less important than a version retrieved from a Source line
            // because the Debian vulnerabilities often skips the epoch from the Version field
            // which is not present in the Source version, and because +bX revisions don't matter
            version := strings.TrimPrefix(line, "Version: ")
            err = versionfmt.Valid(dpkg.ParserName, version)
            if err != nil {
                log.WithError(err).WithField("version", string(line[1])).Warning("could not parse package version. skipping")
            } else {
                pkg.Version = version
        } else if line == "" {
            pkg.Feature.Name = ""
            pkg.Version = ""

        // Add the package to the result array if we have all the informations
        if pkg.Feature.Name != "" && pkg.Version != "" {
            packagesMap[pkg.Feature.Name+"#"+pkg.Version] = pkg
            pkg.Feature.Name = ""
            pkg.Version = ""

    // Convert the map to a slice
    packages := make([]database.FeatureVersion, 0, len(packagesMap))
    for _, pkg := range packagesMap {
        packages = append(packages, pkg)

    return packages, nil

可以看到,dpkg 插件中,主要是扫描 var/lib/dpkg/status 这个文件的内容,判断 dpkg 包是否是存在漏洞的版本 (下面是一段 var/lib/dpkg/status 的文件示例)

Package: fdisk
Status: install ok installed
Priority: important
Section: utils
Installed-Size: 426
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: foreign
Source: util-linux
Version: 2.31.1-0.4ubuntu3
Replaces: util-linux (<< 2.30.1-0ubuntu4~)
Depends: libc6 (>= 2.14), libfdisk1 (>= 2.31.1), libmount1 (>= 2.24.2), libncursesw5 (>= 6), libsmartcols1 (>= 2.28~rc1), libtinfo5 (>= 6)
Breaks: util-linux (<< 2.30.1-0ubuntu4~)
Description: collection of partitioning utilities
 This package contains the classic fdisk, sfdisk and cfdisk partitioning
 utilities from the util-linux suite.
 The utilities included in this package allow you to partition
 your hard disk. The utilities supports both modern and legacy
 partition tables (eg. GPT, MBR, etc).
 The fdisk utility is the classical text-mode utility.
 The cfdisk utilitity gives a more userfriendly curses based interface.
 The sfdisk utility is mostly for automation and scripting uses.
Important: yes
Original-Maintainer: LaMont Jones <lamont@debian.org>

Package: libpam-runtime
Status: install ok installed
Priority: requiredSection: admin
Installed-Size: 300
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: all
Multi-Arch: foreign
Source: pam
Version: 1.1.8-3.6ubuntu2
Replaces: libpam0g-dev, libpam0g-util
Depends: debconf (>= 0.5) | debconf-2.0, debconf (>= 1.5.19) | cdebconf, libpam-modules (>= 1.0.1-6)
Conflicts: libpam0g-util
 /etc/pam.conf 87fc76f18e98ee7d3848f6b81b3391e5
 /etc/pam.d/other 31aa7f2181889ffb00b87df4126d1701
Description: Runtime support for the PAM library
 Contains configuration files and  directories required for
 authentication  to work on Debian systems.  This package is required
 on almost all installations.
Homepage: http://www.linux-pam.org/
Original-Maintainer: Steve Langasek <vorlon@debian.org>

4. 实现自定义扫描需求的思路

从第三部分可以看到,Clair 的漏洞扫描功能,大致流程为解析出镜像中各个 layer 中的文件内容,然后通过分析一些关键文件的内容,判断是否可能存在漏洞。

如果要实现一些自定义的扫描需求,只需要编写一个 featurefmt 插件,并按照 Clair 框架定义的接口格式,实现对于各个 layer 文件内容的扫描逻辑即可。




Docker 相关

  • Container: the execution of an image
  • Image: a set of tarballs that contain the filesystem contents and run-time metadata of a container
  • Layer: one of the tarballs used in the composition of an image, often expressed as a filesystem delta from another layer

Clair 相关

  • Ancestry: the Clair-internal representation of an Image
  • Feature: anything that when present in a filesystem could be an indication of a vulnerability (e.g. the presence of a file or an installed software package)
  • Feature Namespace (featurens): a context around features and vulnerabilities (e.g. an operating system or a programming language)
  • Vulnerability Source (vulnsrc): the component of Clair that tracks upstream vulnerability data and imports them into Clair's database
  • Vulnerability Metadata Source (vulnmdsrc): the component of Clair that tracks upstream vulnerability metadata and associates them with vulnerabilities in Clair's database