thanos source code analysis -- sidecar shipper data to minio

sidecar is responsible for regularly sending local prometheus block data to remote storage. Its operating parameters are as follows:

/bin/thanos sidecar     
    --prometheus.url=http://localhost:9090/
    --tsdb.path=/prometheus
    --grpc-address=[$(POD_IP)]:10901
    --http-address=[$(POD_IP)]:10902
    --objstore.config=$(OBJSTORE_CONFIG)

Where environment variables:

Environment:
    POD_IP:            (v1:status.podIP)
    OBJSTORE_CONFIG:  <set to the key 'thanos.yaml' in secret 'thanos-objstore-config'>  Optional: false

POD_ The IP is obtained by downwareAPI, OBJSTORE_CONFIG is the thanos.yaml content saved by secret. The s3 remote storage minio is configured:

type: s3
config:
  bucket: thanos
  endpoint: minio.minio.svc.cluster.local:9000
  access_key: minio
  secret_key: minio
  insecure: true
  signature_version2: false

sidecar source code entry

Find the entrance of sidecar first:

func main() {
    ......
    app := extkingpin.NewApp(kingpin.New(filepath.Base(os.Args[0]), "A block storage based long-term storage for Prometheus.").Version(version.Print("thanos")))
    registerSidecar(app)
    .......
    var g run.Group
    ......
    if err := g.Run(); err != nil {
        // Use %+v for github.com/pkg/errors error to print with stack.
        level.Error(logger).Log("err", fmt.Sprintf("%+v", errors.Wrapf(err, "%s command failed", cmd)))
        os.Exit(1)
    }
}

registerSidecar() register sidecar service:

// cmd/thanos/sidecar.go
func registerSidecar(app *extkingpin.App) {
    cmd := app.Command(component.Sidecar.String(), "Sidecar for Prometheus server.")
    conf := &sidecarConfig{}
    conf.registerFlag(cmd)
    cmd.Setup(func(g *run.Group, logger log.Logger, reg *prometheus.Registry, tracer opentracing.Tracer, _ <-chan struct{}, _ bool) error {
        ......
        return runSidecar(g, logger, reg, tracer, rl, component.Sidecar, *conf, grpcLogOpts, tagOpts)
    }
}

The startup process of shipper in sidecar is as follows:

  • When there is remote storage, uploads=true;
  • Run the shipper as a background goroutine;
  • During the operation of the shipper, check whether there is a new block every 30s. If there is a new block, execute Sync() to ship it to the remote storage;
// cmd/thanos/shidecar.go
func runSidecar(
    g *run.Group,
    logger log.Logger,
    reg *prometheus.Registry,
    tracer opentracing.Tracer,
    reloader *reloader.Reloader,
    comp component.Component,
    conf sidecarConfig,
    grpcLogOpts []grpc_logging.Option,
    tagOpts []tags.Option,
) error {
    ......
    // uploads=true when there is a remote storage configuration
    var uploads = true
    if len(confContentYaml) == 0 {
        level.Info(logger).Log("msg", "no supported bucket was configured, uploads will be disabled")
        uploads = false
    }
    ......
    if uploads {
        // The background shipper continuously scans the data directory and uploads
        // new blocks to Google Cloud Storage or an S3-compatible storage service.
        bkt, err := client.NewBucket(logger, confContentYaml, reg, component.Sidecar.String())


        ctx, cancel := context.WithCancel(context.Background())
        // The shipper is executed as a background goroutine
        g.Add(func() error {
            ......
            s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
                conf.shipper.uploadCompacted, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc))
            .......
            //Check and execute every 30s
            return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
                if uploaded, err := s.Sync(ctx); err != nil {
                    level.Warn(logger).Log("err", err, "uploaded", uploaded)
                }
                minTime, _, err := s.Timestamps()
                if err != nil {
                    level.Warn(logger).Log("msg", "reading timestamps failed", "err", err)
                    return nil
                }
                m.UpdateTimestamps(minTime, math.MaxInt64)
                return nil
            })
        }, func(error) {
            cancel()
        })
    }
}

Here, run.Group is used to manage concurrent tasks. When one task exits incorrectly, other tasks also exit.

shipper process

  • First, check which local block s need to be uploaded;
  • Then protect the block dir to be uploaded by hardlink;
  • Finally, upload block dir to minio(minio API);

1. Check which block s need to be uploaded

s := shipper.New(logger, reg, conf.tsdb.path, bkt, m.Labels, metadata.SidecarSource,
        conf.shipper.uploadCompacted, conf.shipper.allowOutOfOrderUpload, metadata.HashFunc(conf.shipper.hashFunc))
.......
//Check and execute every 30s
return runutil.Repeat(30*time.Second, ctx.Done(), func() error {
    if uploaded, err := s.Sync(ctx); err != nil {
        level.Warn(logger).Log("err", err, "uploaded", uploaded)
    }
    minTime, _, err := s.Timestamps()
    if err != nil {
        level.Warn(logger).Log("msg", "reading timestamps failed", "err", err)
        return nil
    }
    m.UpdateTimestamps(minTime, math.MaxInt64)
    return nil
})

Focus on s.Sync(ctx):

  • Uploaded block: read the metafile, where the uploaded blocks are saved;
  • Current block: read the data directory of prometheus and query all current blocks;
  • Upload the block without ship in data to remote storage through s.upload();
  • Finally, rewrite the metafile and write the newly upload ed block into the metafile(metafile: thanos.shipper.json);
// pkg/shipper/shipper.go
// Sync performs a single synchronization, which ensures all non-compacted local blocks have been uploaded
// to the object bucket once.
//
// If uploaded.
//
// It is not concurrency-safe, however it is compactor-safe (running concurrently with compactor is ok).
func (s *Shipper) Sync(ctx context.Context) (uploaded int, err error) {
    meta, err := ReadMetaFile(s.dir)    //Read the current data directory
    
    // Build a map of blocks we already uploaded.
    hasUploaded := make(map[ulid.ULID]struct{}, len(meta.Uploaded))
    for _, id := range meta.Uploaded {
        hasUploaded[id] = struct{}{}
    }

    metas, err := s.blockMetasFromOldest()
    for _, m := range metas {
        // Do not sync a block if we already uploaded or ignored it. If it's no longer found in the bucket,
        // it was generally removed by the compaction process.
        if _, uploaded := hasUploaded[m.ULID]; uploaded {    //Already uploaded
            meta.Uploaded = append(meta.Uploaded, m.ULID)
            continue
        }
        if err := s.upload(ctx, m); err != nil {    //upload
            uploadErrs++
            continue
        }
        meta.Uploaded = append(meta.Uploaded, m.ULID)
        uploaded++
    }    
    
    if err := WriteMetaFile(s.logger, s.dir, meta); err != nil {   //Write metafile: thanos.shipper.json
        level.Warn(s.logger).Log("msg", "updating meta file failed", "err", err)
    }    
}

metafile is actually data/thanos.shipper.json, which saves the uploaded block:

/prometheus $ cat thanos.shipper.json
{
        "version": 1,
        "uploaded": [
                "01FEYW9R0P134EWRCPQSQSCEZM",
                "01FEZ35F8Q1WBHSDCGBJGN52YN",
                "01FEZA16GMX4E1VZRQKMEJ7B5R",
                "01FEZGWXRT31P1M8BG5SMFARAJ"
        ]
}

2. Protect the block dir to be uploaded by hardlink

Hardlink the block dir to be uploaded, and the hardlink file is temporarily placed in thanos folder to prevent other operations from modifying the dir;

/prometheus $ ls
01FEZGWXRT31P1M8BG5SMFARAJ  thanos
01FEZ35F8Q1WBHSDCGBJGN52YN  chunks_head                 thanos.shipper.json
01FEZA16GMX4E1VZRQKMEJ7B5R  queries.active              wal

Implementation code:

// pkg/shipper/shipper.go
// sync uploads the block if not exists in remote storage.
func (s *Shipper) upload(ctx context.Context, meta *metadata.Meta) error {
    level.Info(s.logger).Log("msg", "upload new block", "id", meta.ULID)

    // We hard-link the files into a temporary upload directory so we are not affected
    // by other operations happening against the TSDB directory.
    updir := filepath.Join(s.dir, "thanos", "upload", meta.ULID.String())    //Temporary directory

    // Remove updir just in case.
    if err := os.RemoveAll(updir); err != nil {
        return errors.Wrap(err, "clean upload directory")
    }
    if err := os.MkdirAll(updir, 0750); err != nil {
        return errors.Wrap(err, "create upload dir")
    }
    .....
    dir := filepath.Join(s.dir, meta.ULID.String())
    if err := hardlinkBlock(dir, updir); err != nil {
        return errors.Wrap(err, "hard link block")
    }
    ......
    return block.Upload(ctx, s.logger, s.bucket, updir, s.hashFunc)
}

Because linux hardlink cannot operate folders, a new folder is created and each file in its directory is hardlink recursively.

Each block contains the following files:

/prometheus/01FEWYG6RK8JE9MY45XBJ0893G $ ls -alh
total 3M
drwxr-sr-x    3 1000     2000          68 Sep  6 07:00 .
drwxrwsrwx   18 root     2000        4.0K Sep  7 08:19 ..
drwxr-sr-x    2 1000     2000          20 Sep  6 07:00 chunks
-rw-r--r--    1 1000     2000        2.5M Sep  6 07:00 index
-rw-r--r--    1 1000     2000         280 Sep  6 07:00 meta.json
-rw-r--r--    1 1000     2000           9 Sep  6 07:00 tombstones
/prometheus/01FEWYG6RK8JE9MY45XBJ0893G $
/prometheus/01FEWYG6RK8JE9MY45XBJ0893G $ ls chunks/
000001

When in the hardlink directory, traverse each file in the directory for hardlink:

// pkg/shipper/shipper.go
func hardlinkBlock(src, dst string) error {
    //chunks directory
    chunkDir := filepath.Join(dst, block.ChunksDirname)
    if err := os.MkdirAll(chunkDir, 0750); err != nil {
        return errors.Wrap(err, "create chunks dir")
    }
    fis, err := ioutil.ReadDir(filepath.Join(src, block.ChunksDirname))
    if err != nil {
        return errors.Wrap(err, "read chunk dir")
    }
    files := make([]string, 0, len(fis))
    //Traverse the chunks directory
    for _, fi := range fis {
        files = append(files, fi.Name())
    }
    for i, fn := range files {
        files[i] = filepath.Join(block.ChunksDirname, fn)
    }
    // meta.json file, index file
    files = append(files, block.MetaFilename, block.IndexFilename)
    // hardlink all files under dir
    for _, fn := range files {
        if err := os.Link(filepath.Join(src, fn), filepath.Join(dst, fn)); err != nil {
            return errors.Wrapf(err, "hard link file %s", fn)
        }
    }
    return nil
}

3.upload to remote storage

// pkg/block/block.go
// Upload uploads a TSDB block to the object storage. It verifies basic
// features of Thanos block.
func Upload(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc) error {
    return upload(ctx, logger, bkt, bdir, hf, true)
}

Upload will upload each file in the block directory separately:

// pkg/block/block.go
func upload(ctx context.Context, logger log.Logger, bkt objstore.Bucket, bdir string, hf metadata.HashFunc, checkExternalLabels bool) error {
    ......
    // Upload chunks directory
    if err := objstore.UploadDir(ctx, logger, bkt, path.Join(bdir, ChunksDirname), path.Join(id.String(), ChunksDirname)); err != nil {
        return cleanUp(logger, bkt, id, errors.Wrap(err, "upload chunks"))
    }
    // Upload index file
    if err := objstore.UploadFile(ctx, logger, bkt, path.Join(bdir, IndexFilename), path.Join(id.String(), IndexFilename)); err != nil {
        return cleanUp(logger, bkt, id, errors.Wrap(err, "upload index"))
    }
    // Upload meta.json file
    if err := bkt.Upload(ctx, path.Join(id.String(), MetaFilename), strings.NewReader(metaEncoded.String())); err != nil {
        return errors.Wrap(err, "upload meta file")
    }
    ......
}

The upload directory function UploadDir() will traverse the files in the directory, and then upload the files one by one:

// pkg/objstore/objstore.go
func UploadDir(ctx context.Context, logger log.Logger, bkt Bucket, srcdir, dstdir string) error {
    .......
    return filepath.Walk(srcdir, func(src string, fi os.FileInfo, err error) error {
        if err != nil {
            return err
        }
        if fi.IsDir() {
            return nil
        }
        dst := filepath.Join(dstdir, strings.TrimPrefix(src, srcdir))
        return UploadFile(ctx, logger, bkt, src, dst)
    })
}

When uploading, different interfaces are used according to different object storage; For minio, it uses s3 interface;
Call the client provided by mino to upload:

// pkg/objstore/s3/s3.go
// Upload the contents of the reader as an object into the bucket.
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader) error {
    sse, err := b.getServerSideEncryption(ctx)
    ...
    size, err := objstore.TryToGetSize(r)
    partSize := b.partSize
    err := b.client.PutObject(    //API for minio
        ctx,
        b.name,
        name,
        r,
        size,
        minio.PutObjectOptions{
            PartSize:             partSize,
            ServerSideEncryption: sse,
            UserMetadata:         b.putUserMetadata,
        },
    )
    ...
}

reference resources

1.linux hard-link: https://linuxhandbook.com/har...

Tags: Prometheus

Posted on Thu, 11 Nov 2021 15:13:47 -0500 by RobinTibbs