#go microservice framework kratos learning Note 6 (kratos service discovery discovery)


go microservice framework kratos learning Note 6 (kratos service discovery discovery)

In addition to the last warden direct connection, kratos has another service discovery sdk: discovery

discovery can be understood as an http service

Its simplest discovery process may be as follows:

1. Service registers appid with discovery service
2. client queries addr of service from discovery through appid

Of course, it's much more than that. It also includes many functions, such as service self discovery, load balancing, etc

This section only looks at the simplest demo of service discovery

First go through the discovery http api

http api

// innerRouter init local router api path.
func innerRouter(e *bm.Engine) {
    group := e.Group("/discovery")
        group.POST("/register", register)
        group.POST("/renew", renew)
        group.POST("/cancel", cancel)
        group.GET("/fetch/all", initProtect, fetchAll)
        group.GET("/fetch", initProtect, fetch)
        group.GET("/fetchs", initProtect, fetchs)
        group.GET("/poll", initProtect, poll)
        group.GET("/polls", initProtect, polls)
        group.POST("/set", set)
        group.GET("/nodes", initProtect, nodes)

The bm engine in discovery registers these interfaces, and then I test them with postman.

register service registration

fetch instance

Fetches batch access instance

Get instances in batch by polls

Nodes obtaining nodes in batch

renew heartbeat

POST http://HOST/discovery/renew

curl '' -d "zone=sh1&env=test&appid=provider&hostname=myhostname"


cancel offline

POST http://HOST/discovery/cancel

curl '' -d "zone=sh1&env=test&appid=provider&hostname=myhostname"


Application discovery logic

Implementation logic of official application discovery

Select the available nodes and add the app appid to the app ID list of poll
If the poll request returns err, the node node is switched, and the switching logic is the same as that of self discovery error
If the poll returns - 304, it indicates that there is no change in appid, and re initiates the poll listening change
The poll interface returns the instances list of appid, completes service discovery, and selects different load balancing algorithms for node scheduling as required

Service registration

Service registration demo

Directly new a new service and register the demo service to discovery

Add the following registration code to the service registration part of the main function.

    ip := ""
    port := "9000"
    hn, _ := os.Hostname()
    dis := discovery.New(nil)
    ins := &naming.Instance{
        Zone:     env.Zone,
        Env:      env.DeployEnv,
        AppID:    "demo.service",
        Hostname: hn,
        Addrs: []string{
            "grpc://" + ip + ":" + port,

    cancel, err := dis.Register(context.Background(), ins)
    if err != nil {

    defer cancel()

panic can't find the node. This is the node address of our discovery, which can be added in the environment variable.

I:\VSProject\kratos-note\kratos-note\warden\discovery\server>kratos run
INFO 01/04-19:32:28.198 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/net/rpc/warden/server.go:329 warden: start grpc listen addr: [::]:9000
panic: invalid discovery config nodes:[] region:region01 zone:zone01 deployEnv:dev host:DESKTOP-NUEKD5O

Successfully registered after configuring the discovery node

I:\VSProject\kratos-note\kratos-note\warden\discovery\server>set DISCOVERY_NODES=

I:\VSProject\kratos-note\kratos-note\warden\discovery\server>kratos run
INFO 01/04-19:40:25.426 I:/VSProject/kratos-note/kratos-note/warden/discovery/server/cmd/main.go:23 abc start
2020/01/04 19:40:25 start watch filepath: I:\VSProject\kratos-note\kratos-note\warden\discovery\server\configs
INFO 01/04-19:40:25.497 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/net/http/blademaster/server.go:98 blademaster: start http listen addr:
[warden] config is Deprecated, argument will be ignored. please use -grpc flag or GRPC env to configure warden server.
INFO 01/04-19:40:25.500 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/net/rpc/warden/server.go:329 warden: start grpc listen addr: [::]:9000
INFO 01/04-19:40:25.501 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/naming/discovery/discovery.go:248 disocvery: AddWatch(infra.discovery) already watch(false)
INFO 01/04-19:40:25.514 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/naming/discovery/discovery.go:631 discovery: successfully polls( instances ({"infra.discovery":{"instances":{"sh001":[{"region":"sh","zone":"sh001","env":"dev","appid":"infra.discovery","hostname":"test1","addrs":[""],"version":"","latest_timestamp":1578122538945305700,"metadata":null,"status":1}]},"latest_timestamp":1578122538945305700,"scheduler":null}})
INFO 01/04-19:40:25.527 I:/VSProject/go/pkg/mod/github.com/bilibili/kratos@v0.3.2-0.20191224125553-6e1180f53a8e/pkg/naming/discovery/discovery.go:414 discovery: register client.Get( env(dev) appid(demo.service) addrs([grpc://]) success

Service registration logic

Now let's follow the log.

As shown in the figure, the service registration logic should be register - > renew - > cancel registration and then cancel the registration continuously to the heartbeat.

Intercept a local service registration log

The operation is as follows:

1. Start discovery Service
2. Start demo.server to register demo.server appid service
3. After a short time, it will wait for the heartbeat and close demo.server

Then you can see that the process of the whole log is roughly as follows:

1. 0: start the discovery service
2. 2 / 3 / 4: service initialization
3. 5: self discovery of long round robin infra.discovery service
4. 6 / 7: new connection & service registration. At this time, the demo.server service that we started is registered
5. 9: self discovery of long round robin infra.discovery service
6. 10: renew heartbeat
7. 12: finally, I killed the registered service and a cancel request occurred.

There is not much deviation in logical understanding from logs, and then look at service discovery.

0: discovery -conf discovery-example.toml -log.v=0
2: INFO 01/10-10:31:19.575 C:/server/src/go/src/discovery/discovery/syncup.go:160 discovery 3: changed nodes:[] zones:map[]
4: INFO 01/10-10:31:19.575 C:/server/src/go/pkg/mod/github.com/bilibili/kratos@v0.1.0/pkg/net/http/blademaster/server.go:98 blademaster: start http listen addr:
INFO 01/10-10:31:19.575 C:/server/src/go/src/discovery/registry/registry.go:219 Polls from(test1) new connection(1)

5:INFO 01/10-10:31:31.796 http-access-log ts=0 method=GET ip= traceid= user=no_user params=appid=infra.discovery&env=dev&hostname=DESKTOP-9NFHKD0&latest_timestamp=0 msg=0 stack=<nil> err= timeout_quota=39.98 path=/discovery/polls ret=0

6:INFO 01/10-10:31:31.798 C:/server/src/go/src/discovery/registry/registry.go:219 Polls from(DESKTOP-9NFHKD0) new connection(1)

7:INFO 01/10-10:31:31.799 http-access-log method=POST user=no_user path=/discovery/register err= ts=0 params=addrs=grpc%3A%2F%2F127.0.0.1%3A9000&appid=demo.service&env=dev&hostname=DESKTOP-9NFHKD0&metadata=&region=region01&status=1&version=&zone=zone01 stack=<nil> ret=0 timeout_quota=39.98 ip= msg=0 traceid=

8:INFO 01/10-10:32:01.799 C:/server/src/go/src/discovery/registry/registry.go:370 DelConns from(DESKTOP-9NFHKD0) delete(1)

9:ERROR 01/10-10:32:01.799 http-access-log method=GET ip= err=-304 timeout_quota=39.98 user=no_user path=/discovery/polls params=appid=infra.discovery&env=dev&hostname=DESKTOP-9NFHKD0&latest_timestamp=1578623479566211700 ret=-304 msg=-304 stack=-304 ts=30.0011342 traceid=

10:INFO 01/10-10:32:01.799 http-access-log msg=0 err= timeout_quota=39.98 method=POST ip= user=no_user ret=0 path=/discovery/renew traceid= params=appid=demo.service&env=dev&hostname=DESKTOP-9NFHKD0&region=region01&zone=zone01 stack=<nil> ts=0

11:INFO 01/10-10:32:01.800 C:/server/src/go/src/discovery/registry/registry.go:219 Polls from(DESKTOP-9NFHKD0) new connection(1)

12:INFO 01/10-10:32:08.499 http-access-log timeout_quota=39.98 path=/discovery/cancel ret=0 stack=<nil> ip= msg=0 traceid= ts=0 method=POST user=no_user err= params=appid=demo.service&env=dev&hostname=DESKTOP-9NFHKD0&region=region01&zone=zone01

Service discovery

Also configure the discovery node set discovery? Nodes =

Change NewClient() to the following

package dao

import (



// AppID your appid, ensure unique.
const AppID = "demo.service" // NOTE: example

func init(){
    // NOTE: NOTE this code, which means to use discovery for service discovery
    // NOTE: it should also be noted that the resolver.Register is globally valid, so it is recommended that the code be executed during process initialization
    // NOTE: !!! Remember not to Register multiple Middleware in one process!!!
    // NOTE: when starting the application, you can specify the discovery node through the flag(-discovery.nodes) or the environment configuration (discovery \ nodes)

// NewClient new member grpc client
func NewClient(cfg *warden.ClientConfig, opts ...grpc.DialOption) (DemoClient, error) {
    client := warden.NewClient(cfg, opts...)
    conn, err := client.Dial(context.Background(), "discovery://default/"+AppID)
    if err != nil {
        return nil, err
    // Note to replace here:
    // The NewDemoClient method is generated under the "api" directory
    // Corresponding to the service name defined in the proto file, please replace it with the correct method name
    return NewDemoClient(conn), nil

At the same time, it is embedded in the dao structure, and the SayHello interface test call is made in the same way as the last ward direct.

// dao dao.
type dao struct {
    db          *sql.DB
    redis       *redis.Redis
    mc          *memcache.Memcache
    demoClient  demoapi.DemoClient
    cache *fanout.Fanout
    demoExpire int32

// New new a dao and return.
func New(r *redis.Redis, mc *memcache.Memcache, db *sql.DB) (d Dao, err error) {
    var cfg struct{
        DemoExpire xtime.Duration
    if err = paladin.Get("application.toml").UnmarshalTOML(&cfg); err != nil {
    grpccfg := &warden.ClientConfig{
        Dial:              xtime.Duration(time.Second * 10),
        Timeout:           xtime.Duration(time.Millisecond * 250),
        Subset:            50,
        KeepAliveInterval: xtime.Duration(time.Second * 60),
        KeepAliveTimeout:  xtime.Duration(time.Second * 20),
    var grpcClient demoapi.DemoClient
    grpcClient, err = NewClient(grpccfg)

    d = &dao{
        db: db,
        redis: r,
        mc: mc,
        demoClient : grpcClient,
        cache: fanout.New("cache"),
        demoExpire: int32(time.Duration(cfg.DemoExpire) / time.Second),

Test call

Operation process

1. Start discovery Service
2. Start demo.server to register as demo.server service
3. Start demo.client
4. Finally, from the SayHello http interface of demo.client to the grpc SayHello interface of demo.server.

Take a look at the official grpc service discovery logic

context deadline exceeded

I find that when I call for service discovery at some time, I will find that the client can't get up, and the context deadlock exceeded.

Because I added the new client to the dao, if the timeout expires, demo.client will directly pannic

According to the client log, you can find
warden client: dial discovery://default/demo.service?subset=50 error context deadline exceeded!panic: context deadline exceeded

client : host:, url:
When calling discovery poll, it timed out. The grpc dial I configured has a period of 10s. In the introduction to the official discovery document, it was written that when discovery is doing self discovery of service nodes, if the server node instance does not change, the interface will block until - 304 is returned in 30s. (the poll interface is a long round training interface)

As for service self discovery, I won't go into details here. This section only focuses on application discovery logic. If you are interested, you can go to discovery.

INFO 01/10-15:22:34.436 http-access-log method=GET path=/discovery/polls user=no_user params=appid=infra.discovery&env=dev&hostname=CLII&latest_timestamp=0 stack=<nil> err= timeout_quota=39.98 ts=0 msg=0 traceid= ip= ret=0
INFO 01/10-15:22:34.438 C:/server/src/go/src/discovery/registry/registry.go:222 Polls from(CLII) new connection(1)
INFO 01/10-15:22:34.440 C:/server/src/go/src/discovery/registry/registry.go:228 Polls from(CLII) reuse connection(2)
INFO 01/10-15:22:44.219 C:/server/src/go/src/discovery/registry/registry.go:373 DelConns from(DESKTOP-9NFHKD0) delete(1)
ERROR 01/10-15:22:44.219 http-access-log path=/discovery/polls ret=-304 msg=-304 timeout_quota=39.98 ip= params=appid=infra.discovery&env=dev&hostname=DESKTOP-9NFHKD0&latest_timestamp=1578637331623587200 user=no_user ts=39.9808023 err=-304 traceid= method=GET stack=-304
INFO 01/10-15:22:44.221 C:/server/src/go/src/discovery/registry/registry.go:222 Polls from(DESKTOP-9NFHKD0) new connection(1)
INFO 01/10-15:22:44.525 http-access-log ts=0 method=POST ip= user=no_user stack=<nil> path=/discovery/renew err= traceid= ret=0 msg=0 timeout_quota=39.98 params=appid=demo.service&env=dev&hostname=DESKTOP-9NFHKD0&region=region01&zone=zone01
INFO 01/10-15:23:04.438 C:/server/src/go/src/discovery/registry/registry.go:370 DelConns from(CLII) count decr(2)
ERROR 01/10-15:23:04.438 http-access-log msg=-304 ts=30.0002154 method=GET err=-304 stack=-304 timeout_quota=39.98 ip= user=no_user path=/discovery/polls params=appid=infra.discovery&env=dev&hostname=CLII&latest_timestamp=1578637331623587200 ret=-304 traceid=
INFO 01/10-15:23:04.440 C:/server/src/go/src/discovery/registry/registry.go:373 DelConns from(CLII) delete(1)
ERROR 01/10-15:23:04.440 http-access-log ts=30.0013758 traceid= user=no_user path=/discovery/polls ret=-304 err=-304 method=GET ip= params=appid=infra.discovery&appid=demo.service&env=dev&hostname=CLII&latest_timestamp=1578637331623587200&latest_timestamp=0 msg=-304 stack=-304 timeout_quota=39.98

Combining discovery logs
15: 22:34's client sends dial
15: About 22:45 client panic
15: At 23:04, only one-304 reply was received from discovery (no change of instance information)

This is actually because the client.Dial() encapsulates the grpc official service discovery. Of course, the final step is the implementation of the grpc official service discovery logic in Kratos ward.

Let's take a look at the logic of this layer. It's very convoluted. I didn't understand it either. I can only take a look at it briefly. If I have the chance to contact it, I will fill in a detailed one.

Take a look at the official grpc service discovery logic

// NewClient new grpc client
func NewClient(cfg *warden.ClientConfig, opts ...grpc.DialOption) (demoapi.DemoClient, error) {
    client := warden.NewClient(cfg, opts...)
    cc, err := client.Dial(context.Background(), fmt.Sprintf("discovery://default/%s", AppID))
    if err != nil {
        return nil, err
    return demoapi.NewDemoClient(cc), nil

In fact, there is a process in client.Dial():

Client. Dial() - > scheme of dialcontext() - > parser target in grpc and obtain the corresponding Builder (discovery in this case)

    if cc.dopts.resolverBuilder == nil {
        // Only try to parse target when resolver builder is not already set.
        cc.parsedTarget = parseTarget(cc.target)
        grpclog.Infof("parsed scheme: %q", cc.parsedTarget.Scheme)
        cc.dopts.resolverBuilder = resolver.Get(cc.parsedTarget.Scheme)
        if cc.dopts.resolverBuilder == nil {
            // If resolver builder is still nil, the parsed target's scheme is
            // not registered. Fallback to default resolver and set Endpoint to
            // the original target.
            grpclog.Infof("scheme %q not registered, fallback to default scheme", cc.parsedTarget.Scheme)
            cc.parsedTarget = resolver.Target{
                Scheme:   resolver.GetDefaultScheme(),
                Endpoint: target,
            cc.dopts.resolverBuilder = resolver.Get(cc.parsedTarget.Scheme)
    } else {
        cc.parsedTarget = resolver.Target{Endpoint: target}

If DialContext() succeeds, you will get - > structure clientconn - > clientconn.resolverwrapper initialization - > call build()

    defer ccr.resolverMu.Unlock()

    ccr.resolver, err = rb.Build(cc.parsedTarget, ccr, rbo)
// ClientConn represents a virtual connection to a conceptual endpoint, to
// perform RPCs.
// A ClientConn is free to have zero or more actual connections to the endpoint
// based on configuration, load, etc. It is also free to determine which actual
// endpoints to use and may change it every RPC, permitting client-side load
// balancing.
// A ClientConn encapsulates a range of functionality including name
// resolution, TCP connection establishment (with retries and backoff) and TLS
// handshakes. It also handles errors on established connections by
// re-resolving the name and reconnecting.
type ClientConn struct {
    ctx    context.Context
    cancel context.CancelFunc

    target       string
    parsedTarget resolver.Target
    authority    string
    dopts        dialOptions
    csMgr        *connectivityStateManager

    balancerBuildOpts balancer.BuildOptions
    blockingpicker    *pickerWrapper

    mu              sync.RWMutex
    resolverWrapper *ccResolverWrapper
    sc              *ServiceConfig
    conns           map[*addrConn]struct{}
    // Keepalive parameter can be updated if a GoAway is received.
    mkp             keepalive.ClientParameters
    curBalancerName string
    balancerWrapper *ccBalancerWrapper
    retryThrottler  atomic.Value

    firstResolveEvent *grpcsync.Event

    channelzID int64 // channelz unique identification number
    czData     *channelzData

The implementation of user Builder can update state - > clientconn's updateresolverstate - > updateresolverstate - > address initialization and other grpc official logic

// Builder creates a resolver that will be used to watch name resolution updates.
type Builder interface {
    // Build creates a new resolver for the given target.
    // gRPC dial calls Build synchronously, and fails if the returned error is
    // not nil.
    Build(target Target, cc ClientConn, opts BuildOptions) (Resolver, error)
    // Scheme returns the scheme supported by this resolver.
    // Scheme is defined at https://github.com/grpc/grpc/blob/master/doc/naming.md.
    Scheme() string
// ClientConn contains the callbacks for resolver to notify any updates
// to the gRPC ClientConn.
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type ClientConn interface {
    // UpdateState updates the state of the ClientConn appropriately.
    // ReportError notifies the ClientConn that the Resolver encountered an
    // error.  The ClientConn will notify the load balancer and begin calling
    // ResolveNow on the Resolver with exponential backoff.
    // NewAddress is called by resolver to notify ClientConn a new list
    // of resolved addresses.
    // The address list should be the complete list of resolved addresses.
    // Deprecated: Use UpdateState instead.
    NewAddress(addresses []Address)
    // NewServiceConfig is called by resolver to notify ClientConn a new
    // service config. The service config should be provided as a json string.
    // Deprecated: Use UpdateState instead.
    NewServiceConfig(serviceConfig string)
    // ParseServiceConfig parses the provided service config and returns an
    // object that provides the parsed config.
    ParseServiceConfig(serviceConfigJSON string) *serviceconfig.ParseResult

kratos discovery

Warden wraps the whole service discovery implementation logic of gRPC. The code is located in pkg/naming/naming.go and warden/resolver/resolver.go respectively

naming.go defines the Instance structure for describing business instances, the Registry interface for service registration, and the Resolver interface for service discovery.

// Resolver resolve naming service
type Resolver interface {
    Fetch(context.Context) (*InstancesInfo, bool)
    Watch() <-chan struct{}
    Close() error

// Registry Register an instance and renew automatically.
type Registry interface {
    Register(ctx context.Context, ins *Instance) (cancel context.CancelFunc, err error)
    Close() error

// InstancesInfo instance info.
type InstancesInfo struct {
    Instances map[string][]*Instance `json:"instances"`
    LastTs    int64                  `json:"latest_timestamp"`
    Scheduler *Scheduler             `json:"scheduler"`

The resolver.go implements the official resolver.Builder and resolver.Resolver interfaces of gRPC, and also exposes the naming.Builder and naming.Resolver interfaces in naming.go

// Resolver resolve naming service
type Resolver interface {
    Fetch(context.Context) (*InstancesInfo, bool)
    Watch() <-chan struct{}
    Close() error

// Builder resolver builder.
type Builder interface {
    Build(id string) Resolver
    Scheme() string

kratos wraps the Build of grpc, and only needs to pass the appid of the corresponding service: after calling in grpc, warden/resolver/resolver.go will query the corresponding naming.Builder implementation according to the Scheme method and call Build to pass in the id. The implementation of naming.Resolver can go to the corresponding service discovery middleware (here is the discovery service) through appid to query the instance information (Fetch interface). In addition to simple Fetch operation, Watch method is also used to monitor the changes of nodes in the service discovery middleware and update the service instance information in real time.

The service registration and discovery logic based on discovery as middleware is implemented in naming/discovery. In general, you can see that you have made a poll request for the discovery service middleware.

// Build disovery resovler builder.
func (d *Discovery) Build(appid string, opts ...naming.BuildOpt) naming.Resolver {
    r := &Resolve{
        id:    appid,
        d:     d,
        event: make(chan struct{}, 1),
        opt:   new(naming.BuildOptions),
    for _, opt := range opts {
    app, ok := d.apps[appid]
    if !ok {
        app = &appInfo{
            resolver: make(map[*Resolve]struct{}),
        d.apps[appid] = app
        cancel := d.cancelPolls
        if cancel != nil {
    app.resolver[r] = struct{}{}
    if ok {
        select {
        case r.event <- struct{}{}:
    log.Info("disocvery: AddWatch(%s) already watch(%v)", appid, ok)
    d.once.Do(func() {
        go d.serverproc()
    return r

func (d *Discovery) serverproc() {
    var (
        retry  int
        ctx    context.Context
        cancel context.CancelFunc
    ticker := time.NewTicker(time.Minute * 30)
    defer ticker.Stop()
    for {
        if ctx == nil {
            ctx, cancel = context.WithCancel(d.ctx)
            d.cancelPolls = cancel
        select {
        case <-d.ctx.Done():
        case <-ticker.C:
        apps, err := d.polls(ctx)
        if err != nil {
            if ctx.Err() == context.Canceled {
                ctx = nil
        retry = 0

Tags: Go github Redis JSON curl

Posted on Mon, 13 Jan 2020 06:46:50 -0500 by hacksurfin