mit_6.824_2021_lab2A_leader_election

mit_6.824_2021_lab2A_leader_election

Come back after lab2 to write a series of articles summary

If lab1's mapreduce is used to get started with the Distributed Systems course, then the beginning of lab2 is the real beginning of the course design

The lab2 series is the implementation of the raft distributed consistency protocol algorithm. extended Raft paper More importantly, we need to look at it over and over again, especially Figure 2, and some details of the implementation of Section 5

raft decomposes the distributed consensus into subproblems, with which the lab2 series is linked:

  • leader election, leadership election (lab2A)
  • Log replication, log replication (lab2B)
  • safety, security (lab2B&2C); 2C includes Fig8 error log processing in addition to persistence

These are the core features of raft, but there are many other areas to optimize for use in production environments:

  • log compaction, log compression-snapshot (lab2D)
  • Cluster membership changes, cluster member changes

lab2A:leader election

Experimental Contents

Implement Raft Leader Election and Heartbeat (AppendEntries RPC without log entries)The goal of Part 2A is to select a single leader. If there is no paralysis, the leader will continue to be the leader. If the old leader is paralysed or data packets to and from the old leader are lost, the new leader will take over the loss. Run go test-run 2A-race to test your 2A code.

Experimental Tips

The following tips are extracted directly from the translation 6.824 lab2 raft

  • Tests must be run with-race, that is, go test-run 2A-race.

  • Follow Figure 2 of the paper. Implement RequestVote RPC, rules related to elections, and status related to leading elections.

  • Add the Leader Election Status in Figure 2 to raft.go's aft structure. You also need to define a structure to hold information about each log entry (here I define logEntry to represent a log entry).

  • Fill in the RequestVoteArgs and RequestVoteReply structures. Modify Make() to create a background goroutine when it has not received other peers for some timeWhen the message is sent, it will start the leader election periodically by sending a RequestVote RPC. In this way, peer can know who is the leader or become the leader itself when it cannot contact the leader. Implement the RequestVote() RPC handler so that followers can vote for candidates.

  • To achieve a heartbeat, define an AppendEntries RPC structure (lab2A will not use log entries yet) and have the leader send them regularly. Write an AppendEntries RPC handler method to "reset" the election timeout so that when one server is already selected, the other servers will not go forward as leaders.

  • Make sure that election timeouts for different peers do not always trigger at the same time, otherwise all peers will vote for themselves and no one will be a leader.

  • The tester asked the leader to send a heartbeat RPC no more than ten times per second. (That is, 100ms sends a heartbeat)

  • The tester asks your Raft to elect a new leader within five seconds of the old leader's failure (if most peers can still communicate). But remember, this can happen if split votes occur (if data packets are lost or candidates unfortunately choose the same random evasion time)Leader elections may require multiple rounds of voting. You must choose a short enough election timeout (and heartbeat interval) to complete in less than five seconds, even if the election takes multiple rounds.

    This means that the only leader must be selected within five seconds or the test will fail

  • Section 5.2 of the paper mentions election timeouts in the range of 150 to 300 milliseconds. This range only makes sense when leaders send heartbeats much more frequently than once every 150 milliseconds. Because the tester limits you to 10 heartbeats per second, you will have to use a range from 150 to 300 as described in the paperMilliseconds of elections time out, but not too long, because you might not be able to elect a leader in five seconds.

  • You may find that Go s rand Very useful.

  • You need to write code that takes action on a regular or delayed basis. The easiest way is to use calls time.Sleep() Create a goroutine for the loop of; (see Make() ticker() protocol created for this purpose). Do not use Go time.Timer or time.Ticker, which is difficult to use correctly.

  • this wizard sheet , some tips on how to develop and debug code.

  • If your code fails the test, read Figure 2 of the paper again; the full logic of the leadership election is spread across multiple parts of the graph.

  • Don't forget to implement GetState().

  • The tester calls your Raft's rf.Kill() when closing the instance permanently. You can use rf.kill() to check if it was killed. You may want to do this in all loops to avoid printing confusing messages from the dead Raft instance.

  • Go RPC only sends struct fields whose names begin with uppercase letters. Substructures must also have uppercase field names (such as log fields in arrays). This labgob package warns you about this;Do not ignore the warning.

Ideas for implementation

Digest the experimental tips and look at Section 5.2 of raft's paper to get some implementation information:

  1. Firstly, it is clear that there are only three roles in the raft algorithm: leader, candidate, follower; and its state machine changes are clearly described in the paper.

  2. Referring to Fig2, you can know the attributes in the structure of the three roles and the methods to be implemented.

  3. The leader is responsible for periodically broadcasting AppendEntries rpc requests and the candidate for periodically broadcasting RequestVote rpc requests

  4. The rpc interfaces that need to be implemented are AppendEntries and RequestVote, and follower is only responsible for passively receiving rpc requests and never initiatively initiating requests (but leader and candidate also receive requests from other peer s when the network is disrupted)

  5. Heart beat timeout requires a ticker in Make to do a periodic check, and timer is not recommended here, time.Sleep() is recommended, and I use sleep for almost all periodic implementations here

  6. There are periodic sleep(timeout) in which timeout is random, such as heartbeat timeouts and election timeouts

    leader broadcasts are not random and frequent enough, I use 100ms here; both heartbeat and election timeouts are [250,400] MS

  7. GetState() needs to be implemented for testing

  8. RPC structure properties start with uppercase, otherwise golang will not be exported

  9. Bury DPrint in code, print leaderId and rf.me frequently, find bug basically rely on it QAQ

  10. Be sure to watch the assistant's guide several times and get the assistant's go-test-many.sh`. Sometimes full pass may be an accidental phenomenon. It is correct to have a good batch test.

About Organization Structure

It's not recommended that all code be crammed into raft.go. From lab2A to lab2D, the organization of my files has been changing because an AppendEntries rpc method implementation has nearly 100 lines (including log s and comments) even when encapsulated and reused

It can be structured by role functions, with different opinions

About Figure2

Many materials, including the student guide, say that the second part of the paper needs to be checked repeatedly and implemented strictly.

Yes, it is true, but in the course of doing the experiment, it is not enough to just implement strictly, because the information disclosed by Fig2 is limited and can easily be thought-provoking.

First, it does not specify how reply is handled after an rpc request is sent by leader and candidate, which are hidden in Section 5 of the paper and describe only the receiver of the rpc request, the implementation of followers

And some rules apply to all server s, such as

If commitIndex > lastApplied: increment lastApplied, apply log[lastApplied] to state machine (§5.3)
If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower (§5.1)

Where do these two processes for all server s need to be placed in the rpc method or the reply?

Also in candidate:

If AppendEntries RPC received from new leader: convert to follower

Where should this item go in AppendEntries?

And so on, you can understand that Fig 2 is an outline and we need to add some of our own understanding of Section 5 of this paper without deviating from it.

Implementation Details

  • The State in Figure 2 is all added to the code, so there's nothing to say about it

  • Two RPC s, args and reply, each encapsulate a structure and supplement the capitalization properties

  • The relative headaches I feel with lab2A are periodic checks, state machine transitions, specific rpc details

  • Before each rpc sends args and handles reply, it needs to determine if its state has changed. If it has changed, the request will not be sent or returned for processing and will be discarded directly; that is, before taking action, check if its state has changed or expired

    Because this bug stuck in lab2C for a long time, tears QAQ

  • Other common errors, in fact, you can refer to other materials, here only explain the headaches you encounter

Universal encapsulation

For the rules and implementation details of all server s just mentioned, which can be generally handled, a general form of rpc handler and returns can be derived

// RpcHandler refers to AppendEntries and RequestVote and does not implement this method
func (rf *Raft) RpcHandler(){
  rf.mu.Lock()
  defer func() {
		reply.Term = rf.currentTerm
		rf.mu.Unlock()
	}()
  
 	// ...
  
  if request Of Term > currentTerm { set currentTerm = T, convert to follower }

  // ...
}

func (rf *Raft) RpcBroadcast() {
  for i = 0 -> n {
    
    if The role has changed {
      return
    }
    
    RpcSender()
  }
}

func (rf *Raft) RpcSender() {
  if If the rpc Return body expiration or role change {
    return
  }
  
  if request Of Term > currentTerm { set currentTerm = T, convert to follower also return}
  
  //  ...
}

Periodic check

Heart beat timeout can be received with a time.Time

Here I give you the time I use directly: leader 100ms, followers and candidate are [250, 400]ms

The ticker pseudocode is as follows

func (rf *Raft) ticker() {
	for !rf.killed() {
		rf.mu.Lock()
		switch rf.state {
		case FOLLOWER:
			rf.mu.Unlock()
			time.Sleep(Random heartbeat timeout)
			rf.mu.Lock()
			if Heartbeat Timeout {
				rf.Convert to candiate()
				rf.mu.Unlock()
        // Elections begin immediately after becoming candidate, and the elections are asynchronous
        go Start elections()
			} else {
				rf.mu.Unlock()
			}
		case LEADER:
			rf.mu.Unlock()
			// Broadcast a heartbeat immediately after becoming a leader
			go rf.Broadcast Heartbeat()
			time.Sleep(fixed leader Time duration)
		default:
			rf.mu.Unlock()
		}
	}
}

You can see that you need to make reasonable use of locks, because they are dead loops, so you can't defer them, so don't deadlock them

leader needs to broadcast heartbeat pseudocodes regularly as follows:

func (rf *Raft) broadcastAppendEntries() {
	// This round of follower s broadcasts commands
	for i := range rf.peers {
		if i == rf.me {
			continue
		}
		// In Abstract terms: build args -> send RPC -> process reply
		rf.mu.Lock()
    if The state has changed() {
			// Old leader of follower after receipt of request 1, no further request 2 from subsequent for loop is allowed
			rf.mu.Unlock()
			return
		}
		// nextIndex > snapshotIndex
		// Send AE
		args := rf.structure rpc Of args(i)
		go rf.Send out rpc Request and process reply(i, args)
		rf.mu.Unlock()
	}
}

The leader handles rpc reply with the following detailed pseudocodes:

func (rf *Raft) sendAppendEntriesHandler(peerIndex int, args *AppendEntriesArgs) {
	ok := rf.Send out ae request(peerIndex, args, reply)
	if ok {
		rf.mu.Lock()
		defer rf.mu.Unlock()
		isRetry := rf.handlerAppendEntriesReply(peerIndex, args, reply)
		// If AppendEntries fails because of log inconsistency: decrement nextIndex and retry (§5.3)
    If you need to retry here, you need to retry
	}
}


// Handler AppendEntriesReply handles the body of the AppendEntries rpc request
func (rf *Raft) handlerAppendEntriesReply(peerIndex int, args *AppendEntriesArgs, reply *AppendEntriesReply) bool {
	// Expired rpc return check
  Status check, expiration check()
  // If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower (§5.1)
  Term inspect()
	
	// Parameter checking is required whether or not it is a heartbeat package, lab2B
	if reply.Success {
		// ...
	}
}

State Machine Transition

follower to candidate is already in ticker and can only appear in ticker

Candiate to leader, on the other hand, needs to be a leader after candiate elections have been run and more than half of the votes have been taken

The startElection pseudocode is as follows:

func (rf *Raft) startElection() {
	for !rf.killed() {
		rf.mu.Lock()
    // On conversion to candidate, start election
    // increment currentTerm
		// Tenure numbers monotonically increase and never repeat
		rf.currentTerm++
		// vote for self
		rf.votedFor = rf.me
		rf.mu.Unlock()
		// Non-critical zone----------------------------------------------------------------------------------------------------------------------
		// During this period, the election may have succeeded in becoming a leader, may have received heartbeats from other leaders becoming follower s, or the election may have timed out
		// reset election timer
    // send RequestVote RPC's to all other servers
		go rf.Broadcast vote() // Note that concurrent broadcasting is required here
		time.Sleep(Random Election Timeout)
		// Non-critical zone----------------------------------------------------------------------------------------------------------------------
		// raft status must be determined once an election has timed out
		rf.mu.Lock()
		if Still candidate {
			// When the election timed out, start a new election immediately
			rf.mu.Unlock()
		} else {
      // You can quit
			rf.mu.Unlock()
			return
		}
	}
}

candiate handles reply when broadcasting an election and determines whether it is a leader based on the number of votes it receives

The broadcast election pseudocode is as follows:

func (rf *Raft) broadcastVote() {
	voteGranted := 1
	// Broadcast Uniform Concurrent Processing
	//  - Enter the Leader role if you receive votes from most other nodes
	//  - Enter the Follower role if you receive AppendEntries RPC from another Leader
	//  -If Election Timeout is again, then the election is re-launched
	for i := range rf.peers {
    // I don't need to send it to myself - -
		if i == rf.me {
			continue
		}
		rf.mu.Lock()
    // 
		if No candiate Yes {
			rf.mu.Unlock()
			return
		}
		args := Establish rpc Requestor()
		rf.mu.Unlock()
    // Concurrent Send
    // Details of handling reply are also covered here
		go func(peerIndex int, args *RequestVoteArgs) {
			ok := rf.Send Election Voting Request(peerIndex, args, reply)
			if ok {
				rf.mu.Lock()
				defer rf.mu.Unlock()
				/**
				for all server:
					If RPC request or response contains term T > currentTerm:
						set currentTerm = T, convert to follower (§5.1)
				*/
				// Add some checks to prevent expired rpc from returning
        Status Check()
        Overdue check()

				// This cannot be encapsulated as a handler or a parameter at this time because voteGranted is a closure
				// get the vote
				if reply.VoteGranted {
					voteGranted++
					if rf.Is Majority(voteGranted) {
						rf.convert to leader()
						return
					}
				}
			}
		}(i, args)
	}
}

The closure technique is used here, voteGranted refers to the number of votes that were won in this round, and a variable escape occurs without having to be bound to the raft structure, since the number of votes won in this round has nothing to do with the previous round

There are only two things peers need to do when they receive a RequestVote request from candidate, as shown in Figure 2:

  1. Reply false if term < currentTerm (§5.1)
  2. If votedFor is null or candidateId, and candidate's log is at
    least as up-to-date as receiver's log, grant vote (§5.2, §5.4)

While up-to-date has a natural paragraph and other information in Section 5 of the paper, the pseudocode is as follows:

// RequestVote
// example RequestVote RPC handler.
//
func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
	// Your code here (2A, 2B).
	rf.mu.Lock()
	defer func() {
		reply.Term = rf.currentTerm
		rf.mu.Unlock()
	}()
// 1. Reply false if term < currentTerm (§5.1)
  term inspect()
  // for all server: If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower (§5.1)
	if args.Term > rf.currentTerm {
		rf.Convert to follower()
		rf.Not for anyone()
	}
// 2. If votedFor is null or candidateId, and candidate's log is at
// least as up-to-date as receiver's log, grant vote (§5.2, §5.4)
	if rf.Can vote,up-to-date Yes(args) {
		// After clearing the status, you also need to vote for candidate
		// Reset heartbeat timeout after voting
		rf.Reset Heart Timeout()
		rf.vote()
		reply.VoteGranted = true
	} else {
		reply.VoteGranted = false
	}
}

peers handle AppendEntries requests from leader because lab2A does not involve log entries, so 2A is also simple

  1. Reply false if term < currentTerm (§5.1)
func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
	rf.mu.Lock()
	defer func() {
		reply.Term = rf.currentTerm
		rf.mu.Unlock()
	}()
	// Reply false if term < currentTerm (§5.1)
  if term < currentTerm {
    return false
  }

	// 2A
  // If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower (§5.1)
	if args.Term > rf.currentTerm {
    rf.Set up voteFor()Not for anyone
	}
	// If AppendEntries RPC received from new leader: convert to follower
	rf.Convert to follower And reset the heartbeat(args.Term)

  // ...lab2B, 2C, 2D
  
	// Last success
	reply.Success = true
}

Here's a little bit of your own idea, after you've judged term, you can set the state of follower without a brain, because in Fig2:

If RPC request or response contains term T > currentTerm: set currentTerm = T, convert to follower (§5.1)

If AppendEntries RPC received from new leader: convert to follower

Combining the two, it is no problem to turn follower directly without brain. Whether follower or follower, candidate becomes follower, and leader finds that there is no lower term than it and also becomes follower, which at least guarantees the uniqueness of the leader of lab2A.

experimental result

What's your feeling?

When I come back after doing this, I really feel a lot. When I first touched lab2A, it was just like I was just touching lab1. It is always difficult from 0 to 1, but I dare to code the first line of code, even if it is a line of comments, just feel at the code.

mit 6.824🐂🍺Of

Tags: Go Distribution raft

Posted on Sun, 10 Oct 2021 13:09:55 -0400 by amycrystal123