# Unexpectedly, I wrote 10000 words for a current limit!

As a common stability measure in microservices, current limiting is certainly often asked in interviews. I often like to ask what you know about current limiting algorithms during interviews? Have you seen the source code? What is the implementation principle?

The first part first talks about the current limiting algorithm, and finally the implementation principle of the source code.

# Current limiting algorithm

The current limiting algorithms can be divided into four categories: fixed window counter, sliding window counter, leaky bucket (also known as funnel, English Leaky bucket) and token bucket (English Token bucket).

## Fixed window

Fixed window, compared with other current limiting algorithms, this should be the simplest one.

It simply counts the number of requests in a fixed time window. If it exceeds the threshold of the number of requests, it will be discarded directly.

For example, the yellow area in the figure below is a fixed time window. The default time range is 60s and the number of current limits is 100.

As shown in the brackets in the figure, there was no traffic for a period of time, and 100 requests came in the next 30 seconds. At this time, because the current limit threshold was not exceeded, all requests passed, and then 100 requests passed in the next window within 20 seconds.

Therefore, in disguise, it is equivalent to 200 requests passed within 40 seconds of this bracket, exceeding our current limit threshold. Current limiting

## sliding window

In order to optimize this problem, there is a sliding window algorithm. As the name suggests, the sliding window is that the time window moves continuously over time.

Sliding window divides a fixed time window into N small windows, and then counts each small window separately. The sum of all small window requests cannot exceed the current limit threshold we set.

The following figure is an example. Suppose our window is divided into three small windows, all of which are 20s. Similarly, based on the above example, when 100 requests come in the third 20s, they can be passed.

Then the time window slides, and 100 requests come for the next 20s request. At this time, the number of requests within 60s of our sliding window must exceed 100, so the request is rejected. ## Leaky bucket

The leaky bucket algorithm, as its name implies, is a leaky bucket. No matter how many requests are made, it will eventually flow out at a fixed outlet flow rate. If the requested flow exceeds the leaky bucket size, the excess flow will be discarded.

In other words, the speed of flow inflow is uncertain, but the speed of flow outflow is constant.

This is similar to the idea of MQ peak shaving and valley filling. In the face of a sudden surge of traffic, the leaky bucket algorithm can queue at a constant speed and limit the flow at a fixed speed.

The advantage of leaky bucket algorithm is uniform speed, which is both an advantage and a disadvantage. Many people say that leaky bucket can not handle sudden increase of flow, which is not accurate.

The leaking bucket is supposed to deal with the intermittent sudden increase of flow. The flow rises at once, and then the system can't deal with it. It can be dealt with at idle time, so as to prevent the sudden increase of flow from causing the system collapse and protect the stability of the system.

However, in another way of thinking, in fact, these sudden increases in traffic have no pressure on the system. You are still queuing slowly and evenly, which is actually a waste of system performance.

Therefore, for this scenario, the token bucket algorithm has more advantages than the leaky bucket algorithm.

## token bucket

Token bucket algorithm means that the system throws tokens into the token bucket at a certain speed. When a request comes, it will apply for a token in the token bucket. If the token can be obtained, the request can proceed normally, otherwise it will be discarded.

Current token bucket algorithms, such as Guava and Sentinel, have a cold start / warm-up mode. In order to avoid hanging up the system while the traffic surges, the token bucket algorithm will be cold started in the first period of time. With the increase of traffic, the system will dynamically adjust the speed of generating tokens according to the traffic, and finally until the request reaches the system threshold.

# Source code example

We take sentinel as an example. The sliding window algorithm is used in sentinel statistics, and then the leaky bucket and token bucket algorithms are also used.

## sliding window

Sentinel uses the sliding window algorithm for statistics, but its implementation is a little different from the figure I drew above. In fact, it is more reasonable to use a circle to describe the sliding window in sentinel.

In the early stage, nodes are created, and then slots are strung together to form a responsibility chain mode. StatisticSlot counts data through a sliding window. FlowSlot is the real logic of current limiting, and there are some degradation and system protection measures. Finally, the current limiting mode of the whole sentinel is formed. Just look at the official picture. The circle is disgusting

The implementation of sliding window mainly depends on the code of LeapArray. By default, the relevant parameters of time window are defined.

For sentinel, in fact, windows are divided into two levels: seconds and minutes. The number of windows is 2 in seconds and 60 in minutes. The time length of each window is 1s, and the total time period is 60s. It is divided into 60 windows. Here we take the statistics of minute level.

```public abstract class LeapArray<T> {
//Window time length, milliseconds, default 1000ms
protected int windowLengthInMs;
//Number of windows, 60 by default
protected int sampleCount;
//Millisecond time period, 60 * 1000 by default
protected int intervalInMs;
//Second time period, default 60
private double intervalInSecond;
//Time window array
protected final AtomicReferenceArray<WindowWrap<T>> array;
```

Then we need to see how it calculates the current window. In fact, it is written in the source code clearly, but it is not easy to understand if you think of it as a straight line extension according to your previous imagination.

First of all, it is relatively simple to calculate the array index subscript and time window time. The difficulty lies in the third point. What the hell is that the window is greater than old? Let's talk about these cases in detail.

1. The time window in the array is empty, which means that the time has come after our initialization time. At this time, new a new window is updated through CAS, and then return to the new window.

2. In the second case, if the time window is just equal, there is nothing to say about returning directly

3. The third situation is difficult to understand. You can refer to the diagrams of the two time lines for a better understanding. The first time the time window is completed, it reaches 1200, and then the circular time window starts to cycle. The new time starting position is 1200, and then the time of the time window comes to 1676. If the position of B2 is still the old window, it is 600, So we want to reset the time of the previous time window to the current time.

4. The last general case is unlikely to happen unless the clock is set back

From this, we can find that statistics are made for each WindowWrap time window. Finally, the QPS results of time window statistics are actually used in the following places. I won't repeat it here. Just know.

```private int calculateTimeIdx(/*@Valid*/ long timeMillis) {
long timeId = timeMillis / windowLengthInMs;
// Calculate current index so we can map the timestamp to the leap array.
return (int) (timeId % array.length());
}

protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
return timeMillis - timeMillis % windowLengthInMs;
}

public WindowWrap<T> currentWindow(long timeMillis) {
//If the current time is less than 0, null will be returned
if (timeMillis < 0) {
return null;
}
//Calculate the index of the time window
int idx = calculateTimeIdx(timeMillis);
//  Calculates the start time of the current time window
long windowStart = calculateWindowStart(timeMillis);

while (true) {
//Gets the window in the window array
WindowWrap<T> old = array.get(idx);
if (old == null) {
/*
*     B0       B1      B2    NULL      B4
* ||_______|_______|_______|_______|_______||___
* 200     400     600     800     1000    1200  timestamp
*                             ^
*                          time=888
* For example, the current time is 888. According to the calculated array, the window position is empty, so just create a new window directly
*/
WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
if (array.compareAndSet(idx, null, window)) {
// Successfully updated, return the created bucket.
return window;
} else {
// Contention failed, the thread will yield its time slice to wait for bucket available.
}
} else if (windowStart == old.windowStart()) {
/*
*     B0       B1      B2     B3      B4
* ||_______|_______|_______|_______|_______||___
* 200     400     600     800     1000    1200  timestamp
*                             ^
*                          time=888
* This is better, just equal to, just return directly
*/
return old;
} else if (windowStart > old.windowStart()) {
/*
*     B0       B1      B2     B3      B4
* |_______|_______|_______|_______|_______||___
* 200     400     600     800     1000    1200  timestamp
*             B0       B1      B2    NULL      B4
* |_______||_______|_______|_______|_______|_______||___
* ...    1200     1400    1600    1800    2000    2200  timestamp
*                              ^
*                           time=1676
* This should be understood as a circle. If it was 1200 before, it would be a complete circle, and then continue to start from 1200. If it is 1676 now, it will fall at B2,
* The window start time is 1600, and the obtained old time will actually be 600, so it must be expired. Just reset the window directly
*/
if (updateLock.tryLock()) {
try {
// Successfully get the update lock, now we reset the bucket.
return resetWindowTo(old, windowStart);
} finally {
updateLock.unlock();
}
} else {
}
} else if (windowStart < old.windowStart()) {
//  This is unlikely to happen, huh.. Clock callback
return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
}
}
}
```

## Leaky bucket

sentinel mainly controls the flow according to the flow control in FlowSlot. RateLimiterController is the implementation of leaky bucket algorithm. This implementation is much simpler than others. You should understand it after a little look.

1. First, calculate the time spent in sharing the current request within 1s, and then calculate the estimated time of this request

2. If it is less than the current time, the current time will prevail and you can return

3. On the contrary, if it exceeds the current time, it will be queued. When waiting, it is necessary to judge whether it exceeds the current maximum waiting time. If it exceeds the current maximum waiting time, it will be discarded directly

4. Update the last pass time before it is exceeded, and then compare whether it times out again. If it still times out, reset the time. On the contrary, if it is within the waiting time range, wait. If it is not, it can pass

```public class RateLimiterController implements TrafficShapingController {
//The maximum waiting timeout is 500ms by default
private final int maxQueueingTimeMs;
//Current limiting quantity
private final double count;
//Last pass time
private final AtomicLong latestPassedTime = new AtomicLong(-1);

@Override
public boolean canPass(Node node, int acquireCount, boolean prioritized) {
// Pass when acquire count is less or equal than 0.
if (acquireCount <= 0) {
return true;
}
// Reject when count is less or equal than 0.
// Otherwise,the costTime will be max of long and waitTime will overflow in some cases.
if (count <= 0) {
return false;
}

long currentTime = TimeUtil.currentTimeMillis();
//Spread the time evenly to the cost within 1s
long costTime = Math.round(1.0 * (acquireCount) / count * 1000); // 1 / 100 * 1000 = 10ms

//Calculate the estimated time for this request
long expectedTime = costTime + latestPassedTime.get();

//The elapsed time is less than the current time, pass, and the last pass time  =  current time
if (expectedTime <= currentTime) {
latestPassedTime.set(currentTime);
return true;
} else {
//If the expected passing time exceeds the current time, queue up and get it again to avoid problems. The difference is the waiting time
long waitTime = costTime + latestPassedTime.get() - TimeUtil.currentTimeMillis();
//If the waiting time exceeds the maximum waiting time, discard
if (waitTime > maxQueueingTimeMs) {
return false;
} else {
//On the contrary, the last pass time can be updated
try {
waitTime = oldTime - TimeUtil.currentTimeMillis();
//If the maximum timeout is exceeded after the update, it will be discarded and the time will be reset
if (waitTime > maxQueueingTimeMs) {
return false;
}
//Within the time frame, wait
if (waitTime > 0) {
}
return true;
} catch (InterruptedException e) {
}
}
}
return false;
}

}
```

## Token bucket

The last is the token bucket. This is not the replication of the implementation, but what you will find when you look at the source code... sentinel's token bucket implementation is based on Guava, and the code is in WarmUpController.

In fact, we can ignore all kinds of calculation logic of this algorithm (because I don't understand it), but we can only be clear in the process.

For several core parameters, see the notes. The calculation logic in the construction method temporarily doesn't care how it is calculated (I don't understand it, but it doesn't affect our understanding). The key depends on how canPass does it.

1. Get the QPS of the current window and the previous window

2. Filling tokens means throwing tokens into the bucket. Then let's look at the logic of filling tokens

```public class WarmUpController implements TrafficShapingController {
//Current limiting QPS
protected double count;
//Cold start factor, default = 3
private int coldFactor;
protected int warningToken = 0;
//Maximum number of tokens
private int maxToken;
//Slope, the speed at which the token is generated
protected double slope;

//Number of tokens stored
protected AtomicLong storedTokens = new AtomicLong(0);
//Last token fill time
protected AtomicLong lastFilledTime = new AtomicLong(0);

public WarmUpController(double count, int warmUpPeriodInSec, int coldFactor) {
construct(count, warmUpPeriodInSec, coldFactor);
}

public WarmUpController(double count, int warmUpPeriodInSec) {
construct(count, warmUpPeriodInSec, 3);
}

private void construct(double count, int warmUpPeriodInSec, int coldFactor) {
if (coldFactor <= 1) {
throw new IllegalArgumentException("Cold factor should be larger than 1");
}
this.count = count;
this.coldFactor = coldFactor;

//stableInterval   Time period of stable token generation, 1/QPS
//warmUpPeriodInSec   Warm up / cold start time  , default   10s
warningToken = (int)(warmUpPeriodInSec * count) / (coldFactor - 1);
maxToken = warningToken + (int)(2 * warmUpPeriodInSec * count / (1.0 + coldFactor));
//The calculation of slope refers to Guava as a fixed formula
slope = (coldFactor - 1.0) / count / (maxToken - warningToken);
}

@Override
public boolean canPass(Node node, int acquireCount, boolean prioritized) {
//QPS passed in the current time window
long passQps = (long) node.passQps();
//Previous time window QPS
long previousQps = (long) node.previousPassQps();
//Fill token
syncToken(previousQps);

//  Start calculating its slope
//  If you enter the cordon, start adjusting his qps
long restToken = storedTokens.get();
if (restToken >= warningToken) {
//The current token exceeds the warning line, and the number of tokens exceeding the warning line is obtained
long aboveToken = restToken - warningToken;
//  The consumption speed is faster than warning, but slower than
// current interval = restToken*slope+1/count
double warningQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count));
if (passQps + acquireCount <= warningQps) {
return true;
}
} else {
if (passQps + acquireCount <= count) {
return true;
}
}

return false;
}
}
```

The logic to populate the token is as follows:

1. Get the current time, and then remove the milliseconds to get the second time

2. The judgment time is less than here to control the loss of tokens every second

3. Then, coolDownTokens is used to calculate how our cold start / warm-up calculates the filling token

4. The number of tokens left at present will be calculated later, which will not be mentioned. Subtracting the last consumption is the remaining tokens in the bucket

```protected void syncToken(long passQps) {
long currentTime = TimeUtil.currentTimeMillis();
//Remove the milliseconds of the current time
currentTime = currentTime - currentTime % 1000;
long oldLastFillTime = lastFilledTime.get();
//Controls the token to be populated once per second
if (currentTime <= oldLastFillTime) {
return;
}
//Current number of tokens
long oldValue = storedTokens.get();
//Get the number of new tokens, including the logic of adding tokens, which is the logic of preheating
long newValue = coolDownTokens(currentTime, passQps);
if (storedTokens.compareAndSet(oldValue, newValue)) {
//The number of tokens stored is, of course, subtracted from the last token consumed
long currentValue = storedTokens.addAndGet(0 - passQps);
if (currentValue < 0) {
storedTokens.set(0L);
}
lastFilledTime.set(currentTime);
}

}
```
1. In the beginning, because both lastFilledTime and oldValue are 0, a very large number will be obtained according to the current timestamp. Finally, if it is smaller than maxToken, the maximum number of tokens will be obtained. Therefore, maxToken tokens will be generated during the first initialization

2. Then we assume that the QPS of the system is very low at first, and then suddenly soars high. Therefore, at the beginning, go back to the logic higher than the warning line, and then passQps is very low, so it will always be in the state of filling the token bucket (currentTime - lastFilledTime.get() will always be 1000, that is, 1 second), so the maximum number of qpscout tokens will be filled every time

3. Then the sudden increase in traffic comes. The QPS is very high at an instant. Slowly, the number of tokens will be consumed below the warning line. Go to our if logic, and then increase the number of tokens according to the count

```private long coolDownTokens(long currentTime, long passQps) {
long oldValue = storedTokens.get();
long newValue = oldValue;

//If the water level is below the warning line, a token is generated
if (oldValue < warningToken) {
//If the token in the bucket is lower than the warning line, the number of new tokens is obtained according to the time difference of the last time. Because milliseconds are removed, the token generated in 1 second is the threshold count
//If it is 0 for the first time, a count token will be generated
newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
} else if (oldValue > warningToken) {
//On the contrary, if it is higher than the warning line, judge the QPS. Because the higher the QPS, the slower the generation of tokens. The faster the generation of tokens if the QPS is low
if (passQps < (int)count / coldFactor) {
newValue = (long)(oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
}
}
//Do not exceed the maximum number of tokens
return Math.min(newValue, maxToken);
}
```

After the above logic is straightened out, we can continue to look at some logic of current limiting:

1. The logic of token calculation is completed, and then judge whether it exceeds the warning line. According to the above statement, the low QPS state must have been exceeded all the time, so a warningQps will be calculated according to the slope. Because we are in the cold start state, at this stage, we need to calculate a QPS quantity according to the slope, so that the flow can slowly reach the peak that the system can bear. For example, if the count is 100, when the QPS is very low, the token bucket is always full, but the system will control the QPS. The QPS actually passed is warningQps, which may be only 10 or 20 according to the algorithm (how to calculate does not affect understanding). When the QPS primary key is raised, the above token becomes smaller, and the whole warningQps becomes larger until it comes below the warning line and into the else logic.

2. The sudden increase of traffic is the case that the else logic is lower than the warning line. Our token bucket keeps increasing tokens according to the count. At this time, the speed of consuming tokens exceeds the speed of generating tokens, which may lead to being always below the warning line. At this time, of course, we need to judge the current limit according to the highest QPS.

``` long restToken = storedTokens.get();
if (restToken >= warningToken) {
//The current token exceeds the warning line, and the number of tokens exceeding the warning line is obtained
long aboveToken = restToken - warningToken;
//  The consumption speed is faster than warning, but slower than
// current interval = restToken*slope+1/count
double warningQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count));
if (passQps + acquireCount <= warningQps) {
return true;
}
} else {
if (passQps + acquireCount <= count) {
return true;
}
}
```

Therefore, imagine the process from low QPS to sudden increase QPS:

1. At the beginning, the QPS of the system was very low. We filled the token bucket directly after initialization

2. Then, the low QPS state lasted for a period of time, because we always filled the maximum number of QPS tokens (because the minimum value is taken, the tokens in the bucket basically will not change), so the token bucket has always been in the full state, and the current limit of the whole system is also at a relatively low level

The above parts are always above the warning line, which is actually a process called cold start / preheating.

1. Then the QPS of the system suddenly surged, and the token consumption speed was too fast. Even if we increased the maximum number of QPS each time, we still couldn't maintain the consumption, so the tokens in the bucket kept decreasing. At this time, the limit QPS in the cold start phase was also increasing, and finally until the tokens in the bucket were lower than the warning line

2. When it is lower than the warning line, the system will limit the current according to the maximum QPS. This process is the process that the system is gradually reaching the maximum current limit

In this way, we actually achieve our goal of dealing with sudden increase of traffic. The whole system is slowly adapting to the sudden high QPS, and then finally reaches the QPS threshold of the system. 