UA pool of Redis scenario application

Some data search projects involved request simulation. Based on anti crawling, a random User Agent was used, so Redis was used to implement a very simple UA pool.

background

A recent requirement is to simulate the logic of the request. The User Agent in the request header of each request should meet the following points:
  • The User Agent acquired each time is random.
  • The User Agent obtained each time (within a short time) cannot be repeated.
  • The User Agent obtained each time must contain mainstream operating system information (which can be Uinux, Windows, IOS, Android, etc.).
These three points can be solved from the source of UA data. In fact, we should pay attention to the specific implementation scheme. After a brief analysis, the process is as follows:

 

 

When designing UA pool, its data structure is very similar to ring queue:
In the above figure, assuming that UAS of different colors are completely different, they are scattered and put into the ring queue through the shuffle algorithm. In fact, after taking out one UA each time, you only need to move the cursor forward or backward one grid (you can even set the cursor to any element in the queue). The final realization is that distributed queues (just queues, not message queues) need to be implemented through middleware.

Specific implementation scheme

There is no doubt that a middleware of distributed database type is needed to store the prepared UA. At first impression, Redis is more appropriate. Next, we need to select Redis data type, mainly considering several aspects:
  • Queue nature.
  • It is best to support random access.
  • The time complexity of elements entering, leaving and random access should be low. After all, there will be a large number of interface accesses to obtain UA.
The Redis data type supporting these aspects is List. However, note that the List itself cannot be de duplicated, and the de duplication can be realized by code logic. Then, it can be imagined that the process of obtaining UA by the client is roughly as follows:
Combined with the previous analysis, the coding process includes the following steps:
  1. Prepare the UA data to be imported, which can be read from the data source or directly from the file.
  2. Because the UA data set to be imported is generally not too large, consider randomly breaking up the data of this set first. If you use Java development, you can directly use the Collections#shuffle() shuffle algorithm. Of course, you can also implement the algorithm of random distribution of data by yourself. This step is necessary for some scenarios where the simulated party will strictly test the legitimacy of UA.
  3. Import UA data into Redis list.
  4. Write Lua script of RPOP + LPUSH to realize distributed circular queue.

Coding and testing examples

Introduction of advanced client lettue dependency of Redis:
<dependency>
    <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>5.2.1.RELEASE</version>
</dependency>
Write Lua script of RPOP + LPUSH. The name of lua script is temporarily called L_RPOP_LPUSH.lua, in the resources/scripts/lua Directory:
local key = KEYS[1]
local value = redis.call('RPOP', key)
redis.call('LPUSH', key, value)
return value
This script is very simple, but it has implemented the function of circular queue. The remaining test codes are as follows:
public class UaPoolTest {

    private static RedisCommands<String, String> COMMANDS;

    private static AtomicReference<String> LUA_SHA = new AtomicReference<>();
    private static final String KEY = "UA_POOL";

    @BeforeClass
    public static void beforeClass() throws Exception {
        // initialization Redis client
        RedisURI uri = RedisURI.builder().withHost("localhost").withPort(6379).build();
        RedisClient redisClient = RedisClient.create(uri);
        StatefulRedisConnection<String, String> connect = redisClient.connect();
        COMMANDS = connect.sync();
        // Simulation construction UA Raw data for pool,Suppose there are 10 UA,namely UA-0 ... UA-9
        List<String> uaList = Lists.newArrayList();
        IntStream.range(0, 10).forEach(e -> uaList.add(String.format("UA-%d", e)));
        // shuffle the cards
        Collections.shuffle(uaList);
        // load Lua script
        ClassPathResource resource = new ClassPathResource("/scripts/lua/L_RPOP_LPUSH.lua");
        String content = StreamUtils.copyToString(resource.getInputStream(), StandardCharsets.UTF_8);
        String sha = COMMANDS.scriptLoad(content);
        LUA_SHA.compareAndSet(null, sha);
        // Redis Write in queue UA data,When there is a large amount of data, batch writing can be considered to prevent long-term blocking Redis service
        COMMANDS.lpush(KEY, uaList.toArray(new String[0]));
    }

    @AfterClass
    public static void afterClass() throws Exception {
        COMMANDS.del(KEY);
    }

    @Test
    public void testUaPool() {
        IntStream.range(1, 21).forEach(e -> {
            String result = COMMANDS.evalsha(LUA_SHA.get(), ScriptOutputType.VALUE, KEY);
            System.out.println(String.format("The first%d Obtained for the first time UA yes:%s", e, result));
        });
    }
}

The results of a certain operation are as follows:

Obtained for the first time UA yes:UA-0
Obtained for the second time UA yes:UA-8
Obtained for the third time UA yes:UA-2
Obtained for the 4th time UA yes:UA-4
Obtained for the 5th time UA yes:UA-7
Obtained for the 6th time UA yes:UA-5
Obtained for the 7th time UA yes:UA-1
Obtained for the 8th time UA yes:UA-3
Obtained for the 9th time UA yes:UA-6
Obtained for the 10th time UA yes:UA-9
Obtained for the 11th time UA yes:UA-0
Obtained for the 12th time UA yes:UA-8
Obtained for the 13th time UA yes:UA-2
Obtained for the 14th time UA yes:UA-4
Obtained for the 15th time UA yes:UA-7
Obtained for the 16th time UA yes:UA-5
Obtained for the 17th time UA yes:UA-1
Obtained for the 18th time UA yes:UA-3
Obtained for the 19th time UA yes:UA-6
Obtained for the 20th time UA yes:UA-9
It can be seen that the effect of shuffling algorithm is not bad, and the data is relatively scattered.

Summary

In fact, the design of UA pool is not difficult, and several key points need to be paid attention to:
  • Generally, there are not too many system versions of mainstream mobile devices or desktop devices, so there are not too many source UA data. The simplest implementation can be stored in files and directly written to Redis at one time.
  • Note that the UA data needs to be scattered randomly to avoid too dense UA data of the same equipment system type, so as to avoid triggering the risk control rules when simulating some requests.
  • You need to be familiar with Lua's syntax. After all, Redis's atomic instructions must be inseparable from Lua scripts.

Posted on Fri, 03 Dec 2021 22:49:09 -0500 by craigengbrecht