Sensitive word filtering: AOP + annotation + DFA algorithm

preface

In terms of sensitive word filtering, we should say that it's big or small. Generally legal users will not have a big problem, but some illegal users deliberately engage in your business, it is not very friendly! After all, now the network is so developed, what matters to the Internet, the impact will not be small!!!
Today, based on DFA algorithm, AOP and custom annotation are used to implement sensitive word filtering scheme!
Note here: from the Internet to find a DFA algorithm tool class, he modified it, combined with AOP, more convenient to use!!!

Practice

JAVA tool class of DFA algorithm

Let's talk about my improvements on this tool class:

  • It is a static tool class. The path of the word library needs to be modified every time. In order to use Spring injection, the path of the word library needs to be injected into the tool class
  • For the initialization of the character library, some logs are added
  • When Spring scans the bean, it adds some conditions (load the bean when sensitive word filtering is enabled)
package com.zyu.boot.demo.utils.sensitiveword;


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import java.io.*;
import java.util.*;

/**
 * Sensitive word processing tool DFA algorithm implementation
 */
@Component
@ConditionalOnProperty(name = "sensitiveWord.enable", havingValue = "true")
public class SensitiveWordUtil {
    //Font path
    @Value("${sensitiveWord.path}")
    private String filePath;
    private static String SENSITIVE_WORD_PATH;

    private static Logger logger = LoggerFactory.getLogger(SensitiveWordUtil.class);

    /**
     * Sensitive words matching rules
     */
    public static final int MinMatchTYpe = 1;      //Minimum matching rules, such as sensitive THESAURUS ["China", "Chinese"], sentence: "I am Chinese", matching result: I am [Chinese]
    public static final int MaxMatchType = 2;      //Maximum matching rules, such as sensitive THESAURUS ["China", "Chinese"], sentence: "I am Chinese", matching result: I am [Chinese]

    /**
     * Sensitive word set
     */
    @SuppressWarnings("rawtypes")
    public static HashMap sensitiveWordMap;

    /**
     * Initializing sensitive thesaurus and building DFA algorithm model
     */
    @PostConstruct
    private synchronized void init(){
        SENSITIVE_WORD_PATH = filePath;
        Set<String> sensitiveWordSet = new HashSet<>();

        // Read the sensitive word library under the specified path
        try {
            logger.info("Initializing sensitive Library....{}",SENSITIVE_WORD_PATH);
            File wordFileDir = new File(SENSITIVE_WORD_PATH);
            File[] wordFiles = wordFileDir.listFiles();
            for (File wordFile : wordFiles) {
                logger.info("load{}Font",wordFile.getName());
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(wordFile),"utf-8"));
                String line;
                while ((line = reader.readLine()) != null) {
                    sensitiveWordSet.add(line);
                    logger.trace("Load sensitive words{}",line);
                }
                reader.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
            logger.error("Failed to initialize sensitive library, no library loaded....");
        }
        logger.info("load{}Sensitive words",sensitiveWordSet.size());
        initSensitiveWordMap(sensitiveWordSet);
    }

    /**
     * Initializing sensitive thesaurus and building DFA algorithm model
     *
     * @param sensitiveWordSet Sensitive Thesaurus
     */
    @SuppressWarnings({"rawtypes", "unchecked"})
    private static void initSensitiveWordMap(Set<String> sensitiveWordSet) {
        //Initialize sensitive word container to reduce expansion operation
        sensitiveWordMap = new HashMap(sensitiveWordSet.size());
        String key;
        Map nowMap;
        Map<String, String> newWorMap;
        //Iterative sensitive wordset
        Iterator<String> iterator = sensitiveWordSet.iterator();
        while (iterator.hasNext()) {
            //keyword
            key = iterator.next();
            nowMap = sensitiveWordMap;
            for (int i = 0; i < key.length(); i++) {
                //Convert to char
                char keyChar = key.charAt(i);
                //Get keywords in the library
                Object wordMap = nowMap.get(keyChar);
                //If the key exists, it is assigned directly for the next cycle
                if (wordMap != null) {
                    nowMap = (Map) wordMap;
                } else {
                    //If it does not exist, build a map and set isEnd to 0, because it is not the last one
                    newWorMap = new HashMap<>();
                    //Not the last one
                    newWorMap.put("isEnd", "0");
                    nowMap.put(keyChar, newWorMap);
                    nowMap = newWorMap;
                }

                if (i == key.length() - 1) {
                    //the last one
                    nowMap.put("isEnd", "1");
                }
            }
        }
    }

    /**
     * Determine whether the text contains sensitive characters
     *
     * @param txt       written words
     * @param matchType Matching rule 1: minimum matching rule, 2: maximum matching rule
     * @return Return true if included, false otherwise
     */
    public static boolean contains(String txt, int matchType) {
        boolean flag = false;
        for (int i = 0; i < txt.length(); i++) {
            int matchFlag = checkSensitiveWord(txt, i, matchType); //Determine whether sensitive characters are included
            if (matchFlag > 0) {    //Greater than 0 exists, return true
                flag = true;
            }
        }
        return flag;
    }

    /**
     * Determine whether the text contains sensitive characters
     *
     * @param txt written words
     * @return Return true if included, false otherwise
     */
    public static boolean contains(String txt) {
        return contains(txt, MaxMatchType);
    }

    /**
     * Get sensitive words in text
     *
     * @param txt       written words
     * @param matchType Matching rule 1: minimum matching rule, 2: maximum matching rule
     * @return
     */
    public static Set<String> getSensitiveWord(String txt, int matchType) {
        Set<String> sensitiveWordList = new HashSet<>();

        for (int i = 0; i < txt.length(); i++) {
            //Determine whether sensitive characters are included
            int length = checkSensitiveWord(txt, i, matchType);
            if (length > 0) {//Exist, add to list
                sensitiveWordList.add(txt.substring(i, i + length));
                i = i + length - 1;//The reason for minus 1 is that for will increase automatically
            }
        }

        return sensitiveWordList;
    }

    /**
     * Get sensitive words in text
     *
     * @param txt written words
     * @return
     */
    public static Set<String> getSensitiveWord(String txt) {
        return getSensitiveWord(txt, MaxMatchType);
    }

    /**
     * Replace sensitive characters
     *
     * @param txt         text
     * @param replaceChar Replace the characters and match the sensitive words one by one. For example, statement: I love Chinese sensitive words: Chinese, replace characters: *, replace results: I love***
     * @param matchType   Sensitive words matching rules
     * @return
     */
    public static String replaceSensitiveWord(String txt, char replaceChar, int matchType) {
        String resultTxt = txt;
        //Get all sensitive words
        Set<String> set = getSensitiveWord(txt, matchType);
        Iterator<String> iterator = set.iterator();
        String word;
        String replaceString;
        while (iterator.hasNext()) {
            word = iterator.next();
            replaceString = getReplaceChars(replaceChar, word.length());
            resultTxt = resultTxt.replaceAll(word, replaceString);
        }

        return resultTxt;
    }

    /**
     * Replace sensitive characters
     *
     * @param txt         text
     * @param replaceChar Replace the characters and match the sensitive words one by one. For example, statement: I love Chinese sensitive words: Chinese, replace characters: *, replace results: I love***
     * @return
     */
    public static String replaceSensitiveWord(String txt, char replaceChar) {
        return replaceSensitiveWord(txt, replaceChar, MaxMatchType);
    }

    /**
     * Replace sensitive characters
     *
     * @param txt        text
     * @param replaceStr The replacement string and the matching sensitive words are replaced by characters one by one, for example, statement: I love Chinese sensitive words: Chinese, replacement string: [shield], replacement result: I love [shield]
     * @param matchType  Sensitive words matching rules
     * @return
     */
    public static String replaceSensitiveWord(String txt, String replaceStr, int matchType) {
        String resultTxt = txt;
        //Get all sensitive words
        Set<String> set = getSensitiveWord(txt, matchType);
        Iterator<String> iterator = set.iterator();
        String word;
        while (iterator.hasNext()) {
            word = iterator.next();
            resultTxt = resultTxt.replaceAll(word, replaceStr);
        }

        return resultTxt;
    }

    /**
     * Replace sensitive characters
     *
     * @param txt        text
     * @param replaceStr The replacement string and the matching sensitive words are replaced by characters one by one, for example, statement: I love Chinese sensitive words: Chinese, replacement string: [shield], replacement result: I love [shield]
     * @return
     */
    public static String replaceSensitiveWord(String txt, String replaceStr) {
        return replaceSensitiveWord(txt, replaceStr, MaxMatchType);
    }

    /**
     * Get replacement string
     *
     * @param replaceChar
     * @param length
     * @return
     */
    private static String getReplaceChars(char replaceChar, int length) {
        String resultReplace = String.valueOf(replaceChar);
        for (int i = 1; i < length; i++) {
            resultReplace += replaceChar;
        }

        return resultReplace;
    }

    /**
     * Check whether sensitive characters are included in the text. The rules are as follows: < br >
     *
     * @param txt
     * @param beginIndex
     * @param matchType
     * @return If it exists, the length of the sensitive word character is returned; if it does not exist, 0 is returned
     */
    @SuppressWarnings("rawtypes")
    private static int checkSensitiveWord(String txt, int beginIndex, int matchType) {
        //End marker of sensitive words: used when there is only one sensitive word
        boolean flag = false;
        //The number of matching identities is 0 by default
        int matchFlag = 0;
        char word;
        Map nowMap = sensitiveWordMap;
        for (int i = beginIndex; i < txt.length(); i++) {
            word = txt.charAt(i);
            //Get the specified key
            nowMap = (Map) nowMap.get(word);
            if (nowMap != null) {//If it exists, judge whether it is the last one
                //Find the corresponding key, match ID + 1
                matchFlag++;
                //If it is the last matching rule, end the cycle and return the number of matching IDS
                if ("1".equals(nowMap.get("isEnd"))) {
                    //End flag bit is true
                    flag = true;
                    //Minimum rule, direct return, maximum rule still need to be searched
                    if (MinMatchTYpe == matchType) {
                        break;
                    }
                }
            } else {//No, return directly
                break;
            }
        }
        if (matchFlag < 2 || !flag) {//Length must be greater than or equal to 1, which is a word
            matchFlag = 0;
        }
        return matchFlag;
    }

//    public static void main(String[] args) {
//
//        System.out.println (number of sensitive words:+ SensitiveWordUtil.sensitiveWordMap.size());
//        String string = "too much sentimentality may be limited to the scenes on the feeding base screen. "
//                +"Then our role is to join up with the protagonist's fans in anger, sorrow and too farfetched to attach our emotions to the screen plot, and then moved to tears."
//                +"When you are sad, lie in someone's arms and fully explain your heart or cell phone card duplicator. A bitch has a glass of red wine and a movie. In the dead of night, turn off the phone and stay still. ";
//        System.out.println (number of words to be detected:+ string.length());
//
//        //Keyword or not
//        boolean result = SensitiveWordUtil.contains(string);
//        System.out.println(result);
//        result = SensitiveWordUtil.contains(string, SensitiveWordUtil.MinMatchTYpe);
//        System.out.println(result);
//
//        //Get sensitive words in statements
//        Set<String> set = SensitiveWordUtil.getSensitiveWord(string);
//        System.out.println (the number of sensitive words in the statement is:+ set.size() + ".  Include: "+ set";
//        set = SensitiveWordUtil.getSensitiveWord(string, SensitiveWordUtil.MinMatchTYpe);
//        System.out.println (the number of sensitive words in the statement is:+ set.size() + ".  Include: "+ set";
//
//        //Replacing sensitive words in sentences
//        String filterStr = SensitiveWordUtil.replaceSensitiveWord(string, '*');
//        System.out.println(filterStr);
//        filterStr = SensitiveWordUtil.replaceSensitiveWord(string, '*', SensitiveWordUtil.MinMatchTYpe);
//        System.out.println(filterStr);
//
//        String filterStr2 = SensitiveWordUtil.replaceSensitiveWord(string, "[* sensitive word *]");
//        System.out.println(filterStr2);
//        filterStr2 = SensitiveWordUtil.replaceSensitiveWord(string, "[* sensitive word *]", SensitiveWordUtil.MinMatchTYpe);
//        System.out.println(filterStr2);
//    }

}

A gorgeous dividing line

Customize a logo annotation for sensitive word filtering

That is to say, after the annotation is marked on a method or class, it means to filter the sensitive words of the method's input parameters in the class

package com.zyu.boot.demo.annotation;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
 * Mark annotation of sensitive word filtering
 * Only when the value specified in the display is true will the filtering function be enabled.
 * Comments on methods take precedence. Last time for a class (if filtering is not enabled for all methods in the class, false can be added to unfiltered methods)
 */
@Target({ElementType.TYPE,ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
public @interface SensitiveWordFilter {
    /**
     * Filter is not enabled by default
     * @return
     */
    public boolean value() default false;
}

Define an AOP facet

Take the above notes as the cut-off point, mainly dealing with the process of notes and sensitive word replacement
Of course, to implement AOP, we need to add an AOP dependency

<!-- integrate Aop -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

The following is the AOP implementation class
Also, use @ ConditionalOnProperty for conditional loading

Another gorgeous dividing line

  • Processing annotation: if the annotation on the method is not empty, it will take effect directly; if it is empty, the annotation on the class will be taken again. If it is true, the filter will be executed
  • Sensitive word replacement: in fact, only String type and complex type are replaced. Get the type of the parameter. If it is String, replace the sensitive word directly. If it is a complex type, violent reflection is used to get the type of every field and replace the sensitive words of the String type field.
package com.zyu.boot.demo.aop.sensitiveword;

import com.zyu.boot.demo.annotation.SensitiveWordFilter;
import com.zyu.boot.demo.utils.sensitiveword.SensitiveWordUtil;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Pointcut;
import org.aspectj.lang.reflect.MethodSignature;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.stereotype.Component;

import java.lang.reflect.Field;
import java.lang.reflect.Method;

/**
 * Section class of sensitive word filtering function
 * Check the sensitive words of the method annotated with sensitive wordfilter
 */
@Aspect
@Component
@ConditionalOnProperty(name = "sensitiveWord.enable", havingValue = "true")
public class SensitiveWordAspect {
    @Pointcut("@within(com.zyu.boot.demo.annotation.SensitiveWordFilter) || @annotation(com.zyu.boot.demo.annotation.SensitiveWordFilter)")
    public void sensitiveWordPointCut() {
    }

    @Around("sensitiveWordPointCut()")
    public Object around(ProceedingJoinPoint point) throws Throwable {
        boolean enableFilter = false;
        MethodSignature signature = (MethodSignature) point.getSignature();
        Method method = signature.getMethod();
        Class<?> clazz = method.getDeclaringClass();
        SensitiveWordFilter methodSensitiveWordFilter = method.getAnnotation(SensitiveWordFilter.class);
        SensitiveWordFilter clazzSensitiveWordFilter = clazz.getAnnotation(SensitiveWordFilter.class);

        if(methodSensitiveWordFilter != null){//Priority method comments
            enableFilter = methodSensitiveWordFilter.value();
        }else{//Next, take the comments on the class
            enableFilter = clazzSensitiveWordFilter.value();
        }

        Class<?>[] parameterTypes = method.getParameterTypes();
        Object[] paramValues = point.getArgs();
        if (enableFilter == true) {
            for (int i = 0; i < paramValues.length; i++) {
                Object value = paramValues[i];
                if (parameterTypes[i].isAssignableFrom(String.class)) {//String type parameter direct filtering
                    if(null != value){
                        value = SensitiveWordUtil.replaceSensitiveWord((String) value, '*', SensitiveWordUtil.MinMatchTYpe);
                    }
                } else if (!isBasicType(parameterTypes[i])) {//Object type traversal parameter, filtering String type
                    Field[] fields = value.getClass().getDeclaredFields();
                    for (Field field : fields) {
                        Class<?> type = field.getType();
                        if(type.isAssignableFrom(String.class)){
                            field.setAccessible(true);
                            String fieldValue = (String)field.get(value);
                            if(null != fieldValue){
                                fieldValue = SensitiveWordUtil.replaceSensitiveWord((String) fieldValue, '*', SensitiveWordUtil.MinMatchTYpe);
                                field.set(value,fieldValue);
                            }
                        }
                    }
                }
                paramValues[i] = value;
            }
        }
        return point.proceed(paramValues);
    }

    /**
     * Determine whether a parameter type is a basic type
     *
     * @param clazz
     * @return
     */
    private boolean isBasicType(Class clazz) {
        if (clazz.isAssignableFrom(Integer.class) ||
                clazz.isAssignableFrom(Byte.class) ||
                clazz.isAssignableFrom(Long.class) ||
                clazz.isAssignableFrom(Double.class) ||
                clazz.isAssignableFrom(Float.class) ||
                clazz.isAssignableFrom(Character.class) ||
                clazz.isAssignableFrom(Short.class) ||
                clazz.isAssignableFrom(Boolean.class)) {
            return true;
        }
        return false;
    }
}

modify application.yml configuration file

#Configuration of sensitive word filtering
sensitiveWord:
  enable: true #Enable sensitive word filtering
  path: D:\test\sensitive-words #Load path of font

Prepare word library

In order not to let you say that I am very yellow and violent, forge some word bank! In reality, you can go to the Internet to down load some sensitive word databases and add them to the directory of sensitive words!

Test it

Annotate a method with @ sensitive wordfilter (true)

@SensitiveWordFilter(true)
public User createUser(User user) {
    System.out.println(user);
    return user;
}

Write a test method

package com.zyu.boot.demo;

import com.zyu.boot.demo.entity.User;
import com.zyu.boot.demo.service.UserService;
import com.zyu.boot.demo.utils.pwd.PasswordHash;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.security.NoSuchAlgorithmException;
import java.security.spec.InvalidKeySpecException;
import java.util.Date;

@SpringBootTest(classes = {DemoApplication.class})
@RunWith(SpringRunner.class)
public class UserTest {
    @Autowired
    private UserService userService;

    @Test
    public void sensitiveWordTest() throws InvalidKeySpecException, NoSuchAlgorithmException {
        User user = new User();
        user.setUserid("zyufocus Two strokes");
        user.setPassword(PasswordHash.createHash("zyufocus"));
        user.setName("I'm tie Han");
        user.setAge(18);
        user.setGender(false);
        user.setCreateDate(new Date());
        user.setRole("admin");

        user = userService.createUser(user);
        System.out.println(user);
    }
}

Start test

  • Sensitive word library scanned when project started
  • test result
User{userid='zyufocus**', password='1000:9c785ced38921934eeee7572cd0146109efaadcf4bc8e5d65d33b648afd0e9b5:75caa0d93b8a40d7bcdf3dada9f4b732e4b5af07fca7982c0087f6e76eab1b6e9195fc86439877a3254d0c8ca726d4c43369b0d923afae0b1c5ebb091baed921', name='I am***', gender=false, age=18, createDate=Sat Jun 27 08:52:08 CST 2020, role='admin'}
User{userid='zyufocus**', password='1000:9c785ced38921934eeee7572cd0146109efaadcf4bc8e5d65d33b648afd0e9b5:75caa0d93b8a40d7bcdf3dada9f4b732e4b5af07fca7982c0087f6e76eab1b6e9195fc86439877a3254d0c8ca726d4c43369b0d923afae0b1c5ebb091baed921', name='I am***', gender=false, age=18, createDate=Sat Jun 27 08:52:08 CST 2020, role='admin'}

Here we are!

Improvement plan

In fact, there is room for improvement, such as updating and removing sensitive words

  • You can put sensitive words in redis, and go to redis regularly every day to update the next sensitive words

Conclusion

There is no end to learning

Tags: Java Spring Junit Redis

Posted on Fri, 26 Jun 2020 21:22:22 -0400 by jasonbullard