preface
In terms of sensitive word filtering, we should say that it's big or small. Generally legal users will not have a big problem, but some illegal users deliberately engage in your business, it is not very friendly! After all, now the network is so developed, what matters to the Internet, the impact will not be small!!!
Today, based on DFA algorithm, AOP and custom annotation are used to implement sensitive word filtering scheme!
Note here: from the Internet to find a DFA algorithm tool class, he modified it, combined with AOP, more convenient to use!!!
Practice
JAVA tool class of DFA algorithm
Let's talk about my improvements on this tool class:
- It is a static tool class. The path of the word library needs to be modified every time. In order to use Spring injection, the path of the word library needs to be injected into the tool class
- For the initialization of the character library, some logs are added
- When Spring scans the bean, it adds some conditions (load the bean when sensitive word filtering is enabled)
package com.zyu.boot.demo.utils.sensitiveword; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Value; import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty; import org.springframework.stereotype.Component; import javax.annotation.PostConstruct; import java.io.*; import java.util.*; /** * Sensitive word processing tool DFA algorithm implementation */ @Component @ConditionalOnProperty(name = "sensitiveWord.enable", havingValue = "true") public class SensitiveWordUtil { //Font path @Value("$") private String filePath; private static String SENSITIVE_WORD_PATH; private static Logger logger = LoggerFactory.getLogger(SensitiveWordUtil.class); /** * Sensitive words matching rules */ public static final int MinMatchTYpe = 1; //Minimum matching rules, such as sensitive THESAURUS ["China", "Chinese"], sentence: "I am Chinese", matching result: I am [Chinese] public static final int MaxMatchType = 2; //Maximum matching rules, such as sensitive THESAURUS ["China", "Chinese"], sentence: "I am Chinese", matching result: I am [Chinese] /** * Sensitive word set */ @SuppressWarnings("rawtypes") public static HashMap sensitiveWordMap; /** * Initializing sensitive thesaurus and building DFA algorithm model */ @PostConstruct private synchronized void init(){ SENSITIVE_WORD_PATH = filePath; Set<String> sensitiveWordSet = new HashSet<>(); // Read the sensitive word library under the specified path try { logger.info("Initializing sensitive Library....{}",SENSITIVE_WORD_PATH); File wordFileDir = new File(SENSITIVE_WORD_PATH); File[] wordFiles = wordFileDir.listFiles(); for (File wordFile : wordFiles) { logger.info("load{}Font",wordFile.getName()); BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(wordFile),"utf-8")); String line; while ((line = reader.readLine()) != null) { sensitiveWordSet.add(line); logger.trace("Load sensitive words{}",line); } reader.close(); } } catch (IOException e) { e.printStackTrace(); logger.error("Failed to initialize sensitive library, no library loaded...."); } logger.info("load{}Sensitive words",sensitiveWordSet.size()); initSensitiveWordMap(sensitiveWordSet); } /** * Initializing sensitive thesaurus and building DFA algorithm model * * @param sensitiveWordSet Sensitive Thesaurus */ @SuppressWarnings({"rawtypes", "unchecked"}) private static void initSensitiveWordMap(Set<String> sensitiveWordSet) { //Initialize sensitive word container to reduce expansion operation sensitiveWordMap = new HashMap(sensitiveWordSet.size()); String key; Map nowMap; Map<String, String> newWorMap; //Iterative sensitive wordset Iterator<String> iterator = sensitiveWordSet.iterator(); while (iterator.hasNext()) { //keyword key = iterator.next(); nowMap = sensitiveWordMap; for (int i = 0; i < key.length(); i++) { //Convert to char char keyChar = key.charAt(i); //Get keywords in the library Object wordMap = nowMap.get(keyChar); //If the key exists, it is assigned directly for the next cycle if (wordMap != null) { nowMap = (Map) wordMap; } else { //If it does not exist, build a map and set isEnd to 0, because it is not the last one newWorMap = new HashMap<>(); //Not the last one newWorMap.put("isEnd", "0"); nowMap.put(keyChar, newWorMap); nowMap = newWorMap; } if (i == key.length() - 1) { //the last one nowMap.put("isEnd", "1"); } } } } /** * Determine whether the text contains sensitive characters * * @param txt written words * @param matchType Matching rule 1: minimum matching rule, 2: maximum matching rule * @return Return true if included, false otherwise */ public static boolean contains(String txt, int matchType) { boolean flag = false; for (int i = 0; i < txt.length(); i++) { int matchFlag = checkSensitiveWord(txt, i, matchType); //Determine whether sensitive characters are included if (matchFlag > 0) { //Greater than 0 exists, return true flag = true; } } return flag; } /** * Determine whether the text contains sensitive characters * * @param txt written words * @return Return true if included, false otherwise */ public static boolean contains(String txt) { return contains(txt, MaxMatchType); } /** * Get sensitive words in text * * @param txt written words * @param matchType Matching rule 1: minimum matching rule, 2: maximum matching rule * @return */ public static Set<String> getSensitiveWord(String txt, int matchType) { Set<String> sensitiveWordList = new HashSet<>(); for (int i = 0; i < txt.length(); i++) { //Determine whether sensitive characters are included int length = checkSensitiveWord(txt, i, matchType); if (length > 0) {//Exist, add to list sensitiveWordList.add(txt.substring(i, i + length)); i = i + length - 1;//The reason for minus 1 is that for will increase automatically } } return sensitiveWordList; } /** * Get sensitive words in text * * @param txt written words * @return */ public static Set<String> getSensitiveWord(String txt) { return getSensitiveWord(txt, MaxMatchType); } /** * Replace sensitive characters * * @param txt text * @param replaceChar Replace the characters and match the sensitive words one by one. For example, statement: I love Chinese sensitive words: Chinese, replace characters: *, replace results: I love*** * @param matchType Sensitive words matching rules * @return */ public static String replaceSensitiveWord(String txt, char replaceChar, int matchType) { String resultTxt = txt; //Get all sensitive words Set<String> set = getSensitiveWord(txt, matchType); Iterator<String> iterator = set.iterator(); String word; String replaceString; while (iterator.hasNext()) { word = iterator.next(); replaceString = getReplaceChars(replaceChar, word.length()); resultTxt = resultTxt.replaceAll(word, replaceString); } return resultTxt; } /** * Replace sensitive characters * * @param txt text * @param replaceChar Replace the characters and match the sensitive words one by one. For example, statement: I love Chinese sensitive words: Chinese, replace characters: *, replace results: I love*** * @return */ public static String replaceSensitiveWord(String txt, char replaceChar) { return replaceSensitiveWord(txt, replaceChar, MaxMatchType); } /** * Replace sensitive characters * * @param txt text * @param replaceStr The replacement string and the matching sensitive words are replaced by characters one by one, for example, statement: I love Chinese sensitive words: Chinese, replacement string: [shield], replacement result: I love [shield] * @param matchType Sensitive words matching rules * @return */ public static String replaceSensitiveWord(String txt, String replaceStr, int matchType) { String resultTxt = txt; //Get all sensitive words Set<String> set = getSensitiveWord(txt, matchType); Iterator<String> iterator = set.iterator(); String word; while (iterator.hasNext()) { word = iterator.next(); resultTxt = resultTxt.replaceAll(word, replaceStr); } return resultTxt; } /** * Replace sensitive characters * * @param txt text * @param replaceStr The replacement string and the matching sensitive words are replaced by characters one by one, for example, statement: I love Chinese sensitive words: Chinese, replacement string: [shield], replacement result: I love [shield] * @return */ public static String replaceSensitiveWord(String txt, String replaceStr) { return replaceSensitiveWord(txt, replaceStr, MaxMatchType); } /** * Get replacement string * * @param replaceChar * @param length * @return */ private static String getReplaceChars(char replaceChar, int length) { String resultReplace = String.valueOf(replaceChar); for (int i = 1; i < length; i++) { resultReplace += replaceChar; } return resultReplace; } /** * Check whether sensitive characters are included in the text. The rules are as follows: < br > * * @param txt * @param beginIndex * @param matchType * @return If it exists, the length of the sensitive word character is returned; if it does not exist, 0 is returned */ @SuppressWarnings("rawtypes") private static int checkSensitiveWord(String txt, int beginIndex, int matchType) { //End marker of sensitive words: used when there is only one sensitive word boolean flag = false; //The number of matching identities is 0 by default int matchFlag = 0; char word; Map nowMap = sensitiveWordMap; for (int i = beginIndex; i < txt.length(); i++) { word = txt.charAt(i); //Get the specified key nowMap = (Map) nowMap.get(word); if (nowMap != null) {//If it exists, judge whether it is the last one //Find the corresponding key, match ID + 1 matchFlag++; //If it is the last matching rule, end the cycle and return the number of matching IDS if ("1".equals(nowMap.get("isEnd"))) { //End flag bit is true flag = true; //Minimum rule, direct return, maximum rule still need to be searched if (MinMatchTYpe == matchType) { break; } } } else {//No, return directly break; } } if (matchFlag < 2 || !flag) {//Length must be greater than or equal to 1, which is a word matchFlag = 0; } return matchFlag; } // public static void main(String[] args) { // // System.out.println (number of sensitive words:+ SensitiveWordUtil.sensitiveWordMap.size()); // String string = "too much sentimentality may be limited to the scenes on the feeding base screen. " // +"Then our role is to join up with the protagonist's fans in anger, sorrow and too farfetched to attach our emotions to the screen plot, and then moved to tears." // +"When you are sad, lie in someone's arms and fully explain your heart or cell phone card duplicator. A bitch has a glass of red wine and a movie. In the dead of night, turn off the phone and stay still. "; // System.out.println (number of words to be detected:+ string.length()); // // //Keyword or not // boolean result = SensitiveWordUtil.contains(string); // System.out.println(result); // result = SensitiveWordUtil.contains(string, SensitiveWordUtil.MinMatchTYpe); // System.out.println(result); // // //Get sensitive words in statements // Set<String> set = SensitiveWordUtil.getSensitiveWord(string); // System.out.println (the number of sensitive words in the statement is:+ set.size() + ". Include: "+ set"; // set = SensitiveWordUtil.getSensitiveWord(string, SensitiveWordUtil.MinMatchTYpe); // System.out.println (the number of sensitive words in the statement is:+ set.size() + ". Include: "+ set"; // // //Replacing sensitive words in sentences // String filterStr = SensitiveWordUtil.replaceSensitiveWord(string, '*'); // System.out.println(filterStr); // filterStr = SensitiveWordUtil.replaceSensitiveWord(string, '*', SensitiveWordUtil.MinMatchTYpe); // System.out.println(filterStr); // // String filterStr2 = SensitiveWordUtil.replaceSensitiveWord(string, "[* sensitive word *]"); // System.out.println(filterStr2); // filterStr2 = SensitiveWordUtil.replaceSensitiveWord(string, "[* sensitive word *]", SensitiveWordUtil.MinMatchTYpe); // System.out.println(filterStr2); // } }
A gorgeous dividing line
Customize a logo annotation for sensitive word filtering
That is to say, after the annotation is marked on a method or class, it means to filter the sensitive words of the method's input parameters in the class
package com.zyu.boot.demo.annotation; import java.lang.annotation.ElementType; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import java.lang.annotation.Target; /** * Mark annotation of sensitive word filtering * Only when the value specified in the display is true will the filtering function be enabled. * Comments on methods take precedence. Last time for a class (if filtering is not enabled for all methods in the class, false can be added to unfiltered methods) */ @Target({ElementType.TYPE,ElementType.METHOD}) @Retention(RetentionPolicy.RUNTIME) public @interface SensitiveWordFilter { /** * Filter is not enabled by default * @return */ public boolean value() default false; }
Define an AOP facet
Take the above notes as the cut-off point, mainly dealing with the process of notes and sensitive word replacement
Of course, to implement AOP, we need to add an AOP dependency
<!-- integrate Aop --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency>
The following is the AOP implementation class
Also, use @ ConditionalOnProperty for conditional loading
Another gorgeous dividing line
- Processing annotation: if the annotation on the method is not empty, it will take effect directly; if it is empty, the annotation on the class will be taken again. If it is true, the filter will be executed
- Sensitive word replacement: in fact, only String type and complex type are replaced. Get the type of the parameter. If it is String, replace the sensitive word directly. If it is a complex type, violent reflection is used to get the type of every field and replace the sensitive words of the String type field.
package com.zyu.boot.demo.aop.sensitiveword; import com.zyu.boot.demo.annotation.SensitiveWordFilter; import com.zyu.boot.demo.utils.sensitiveword.SensitiveWordUtil; import org.aspectj.lang.ProceedingJoinPoint; import org.aspectj.lang.annotation.Around; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Pointcut; import org.aspectj.lang.reflect.MethodSignature; import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty; import org.springframework.stereotype.Component; import java.lang.reflect.Field; import java.lang.reflect.Method; /** * Section class of sensitive word filtering function * Check the sensitive words of the method annotated with sensitive wordfilter */ @Aspect @Component @ConditionalOnProperty(name = "sensitiveWord.enable", havingValue = "true") public class SensitiveWordAspect { @Pointcut("@within(com.zyu.boot.demo.annotation.SensitiveWordFilter) || @annotation(com.zyu.boot.demo.annotation.SensitiveWordFilter)") public void sensitiveWordPointCut() { } @Around("sensitiveWordPointCut()") public Object around(ProceedingJoinPoint point) throws Throwable { boolean enableFilter = false; MethodSignature signature = (MethodSignature) point.getSignature(); Method method = signature.getMethod(); Class<?> clazz = method.getDeclaringClass(); SensitiveWordFilter methodSensitiveWordFilter = method.getAnnotation(SensitiveWordFilter.class); SensitiveWordFilter clazzSensitiveWordFilter = clazz.getAnnotation(SensitiveWordFilter.class); if(methodSensitiveWordFilter != null){//Priority method comments enableFilter = methodSensitiveWordFilter.value(); }else{//Next, take the comments on the class enableFilter = clazzSensitiveWordFilter.value(); } Class<?>[] parameterTypes = method.getParameterTypes(); Object[] paramValues = point.getArgs(); if (enableFilter == true) { for (int i = 0; i < paramValues.length; i++) { Object value = paramValues[i]; if (parameterTypes[i].isAssignableFrom(String.class)) {//String type parameter direct filtering if(null != value){ value = SensitiveWordUtil.replaceSensitiveWord((String) value, '*', SensitiveWordUtil.MinMatchTYpe); } } else if (!isBasicType(parameterTypes[i])) {//Object type traversal parameter, filtering String type Field[] fields = value.getClass().getDeclaredFields(); for (Field field : fields) { Class<?> type = field.getType(); if(type.isAssignableFrom(String.class)){ field.setAccessible(true); String fieldValue = (String)field.get(value); if(null != fieldValue){ fieldValue = SensitiveWordUtil.replaceSensitiveWord((String) fieldValue, '*', SensitiveWordUtil.MinMatchTYpe); field.set(value,fieldValue); } } } } paramValues[i] = value; } } return point.proceed(paramValues); } /** * Determine whether a parameter type is a basic type * * @param clazz * @return */ private boolean isBasicType(Class clazz) { if (clazz.isAssignableFrom(Integer.class) || clazz.isAssignableFrom(Byte.class) || clazz.isAssignableFrom(Long.class) || clazz.isAssignableFrom(Double.class) || clazz.isAssignableFrom(Float.class) || clazz.isAssignableFrom(Character.class) || clazz.isAssignableFrom(Short.class) || clazz.isAssignableFrom(Boolean.class)) { return true; } return false; } }
modify application.yml configuration file
#Configuration of sensitive word filtering sensitiveWord: enable: true #Enable sensitive word filtering path: D:\test\sensitive-words #Load path of font
Prepare word library
In order not to let you say that I am very yellow and violent, forge some word bank! In reality, you can go to the Internet to down load some sensitive word databases and add them to the directory of sensitive words!
Test it
Annotate a method with @ sensitive wordfilter (true)
@SensitiveWordFilter(true) public User createUser(User user) { System.out.println(user); return user; }
Write a test method
package com.zyu.boot.demo; import com.zyu.boot.demo.entity.User; import com.zyu.boot.demo.service.UserService; import com.zyu.boot.demo.utils.pwd.PasswordHash; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.junit4.SpringRunner; import java.security.NoSuchAlgorithmException; import java.security.spec.InvalidKeySpecException; import java.util.Date; @SpringBootTest(classes = {DemoApplication.class}) @RunWith(SpringRunner.class) public class UserTest { @Autowired private UserService userService; @Test public void sensitiveWordTest() throws InvalidKeySpecException, NoSuchAlgorithmException { User user = new User(); user.setUserid("zyufocus Two strokes"); user.setPassword(PasswordHash.createHash("zyufocus")); user.setName("I'm tie Han"); user.setAge(18); user.setGender(false); user.setCreateDate(new Date()); user.setRole("admin"); user = userService.createUser(user); System.out.println(user); } }
Start test
- Sensitive word library scanned when project started
- test result
User{userid='zyufocus**', password='1000:9c785ced38921934eeee7572cd0146109efaadcf4bc8e5d65d33b648afd0e9b5:75caa0d93b8a40d7bcdf3dada9f4b732e4b5af07fca7982c0087f6e76eab1b6e9195fc86439877a3254d0c8ca726d4c43369b0d923afae0b1c5ebb091baed921', name='I am***', gender=false, age=18, createDate=Sat Jun 27 08:52:08 CST 2020, role='admin'} User{userid='zyufocus**', password='1000:9c785ced38921934eeee7572cd0146109efaadcf4bc8e5d65d33b648afd0e9b5:75caa0d93b8a40d7bcdf3dada9f4b732e4b5af07fca7982c0087f6e76eab1b6e9195fc86439877a3254d0c8ca726d4c43369b0d923afae0b1c5ebb091baed921', name='I am***', gender=false, age=18, createDate=Sat Jun 27 08:52:08 CST 2020, role='admin'}
Here we are!
Improvement plan
In fact, there is room for improvement, such as updating and removing sensitive words
- You can put sensitive words in redis, and go to redis regularly every day to update the next sensitive words
Conclusion
There is no end to learning