Incorrect string value: '\xF0\x9F\x98\x84\xF0\x9F'
Official account of WeChat official account is developed. When users visit the public address, we usually store user account information to make records such as OPEN_ID, nickname, gender, city, country, head, attention state, etc. These WeChat official user parameters.
Environment: mysql database
When the user's nickname has an expression, when performing the insert operation to the database summary, it will be reported that the nickname field is incorrect incorrect string value: 'xf0 \ x9f \ x98 \ X84 \ xf0 \ x9f'.
And the character set of the database is utf8mb4. Try to think of the way still can't solve, and finally used the following strategy: to filter out the label.
Filter out expressions in strings
/**
* Check whether there are emoji characters
*
* @param source
* @return Throw when it is contained
*/
public static boolean containsEmoji(String source) {
if (StringUtils.isBlank(source)) {
return false;
}
int len = source.length();
for (int i = 0; i < len; i++) {
char codePoint = source.charAt(i);
if (isEmojiCharacter(codePoint)) {
//do nothing, the judgment here indicates that the emoticon is confirmed
return true;
}
}
return false;
}
private static boolean isEmojiCharacter(char codePoint) {
return (codePoint == 0x0) ||
(codePoint == 0x9) ||
(codePoint == 0xA) ||
(codePoint == 0xD) ||
((codePoint >= 0x20) && (codePoint <= 0xD7FF)) ||
((codePoint >= 0xE000) && (codePoint <= 0xFFFD)) ||
((codePoint >= 0x10000) && (codePoint <= 0x10FFFF));
}
/**
* Filter emoji or other non text characters
*
* @param source
* @return
*/
public static String filterEmoji(String source) {
source = source.replaceAll("[\\ud800\\udc00-\\udbff\\udfff\\ud800-\\udfff]", "*");
if (!containsEmoji(source)) {
return source;//If not, return directly
}
//It's bound to contain
StringBuilder buf = null;
int len = source.length();
for (int i = 0; i < len; i++) {
char codePoint = source.charAt(i);
if (isEmojiCharacter(codePoint)) {
if (buf == null) {
buf = new StringBuilder(source.length());
}
buf.append(codePoint);
} else {
buf.append("*");
}
}
if (buf == null) {
return source;//If emoji emoticons are not found, the source string is returned
} else {
if (buf.length() == len) {//The point here is to have as few tostrings as possible, because strings are regenerated
buf = null;
return source;
} else {
return buf.toString();
}
}
}