current position:Home>Python implements filtering emoticons in text

Python implements filtering emoticons in text

2022-01-31 21:15:42 Zong AI's life

background

In the project, you need to filter the text , The main expression in the text is the expression of wechat , Its form is mainly “[ The name of the expression ]”, Because they are all strings , So I'm going to use a regular way to match and replace .

step

Search wechat expression library

To replace an expression , Need a wechat expression library , So the first step is to find out the expressions of wechat , After searching , Found this page WeChat (Wechat) Emoticon list List with wechat expression , But it's a page , You can't copy and paste directly , So we have to find a way to take down the expression name .

Get wechat name list

For the content of the page , We can use JavaScript To get dom The value of the node , Open console , View its nodes , According to its characteristics , Simply write a paragraph that prints the name of the expression js Code .

var doms = document.getElementsByClassName('emoji_card_list');
for(var i=0;i<doms.length;i++){
    var tds = doms[i].getElementsByTagName('td');
    for(var j=0;j<tds.length;j++){
        var text = tds[j].innerText;
        if(text.indexOf('[') === 0 || text.indexOf('/')===0) {
            console.log(text);
        }
    }
}
 Copy code 

After copying to the console for execution , Copy and get the text in this format ( Just part of it ):

[ Let me see ] debugger eval code:7:21
[666] debugger eval code:7:21
[ roll one's eye ] debugger eval code:7:21
/ smile  debugger eval code:7:21
/ Pout  debugger eval code:7:21
/ color  debugger eval code:7:21
/ Shyness  debugger eval code:7:21
/ Shut up  debugger eval code:7:21
/ sleep  debugger eval code:7:21
/ Wangchai  debugger eval code:7:21
/ Ladybug 
 Copy code 

For content in this form , We can replace the following content in the text editor , Of course, you can use Python To operate .

Handle wechat expression name

stay Python Console , We can handle it easily , Through the following code , You can get the final collection of expression names .

data = """ The large section copied above , Enter after copying """
data_list = [x.split(' ')[0] for x in data.split('\n')]
emoji_list = []
for x in all_emoj:
    if x[0] == '/':
        emoji_list.append('[%s]' % x[1:])
    else:
        emoji_list.append(x)
 Copy code 

Regular matching expression

With the expression content , We use regular to match the text , The matching code is as follows :

def remove_emoji(text):
    """
     Remove emoticons from text , Emoticons are "[ The name of the expression ]"  Form like this 
    return: str
    """
    if '[' not in text:
        return text
    reg_expression = '|'.join([x.replace('[', '\[').replace(']', '\]') for x in emoji_list])
    pattern = re.compile(reg_expression)
    matched_words = pattern.findall(text)
    #  Use set duplicate removal , It can prevent multiple loops when finding multiple 
    for matched_word in set(matched_words):
        text = text.replace(matched_word, '')
    return text
 Copy code 

The test results are as follows , Meet the requirements :

>>> remove_emoji("[ ha-ha ][ Shut up ][ Shut up ]")
'[ ha-ha ]'
 Copy code 

At the end

Because we use non-standard in our project Unicode expression , If you use standard in your project Unicode expression , have access to Python Of emoji package , There is a standard expression list , For details, please refer to an article on the Internet Use python Environment filter text emoji expression .

copyright notice
author[Zong AI's life],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201312115400887.html

Random recommended