current position:Home>Python specific text extraction in actual combat challenges the first step of efficient office

Python specific text extraction in actual combat challenges the first step of efficient office

2022-01-31 05:18:04 Grey ape

This is my participation 11 The fourth of the yuegengwen challenge 8 God , Check out the activity details :2021 One last more challenge

Recently, many places have been promoting Python The power of office automation , So today, the big gray wolf will share with you Python Practical project of office automation .

Many times we waste a lot of time on many tedious and boring tasks , For example, you will find all the phone calls and... In a long web page or document E-mail Address , If you find it manually, it may take a lot of time and effort .

But if there is a program now , You can find phone numbers and... In the text of the clipboard E-mail Address , Just click Ctrl+A All texts , Click Ctrl+C Copy it to the clipboard .

And then run your program , It will find the phone number you set and E-mail Address , And replace the text in the clipboard , Will you feel greatly improved efficiency ?

Big gray wolf will come and talk to you about using Python To extract specific text , This operation will read a text from the clipboard of your computer , And extract the specific information you want from the text , And copy it to the cutting board again .

The wolf told you before Python Knowledge of regular expressions , If you don't understand, you can check “Python Tutorial regular expressions ( The basic chapter )” and “Python Tutorial regular expressions ( Improve ) ”.

So in the actual combat project in this section , It will be one of the best real-world development projects for regular expressions .

First we need to call Python A library file for pyperclip, The purpose of using this library file is to make Python The program can read the text on the computer clipboard .

Call a specific library file :

import re, pyperclip

text = str(pyperclip.paste())
 Copy code 

Then we create a regular expression for the phone number , We know that our most common phone number is 11 position .

Include the first three number types , For example, China Mobile or China Unicom , The middle four digit area code and the last four digits of random typesetting numbers . So many phone numbers are written in three parts , Use a space, a dot or a horizontal bar in the middle to connect .

 Insert picture description here Then we need to extract different types of phone numbers , So when we create regular expressions , First, match the first three digits (\d{3}|(\d{3})), Then the space, dot or horizontal bar that may appear in the middle are expressed ([-.\s]).

Then regular expression matching is performed on the four digit area code numbers (\d{4}|(\d{4})), Then carry out the space, dot or horizontal line in the middle ([-.\s]), Finally, match the four digits randomly arranged (\d{4}|(\d{4})).

meanwhile , We call functions at the end of regular expressions re.VERBOSE, This function is used to write comments , So the regular expression we created for the phone number should look like this :

Create a regular expression for a phone number :

telRegex=re.compile(r'''(\d{3}|\(\d{3}\)) ([-.\s]) (\d{4}|\(\d{4}\)) ([-.\s]) (\d{4}|\(\d{4}\))''', re.VERBOSE)
 Copy code 

Then we email E-mail Create regular expression , Again, we know E-mail The user part of the address is one or more characters , Can include lowercase and uppercase alphanumeric periods, underscores, percent signs, plus signs, or dashes , So we can put all these into one character category [\w\d._%+-].

Domain name and user name are @ Character segmentation .

The domain name allows fewer character categories , Only alphanumeric periods and dashes are allowed [\w\d._%+-].

Finally, this is technically called top-level domain name , And there are 2 To 4 Characters .[\w]{2,4}.

So the email regular expression we build should be like this ;

establish E-mail Regular expression of :

mailRegex = re.compile(r'''[\w\d._%+-]+ @ [\w\d._%+-]+ \.[\w]{2,4}''', re.VERBOSE | re.I)
 Copy code 

After we create regular expressions for phone numbers and email addresses , We should match the text on the clipboard , Here we can create a list to store phone numbers and email addresses :

establish marches list :

marches = []
 Copy code 

First of all, let's analyze the whole text for Loop traversal , Find the text that matches the phone number requirements , Because the text returned after regular expression matching is rendered in segments .

So we just need to store the digital part of the phone number in the list , Then store the results of each traversal in the list :

for Loop to extract a specific phone number :

for grops in telRegex.findall(text):

    telNum = '-'.join([grops[0], grops[2], grops[4]])

    marches.append(telNum)
 Copy code 

Then call... Again for Loop through the entire text , Read the text that matches the email regular expression in the text , And save its text to the list marches in .

When for After the cycle is completed , We put the phone number and email address in the whole text into the list .

for Loop extract specific E-mail Address :

for grops in mailRegex.findall(text):

    marches.append(grops)
 Copy code 

At this time, we need to split the stored information with a newline character , And store it on the shear board . The function we need to call at this time is pyperclip Under the Treasury copy function :

Copy the obtained text to the clipboard :

pyperclip.copy('\n'.join(marches))
 Copy code 

When we have finished writing the program , Set the program to python Save in the form of script .

Don't understand python Script buddy , You can read this article of the big gray wolf “Python Build a script environment , To configure path Detailed steps for setting environment variables ”

When we save it , Copy from text containing phone number and email address , Then run the script and paste , You can extract a specific phone number and e-mail address !

Complete source code

Finally, attach the complete source code :

import re, pyperclip
#! python3
text = str(pyperclip.paste())

# establish tel Regular expression of 
telRegex = re.compile(r'''(\d{3}|\(\d{3}\)) ([-.\s]) (\d{4}|\(\d{4}\)) ([-.\s]) (\d{4}|\(\d{4}\))''', re.VERBOSE)

# establish E-mail Regular expression of 
mailRegex = re.compile(r'''[\w\d._%+-]+ @ [\w\d._%+-]+ \.[\w]{2,4}''', re.VERBOSE | re.I)

marches = []
for grops in telRegex.findall(text):
    telNum = '-'.join([grops[0], grops[2], grops[4]])
    marches.append(telNum)

for grops in mailRegex.findall(text):
    marches.append(grops)

pyperclip.copy('\n'.join(marches))
print('\n'.join(marches))
 Copy code 

The program is not difficult, but it is very useful ,

Through such a program , We can extract different specific characters from different texts , This greatly shortens our time and energy to find specific content in long texts , At the same time, it also improves the efficiency of our office !

** If you find it useful, remember to pay attention and share ! Wolf will accompany you to progress !

copyright notice
author[Grey ape],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201310518018246.html

Random recommended