current position:Home>Python notes (23): regular module

Python notes (23): regular module

2022-01-30 19:07:48 A bowl week

Little knowledge , Great challenge ! This article is participating in “ A programmer must have a little knowledge ” Creative activities .

Hello everyone , I am a A bowl week , One doesn't want to be drunk ( Internal volume ) The front end of the . If you are lucky enough to get your favor , I'm very lucky ~

stay Python Modules for manipulating regular expressions are provided in , namely re modular .

Decorator of regular expression

Modifier describe Full name
re.I Make match match case insensitive re.IGNORECASE
re.A Give Way \w, \W, \b, \B, \d, \D, \s and \S Only match ASCII, instead of Unicode re.ASCII
re.L Do localization identification (locale-aware) matching re.LOCALE
re.M Multi-line matching , influence ^ and $, In multiline mode, matching the beginning of a line is supported re.MULTILINE
re.S send . Match all characters including line breaks e.DOTALL
re.U according to Unicode Character set parsing characters . This sign affects \w, \W, \b, \B. re.UNICODE
re.X This flag allows you to write regular expressions more easily by giving you a more flexible format . re.VERBOSE

Find a single match

match

re.match If string At the beginning 0 Or multiple characters match the regular expression style , It returns a corresponding A match object . If there is no match , Just go back to None ; Note that it is different from zero length matching .

Grammar format

re.match(pattern, string, flags=0)
 Copy code 
  • pattern: Matching regular expressions
  • string: String to match .
  • flags: Sign a , Used to control how regular expressions are matched , Such as : Is it case sensitive , Multi line matching and so on .

The match is successful re.match Method returns a matching object , Otherwise return to None.

Sample code

"""
-*- coding:uft-8 -*-
author:  Xiaotian 
time:2020/5/30
"""
import re
string1 = "hello python"
string2 = "hell5o python"
pattern = r"[a-z]+\s\w+"  # a-z appear 1 Add one at any time \s Add any character to appear 1 Times to any time 
print(re.match(pattern, string1))  # <re.Match object; span=(0, 12), match='hello python'>
print(re.match(pattern, string2))  # None
 Copy code 

Start import re modular ,r"" Expressed as a regular expression

because string2 A number appears in the middle 5 So it doesn't match

group

re.group It's from Match Object , However, it is not grouped. The default is 0, The grouping index starts from 0 Start (0 Is a complete match ), If multiple groups , Then the first group is 1; You can also name it and use

Sample code

"""
-*- coding:uft-8 -*-
author:  Xiaotian 
time:2020/5/30
"""
import re
string1 = "hello python"
string2 = "hell5o python"
pattern = r"[a-z]+\s\w+"
pattern1 = r"(\w+)(\s)(\w+)"
pattern2 = r"(?P<first>\w+\s)(?P<last>\w+)"  #  Name groups 
print(re.match(pattern, string1))  # <re.Match object; span=(0, 12), match='hello python'>
print(re.match(pattern, string1).group())  # hello python
print(re.match(pattern, string2))  # None
print(re.match(pattern1, string2).group(0))  # hell5o python
print(re.match(pattern1, string2).group(1))  # hell5o
print(re.match(pattern1, string2).group(2))  #  What matches here is the space 
print(re.match(pattern1, string2).group(3))  # python
print(re.match(pattern2, string2).group("last"))  # python
 Copy code 

search

re.search Scan the entire string to find the first position of the matching style , And return a corresponding A match object . If there is no match , Just go back to one None ; Note that this is different from finding a zero length match .. Grammatical structure and match It's the same

Sample code

"""
-*- coding:uft-8 -*-
author:  Xiaotian 
time:2020/5/30
"""
import re
string = "Hi World Hello python"
pattern = r"Hello python"
print(re.search(pattern, string).group())  # Hello python
print(re.match(pattern, string))  # None
 Copy code 

The difference between the two

re.match Match only the beginning of the string , If the string doesn't start with a regular expression , The match fails , The function returns None, and re.search Match the entire string , Until we find a match .

fullmatch

re.fullmatch If the whole string Match this regular expression , It returns a corresponding A match object . Otherwise it returns None ; Note that matching with zero length is different .

The syntax format is the same as that above

Sample code

"""
-*- coding:uft-8 -*-
author:  Xiaotian 
time:2020/5/30
"""
import re
string = "Hi World Hello python"
pattern = r"Hi World Hello python"
pattern1 = r"hi World hello python"

print(re.fullmatch(pattern, string))  # <re.Match object; span=(0, 21), match='Hi World Hello python'>
print(re.fullmatch(pattern1, string))  # None
 Copy code 

Differences among the three

match: String start match

search: Find matches anywhere

fullmatch: The entire string must match the regular expression exactly

A match object

Matching objects always have a Boolean value True. If there's no match match() and search() return None So you can simply use if Statement to determine whether it matches

Sample code

import re
string = "Hi World Hello python"
pattern = r"Hello python"
match1 = re.search(pattern, string)
match2 = re.match(pattern, string)
if match1:
    print(match1.group())  # Hello python

if match2:  #  because match2 The value of is none So don't execute 
    print(match2.group())
 Copy code 

Find multiple matches

compile

re.compile Compile the style of regular expression into a regular object , Can be used to match

Grammatical structure

re.compile(pattern, flags=0)
 Copy code 
  • pattern: Matching regular expressions
  • flags: Sign a , Used to control how regular expressions are matched , Such as : Is it case sensitive , Multi line matching and so on .

findall

re.findall Find all the substrings that the regular expression matches in the string , And return a list , If no match is found , Then return to the empty list . And match and search The difference is match and search It's a match findall Match all .

Grammatical structure

re.findall(string[, pos[, endpos]])
 Copy code 
  • string: String to match .
  • pos: Optional parameters , Specifies the starting position of the string , The default is 0.
  • endpos: Optional parameters , Specify the end of the string , The default is the length of the string

finditer

pattern stay string All non duplicate matches in the , Return to save... As an iterator A match object . *string* Scan from left to right , The matches are arranged in order . Null matches are also included in the results .

The grammatical structure is the same as match

Sample code

import re
from collections.abc import Iterator  #  Import an object that determines whether it is an iterator 
string = "hello python hi javascript"
pattern = r"\b\w+\b"
pattern_object = re.compile(r"\b\w+\b")
print(type(pattern_object))  # <class 're.Pattern'>

findall = pattern_object.findall(string)
for i in findall:
    print(i)

finditer = re.finditer(pattern, string)
#  Determine whether it is an iterator 
print(isinstance(finditer, Iterator))  # True
for _ in range(4):
    finditer1 = finditer.__next__()  #  Take out the next value 
    print(finditer1.group())
'''
-- The result of the cycle --
hello
python
hi
javascript
'''
 Copy code 

If there are too many matches , return finditer Better than findall, That's the difference between a list and an iterator .

Division split

re.split Method to split the string according to the matching substring and return the list

Grammatical structure

re.split(pattern, string[, maxsplit=0, flags=0])
 Copy code 
  • pattern: Matching regular expressions
  • string: Separator .
  • maxsplit: Number of separations ,maxsplit=1 Separate once , The default is 0, Unlimited times .
  • flags: Sign a , Used to control how regular expressions are matched , Such as : Is it case sensitive , Multi line matching and so on .

Sample code

import re
string = '''hello hi    good morning
goodnight
python
javascript
Linux
'''
pattern = r'\s+'  #  Carriage return with space and carriage return tab 
print(re.split(pattern, string))  #  There is no limit to the number of times to separate 
# ['hello', 'hi', 'good', 'morning', 'goodnight', 'python', 'javascript', 'Linux', '']
print(re.split(pattern, string, 5))  #  Separate 5 Time 
# ['hello', 'hi', 'good', 'morning', 'goodnight', 'python\njavascript\nLinux\n']
 Copy code 

And str Modular split The difference is ,re Modular split Support regular

Replace

sub

re.sub Used to replace matches in strings

Grammatical structure

re.sub(pattern, repl, string, count=0, flags=0)
 Copy code 
  • pattern : Pattern string in regular .
  • repl : Replaced string , It can also be a function .
  • string : The original string to be found and replaced .
  • count : The maximum number of substitutions after pattern matching , Default 0 Means to replace all matches .
  • flags : The matching pattern used at compile time , Digital form .

Here you can complete a comment area of a certain hand , Small cases of modifying bad comments

import re
string = input(" Please enter a comment :")
pattern = r"[ Beautiful, lovely and generous ]{1}"  #  Detected characters 
print(re.sub(pattern, " ' ", string))
 Copy code 

design sketch

subn

Behavior and sub() identical , But it returns a tuple ( character string , Number of replacements ).

escape

re.escape(pattern) escape pattern Special character in . For example, in regular Metacharacters .

Sample code

import re
pattern = r'\w\s*\d\d.'
#  Print pattern Special characters for 
print(re.escape(pattern))  # \w\s*\d\d.
 Copy code 

Match any text string that may contain regular expression metacharacters , It is useful , But it's prone to mistakes , Manual escape is better

purge

re.purge() Clear the cache of regular expressions .

copyright notice
author[A bowl week],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301907447369.html

Random recommended