current position:Home>Python notes - day14 - regular

Python notes - day14 - regular

2022-05-15 06:03:51Wandering mage 12

Preface
python Grammar learning , Leave it to those who need it , Understand everything !!

# coding=utf8
# @time:2022/4/22 20:14
# Author  Haoyu 


# re The module is python A built-in module dedicated to processing regular expression functions 
from re import fullmatch,findall,search

#  One 、 Regular expressions match symbols 
# 1. What is regular expression 
#  Regular expression is a tool that can make string processing very simple 
#  Regular expressions are rules that describe strings through various regular symbols 
#  In different programming languages , The regular syntax is the same , But they are expressed in different ways : Such as python - ‘ Regular expressions ’;js - / Regular expressions /

# 2. Regular sign 
# 1) Ordinary character  -  Ordinary characters represent the character itself in regular expressions 
# fullmatch( Regular expressions , character string ) -  Determine whether the string conforms to the rules described by regular , If yes, the return result is None, It is None
'''''''''
re_str='abc'    #  The rules : A string has three characters , Namely a,b, and c
result = fullmatch(re_str,'abc')
print(result)
'''''''''

# 2) Special characters 
#  first : . -  To match an arbitrary character 
'''''''''
'a.b'   -    Indicates that the length of a match is three , The first is a, The last one is b, In the middle is any string 
re_str = 'a.b'
result = fullmatch(re_str,'a+b')
print(result)   # <re.Match object; span=(0, 3), match='a+b'>

# ’xy..‘ -  Indicates that the matching length is 4, The first two are xy, The last two digits are strings of arbitrary characters 
'''''''''

#  the second :\d -  Match an arbitrary number 
'''''''''
re_str = 'a\db'    #  Match a length of 3, The first is a, The last one is b, A character with an arbitrary number in the middle 
result = fullmatch(re_str,'a5b')
print(result)   # <re.Match object; span=(0, 3), match='a5b'>
'''''''''

#  Third :\s -  Match a blank character 
#  Blank character :‘ Space ’、‘\n( It's like a carriage return )’、‘\t( amount to tab key )’
'''''''''
re_str = 'abc\s123'
result = fullmatch(re_str,'abc 123')
print(result)   # <re.Match object; span=(0, 7), match='abc 123'>
'''''''''

#  The fourth one :\w( understand ) -  Match a letter 、 Numbers 、 Underline (ASCII Characters outside the code table can match )
'''''''''
re_str='\d\w\d'
result = fullmatch('re_str','2 see 8')
print(result)   # None
'''''''''

#  The fifth one :\D -  Match an arbitrary non numeric character 
'''''''''
re_str = 'a\Db'
result = fullmatch(re_str,'aab')
print(result)   # <re.Match object; span=(0, 3), match='aab'>
'''''''''

#  Sixth :\S -  Match any non whitespace character 
'''''''''
re_str = 'a\Sb'
result = fullmatch(re_str,'acb')
print(result)   # <re.Match object; span=(0, 3), match='acb'>
'''''''''

#  Seventh :[ Character set ] -  Match any character in the character set 
'''''''''
 Be careful :a. One [] Only one character can be matched 
     b. stay [] You can put - Put between two characters to indicate the range , however - The preceding encoding value must be less than the encoding value of the following character 
     c. stay [] in - It only makes sense between two characters ( Scope of representation ), In the front and back, it means - In itself 
     
[a1+]   -    Matching character a perhaps 1 Or characters +.
[\dxy]  -    Match any number or x perhaps y.
[1-9]   -    matching 1 To 9 Any number of .
[a-z]   -    Match any lowercase letter 
[A-Z]   -    Match any capital letter 
[\u4e00-u9fa5]  -    Match any Chinese character 

[a-z+=\]    -    Match any lowercase letter or + perhaps = perhaps \
'''''''''

'''''''''
re_str= 'a[xyz]b'   #  Match a length of 3, The first character is a, The third character is b, In the middle is xyz Any character in 
result = fullmatch(re_str,'axb')
print(result)
'''''''''


#  The eighth :[^ Character set ] -  Matches any character in a non character set 
'''''''''
[^abc]  -    Match except abc Any character other than 
[^\d]   -    Match any character other than a number 
[^a-z]  -    Match any non lowercase character 
 Be careful : If you put ^ Put in the middle or after the character set , said ^ In itself ;

re_str = '\d[^abc]\d'
result = fullmatch(re_str,'374')
print(result)   # <re.Match object; span=(0, 3), match='374'>
'''''''''




#  Two 、 Detection class symbol 
#  Match the symbol of the class , A symbol needs to correspond to a character ; The detection class symbol does not affect the string length , Only after the matching is successful, check whether the position of the symbol meets the requirements ;
# 1.
# \b -  Detect whether it is a word boundary 
# \B -  Detect no word boundaries 
#  Word boundaries : Characters that can distinguish two words in life . for example : Space ( blank )、 Punctuation marks such as commas 、 Start and end of string 
'''''''''
re_str = r'abc,\b123'  # r You can't let escape characters escape , It doesn't affect the regularity 
result = fullmatch(re_str,'abc,123')
print(result)   # <re.Match object; span=(0, 7), match='abc,123'>
 Be careful : Let's see if it can match successfully , Look again , Whether there is a word boundary behind ; Or look 1 Whether there is a word boundary in front , The latter is consistent with the above, so it can be detected successfully 
'''''''''

# 2.^ -  Detect the beginning of the string 
'''''''''
re_str = '^\d\d\d'
result = findall(re_str,'123asd')
print(result)   # ['123']
'''''''''

# 3.$ -  Check whether it is the end of the string 
'''''''''
re_str = '\d\d\d$'
result = findall(re_str,'sd123')
print(result)   # ['123']
'''''''''


#  3、 ... and 、 Control matching times 
# 1.+ -  Once or more ( At least once )
'''''''''
a+  -    character a One or more times ( At least once a)
\d  -    Match any numeric character one or more times 
.+  -    Match any character one or more times 

re_str = 'xa+y'
result = fullmatch(re_str,'xaay')
print(result)   # <re.Match object; span=(0, 4), match='xaay'>

re_str = 'x\d+y'
result = fullmatch(re_str,'x12y')
print(result)   # <re.Match object; span=(0, 4), match='x12y'>
'''''''''

# 2.* -  matching 0 Times or more ( Any number of times )
'''''''''
re_str = 'xa*y'
result = fullmatch(re_str,'xaaaaaay')
print(result)   # <re.Match object; span=(0, 8), match='xaaaaaay'>
'''''''''

# 3.? -  matching 0 Once or once 
'''''''''
re_str = 'xa?y'
result = fullmatch(re_str,'xay')
result1 = fullmatch(re_str,'xy')
print(result)   # <re.Match object; span=(0, 3), match='xay'>
print(result1)  # <re.Match object; span=(0, 2), match='xy'>
'''''''''

# 4.{} -
'''''''''
1){
    N}   -    matching N Time 
2{
    M,N} -    matching M To N Time ([m,n]3{
    M,}  -    Match at least M Time 
4{
    ,N}  -    Most matches N Time 
'''''''''

# ( important !)5. Greed is not greed 
#  When the number of matches is uncertain , Matching is divided into greedy and non greedy , The default is greedy .
# 1) greedy : When the matching times are uncertain, the default is greedy (+ * ?)
#  On the premise of matching success , Take the result with the most matching times .( hypothesis 3 Time \4 Time \6 Can match successfully every time , Finally take 6 Time )

# re_str = r'a.+b'
# print(findall(re_str,'amsnbsdhdnb')) # ['amsnbsdhdnb']

# 2) Not greed : When the number of matches is uncertain, add... After the number of matches '?'(+? *? ??)
#  On the premise of matching success , Take the result with the least number of matches .( hypothesis 3 Time \4 Time \6 Can match successfully every time , Finally take 3 Time )

# re_str = r'a.+?b'
# print(findall(re_str,'amsnbsdhdnb')) # ['amsnb']


#  Four 、 Branch and group 
# 1.() -  grouping 
#  Grouping is to use a part of a regular expression with () Wrap it up as a whole , Then carry out the overall operation 
#  In regular expressions, a () Represents a group 
'''''''''
# 1) Overall operation 
print(fullmatch('(\d{2}[a-z]{3})+','22asd'))    # <re.Match object; span=(0, 5), match='22asd'>

# 2) Overall repetition 
\M  -    It means repetition and the first M Content matched by a group 
print(fullmatch(r'(\d{2})ab\1','22ab22'))   # <re.Match object; span=(0, 6), match='22ab22'>

# 3) Screening 

'''''''''

# 2.| -  Branch 
'''''''''
#  practice : Write a regular expression , The ability to match a string is :abc Followed by three arbitrary numbers or three arbitrary capital letters ?
# 'ABC827'、'abcKNM'
re_str = '(abc\d{3})|([A-Z]{3})'
print(fullmatch(re_str,'ASD'))  # <re.Match object; span=(0, 3), match='ASD'>
'''''''''

# 3. Escape symbol 
#  Pay money plus... With special meaning in regular  \  Let the function of this symbol disappear , Represents the symbol itself 
# re_str = '\d{2}\.a' -  Express 2 After a number, Mina a point , And finally a a

#  Be careful : In addition to the [] Symbols with special meaning in (^、-) Outside , Other symbols are in [] It all represents the symbol itself 




from re import fullmatch,match,search,findall,split,sub
#  5、 ... and 、re modular 
# 1.fullmatch( Regular expressions , character string ) -  Determine whether the string and regular expression exactly match ; If you can match, return the matching object , Otherwise return to None;
# 2.match( Regular expressions , character string ) -  Determine whether the beginning of the string matches the regular ; If you can match, return the matching object , Otherwise return to None;
'''''''''
re_str = '\d{3}'
print(match(re_str,'123asd asd'))   # <re.Match object; span=(0, 3), match='123'>
'''''''''
# 3.search( Regular expressions , character string ) -  Get the first string in this string that satisfies the regular expression ; If you can match, return the matching object , Otherwise return to None;
'''''''''
re_str = '\d{3}'
print(match(re_str,'123asd321asd')) # <re.Match object; span=(0, 3), match='123'>
'''''''''
# 4.findall( Regular expressions , character string ) -  Get all substrings in the string that satisfy the regular expression , Return a list 
'''''''''
print(findall('\d{3}','123asd432123'))  # ['123', '432', '123']
'''''''''

# 5.split( Regular expressions , character string ) -  Take all substrings in the string that meet the regular expression as the cutting point , Cut strings .
'''''''''
print(split('[ab]','dsadbdsfe'))    # ['ds', 'd', 'dsfe']
'''''''''

# 6.sub( Regular expressions , character string 1, character string 2) -  The string 2 Substrings of regular expressions are replaced with strings 1
'''''''''
print(sub('\d','+','jiushi123'))    # jiushi+++
'''''''''

More secure sharing , Please pay attention to 【 Security info】 WeChat official account !

copyright notice
author[Wandering mage 12],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/131/202205110607424172.html

Random recommended