current position:Home>Python, zsbd

Python, zsbd

2022-05-15 05:59:13daoboker

Python

1.1 brief introduction

Main reference Python Basic course And Python 3 course . Because some features are not commonly used or have not been touched ( Database related or network programming related ), We won't go into some chapters . Relevant codes can be found in Github Upload and download .

1.1.1 characteristic

  • Dynamic type language
    • The language for data type checking only during operation , That is, when programming with dynamic type language , Never assign a data type to any variable .
  • Strongly typed language
    • Implicit type conversions are not tolerated . Once a variable is assigned a data type , Without coercion , That is, it will always be this data type .
  • Scripting language
    • Translate and execute the code line by line . Due to the lack of compilation process , Even if there are syntax errors in the following code, it will not affect the execution of the previous code .
  • Object oriented language
    • encapsulation : Package into several independent units , Interface only . Can reduce coupling , Improve code security , Convenient for subsequent modification .
      Inherit : Class can be derived from another class , The new class is called a derived class , The original class is called the base class , The derived class inherits the member methods and variables of the base class , And you can add your own characteristic function variables in the class to realize specific functions . It can improve the reuse rate of code .
      polymorphic : Allow different objects to respond to the same message , The response of the same message to different objects can result in different results . Assuming that the class A For the class B The derived class , that A An example of a Will have at the same time A And B Two kinds of state . Let's further assume that B There are class functions f, that A An example of B All instances of can directly call f, And A It can be overloaded f Realization and B Different functions .
  • Explanatory language
    • No need to compile into binary code , You can run the program directly from the source code .

1.1.2 advantage

  • Easy to learn 、 Reading and maintenance .
  • There are a wide range of basic libraries and a large number of powerful third-party libraries , Can improve development efficiency .
  • It's a high-level language , We don't need to consider the underlying details .
  • It's portable , So that the program can be easily transplanted to different system platforms .
  • With scalability , Allow us to python Used in program C/C++ Code written ( Use C/C++ It can improve the running speed and support code encryption ).
  • Embeddable , Allow us to python Embedded in C/C++ In the program .
  • It can be done by shell Interactive execution of code .
  • Interface to all major business databases .

1.1.3 shortcoming

  • Slow running speed .
  • The code can't be encrypted .
  • Multithreading cannot take advantage of multicore CPU.

1.2 Environmental Science

1.2.1 Parser

Adding the following code to the first line of the script can specify the parser of the script .

#!/usr/bin/python3

Using the following code allows you to find the parser path through the environment settings .

#!/usr/bin/env python3

1.2.2 code

You can add the following code to the script header to specify the encoding format .

# -*- coding: utf-8 -*-

utf-8 Can support Chinese reading . And outside , Use pickle When reading a file , Sometimes you need to use iso-8859-1( European languages ).

1.3 Basics

1.3.1 identifier

stay python in , All identifiers can include English ( Case sensitive )、 Numbers and underscores , But don't start with a number .

foo     #  legal 
FOO     #  legal 
foo123  #  legal 
f_oo    #  legal 
foo_    #  legal 
123foo  #  illegal 

among , Identifiers that begin with underscores have a special meaning .

# class definition
class A:
    #  Represents class properties that cannot be accessed directly , It needs to be accessed through the interface provided by the class .
    #  Out-of-service  from xxx import *  And import .
    def _foo(self):
        pass
#  Represents a private member of a class .
def __foo(self):
    pass

#  representative  Python  A special way of marking , Such as  __init__()  Constructor representing class .
def __foo__(self):
    pass

#main function
a = A()

a._foo() # You can call , But there will be a warning

a.__foo() # Unable to call , Will report a mistake
a._A__foo() # Force an external call a._foo()

stay python in , Using semicolons can make multiple statements appear on the same line .

print("hello"); print("world");

1.3.2 Reserved words

   
and( Logic and )exec( Execute... Stored in a string or file Python sentence )not( Logic is not )
assert( Assertion )finally( Exception capture )or( Logic or )
break( End cycle )for( loop )pass( Empty statement )
class( Class definition )from( Specify domain )print( Print )
continue( Skip the loop )global( Global variable modifier )raise( Trigger exception )
def( Function definition )if( Conditions )return( return )
del( Remove references )import( Import )try( Exception capture )
elif( Equate to else if)in( member operator )while( loop )
else( Conditions )is( Object comparison )with( Context management ,try … finally Abbreviation )
except( Exception capture )lambda(lambda expression )yield( generator )
as( assignment )None( Null value ) 

1.3.3 Format

python The code block does not use {}, Instead, use indent , To control the class 、 Functions and other logical judgments . The number of indented blanks is variable , But all code block statements must contain the same amount of indented white space .

if True:
    print ("Answer")
    print ("True")
else:
    print ("Answer")
    #  There is no strict indentation , Error will be reported during execution 
  print ("False")

stay python in , We can use ( \ ) Divide a line of statements into multiple lines to display . If the statement contains [], {} or () Then you don't need to use multiline connectors .

sum = 1 + \
      2 + \
      3

days = [‘Monday’, ‘Tuesday’, ‘Wednesday’,
‘Thursday’, ‘Friday’]

python You can use quotation marks ( ‘ )、 Double quotes ( “ )、 Three quotes ( ‘’’ or “”” ) To represent a string , The beginning and end of quotation marks must be of the same type .

word = 'word'
sentence = " This is a sentence ."
paragraph = """ This is a paragraph .
                Contains multiple statements """

python Middle and single line notes use # start ; Multiline comments use three single quotes or three double quotes .

#  Single-line comments 

‘’’
Multiline comment
Multiline comment
‘’’

“”"
Multiline comment
Multiline comment
“”"

1.3.4 Input and output

input() function , Can receive any input , Default all input to string processing , And return string type .

a = input("please input an integer: ")
print(a, type(a))

b = input("please input a float number: ")
print(b, type(b))

c = input("please input a string: ")
print(c, type)

1.4 Variable type

python The standard data types are :

  • Numbers( Numbers )
  • String( character string )
  • List( list )
  • Tuple( Tuples )
  • Dictionary( Dictionaries )

Use the following code for assignment , Can make the code more concise :

a = b = c = 1

a, b, c = 2020, 3.14, ‘python’

1.4.1 Numbers

Numbers The following types are included :

  • int( integer )
  • float( floating-point )
  • complex( The plural )
intfloatcomplex
100.03.14j
10015.2045.j
-786-21.99.322e-36j
08032.3e+18.876j
-0490-90.-.6545+0J
-0x260-32.54e1003e+26J
0x6970.2E-124.53e-7j

stay python in , Binary numbers 0b start ( Such as 0b10 == 2); Octal number 0o start ( Such as 0o10 == 8); Hexadecimal number 0x start ( Such as 0x10 == 16). And outside , The plural ( Common in filtering algorithms ) It consists of real part and imaginary part , It can be used a + bj, perhaps complex(a,b) Express , The real part of a complex number a Deficiency part of harmony b It's all floating point . In general , We may not need binary numbers 、 Octal number 、 Complex numbers and hexadecimal numbers . When dealing with large numbers , Using scientific counting can make the code more concise ( Such as 1e3 == 1000).

numpy Supported data type ratio Python There are many built-in types , Basically, it can be with C The data type of the language corresponds to , Some of the types correspond to Python Built in type . Please refer to NumPy data type . And outside ,np.inf( It's just infinite ) And np.nan( Invalid value ) It's also very practical .

1.4.2 String

python Single character type... Is not supported , The single character is in python Is also used as a string .

var1 = 'python'
var2 = 'p'

python Access substring , You can use square brackets to intercept strings .

print(var1[1:4])
print(var2[0])

When special characters need to be used in characters , We want to use the following escape characters :

  • \, Line continuation operator
  • \\, The backslash
  • \‘, Single quotation marks
  • \“, Double quotes
  • \n, A newline
  • \t, tabs
  • \a, stay shell When entering in print(“\a”) when , The system will ring
  • \b, Back space , perform print(“xy\bz”) You'll get xz
  • \r, A carriage return
  • \v, Vertical tabs
  • \f, Page identifier
  • \000, empty

stay python in , String supports the following operators :

  • +
  • *
  • []
  • [:]
  • in
  • not in
  • r
  • R
  • %

python Support output of formatted string . Although this may use very complex expressions , But the most basic use is to insert a value into a string formatter %s In the string of . stay python in , String formatting uses C in sprintf Function like syntax .

python String formatting symbols :

  • %c, Format characters and their ASCII code
  • %s, Formatted string
  • %d, Formatted integer
  • %u, Format an unsigned integer
  • %o, Format an unsigned octal number
  • %x, Formats unsigned hexadecimal Numbers
  • %X, Formats unsigned hexadecimal Numbers ( Capitalization )
  • %f, Formatted floating point number , Precision after the decimal point can be specified
  • %e, Scientific notation for formatting floating - point Numbers
  • %E, Work with %e, Scientific notation for formatting floating - point Numbers
  • %g,%f and %e Abbreviation
  • %G,%f and %E Abbreviation
  • %p, Format the address of a variable with a hexadecimal number ( stay python3 Not applicable )

Formatting operator helper :

  • *, Define width or decimal precision
  • -, Use for left alignment
  • +, Show a plus sign before a positive number ( + )
  • , Show spaces before positive numbers
  • #, Show zero before octal number (‘0’), Show... In front of hex ’0x’ perhaps ’0X’( It depends on what you use ’x’ still ’X’)
  • 0, Fill in the front of the displayed number ’0’ Not the default space
  • %,’%%’ Output a single ’%’
  • (var), Mapping variables ( Dictionary parameters )
  • m.n.,m Is the minimum overall width of the display ,n It's the number of decimal places ( If available )

f-string For details and more string built-in functions, please refer to Python3 character string .

1.4.3 List

Lists are the most commonly used python data type , It can appear as a comma separated value in square brackets . The data items of a list do not need to have the same type .

Access the values in the list :

list1 = ['Google', 'Now', 1997, 2020]
list2 = list(range(1, 7))

print("list1[0]: ", list1[0])
print("list2[1:5]: ", list2[1:5])

Update list :

list1 = ['Google', 'Now', 1997, 2020]

print(" The third element is : ", list1[2])
list1[2] = 2001
print(" The third updated element is : ", list1[2])

Delete list elements :

list1 = ['Google', 'Now', 1997, 2020]

print(" Original list : ", list1)
del list1[2]
# or list1.pop(2)
print(" Delete the third element : ", list1)

More details can be found in Python3 list .

1.4.4 Tuple

python A tuple of is similar to a list , The difference is that the elements of a tuple cannot be modified .

tup1 = ('Google', 'Now', 1997, 2020)
tup2 = (1, 2, 3, 4, 5)
tup3 = "a", "b", "c", "d"       #  You don't need brackets 
print(tup1, tup2, tup3)

When a tuple contains only one element , You need to add a comma after the element , Otherwise parentheses will be used as operators .

tup4 = (50)
print(tup4)     #  No commas , Type is integer 
tup4 = (50,)
print(tup4)     #  Add a comma , The type is tuple 

Access tuples :

tup1 = ('Google', 'Now', 1997, 2020)
tup2 = (1, 2, 3, 4, 5, 6, 7)

print("tup1[0]: ", tup1[0])
print("tup2[1:5]: ", tup2[1:5])

Modify tuple : Element values in tuples are not allowed to be modified , But we can join tuples

tup1 = ('Google', 'Now', 1997, 2020)
tup2 = (1, 2, 3, 4, 5, 6, 7)

tup3 = tup1 + tup2
print(tup3)

Delete tuples : Element values in tuples are not allowed to be deleted , But we can use del Statement to delete the entire tuple

tup1 = ('Google', 'Now', 1997, 2020)
tup2 = (1, 2, 3, 4, 5, 6, 7)

tup3 = tup1 + tup2
del tup3
print(tup3)

More details can be found in Python3 Tuples .

1.4.5 Dictionary

Dictionary is another variable container model , And can store any type of object . Each key value of the dictionary (key => value) Divide... With a colon , Each pair is separated by commas , The whole dictionary is enclosed in curly braces .

dict1 = {'Alice': '2341', 'Beth': '9102', 'Cecil': '3258'}

dict2 = { ‘abc’: 456 }
dict3 = { ‘abc’: 123, 98.6: 37 }

Visit the values in the dictionary :

dict1 = {'Alice': '2341', 'Beth': '9102', 'Cecil': '3258'}

print("dict[‘Alice’]: ", dict1[‘Alice’])
print("dict[‘Cecil’]: ", dict1[‘Cecil’])

Revise the dictionary :

dict1 = {'Alice': '2341', 'Beth': '9102', 'Cecil': '3258'}

dict1[‘Alice’] = ‘1432’
dict1[‘Eddie’] = ‘1996’
print(dict1)

Delete dictionary elements :

dict1 = {'Alice': '2341', 'Beth': '9102', 'Cecil': '3258'}

del dict1[‘Eddie’]
#dict1.clear() Empty dictionary
print(dict1)

Dictionary features :

  • The same key is not allowed to appear twice . When creating, if the same key is assigned twice , The latter value will be remembered .
  • The key must be immutable , So you can use numbers , A string or tuple acts as , Not with lists .
  • Dictionary search speed , Whether there is 10 An element or 10 Ten thousand elements , Search speed is the same . The search speed of the list decreases with the increase of elements . However, fast search speed is not without cost , The disadvantage is that it takes up a lot of memory , And waste a lot of content , The list is just the opposite , It takes up little memory , But the search speed is slow .
  • The dictionary value can take any python object , It can be a standard object , It can also be user-defined .
  • Dictionaries store key-value There is no order in the order pair .

Sum up , Suppose we need to use lookup operations frequently , Use dict It will be more reasonable .

list1 = list(range(int(1e6)))
dict1 = dict(zip(list1, list1))

print(54502 in list1)
print(54502 in dict1)

More details can be found in Python3 Dictionaries .

And outside , In some cases , A collection can be an alternative to a dictionary .

1.5 Operator

Python The language supports the following types of operators :

  • Arithmetic operator
  • Compare ( Relationship ) Operator
  • Assignment operator
  • Logical operators
  • An operator
  • member operator
  • Identity operator

More details can be found in Python Operator .

among ,**( Power operation ) And //( Division operation ) It is a practical but easily ignored arithmetic operator .

1.6 Conditional statements

python A conditional statement is the result of execution through one or more statements (True perhaps False) To determine the code block to execute .

if expression:
    operation
elif expression:
    operation
else:
    operation

because python Does not support switch sentence , So multiple conditional judgments , Only use elif To achieve . And outside , You can also use... On the same line if Conditional statements ( in my opinion , It's not a good habit ).

var = 100
if var == 100: print(" Variable  var  The value of is 100") 

If contact C++ Words , You've probably heard of conditional operators (a ? b : c). stay python in , We can also use a similar syntax ( In some cases , Can make the code more concise ).

def isOdd(num):
    return ("%d  It's an odd number " if num % 2 == 1 else "%d  Not an odd number ") % num

print(isOdd(241))
print(isOdd(8832))

>> 241 It's an odd number
>> 8832 Not an odd number

1.7 Loop statement

A loop statement allows us to execute a statement or group of statements many times .python Provides for Circulation and while loop .

for loop :

for var in sequence:
    statements

while loop :

while condition:
    statements

Loop control statement can change the order of statement execution .python The following loop control statements are supported :

  • break
  • continue
  • pass

1.8 Date and time

May refer to Python Date and time .

ad locum , Let's briefly introduce the following common applications .

Calculate the execution time of a piece of code :

import time

tick = time.time()

for i in range(int(1e8)):
pass

print(“executed in %f seconds” % (time.time() - tick))

Convert timestamps to dates ( A feature engineering ):

import time

print(time.localtime(20647191))

>> time.struct_time(tm_year=1970, tm_mon=8, tm_mday=28, tm_hour=7, tm_min=19, tm_sec=51, tm_wday=4, tm_yday=240, tm_isdst=0)

Format date :

import time

print(time.strftime("%Y-%m-%d %H:%M:%S %a", time.localtime()))

python Medium time date format symbol :

  • %y, Two digit year representation (00-99)
  • %Y, Four digit year representation (000-9999)
  • %m, month (01-12)
  • %d, One day in the month (0-31)
  • %H,24 Hours in hours (0-23)
  • %I,12 Hours in hours (01-12)
  • %M, Minutes (00=59)
  • %S, second (00-59)
  • %a, Local simplified week name
  • %A, Local full week name
  • %b, Local simplified month name
  • %B, Local full month name
  • %c, Local corresponding date and time representation
  • %j, One day in the year (001-366)
  • %p, Local A.M. or P.M. The equivalent of
  • %U, Weeks of the year (00-53) Sunday is the beginning of the week
  • %w, week (0-6), Sunday is the beginning of the week
  • %W, Weeks of the year (00-53) Monday is the beginning of the week
  • %x, Local corresponding date representation
  • %X, Local corresponding time representation
  • %Z, Name of the current time zone
  • %%,% The sign itself

Convert format string to timestamp ( A feature engineering ):

import time

today = “2020-01-04 03:25:50 Sat”
print(time.mktime(time.strptime(today, “%Y-%m-%d %H:%M:%S %a”)))

1.9 function

Function can improve the modularity of application , And code reuse . You already know Python Many built-in functions are provided , such as print(). But you can also create your own functions , This is called a user-defined function .

  • Function code block to def Key words start with , Followed by function identifier name and parentheses ().
  • Any arguments and arguments passed in must be placed between parentheses , Parentheses can be used to define parameters .
  • The first line of the function optionally uses the document string — Used to store function descriptions .
  • Function contents start with a colon , And indent .
  • return [ expression ] End function , Optionally return a value to the caller . Without expression return It's equivalent to returning to None.

1.9.1 Parameter passing

stay python in , Type belongs to object , Variables have no type :

a = [1,2,3]

a = “Hello”

In the above code ,[1,2,3] yes List type ,”Hello” yes String type , Variables a There is no type , Just a reference to an object ( A pointer ), It can point to List Type object , It can also point to String Type object .

stay python in ,strings, tuples, and numbers Is an object that cannot be changed , and list,dict And so on are modifiable objects .

  • Immutable type : Variable assignment a = 5 After the assignment a = 10, This is actually a new generation of int The value object 10, let a Pointing to it , and 5 To be discarded , Not change a Value , It's equivalent to a new generation of a.
  • Variable type : Variable assignment la = [1,2,3,4] After the assignment la[2] = 5 Will be list la The third element value of changes , In itself la Didn't move , Only part of its internal value has been modified .

python Parameter passing of function :

  • Immutable type : similar c++ Value transfer of , Such as Integers 、 character string 、 Tuples . Such as fun(a), It's just a Value , No impact a Object itself . For example fun(a) Internal modification a Value , Just modify another copied object , Does not affect the a In itself .
  • Variable type : similar c++ By reference , Such as list , Dictionaries . Such as fun(la), Will be la Really pass it on , After modification fun External la It will also be affected .

In development , Special attention should be paid to the difference between immutable types and variable types .

1.9.2 Parameters

Here are the formal parameter types that can be used when calling a function :

  • Required parameters
  • Key parameters
  • Default parameters
  • Indefinite length parameter

The required parameters must be passed into the function in the correct order . The number of calls must be the same as when they were declared .

Keyword parameters are closely related to function calls , Function calls use key arguments to determine the value of the arguments passed in . Using keyword parameters allows the order of parameters when a function is called to be different from when it is declared , because Python The interpreter can match parameter values with parameter names .

When you call a function , If no parameters are passed , Then the default parameters .

You may need a function that can handle more arguments than it was originally declared . These parameters are called indefinite length parameters , Different from the above parameters , Declaration will not be named .

1.9.3 Anonymous functions

python Use lambda To create anonymous functions . So called anonymity , It means to stop using def Statement defines a function in this standard form .

  • lambda It's just an expression , Function volume ratio def Simple a lot .
  • lambda The subject of is an expression , Not a block of code . Only in lambda Expressions encapsulate limited logic .
  • lambda Function has its own namespace , And can't access parameters outside their parameter list or in the global namespace .
  • although lambda The function looks like it can only write one line , But it's not the same as C or C++ Inline function , The purpose of the latter is not to occupy stack memory when calling small functions so as to increase the running efficiency .
lambda [arg1 [,arg2,.....argn]]:expression

1.9.4 Function Annotations

python3 Provide a grammar , Used to attach metadata to parameters and return values in a function declaration .

  • The parameters in the function declaration can be in : Add annotation expression after .
  • If the parameter is changed from the default value , Comments are placed in parameter names and = Between .
  • If the annotation has a return value , stay ) And... At the end of the function : In between -> And an expression . That expression can be of any type . The most common type of annotation is class ( Such as str or int) And string ( Such as ‘int > 0’).

python The only thing you do with annotations is , Store them in the... Of the function __annotations__ In the attribute .python No checks will be done 、 mandatory 、 verification . Comments on python The interpreter doesn't make any sense . Annotations are just metadata , You can supply IDE、 Use of tools such as frames and decorators .( When the parameter type is incorrect or the return type is incorrect ,IDE There will be a highlight .)

1.10 Iterators and generators

1.10.1 iterator

An iterator is an object that remembers the traversal location . The iterator object is accessed from the first element of the collection , Until all elements are accessed . Iterators can only move forward and not backward .

Using a class as an iterator requires implementing two methods in the class __iter__() And __next__().

It is estimated that this part will rarely be used . Generally, some complex data structures are defined ( Provide a traversal method ) Or generate a sequence ( Get the next number of fiboracci sequence with iterator ) Iterators are only used when .

1.10.2 generator

stay python in , Used yield The function of the is called the generator (generator).

Different from ordinary functions , A generator is a function that returns an iterator , Can only be used for iterative operations , It's easier to understand that a generator is an iterator . During the call generator run , Every encounter yield Function will pause and save all current running information , return yield Value , And next time next() Method to continue from the current location . Call a generator function , Returns an iterator object .

The list generator mentioned earlier can also be regarded as a generator , But it will generate all the elements at once . Generated by list , We can create a list directly , however , Limited by memory , List capacity must be limited , And create one that contains 100 List of ten thousand elements , Not only takes up a lot of storage space , If we just need to access the first few elements , Most of the space occupied by the elements behind that is wasted . If list elements can be calculated by some algorithm , So you don't have to create a complete list, To save a lot of space .

In order to interact with the generator object , The generator also provides send、throw and close Method .

send Method has a parameter , This parameter specifies the last suspended yield The return value of the statement . in general ,send Methods and next The only difference between methods is that they perform send The method will first put the last suspended yield The return value of the statement is set by parameters , So as to realize the interaction with the generator method . But we need to pay attention , A generator object does not execute next Before method , Because there is no yield Statement is suspended , So execute send The method will report an error .

throw The method is by sending the generator object where it was last suspended , Throw an exception . After that, the following statements in the generator object will continue to be executed , Until the next yield Statement returns . If after the execution of the generator object method , Still haven't met yield sentence , Throw out StopIteration abnormal .

Generator object close Method will throw a... At the hang of the generator object method GeneratorExit abnormal .GeneratorExit After exception , The system will continue to execute the subsequent code of the generator object method .

1.11 modular

A module is a file that contains all the functions and variables you define , Its suffix is .py. Modules can be introduced by other programs , To use functions in the module . This is also used python Standard library method .

Want to use Python Source file , Just execute it in another source file import sentence . When the interpreter encounters import sentence , If the module is in the current search path it will be imported . Use pycharm To develop it , The root directory of the project is added to the search path .

import sys
import os

python Of from Statement allows you to import a specified part from a module into the current namespace .

from random import random

It is also possible to import all the contents of a module into the current namespace .

from random import *

And outside , We can also import modules , Rename the module , That is to give the module an alias .

import numpy as np
from numpy import inf as INFINITY

Last , When a module is first introduced by another program , Its main program will run . If we want to introduce modules , A block in a module does not execute , We can use __name__ Property to make the block execute only when the module itself is running .( Be similar to C++ Of main function )

if __name__ == '__main__':
    print("hello world")

Suppose our project structure is as follows :

imgimg

We are right. b.py、c.py Make the following changes :

imgimg

We can try in a.py Import different modules in .

#  When a module is first introduced by another program , Its main program will run 
from lib import b

if name == ‘main’:
pass

>> Module b.py - main function

#  Use  __name__  Attribute to make  c.py  The main program of is executed only when the module itself is running 
import lib.c

if name == ‘main’:
pass

>>

from lib.b import Printer

if name == ‘main’:
p = Printer()
p()

>> Module b.py - main function
>> Module b.py - Class Printer

from lib.b import print_one

if name == ‘main’:
print_one()

>> Module b.py - main function
>> Module b.py - 1

from lib.c import *

if name == ‘main’:
print_one()
print_two()

>> Module c.py - 1
>> Module c.py - 2

from lib.b import Printer as BPrinter
from lib.c import Printer as CPrinter

if name == ‘main’:
BPrinter()()
CPrinter()()

>> Module b.py - main function
>> Module b.py - Class Printer
>> Module c.py - Class Printer

1.12 Namespace and scope

1.12.1 Namespace

A namespace is a mapping from a name to an object , Most of the namespace is through python Dictionary to achieve . Namespaces provide a way to avoid name conflicts in projects . Each namespace is independent , It doesn't matter , Therefore, a namespace cannot have duplicate names , But different namespaces can duplicate names without any effect .

There are generally three namespaces :

  • Built in name (built-in names), Python Language built-in names , Like the function name abs、char And exception name BaseException、Exception wait .
  • Global name (global names), The name defined in the module , Module variables recorded , Include function 、 class 、 Other imported modules 、 Module level variables and constants .
  • Local name (local names), The name defined in the function , The variables of the function are recorded , Including parameters of functions and locally defined variables .( Class is also defined in )

Namespace lookup order : Local -> overall situation -> built-in .

The lifecycle of a namespace depends on the scope of the object , If the object execution is complete , Then the lifecycle of the namespace ends . therefore , We cannot access objects with internal namespaces from external namespaces .

1.12.2 Scope

The scope is a python The program can directly access the body area of the namespace . In a python In the program , Direct access to a variable , All scopes will be accessed from the inside out until , Otherwise, an undefined error will be reported . Program variables are not accessible anywhere , Access depends on where the variable is assigned . The scope of a variable determines which part of the program can access which specific variable name .

python There are a total of 4 Kind of , Namely :

  • L(Local): The innermost layer , Contains local variables , Like a function / Methods the internal .
  • E(Enclosing): Contains nonlocal (non-local) It's not the whole picture (non-global) The variable of . Like two nested functions , A function ( Or class ) A It contains a function B , So for B For the name of A The scope in is nonlocal.
  • G(Global): The outermost layer of the current script , For example, the global variables of the current module .
  • B(Built-in): Contains built-in variables / Keywords, etc .

Rule order :L –> E –> G –> B

python There are only modules in (module), class (class) And function (def、lambda) To introduce a new scope , Other code blocks ( Such as if/elif/else、try/except、for/while etc. ) No new scopes will be introduced , In other words, the variables defined in these statements , It can also be accessed from the outside .

When the internal scope wants to modify the variables of the external scope , Need to use global or nonlocal keyword .

1.13 Input and output

1.13.1 str() And repr()

If you want to convert the output value to a string , have access to repr() or str() Function to implement .

  • str(): Function returns a user-friendly expression .
  • repr(): Produce an interpreter readable form of expression .
import datetime

datetime.datetime.now()
>>> datetime.datetime(2018, 9, 30, 18, 50, 35, 860171)

print(datetime.datetime.now())
>>> 2018-09-30 18:50:41.618557

print(repr(datetime.datetime.now()))
>>> datetime.datetime(2018, 9, 30, 18, 56, 11, 277455)

print(str(datetime.datetime.now()))
>>> 2018-09-30 18:50:41.618557

The printing operation will first try __str__ and str Built in functions (print The internal equivalent form of operation ), It should usually return a friendly display .__repr__ For all other environments : It is used for prompt response in interactive mode and repr function , If not used __str__, Will use print and str. It should usually return an encoded string , Can be used to recreate objects , Or give developers a detailed display .

  • __repr__ You can output directly in an interactive environment , and __str__ Must be print(), perhaps str().
  • __repr__ The returned results should be more accurate , For programmer debugging , and __str__ Highly readable results , For user-friendly display .
#  restructure __repr__
class TestRepr():
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return 'TestRepr(%s)' % self.data

>>> tr = TestRepr(‘5’)
>>> tr
TestRepr(5)
>>> print(tr)
TestRepr(5)

#  restructure __str__
class TestStr():
    def __init__(self, data):
        self.data = data
    def __str__(self):
        return '[Value: %s]' % self.data

>>> ts = TestStr(‘5’)
>>> ts
<main.TestStr at 0x7fa91c314e50>
>>> print(ts)
[Value: 5]

##  restructure __str__  and __repr__
class Test():
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return 'Test(%s)' % self.data
    def __str__(self):
        return '[Value: %s]' % self.data

t = Test(‘5’)
t
>>> Test(5)
print(t)
>>> [Value: 5]

And refactoring __str__ and __repr__ when ,print() When the output __str__ Will be covered __repr__, namely Automatically call when the interactive environment outputs directly __repr__, And in the print() Automatically call... When outputting __str__.

  • Suppose we use print() When outputting, it will call... First __str__.
  • Suppose we store objects in a list , Then when we output the list directly ,python Would call __repr_&#95.

repr() The return value of can be used in general eval() Function to restore an object , Generally speaking, there is the following equation .

obj = str()
print(obj == eval(repr(obj)))

1.13.2 Output format beautification

Another way zfill(), It will fill the left side of the number with 0. alignment . String object rjust() Method , It can put strings to the right , And fill in the space on the left . There's a similar way , Such as ljust() and center().

for x in range(1, 11):
    print(repr(x).rjust(2), repr(x*x).rjust(3), end=' ')
    #  Note the previous line  'end'  Use 
    print(repr(x*x*x).rjust(4))

for x in range(1, 11):
print(’{0:2d} {1:3d} {2:4d}’.format(x, xx, xx*x))

str.format() The basic use of :

#  Basic usage 
print("{} {}".format('Hello', 'World'))
#  The specified location 
print("{0} {1}".format('Hello', 'World'))
print("{1} {0}".format('Hello', 'World'))
#  Use keywords 
print("{verb} {noun}".format(verb='Hello', noun='World'))
#  Location and keyword parameters 
print("{0} {noun}".format('Hello', noun='World'))
# !a ( Use  ascii()), !s ( Use  str())  and  !r ( Use  repr())  Can be used to convert a value before formatting it 
import datetime
now = datetime.datetime.now()
print("{0!s}, {0!r}".format(now))
#  optional  :  And format identifier can follow the field name 
#  Specify the number of decimal places 
pi = 3.141592653589793
print("{0:.3f}".format(pi))
#  Specify the width 
prices = {'banana': 1, 'apple': 2, 'mango': 3}
for (fruit, price) in prices.items():
    print('{0:10} ==> {1:10d}'.format(fruit, price))

If you have a long formatted string , And you don't want to separate them , So it's good to use variable names instead of locations when formatting :

prices = {'banana': 1, 'apple': 2, 'mango': 3}
print('banana: {0[banana]:d}; apple: {0[apple]:d}; mango: {0[mango]:d}'.format(prices))
print('banana: {banana:d}; apple: {apple:d}; mango: {mango:d}'.format(**prices))

1.13.3 Keyboard entry

python Provides input() The built-in function reads a line of text from standard input , The default standard input is the keyboard .input Can receive a python Expression as input , And return the result to .

content = input("please input a sentence: ")
print(content)

1.14 File read

python open() Method is used to open a file , And return the file object , This function is used for processing files , If the file cannot be opened , Will throw out OSError. Use open() Methods must ensure that the file object is closed , That is to call close() Method .

open() A function usually takes two arguments : file name (file) And pattern (mode).

f = open(file, mode='r')
f = open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
  • r, Open the file read-only . The pointer to the file will be placed at the beginning of the file . This is the default mode .
  • rb, Open a file in binary format for read-only use . The file pointer will be placed at the beginning of the file .
  • r+, Open a file for reading and writing . The file pointer will be placed at the beginning of the file .
  • rb+, Open a file in binary format for reading and writing . The file pointer will be placed at the beginning of the file .
  • w, Open a file only for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .
  • wb, Opening a file in binary format is only used for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .
  • w+, Open a file for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .
  • wb+, Open a file in binary format for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .
  • a, Open a file for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to .
  • ab, Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to .
  • a+, Open a file for reading and writing . If the file already exists , The file pointer will be placed at the end of the file . Append mode when the file opens . If the file does not exist , Create a new file for reading and writing .
  • ab+, Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . If the file does not exist , Create a new file for reading and writing .

When processing a file object , Use with Keywords are a great way to . At the end , It will help you close the file correctly , And it's better to write than try - finally The sentence block should be short .

with open(file, 'r') as f:
    data = f.read()
print(f.closed)

1.14.1 pickle modular

python Of pickle The module implements the basic data sequence and deserialization .

  • adopt pickle Module serialization operation we can save the object information running in the program to a file , Permanent storage .
  • adopt pickle Deserialization of modules , We were able to create objects from the file that were last saved by the program .
#  Put the object  obj  Save to file  file  In the middle .
pickle.dump(obj, file, [,protocol])
#  Read the file  file
data = pickle.load(file)

among :

  • protocol Protocol version used for serialization . Among them, the agreement 0 and 1 Compatible with older versions of python.protocol The default value is 0.protocol The value of is -1 when , Represents the highest protocol used . Generally speaking , We can also... Without specifying an agreement .
    • 0:ASCII agreement , The serialized object uses printable ASCII Code said ;
    • 1: Old binary protocol ;
    • 2:2.3 New binary protocol introduced by version , More efficient than before .
  • file Class file object saved for the object .file There has to be write() Interface , file It can be one with ‘w’ Open a file or a StringIO Object or any other implementation write() Object of the interface . If protocol>=1, The file object needs to be opened in binary mode .

about pickle modular , The commonly used methods are : Save a trained model and a set of parameters .

1.15 system operation

os Module provides a very rich way to handle files and directories . Details can be referred to Python3 OS file / Catalog method And Python3 os.path() modular .

Next , We will introduce some commonly used... In combination with specific scenarios os function .

Mentioned in the previous chapter , We can use pickle To save the trained scikit-learn Model . Suppose we use the following code to save the model :

1.15.1 Create folder

#  Save the model 
with open('save/sklearn.pickle', 'wb') as f:
    pickle.dump(tree, f)

If there is no... In the current directory save This folder ,python The following error message will be displayed :

FileNotFoundError: [Errno 2] No such file or directory: 'save/sklearn.pickle'

here , We need to create a code called save To avoid errors :

#  Create folder 
os.mkdir('save')
#  Save the model 
with open('save/sklearn.pickle', 'wb') as f:
    pickle.dump(tree, f)

1.15.2 Path detection

Continue the previous assumptions , Suppose we want to execute the following code :

#  Create folder 
os.mkdir('save')
#  Save the model 
with open('save/sklearn.pickle', 'wb') as f:
    pickle.dump(tree, f)

If the current directory already exists save This folder ,python The following error message will be displayed :

FileExistsError: [Errno 17] File exists: 'save'

To avoid this , We further modify the previous code :

#  testing  save  Does the folder exist 
if not os.path.isdir('save'):
    #  Create folder 
    os.mkdir('save')
#  Save the model 
with open('save/sklearn.pickle', 'wb') as f:
    pickle.dump(tree, f)

Run the modified code , We can do that save Automatically generated when the folder does not exist save Folder to avoid errors .

In addition to detecting the existence of folders , We can also detect whether the file exists to avoid overwriting the trained model :

#  Check whether the model exists 
if not os.path.exists('save/sklearn.pickle'):
    with open('save/sklearn.pickle', 'wb') as f:
        pickle.dump(tree, f)

On this basis , We have a more rigorous way of writing :

#  Check whether the file exists 
if os.path.exists(self.filenames):
    #  Ask the user to enter instructions 
    flag = input('file %s already exists, do you want to remove it? [y/n]\n' % self.filenames)
    if flag == 'y':
        #  Remove existing files 
        os.remove(self.filenames)
    elif flag == 'n':
        #  There is an error in the thrown file 
        raise FileExistsError
    else:
        #  Throw an input value error 
        raise ValueError

with open(self.filenames, ‘wb’) as f:
pickle.dump(tree, f)

1.15.3 Path synthesis

Suppose we want to train and save multiple models , We need to name different models separately . here , We can use path composition to name :

path = 'save/'

with open(os.path.join(path, time.strftime(’%Y-%m-%d_%H-%M-%S.pkl’, time.localtime(time.time()))), ‘wb’) as f:
pickle.dump(tree, f)

with open(os.path.join(path, time.strftime(’%Y-%m-%d_%H-%M-%S.pkl’, time.localtime(time.time()))), ‘wb’) as f:
pickle.dump(tree, f)

In addition to the above synthesis , The following synthesis methods are also possible :

print(os.path.join('.', time.strftime('%Y-%m-%d_%H-%M-%S.pkl', time.localtime(time.time()))))

>>> ./2020-01-14_13-47-14.pkl

print(os.path.join(’.’, ‘save’, time.strftime(’%Y-%m-%d_%H-%M-%S.pkl’, time.localtime(time.time()))))

>>> ./save/2020-01-14_13-47-40.pkl

#os.getcwd() You can get the current working path
print(os.path.join(os.getcwd(), time.strftime(’%Y-%m-%d_%H-%M-%S.pkl’, time.localtime(time.time()))))

>>> /Users/Eddie/Desktop/Note/2020-01-14_13-48-32.pkl

1.15.4 Document Retrieval

If there are many files in our dataset , And there are more than one kind of documents , We can get the names of all the files first, and then process them separately :

# obtain available data automatically
files = os.listdir(self.path)

Aforementioned files The variable is a that contains all the file names lists. On this basis , We can use regular expressions ( It will be introduced later ) Sort and filter these files :

#  Get all file names 
files = os.listdir(self.path)
#  Define regular rules 
regulation = [r'\d+_dataset_info.txt', r'\d+_fit_crop_info.txt'
                r'\d+_image.png', r'\d+_render_light.png', r'\d+_joints.npy', r'\d+_body.pkl']
#  Classify by rules 
category = []
for r in regulation:
    p = re.compile(r)
    target = []
    for name in files:
        if re.match(p, name):
            target.append(name)
    category.append(target)

#judge the validity of data
category[0].sort(key=lambda x: x[0:5])
for i in range(1, len(category)):
if len(category[i]) != len(category[i-1]):
raise ValueError(‘Incomplete sample or redundant sample exists’)
category[i].sort(key=lambda x: x[0:5])

#create tfrecord
for index in range(len(category[0])):
for i in range(1, len(category)):
if category[i][index][0:5] != category[i-1][index][0:5]:
raise ValueError(‘Mismatched sample exists at index %d’ % index)
info_path = self.path + category[0][index]
crop_info_path = self.path + category[1][index]
image_path = self.path + category[2][index]
silhouette_path = self.path + category[3][index]
keypoint_path = self.path + category[4][index]
mesh_path = self.path + category[5][index]
# serialize
self.__serialize(info_path, crop_info_path, image_path, silhouette_path, keypoint_path, mesh_path)
self.writer.close()

1.15.5 File permissions

When we read and write some documents , You may encounter situations without permission . here , We can use the following function to modify the permissions of the file :

os.chmod(path, mode)

Specific use can refer to Python3 os.chmod() Method

1.16 Errors and exceptions

1.16.1 Grammar mistakes

python We call it parsing error . When we run code with syntax errors , The parser will throw SyntaxError, And pointed out the location of the error . It is recommended to use PyCharm、Spyder etc. IDE Development , these IDE Can provide itellisense The function of .

File "/Users/Eddie/Desktop/Note/tmp.py", line 4
    a = b -= c
           ^
SyntaxError: invalid syntax

1.16.2 Exception capture

Even if python The syntax of the program is correct , When it's running , There is also the possibility of mistakes . Errors detected during runtime are called exceptions .

The simplest way to detect exceptions is to use assert sentence .python assert( Assertion ) Used to determine an expression , In the expression, the condition is false Trigger exception when (AssertionError).

assert The syntax of is as follows :

assert expression [, arguments]

It is equivalent to ( Will be described in detail later ):

if not expression:
    #  Use  raise  To throw a mistake 
    raise AssertionError(arguments)

If we encounter an exception , I hope the program can handle these errors , We need to use exception capture statements . Exception catching can be used try/except sentence . Here is an example :

#  ordinary  try/except
while True:
    try:
        dividend = int(input("please input the dividend: "))
        divisor = int(input("please input the divisor: "))
        break
    except ValueError as err:
        print("please input again: %s. " % err)

The program will try to execute try What's in it . If no abnormality occurs , It ignores except Clause , And in try End after Clause execution . If in execution try An exception occurred during clause , that try The rest of the clause will be ignored . If the type of exception and except The following names match , So the corresponding except Clause will be executed ; If an exception is not associated with anything except matching , Then this exception will be passed to the upper level try in .

One try The statement may contain more than one except Clause , To handle different specific exceptions . At most one branch will be executed . The handler will only target the corresponding try Exception in Clause , Not the others try Exception in handler for . Exception handling doesn't just deal with those that happen directly in try Exception in clause , It can also handle functions invoked in clauses ( Even indirectly called functions ) The exception thrown in .

One except Clause can handle multiple exceptions at the same time , These exceptions will be placed in parentheses as a tuple :

while True:
    try:
        dividend = int(input("please input the dividend: "))
        divisor = int(input("please input the divisor: "))
        print("{} / {} = {}".format(dividend, divisor, dividend / divisor))
        break
    except (ValueError, ZeroDivisionError) as err:
        print("please input again: %s. " % err)

the last one except Clause to ignore the name of the exception , It will be used as a wildcard . You can use this method to print an error message , Then throw the exception again .

try/except There is also an optional else Clause , If you use this clause , Then it must be put on all except After Clause .else Clause will be try Clause is executed when no exception occurs .

while True:
    try:
        dividend = int(input("please input the dividend: "))
        divisor = int(input("please input the divisor: "))
        result = dividend / divisor
    except (ValueError, ZeroDivisionError) as err:
        print("please input again: %s. " % err)
    else:
        print("{} / {} = {}".format(dividend, divisor, result))
        break

try/finally Statement finally(finally It can be understood as cleaning up behavior ) Whether or not an exception occurs, it will execute .

1.16.3 Throw an exception

python Use raise Statement throws a specified exception .

raise [Exception [, args [, traceback]]]

Here is a simple example :

while True:
    try:
        negative = int(input("please input a negative integer: "))
        #  When the input is not negative , Throw an exception 
        if negative >= 0:
            raise ValueError("non-negative inputs are not expected")
    except ValueError as err:
        print("please input again: %s" % err)
    else:
        print("your input is {}".format(negative))
        break

If you throw an exception using the above statement , Although we got the wrong log output , But I don't know why it went wrong , Nor can we locate the specific error location . We further use trackback You can get specific errors , And navigate to the wrong location . We can also have our own exceptions by creating a new exception class ( Usually you don't need to do this ). As for how to customize exceptions , May refer to Python3 Errors and exceptions .

1.17 Regular expressions

A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .python Of re Module enable python The language has all the regular expression functions .

First ,’r’ Is to prevent character escape If... Appears in the path ‘\t’ Words , No addition ‘r’ Words ‘\t’ Will be transferred And added ‘r’ after ‘\t’ You can keep the original look . When assigning a value to a string , Add in front ‘r’ It can prevent the string from being escaped . With ‘r’ Starting string , Often used in regular expressions , Corresponding re modular .

print(r"\nHello World")
print("\nHello World")

then , Let's start with two functions :re.match() And re.search().

re.match() Try to match a pattern... From the beginning of the string , If it's not a successful start match , Just go back to None;re.search() Scan the entire string and return the first successful match .

1.17.1 Regular expression patterns

Pattern strings use special syntax to represent a regular expression : Letters and numbers represent themselves . Letters and numbers in a regular expression pattern match the same string . Most letters and numbers have different meanings when preceded by a backslash . Punctuation matches itself only when it is escaped , Otherwise, they mean something special . The backslash itself is needed to escape . Because regular expressions usually contain backslashes , So you'd better use raw strings to represent them . Pattern elements ( Such as r’\t’, Equivalent to \\t ) Match the corresponding special characters .

1.17.2 Regular expression modifiers

Regular expressions can contain optional flag modifiers to control the matching pattern . The modifier is specified as an optional flag .

For more models, please refer to Python3 Regular expressions .

1.18 object-oriented

The introduction of object-oriented technology can refer to Python3 object-oriented .

1.18.1 Inherit

Here is a basic inheritance :

class DerivedClassName(BaseClassName1):
    <statement-1>
    .
    .
    .
    <statement-N>

Suppose in the above example DerivedClassName And BaseClassName1 Not in the same scope , We need to make the following changes :

class DerivedClassName(modname.BaseClassName1):
    <statement-1>
    .
    .
    .
    <statement-N>

The derived class inherits the fields and methods of the base class . Inheritance also allows the object of a derived class to be treated as a base class object . Here is a classic example :

#  One  Cat  Object of type derived from  Animal  class 
class Animal:
    def __init__(self, name):
        self.name = name
def yell(self):
    print("%s: %s" % (self.name, "..."))

class Cat(Animal):
def init(self, name):
Animal.init(self, name)

a, b = Animal(“a”), Cat(“b”)
a.yell(), b.yell()
print(a.class, b.class)

>>> a: …
>>> b: …
>>> <class ‘main.Animal’> <class ‘main.Cat’>

In this simple example ,Cat Inherited Animal Of yell() Method , At the same time, in the constructor Animal Constructor for . actually , We have other methods to call parent functions ( Refer to the code file for details ).

If we're not here Cat In order to realize yell() Method ,Cat Will call... By default Animal Of yell() Method . We can do it in Cat To achieve and Animal Not the same yell(), This is called method rewriting . We can implement polymorphism through method rewriting .

class Animal:
    def __init__(self, name):
        self.name = name
def yell(self):
    print("%s: %s" % (self.name, "..."))

class Cat(Animal):
def init(self, name):
Animal.init(self, name)

def yell(self):
    print("%s: %s" % (self.name, "Meow"))

a, b = Animal(“a”), Cat(“b”)
a.yell(), b.yell()
print(a.class, b.class)

>>> a: …
>>> b: Meow
>>> <class ‘main.Animal’> <class ‘main.Cat’>

Achieve new yell() after ,Cat call yell() The output will be different from the original output .

On this basis , We can also implement multiple inheritance ( That is, a derived class inherits multiple classes at the same time ).

class DerivedClassName(Base1, Base2, Base3):
    <statement-1>
    .
    .
    .
    <statement-N>

Note the order of the parent classes in parentheses , If the parent class has the same method name , When subclass is used, it does not specify ,python Search from left to right : When the method is not found in the subclass , From left to right, find whether the parent class contains methods .

On the question of multiple inheritance , You can also refer to Multiple inheritance problem (super And mro). It explains super The advantages of the method .

1.19 Colored eggs

1.19.1 Config modular

Extract the configuration items in the code into the configuration file , There is no need to involve code modification when modifying the configuration , Avoid facing a bunch of crazy magic number, Great convenience for later software maintenance .python It provides standard configuration read-write module configParse(python2,python3 It is amended as follows configparser), For reading ini Format of the configuration file .

Suppose we want to build a model , This model requires several sets of different parameter configurations , Use config Files can make this process easier .

#  First define a class ( Suppose it's a model )
class Model:
    def __init__(self,
                 file='',
                 image_size=256,
                 batch_size=8,
                 num_epochs=None,
                 iterations=5e5,
                 learning_rate=3e-4, 
                 training=True):
        self.file = file
        self.image_size = image_size
        self.batch_size = batch_size
        self.num_epochs = num_epochs
        self.iterations = int(iterations)
        self.learning_rate = learning_rate
        self.training = training

Let's assume that the above model is saved in a file called model.py In the file of . Next , We need to create a file named... In the same directory config.ini Configuration file for , The contents are as follows :

[DEFAULT]
file = default.tfrecords
image_size = 256
batch_size = 8
num_epochs = 100
iterations = None
learning_rate = 3e-4
training = true

[training]
file = training.tfrecords
image_size = 256
batch_size = 8
num_epochs = None
iterations = 5e5
learning_rate = 3e-4
training = true

[testing]
file = testing.tfrecords
image_size = 512
batch_size = 1
num_epochs = 1
training = false

When we need to configure parameters for the model , We can use the following code to read the configuration file :

import configparser

# Create a new config parser, The newly created parser Of section The list is empty.
config = configparser.ConfigParser()
print(“Empty config: %s” % config.sections())

# Read configuration file
config.read(“config.ini”)
# Pull it section When listing , Will ignore DEFAULT term
print(“Config sections: %s” % config.sections())

>>> Empty config: []
>>> Config sections: [‘training’, ‘testing’]

Here are some detailed operations :

#  Judge a certain  section  Whether there is 
print("training: ", 'training' in config)
print("nothing: ", 'nothing' in config)

>>> training: True
>>> nothing: False

#  The structure of the configuration file is the same as  dict  similar 
for key, value in config.items():
    print(key, value)
    for k, v in value.items():
        print(k, v)

>>> DEFAULT <Section: DEFAULT>
>>> file default.tfrecords
>>> image_size 256
>>> batch_size 8
>>> num_epochs 100
>>> iterations None
>>> learning_rate 3e-4
>>> training true

>>> training <Section: training>
>>> file training.tfrecords
>>> image_size 256
>>> batch_size 8
>>> num_epochs None
>>> iterations 5e5
>>> learning_rate 3e-4
>>> training true

>>> testing <Section: testing>
>>> file testing.tfrecords
>>> image_size 512
>>> batch_size 1
>>> num_epochs 1
>>> training false
>>> iterations None
>>> learning_rate 3e-4

#  You can also use subscripts to access 
print(config['DEFAULT']['file'])

>>> default.tfrecords

Next , We will use the configuration file to create the model . First , We need to make some changes to the model :

class Model:
    def __init__(self,
                 file='',
                 image_size=256,
                 batch_size=8,
                 num_epochs=None,
                 iterations=5e5,
                 learning_rate=3e-4,
                 training=True,
                 config_session=None):
        if config_session is None:
            self.file = file
            self.image_size = image_size
            self.batch_size = batch_size
            self.num_epochs = num_epochs
            self.iterations = int(iterations)
            self.learning_rate = learning_rate
            self.training = training
        else:
            self.file = config_session['file']
            self.image_size = config_session.getint('image_size')
            self.batch_size = config_session.getint('batch_size')
            self.num_epochs = config_session.getint('num_epochs')
            self.iterations = int(config_session.getfloat('iterations'))
            self.learning_rate = config_session.getfloat('learning_rate')
            self.training = config_session.getboolean('training')
#  Read  training  To configure 
model = Model(config_session=config['training'])
print(model.file, model.iterations, model.training)

>>> training.tfrecords 500000 True

# You can also read the default value
model = Model(config_session=config[‘DEFAULT’])
print(model.file, model.iterations, model.training)

# Read testing
# Because in testing Cannot find... In configuration iterations And learning_rate, Will be from DEFAULT Read the corresponding value in
model = Model(config_session=config[‘testing’])
print(model.file, model.iterations, model.training)

>>> testing.tfrecords 0 False

However , If you use the above operation to read , The code still looks a little bloated . We have a more elegant way to read the configuration file . We can use pyyaml To make it easier to read configuration files .pyyaml yes python Three party Library , It needs to be installed separately .

Suppose we want to put the above config.ini Convert to yaml Format , We can refer to the following code :

config = dict()

config[‘DEFAULT’] = {‘file’: ‘default.tfrecords’, ‘image_size’: 256, ‘batch_size’: 8,
‘num_epochs’: 100, ‘iterations’: None, ‘learning_rate’: 3e-4,
‘training’: True}
config[‘training’] = {‘file’: ‘training.tfrecords’, ‘image_size’: 256, ‘batch_size’: 8,
‘num_epochs’: None, ‘iterations’: 5e5, ‘learning_rate’: 3e-4,
‘training’: True}
config[‘testing’] = {‘file’: ‘testing.tfrecords’, ‘image_size’: 512, ‘batch_size’: 1,
‘num_epochs’: 1, ‘iterations’: None, ‘learning_rate’: 0,
‘training’: True}

with open(‘config.yaml’, ‘w’, encoding=‘utf-8’) as f:
yaml.dump(config, f)

Execute the above code , We can get a file with the following contents yaml file :

DEFAULT: {batch_size: 8, file: default.tfrecords, image_size: 256, iterations: null,
  learning_rate: 0.0003, num_epochs: 100, training: true}
testing: {batch_size: 1, file: testing.tfrecords, image_size: 512, iterations: null,
  learning_rate: 0, num_epochs: 1, training: true}
training: {batch_size: 8, file: training.tfrecords, image_size: 256, iterations: 500000.0,
  learning_rate: 0.0003, num_epochs: null, training: true}

When we need to read yaml When you file , We just need to execute the following code :

with open('config.yaml', 'r', encoding='utf-8') as f:
    # config  It's a  dict
    config = yaml.load(f)

Next , We can use yaml File to initialize our class . Reading the values in the dictionary and initializing them one by one will be a little cumbersome , So we can use __dict__ Property to initialize .

class Model:
    def __init__(self, kwargs):
        self.__dict__ = kwargs

with open(‘config.yaml’, ‘r’, encoding=‘utf-8’) as f:
config = yaml.load(f)

model = Model(config[‘training’])
print(model.file, model.batch_size, model.num_epochs, model.training)

>>> training.tfrecords 8 None True

however , We'd better not only provide the above initialization methods . Because the above initialization method does not provide the definition of parameters in the code . Here is a feasible way to improve :

class Model:
    def __init__(self, kwargs, *,
                 file='',
                 image_size=256,
                 batch_size=8,
                 num_epochs=None,
                 iterations=5e5,
                 learning_rate=3e-4,
                 training=True):
        if kwargs:
            self.__dict__ = kwargs
        else:
            self.file = file
            self.image_size = image_size
            self.batch_size = batch_size
            self.num_epochs = num_epochs
            self.iterations = int(iterations)
            self.learning_rate = learning_rate
            self.training = training

1.19.2 Command line arguments

Using command line arguments , We can make our model more user-friendly .

For command line arguments , We can use sys Module to get . Suppose our code file is as follows :

print(sys.argv)

When we execute the following instructions on the command line , You will get the following running results :

Eddie$ python3.5 tmp.py --h --i 123

>>> [‘tmp.py’, ‘–h’, ‘–i’, ‘123’]

and getopt A module is a module that deals specifically with command-line arguments , Used to get command line options and parameters , That is to say sys.argv. Command line options make program parameters more flexible . Support short option mode (-) And long option mode (–).

  • args: List of command line arguments to parse .
  • options: Define... In the form of a list ,options The colon after (:) Indicates that the option must have additional parameters , Without a colon, it means that the option has no additional parameters .
  • long_options: Define... In the form of a string ,long_options The later equal sign (=) Indicates if the option is set , There must be additional parameters , Otherwise, there will be no additional parameters .
  • The return value of this method consists of two elements : The first is (option, value) A list of tuples . The second is the parameter list , Including those not ’-‘ or ’–’ Parameters of .

Here is an application . The user can modify the parameters of the model through the command line parameters :

import getopt
import sys

class Model:
    def __init__(self, kwargs, *,
                 file='',
                 image_size=256,
                 batch_size=8,
                 num_epochs=None,
                 iterations=5e5,
                 learning_rate=3e-4,
                 training=True):
        self.file = file
        self.image_size = image_size
        self.batch_size = batch_size
        self.num_epochs = num_epochs
        self.iterations = int(iterations)
        self.learning_rate = learning_rate
        self.training = training

        for key, value in kwargs.items():
            self.__dict__[key] = value

# Extract command line parameters ,sys.argv[1:] It means to ignore and drop the mark as 0 The name of the file
opts, args = getopt.getopt(sys.argv[1:], [], [‘file=’, ‘image_size=’, ‘batch_size=’, ‘num_epochs=’,
‘iterations=’, ‘learning_rate=’, ‘training=’])
# Create a parameter Dictionary
config = dict()
for k, v in opts:
# k[2:] Used for holding ‘–’ Get rid of
config[k[2:]] = v

# The new object
model = Model(config)
print(model.file)

Eddie$ python3.5 tmp.py --file training.tfrecords
>>> training.tfrecords

recursive

The hanotta problem (Hanoi)

 Insert picture description here
The task can be broken down into several steps , This idea is very important. It's called task decomposition . It is divided into several tasks. Compared with the original tasks, these tasks have the same form and smaller scale , You can do it recursively . Follow the steps , according to “ Take a step first , How about the rest of the questions ” Thought , The first step is to climb one or two steps , be n How to walk the steps ways(n) = wany(n-1)+ways(n-2).

def Hanoi(n, src, mid, dest):
	#  take src Upper n null , With mid The seat is a transfer , Move to dest seat ,print Equivalent to mobile operation 
	if(n == 1):	#  Just move one plate 
		#  Directly from the plate src Move to dest that will do 
		print(src + "->" + dest)
		return	#  Recursive termination 
	Hanoi(n-1, src, dest, mid)	# First the n-1 A plate from src Move to mid
	print(src + "->" + dest)	# Take another plate from src Move to dest
	Hanoi(n-1, mid, src, dest)	# The final will be n-1 A plate from mid Move to dest, frequency T(n)=2T(n-1)+1
	
n = int(input())
Hanoi(n, 'A', 'B', 'C')
	

Draw a snowflake curve ( Koch curve )

Recursive definition of snowflake curve
1) Long for size, Direction is x(x It's the angle ) Of 0 First order snowflake curve , Is the direction x The last one is size The line segment
2) Long for size, Direction is x Of n First order snowflake curve , It is composed of the following four parts :
1. Long for size/3, Direction is x Of n-1 First order snowflake curve
2. Long for size/3, Direction is x+60 Of n-1 First order snowflake curve
3. Long for size/3, Direction is x-60 Of n-1 First order snowflake curve
4. Long for size/3, Direction is x Of n-1 First order snowflake curve

 Insert picture description here

import turtle
def snow(n,size):	#n Is the number of orders ,size It's length. . Starting from the current starting point , Draw a length of... In the current direction size, The steps are n Snowflake curve 
	if n == 0:
		turtle.fd(size)	# The pen moves in the current direction size
	else:
		for angle in [0,60,-120,60]:	# For each element in the list angle:
			turtle.left(angle)			# Pen left turn angle degree ,turtle.lt(angle) Can also be 
			snow(n-1,size/3)

turtle.setup(800,600)
# By default, the window is located in the middle of the screen , Wide and high 800*600 Pixels , Window center coordinates (0,0)
# The direction of the initial pen is 0 degree . It's the East 0 degree , Due north is 90 degree 
turtle.penup()			# Raise the pen 
turtle.goto(-300,-50)	# Move the pen to -300,-50 Location 
turtle.pendown()
turtle.pensize(3)
snow(3,600)				# The drawing length is 600, The steps are 3 Snowflake curve , The direction is horizontal 
turtle.done()			# Keep drawing window 

exercises

 Insert picture description here
Put it together 5 The scheme is divided into 5 And don't take 5 Two cases . Not take 5 When ,(5,4) Mean gather 5 The maximum number you can take is 4.

def ways(n,m):
	if n == 0:		# The boundary conditions 
		return 1	# Don't take any numbers , There is only one way 
	if m == 0:
		return 0
	w = ways(n,m-1)	# If you don't take m
	if n >= m:		# If you take m
		w += ways(n-m,m)
	return w

a = int(input())
print(ways(a,a)

Strings and tuples - Exercise stone scissors paper

describe
Stone scissors paper is a common guessing game . Stone is better than scissors , Scissors are better than cloth , Busson stone . If two people punch the same way , It's a draw .
One day , Small A And small B Just right Play with stones, scissors, paper . It's known that their fists are periodic , such as :“ stone - cloth - stone - scissors - stone - cloth - Stone scissors …”, That is to say “ stone - cloth - stone - scissors ” For the cycle of . Excuse me, , Small A And small B Than the N After the round , Who wins more rounds ?
Input
The input contains three lines :
The first line contains three integers :N,NA,NB, They are better than N round , Small A The cycle length of the punch , Small B The cycle length of the punch 0<N,NA,NB < 100. The second line contains NA It's an integer , It means small A The law of punching . The third line contains NB It's an integer , It means small B The law of punching .
among ,0 Express " stone ",2 Express " scissors ",5 Express “ cloth ". Two adjacent integers are separated by a single space .
Output
Output one line , If small A More rounds won , Output A; If small B More rounds won , Output B; If the two are even , Output draw.

def result(a,b):
    if a == b:
        return 0
    if a == 5 and b == 0:
        return 1
    if a == 0 and b == 5:
        return -1
    if a < b:
        return 1
    else:
        return -1

s  = input().split()
n,na,nb = int(s[0]),int(s[1]),int(s[2])
sa = input().split()
sb = input().split()
winA = winB  = 0
ptrA = ptrB = 0
for i in range(n):
    r = result( int(sa[ptrA]), int(sb[ptrB]) )
    if r == 1:
        winA += 1
    elif r == -1:
        winB += 1
    ptrA = (ptrA + 1) % na
    ptrB = (ptrB + 1) % nb
if winA > winB:
    print("A")
elif winA <winB:
    print("B")
else:
    print("draw")



#  Upper and lower case letters are interchanged  
s = input()
for c in s:
    if 'a' <= c <= 'z':
        print(chr(ord(c) - 32 ),end="")
    elif 'A' <= c <= 'Z':
        print(chr(ord(c) + 32),end="")
    else:
        print(c,end="")

List application examples : Trees outside the school gate

The length of a school gate is L There are a row of trees on the road , The interval between every two adjacent trees is 1 rice . We can think of the road as a number axis , One end of the road is on the number axis 0 The location of , In the other end L The location of ; Every integer point on the number axis , namely 0,1,2,……,L, They all have a tree .
Because there are some areas on the road that will be used to build subways . These areas are represented by their starting and ending points on the number axis . It is known that the coordinates of the starting point and the ending point of any area are integers , There may be overlaps between areas . Now let's put the trees in these areas ( Includes two trees at the end of the area ) Removal . Your task is to calculate how to move all these trees away , How many trees are there on the road .
Input
The first line has two integers L(1 <= L <= 10000) and M(1 <= M <= 100),L Represents the length of the road ,M Represents the number of regions ,L and M Separated by a space . Next M Each line contains two different integers , Separated by a space , Represents the coordinates of the starting point and the ending point of an area .
Output
Including a line , This line contains only one integer , Represents the number of trees remaining on the road .
The sample input
500 3
150 300
100 200
470 471
Sample output
298

s = input().split()
L,M = int(s[0]),int(s[1])
good = [True] * (L+1) #good[i]  by True Representation coordinates i My tree is still 
for i in range(M):
	s = input().split()
	start,end = int(s[0]),int(s[1])
	for k in range(start,end + 1):
		good[k] = False # coordinate k The trees were removed 
print(sum(good)) #sum yes python function , You can find the sum of list elements 
#True Namely 1,False Namely 0

Sorting algorithm of list

Selection sort , The stupidest way , The time complexity is n ( n − 1 ) 2 \frac {n(n-1)}{2} 2n(n1), perhaps O(n²), That is, yes n A list of elements ( Array ),, Need to do n² Compare it to .
Good sorting algorithm , For example, merge sort , Quick sort , Complexity is O(nlog(n))
python It has its own sorting function , The complexity is O(n
log(n)), It is also the general default sorting time complexity .

def SelectionSort(a): # Selection sort 
	# Will list a Sort from small to large 
	n = len(a)
	for i in range(n-1):
		# Each time from a[i] And the element to the right of it , Put it in a[i] This position 
		for j in range(i+1,n): # Examine... In turn a[i] Element on the right 
			if a[j] < a[i]:
				a[i],a[j] = a[j],a[i]
lst = [1,12,4,56,6,2]
SelectionSort(lst)
print(lst) #>>[1, 2, 4, 6, 12, 56]

Customize the sorting of comparison rules , If students For tuples students = ( (),(),....() ) When , Tuples cannot be modified , So there is no sort function , It can be used sorted Get a new sorted list , At this time print(sorted(students,key = f)).

# Multi level sorting 
def f(x):
	return (-x[2],x[1],x[0])
students = [('John', 'A', 15), ('Mike', 'C', 19),
			('Wang', 'B', 12), ('Mike', 'B', 12),
			('Mike', 'C', 12),
			('Mike', 'C', 18),
			('Bom', 'D', 10)]
students.sort(key = f ) # First according to the age from high to low , Then press the score from high to low , Then according to the name dictionary order 
print(students)
#>>[('Mike', 'C', 18), ('John', 'A', 15), ('Mike', 'B', 12), ('Wang', 
'B', 12), ('Mike', 'C', 12), ('Bom', 'D', 10)]

List exercises - Grade ranking

describe
Give the transcript of a course in the class , Please sort your transcripts from high to low , If you have the same score, the smaller one in the name dictionary comes first . Input
First act n(0 <n< 20), Indicates the number of students in the class ; Next n That's ok , Each behavior, each student's name and his grades , Separate them with a single space . The name contains only letters and is no longer than 20, The score is no more than 100 Non-negative integer .
Output
Sort the transcripts from high to low and output them , Each line contains two items: name and score , There is a space between .
The sample input
Kitty 80
Hanmeimei 90
Joey 92
Tim 28
Sample output
Joey 92
Hanmeimei 90
Kitty 80
Tim 28

n = int(input())
a = []
for i in range(n):
	s = input().split
	a.append(s[0],int(s[1]))
a.sort( key = lambda x: (-x[1],x[0]))
for x in a:
	print(x[0] x[1])

List exercises - Image blur processing

describe
Given n That's ok m The gray value of each pixel of the image of the column , It is required to fuzzify it with the following methods :
1. The gray value of the outermost pixels around remains unchanged ;
2. The new gray value of each pixel in the middle is the average of the original gray value of the pixel and its upper, lower, left and right adjacent pixels ( Round to the nearest whole number ).
Input
The first line contains two integers n and m, Represents the number of rows and columns in an image containing pixels .1 <= n <= 100,1 <=m <= 100.
Next n That's ok , Each row m It's an integer , Represents the gray level of each pixel of the image , Two adjacent integers are separated by a single space , Each element is in 0~255 Between .
Output
n That's ok , Each row m It's an integer , For the blurred image . Two adjacent integers are separated by a single space .
The sample input
4 5
100 0 100 0 50
50 100 200 0 0
50 50 100 100 200
100 100 50 50 100
Sample output
100 0 100 0 50
50 80 100 60 0
50 80 100 90 200
100 100 50 50 100

import copy
n,m = map(int,input().split())	# Yes n That's ok m Column 
a = []
for i in range(n):
	lst = list(map(int,input().split()))
	a.append(lst)	#a It's a two-dimensional list 
b = copy.deepcopy(a)
# (1,1) - (n-2,m-2)
for i in range(1,n-1):
	for j in range(1,m-1):
		b[i][j] = round((a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1])/5)

for i in range(0,n):
	for j in range(0,m):
		print(b[i][j],end = " ")
	print("")

Dictionaries and collections

Example : Count the frequency of words

Input
Number of rows , One word per line
Output
Type all words from high to low according to the number of occurrences of words . Same number of times , From small to large in dictionary order .
sample input
about
send
about
me
sample output
2 about
1 me
send

dt = {
    }
while Ture:
	try:
		wd = input()
		if wd in dt:	# If there is an element, the key is wd
			dt[wd] += 1
		else:
			dt[wd] = 1	# Add key wd The elements of , Its value is 1
		'''  It can be changed to  dt[wd] = dt.get(wd,0) + 1  If in dt There is a key in the wd The elements of , be get Return its value , Otherwise return to 0 '''
	except:
		break			# Enter the after input() Trigger exception , Jump here , Then jump out of the loop 
result = []				# The dictionary itself is not sorted , You can only use lists 
for x in dt.items():
	result.append(x)	#x Is a tuple ,x[0] Is the word ,x[1] Is the number of occurrences 
result.sort(key = lambda x:(-x[1],x[0]))	# Arrange from high to low according to the number of occurrences 
for x in result:
	print(x[1],x[0])	# Commas are spaces , Double quotation marks or single quotation marks are followed by carriage return by default ,end="" Replace the default carriage return 

Example : Appear the least number

describe
Enter several integers ( No more than 20 m ), Every whole religion is less than 1 Billion . Find the number that appears the least . If there is more than one such number , Then find the front one .
Input
Several integers , Every 1 That's ok .
Output
The integer that meets the requirements and its number of occurrences .
The sample input
200
18
9
70
70
6
200
18
Sample output
9 1

#[200,1,0]
#{200:[1,0],.....}  The value is the number of times plus position 
dt = {
    }
i = 0
while True:
	try:
		n = int(input())
		if n not in dt:
			dt[n] = [1,i]	# key n The value of is [1,i]
		else:
			dt[n][0] += 1
		dt[n][1] += 1
	except:
		break
ans = 0				#{ans, minT, pos}
minT = 1000000		# Used to record the number of occurrences of the number with the least number of occurrences so far 
pos =  1000000
for x in dt.items():	# Every element in the dictionary is a tuple x = (ans, [minT, pos])
	if minT > x[1][0]:
		minT = x[1][0]
		ans = x[0]
		pos = x[1][1]
	elif minT == x[1][0]:
		if pos > x[1][1]:
			ans = x[0]
			pos = x[1][1]
print(ans, minT)
# Use a list to do , The time complexity is O(n²)
dt = []
while True:
	try:
		dt.append(int(input()))
	except:
		break
ans = 0
minT = 1000000
for x in dt:
	cnt = dt.count(x)
	if minT > cnt:
		minT = cnt
		ans = x
print(ans, minT)

Document processing : Word frequency statistics

Program 1: Count the cumulative word frequency of a single file

result = {
    }  # Result Dictionary . The format is  { 'a':2,'about':3 ....}
if countFile(sys.argv[1],result) ==0:# argv[1]  yes   Source file , The results of the analysis are recorded in result Inside 
    exit()
lst = list(result.items())
lst.sort() # The words are sorted in dictionary order 
f = open(sys.argv[2],"w",encoding="gbk")  #argv[2]  It's the result file ,  The file is the default encoding , "w" Means write 
for x in lst:
    f.write("%s\t%d\n" % (x[0],x[1]))
f.close()

Program 2: Count the cumulative word frequency of multiple files

usage :python countfiles.py Results file
for example :python countfiles.py result.txt

For the current folder (countfiles.py File folder ) All file names under are letters a The first .txt File word frequency statistics , The total result of statistics is written into " Results file ” result.txt

Ideas :
To get .py All files in the folder where the program is located a Lead ,.txt Final document . For each file , Calling procedure countfile.py A function that processes a single file
import os #python Bring their own os library
os.listdir() You can get a list of all files and folders in the current folder . The elements in the list are file or folder names , No path ( Catalog )
os.path.isfile(x) Can be judged x Is it a file ( The folder is not a file )

'''  Usage of this program : python countfiles.py  Results file   for example : python countfiles.py result.txt  For all under the current folder  "a*.txt" File for word frequency analysis , The total results of the analysis are written into  " Results file " '''
import sys
import re
import os
def countFile(filename,words):
    # Yes  filename  File for word frequency analysis , The results of the analysis are recorded in the dictionary  words in 
    try:
        f = open(filename,"r",encoding = "gbk" )	# The file is the default encoding . Parameters can be added according to the actual situation  encoding="utf-8"  or  encoding = "gbk"
    except Exception as e:
        print(e)
        return 0
    txt = f.read()  # All the contents of the file are stored in the string txt
    f.close()
    splitChars = set([]) # A collection of separated strings 
    # Find out the non alphabetic characters in all files below , As a separator string 
    for c in txt:
        if not ( c >= 'a' and c <= 'z' or c >= 'A' and c <= 'Z'):
            splitChars.add(c)
    splitStr = ""  # be used for  re.split Regular expression of 
    # The regular expression form is similar to : ",|:| |-"  And so on , The string between two vertical lines is the separator 
    for c in splitChars:
        if c in {
    '.','?','!','"',"'",'(',')','|','*','$','\\','[',']','^','{','}'}:
	    # The above characters are special , Add to splitChars When inside, add... In front  "\"
            splitStr += "\\" + c + "|"   # python In the string ,\\ In fact, that is  \
        else:
            splitStr +=  c + "|"
    splitStr += " "  # '|' There must be something in the back , It doesn't matter to write more blanks 
    lst = re.split(splitStr,txt)#lst Is a separated list of words . Advanced string segmentation ,splitStr For regular expressions ,txt To split objects 
    for x in lst:
        if x == "":		# An empty string will be separated between two adjacent separated strings , Ignore it 
            continue
        lx = x.lower()
        if lx in words:
            words[lx] += 1  # If in a dictionary , Then the number of occurrences of the word change +1
        else:
            words[lx] = 1  # If it's not in the dictionary , Then add the word to the dictionary , The number of occurrences is set to 1
    return 1

result = {
    }  # Result Dictionary 
lst  = os.listdir() # List all files and folders under the current folder 
for x in lst:
    if os.path.isfile(x): # If x It's a document 
        if x.lower().endswith(".txt") and x.lower().startswith("a"):
            #x yes  'a' start , .txt ending 
            countFile(x,result)
lst = list(result.items())
lst.sort() # The words are sorted in dictionary order 
f = open(sys.argv[1],"w",encoding="gbk") #argv[2]  It's the result file ,  The file is the default encoding , "w" Means write 
for x in lst:
    f.write("%s\t%d\n" % (x[0],x[1]))
f.close()

Program 3: Accurately count the word frequency of words in the article

usage :
python countfile novary.py Source file Results file
Yes " Source file ” Analyze word frequency , The analysis results are written into " Results file "
If you encounter a change in the form of a word , Then it is converted into a prototype and then counted , That is, only the prototype
Word prototype - change - Vocabulary format
act
acted|acting|acts
action
actions
active
actively|activeness

Ideas :
1) You also need a dictionary to count words and their occurrences .
2) Read vocabulary word_varys.txt file , Construct a dictionary dt. The element form is :
{acted:act, acting:act, acts:act, actions:action,…}
Keys are variations of words , Value is the prototype of a word .
3) For each “ Source file ” Words in w, lookup dt The middle key is w The elements of x. If x non-existent , be w It's the prototype , Count the word frequency . If x There is , Then value x[1] Archetype , take x[1] The number of occurrences of plus 1.

#Windows The next source files are ansi Format 
#Windows The next source files are ansi Format 
'''  Usage of this program : python countfile_novary.py  Source file   Results file   for example : python countfile_novary.py a1.txt r1.txt  Yes  " Source file "  Analyze word frequency , The analysis results are written into  " Results file "  If you encounter a change in the form of a word , Then it is converted into a prototype and then counted   Word prototype - change   glossary :word_varys.txt  Format : act acted|acting|acts action actions active actively|activeness '''
import sys
import re
def makeVaryWordsDict():
    vary_words = {
     }    # The element form is   Change form : Prototype   for example  {acts:act,acting:act,boys:boy....}
    f = open("word_varys.txt","r",encoding="gbk")
    lines = f.readlines()
    f.close()
    L = len(lines)
    for i in range(0,L,2):  # Every two lines are the prototype and variation of a word 
        word = lines[i].strip()     # Word prototype ,strip Remove the spaces and line breaks before and after 
        varys = lines[i+1].strip().split("|")   # A word in a variety of forms , The middle is separated by a vertical line , To split the string 
        for w in varys:
            vary_words[w] = word  # Join in   Change form : Prototype  ,  key w The prototype is  word
    return vary_words

def makeSplitStr(txt):	# Used to generate regular expressions , To split the article 
    splitChars = set([])
    # Find out the non alphabetic characters in all files below , As a separator 
    for c in txt:
        if not ( c >= 'a' and c <= 'z' or c >= 'A' and c <= 'Z'):
            splitChars.add(c)
    splitStr = ""
    # Generated for  re.split Delimiter string 
    for c in splitChars:
        if c in ['.','?','!','"',"'",'(',')','|','*','$','\\','[',']','^','{','}']:
            splitStr += "\\" + c + "|"
        else:
            splitStr +=  c + "|"
    splitStr+=" "
    return splitStr

def countFile(filename,vary_word_dict):
# there vary_word_dict It's just a parameter , In the main program, it refers to makeVaryWordsDict() function 
# analysis  filename  file , Returns a dictionary as a result . To  vary_word_dict Look up the word prototype 
    try:
        f = open(filename,"r",encoding="gbk")
    except Exception as e:	#e For the returned exception reminder 
        print(e)
        return None
    txt = f.read()
    f.close()
    splitStr = makeSplitStr(txt)	# Generate regular expressions 
    words = {
    }	# Generate an empty dictionary 
    lst = re.split(splitStr,txt)
    for x in lst:
        lx = x.lower()	# Change to lowercase 
        if lx == "":
            continue
        if lx in vary_word_dict: # If you can find the prototype in the prototype Dictionary , It becomes a prototype and then counts 
            lx = vary_word_dict[lx]
        # Writing this sentence directly can replace the above  if  sentence  lx = vary_word_dict.get(lx,lx)
        words[lx] = words.get(lx,0) + 1
        #get(k,v)  If there is an element, the key is k, Returns the value of the element , without , Then return to v
    return words

result = countFile(sys.argv[1],makeVaryWordsDict())	# Return a dictionary , Comparison table of word prototype and deformation 
if result != None and result != {
    }:
    lst = list(result.items())
    lst.sort()
    f = open(sys.argv[2],"w",encoding="gbk")
    for x in lst:
        f.write("%s\t%d\n" % (x[0],x[1]))
    f.close()

Program 4: Extract only words that are not in the level 4 word list

usage :
python countfi le-nocet4.py Source file Results file
Yes “ Source file ” Analyze word frequency , Extract only words that are not in the level 4 word list , Write the analysis results to " Results file "
The list of level 4 words is in the file cet4words.txt in , Words are on a single line , With $ Lead
Ideas :
Read cetAwords.txt The words in , Put it in a collection . Encounter the word in the source file , First check if it's in the collection , If in , Then abandon .

#Windows The next source files are ansi Format 
#Windows The next source files are ansi Format 
'''  Usage of this program : python countfile_nocet4.py  Source file   Results file   for example : python countfile_nocet4.py a1.txt r1.txt  Yes  " Source file "  Analyze word frequency , Extract only words that are not in the level 4 word list , Write the analysis results to  " Results file "  Level 4 word list : cet4words.txt  The format : $abandon [?'b?nd?n] vt. abandonment ; give up ; Indulge ( own ) $ability [?'b?l?t?] n. Ability , can  $able ['e?bl] a. Have the ability to ; Capable , Capable  $aboard [?'b?:d] ad.&prep. On board ( The plane 、 vehicle ) On ;ad. Aboard ( The plane ) .... '''
import sys
import re
def makeFilterSet():
    cet4words = set([])
    f = open("cet4words.txt", "r",encoding="gbk")
    lines = f.readlines()
    f.close()
    for line in lines:
        line = line.strip()
        if line == "":
            continue
        if line[0] == "$":
            cet4words.add(line[1:])  #  Add level 4 words to   aggregate 
    return cet4words

def makeSplitStr(txt):
    splitChars = set([])
    # Find out the non alphabetic characters in all files below , As a separator 
    for c in txt:
        if not ( c >= 'a' and c <= 'z' or c >= 'A' and c <= 'Z'):
            splitChars.add(c)
    splitStr = ""
    # Generated for  re.split Delimiter string 
    for c in splitChars:
        if c in ['.','?','!','"',"'",'(',')','|','*','$','\\','[',']','^','{','}']:
            splitStr += "\\" + c + "|"
        else:
            splitStr +=  c + "|"
    splitStr+=" "
    return splitStr

def countFile(filename,filterdict):  # Word frequency statistics , To get rid of  filterdict The words in the collection 
    words = {
    }
    try:
        f = open(filename,"r",encoding="gbk")
    except Exception as e:
        print(e)
        return 0
    txt = f.read()
    f.close()
    splitStr = makeSplitStr(txt)
    lst = re.split(splitStr,txt)
    for x in lst:
        lx = x.lower()
        if lx == "" or lx in filterdict:  # Remove the  filterdict Words in 
            continue
        words[lx] = words.get(lx,0) + 1
    return words

result = countFile(sys.argv[1],makeFilterSet())
if result != {
    }:
    lst = list(result.items())
    lst.sort()
    f = open(sys.argv[2],"w",encoding="gbk")
    for x in lst:
        f.write("%s\t%d\n" % (x[0],x[1]))
    f.close()

Regular expressions

Application example : The Cao Cao in Zhuge Liang's mouth

The goal is : Find out the romance of the Three Kingdoms , All the scenes mentioned by Cao Ge Liang , What did he say
Pattern :

Confucius said :“ If Cao Cao leads troops to , What to do ?"
Confucius said :“ Gongjin's idea is to lower the fuck , Quite reasonable .”
Kong Ming replied :“ Cao Cao is a thief of the Han Dynasty , Why ask ?"
Kong Ming said with a smile :“ Today's operation has attracted millions of people ,…” :“” It's all in Chinese

import re
f = open(" The romance of The Three Kingdoms utf8.txt", "r", encoding = "utf-8")
txt = f.read()
f.close()
pt = "( kong ming .{0,2} yue :“[^”]*( Cao Cao | Cao thief | Fuck the thief | Cao ahui | fuck ).*?”)"	# [^”] Represents any character that is not a back quote 
a = re.findall(pt,txt)
print(len(a))		#>>58
for x in a:		#x Form like :(' Kong Ming replied :“ Cao Cao is a thief of the Han Dynasty , Why ask ?"', ' fuck ') Group one and group two , first * Greedy matching 
	print(x[0])

extract ip Address 、 mailbox 、 website

Generally speaking , To write an exact regular expression , For example, write a regular expression to match ip Address , To match it, the string must be ip Address , And ip The address must match it , It's more difficult .
Regular expressions can be written more leniently , namely ip The address must match it , But what can match it is not necessarily ip Address . For matching strings , In addition, make some additional judgments to eliminate non ip Address , This is easier than writing exact regular expressions .
for example : ‘\d+.(\d{1,3}.){2}\d+’
And then again split In the future, manually judge whether each paragraph is less than 255

#  Generally, you can copy it , Don't understand 
import re
ipadr = r"\b((25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\.){3}(25[0-5]|2[0-4]\d|((1\d{2})|([1-9]?\d)))\b"
mailbox = r"\b[a-zA-Z0-9_-][email protected][a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+\b"
url = r'http://[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][- a-zA-Z0-9]{0,62})+(/[-a-zA-Z0-9]+)*\b'
s = "My ip is 223.44.3.4, this is http://www.pku.edu.cn/python/new, http://www.sohu.com my mailbox is [email protected] ok?"
m = re.search(ipadr,s)
if m != None:
	print(m.group()) #>>223.44.3.4

use Pillow Process images

Image zooming

from PIL import Image # Import Image Class for image processing 
img = Image.open("c:/tmp/pic/grass.jpg") # Load the image file into the object img
w,h = img.size # Get the width and height of the image ( Company : Pixels ),img.size Is a tuple 
newSize = (w//2,h//2) # Generate a new image size 
newImg = img.resize(newSize) # Get a new image half the size of the original image 
newImg.save("c:/tmp/pic/grass_half.jpg") # Save new image file 
newImg.thumbnail((128,128)) # Become wide and high 128 Thumbnails of pixels 
newImg.save("c:/tmp/pic/grass_thumb.png", "PNG") 
# Save the new image file as png file 
newImg.show() # Display image file 

The rotation of the image 、 Flipped Picture 、 And filter effect

from PIL import Image
from PIL import ImageFilter # To achieve the filter effect, you need 
img = Image.open("c:/tmp/pic/grass_half.jpg")
print(img.format,img.mode) #>>JPEG RGB
newImg = img.rotate(90,expand = True) # Image rotates counterclockwise 90 degree 
newImg.show()
newImg = img.transpose(Image.FLIP_LEFT_RIGHT) # Flip left and right 
newImg = img.transpose(Image.FLIP_TOP_BOTTOM) # Flip up and down ( Reverse )
newImg = img.filter(ImageFilter.BLUR) # Blur effect 

Image clipping

from PIL import Image
img = Image.open("c:/tmp/pic/grass.jpg")
w,h = img.size[0]//3,img.size[1]//3
gap = 10 # The width of the blank space between two adjacent subgraphs in the nine palaces 10 Pixels 
newImg = Image.new("RGB",(w * 3 + gap * 2,h * 3 + gap * 2),"white")
for i in range(0,3):
	for j in range(0,3):
		clipImg = img.crop((j*w,i*h,(j+1)*w,(i+1)*h))
		clipImg.save("c:/tmp/pic/grass%d%d.jpg" % (i,j))
		newImg.paste(clipImg,(j*(w + gap), i * ( h + gap)))
newImg.save("c:/tmp/pic/grass9.jpg") # Save the nine palaces 
newImg.show()

Web crawler design

example : use pypetter Crawl daily stock trading information

• python The program needs to get the web page , And execute the inside javascript Program , To get stock data . use requests.get Unable to get the displayed page . Must use selenium perhaps pyppeteer.
Gem stock exchange code Daquan : https://www.banban.cn/gupiao/list_cyb.html
Shenzhen Stock Exchange code Daquan :https://www.banban.cn/gupiao/list_sz.html
Shanghai Stock Exchange code Daquan : https://www.banban.cn/gupiao/list_sh.html
• View the source code :
<li><a href="/gupiao/600151/"> Space mechatronics (600151)</a></li>
<li><a href="/gupiao/600156/"> Huasheng shares (600156)</a></li>
• A single stock : quote.eastmoney.com/sh600000.html

import re
import asyncio # Python 3.6 Then the self-contained collaborative Library 
import pyppeteer as pyp
import bs4
async def antiAntiCrawler(page): # by page Add anti crawler means 
	await page.setUserAgent('Mozilla/5.0 (Windows NT 6.1; \Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)\Chrome/78.0.3904.70 Safari/537.36')
	await page.evaluateOnNewDocument('() =>{ Object.defineProperties(navigator, \{ webdriver:{ get: () => false } }) }')


Get the stock name and code with regular expression

# Get the stock name and code with regular expression 
async def getStockCodes(page):
# from "https://www.banban.cn/gupiao/list_sh.html" Corresponding page Get all stock names and codes 
	codes = [] 
	# Final content :[" Luqiao, Sichuan (600039)"," Baotou Steel Group (600010)"......]
	html = await page.content()
	pt = '<a href="/gupiao/[^"]*">([^<]*\(\d+\))</a>'
	# Corresponding  <li><a href="/gupiao/600151/"> Space mechatronics (600151)</a></li>
	for x in re.findall(pt,html):
		codes.append(x)
	return codes
	# Time consuming :0: 00:00.033943
	
async def getStockInfo(url):
	browser = await pyp.launch(headless=False)
	#  start-up Chromium,browser That is to say Chromium browser , Non hidden startup 
	page = await browser.newPage() # Open a new page in the browser ( label )
	await antiAntiCrawler(page) # After a new page is generated, it will be called to reverse crawl 
	await page.goto(url) #  Load url Corresponding web page 
	codes = await getStockCodes(page)
	for x in codes[:3]: # Take only the first three stock information 
		print("-----",x) #x Form like  " Luqiao, Sichuan (600039)"
		pos1, pos2 = x.index("("), x.index(")")
		code =x[pos1 + 1:pos2] # Take the stock code , Such as 600039
		url = "https://quote.eastmoney.com/sh" + code + ".html"
		await page.goto(url)
		html = await page.content() # Before programming down, you can print(html) Have a look 
		pt = '<td>([^<]*)</td>.*?<td[^>]*id="gt\d*?"[^>]*>([^<]*)</td>'
		for x in re.findall(pt,html,re.DOTALL):
			print(x[0],x[1])
	await browser.close() # Close the browser 
url = "https://www.banban.cn/gupiao/list_sh.html"
loop = asyncio.get_event_loop()
loop.run_until_complete(getStockInfo(url))
''' <td> Open today :</td> <td id="gt1" class="txtl" data-bind="46">1.22</td> '''

html = await page.content()
print(html) # Copy the typed content to Notepad to view , Find out :
 Insert picture description here  Insert picture description here

use BeautifulSoup Get the stock name and code

# use BeautifulSoup Get the stock name and code 
async def getStockCodes(page):
	codes = []
	html = await page.content()
	soup = bs4.BeautifulSoup(html, "html.parser")
	for x in soup.find_all("li"):
	# Corresponding  <li><a href="/gupiao/600151/"> Space mechatronics (600151)</a></li>
		a = x.find("a")
		if( "(" in a.text and ")" in a.text):
			codes.append(a.text)
	return codes
# Time consuming :0: 00:00.193480

Use the browser's own search element function to obtain the stock name and code

# Use the browser's own search element function to obtain the stock name and code ,Chrome Right click the browser “ Open today ” The data of “10.23”, Pop up the menu and click “ Check ”
async def getStockCodes(page):
	codes = [] 
	elements = await page.querySelectorAll("li") # according to tag name Look for elements 
	# Corresponding  <li><a href="/gupiao/600151/"> Space mechatronics (600151)</a></li>
	for e in elements:
		a = await e.querySelector("a") # according to tag name Look for elements 
		obj = await a.getProperty("text") # just so so  a.getProperty("href")
		# If it doesn't work, change it to : obj = await a.getProperty("innerText")
		text = await obj.jsonValue() # Fixed writing 
		if( "(" in text and ")" in text):
			codes.append(text)
	return codes
	# Time consuming : 0:00:04.421178

pyppeteer Crawling Openjudge( Website of a submission job ) All the program source code submitted by yourself

Many websites need to log in to access their content , Simulate the login process with a browser , Enter the username and password 、 Click the login button . Or the program starts the browser , Wait for manual login , The program continues the crawler operation ( If there is a verification code ).
More advanced approach : No browser , After packet analysis , use requests Data transfer and login of Library .

import asyncio
import pyppeteer as pyp
async def antiAntiCrawler(page):
	# by page Add anti crawler means 
	await page.setUserAgent('Mozilla/5.0 (Windows NT 6.1; Win64; x64) \ ''AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/78.0.3904.70 Safari/537.36')
	await page.evaluateOnNewDocument('() =>{ Object.defineProperties(navigator,''{ webdriver:{ get: () => false } }) }')

async def getOjSourceCode(loginUrl):
	width, height = 1400, 800 # Web page width and height 
	browser = await pyp.launch(headless=False, userdataDir = "c:/tmp", args=[f'--window-size={
      width},{
      height}'])
	page = await browser.newPage()
	await antiAntiCrawler(page)
	await page.setViewport({
    'width': width, 'height': height})
	await page.goto(loginUrl)
	# If you log in manually , Then the following lines can be removed 
	element = await page.querySelector("#email") # Find the account input box 
	await element.type("[email protected]") #  Enter email 
	element = await page.querySelector("#password") # Find the password input box 
	await element.type("XXXXXXXXX") #  Input password 
	element = await page.querySelector("#main > form > div.user-login > p:nth-child(2) > button") # Find the login button 
	await element.click() #  Click the login button 
	# If you log in manually , Then the above lines can be removed 
	await page.waitForSelector("#main>h2", timeout=30000) # wait for “ Ongoing competition ...." The title appears 
	element = await page.querySelector("#userMenu>li:nth-child(2)>a")
	# look for " Personal homepage ” link 
	await element.click() # Click on the personal home page link 
	await page.waitForNavigation() # When the new page is loaded 
	elements = await page.querySelectorAll(".result-right")
	# Find all "Accepted" link ,  It has properties  class="result-right"
	
	page2 = await browser.newPage() # Open a new page  ( label )
	await antiAntiCrawler(page2)
	for element in elements[:2]: # Print only the first two programs 
		obj = await element.getProperty("href") # obtain href attribute 
		url = await obj.jsonValue()
		await page2.goto(url) # On the new page ( label ) Load new page 
		element = await page2.querySelector("pre") # lookup pre tag
		obj = await element.getProperty("innerText") # Take the source code 
		text = await obj.jsonValue()
		print(text) 
		print("-------------------------")
	await browser.close()
	
def main():
	url = "http://openjudge.cn/auth/login/"
	asyncio.get_event_loop().run_until_complete(getOjSourceCode(url))
main()

pyppeteer+requests Write a fast crawler

• requests It's troublesome to log in ( You need to use skills such as capturing bags )
• pyppeteer No, requests fast ( Because you want the browser to render web pages )
• For those who need to log in , And the web pages after login are not javascript The situation of the generated dynamic web page , have access to pyppeteer After logging in , Reuse requests Do the rest .
Preliminary knowledge :cookie and session
1. After successful login , The server sends some identification data to the browser , be called cookie, Every time the browser sends a request to the server , Take them all cookie, The server will know that the request comes from the previous login browser .
2. The server maintains a for the browser in memory session, Each browser corresponds to a different session, It stores the status of the browser ( For example, to what extent has a series of filling in forms and other steps been carried out ), Different session Different session id, When the browser sends the request , If you bring it session id, The server can also know which browser is requesting .
3. On the client computer by cookie You can generate a that identifies the same browser session.

working principle
1.pyppeteer Your browser's pages are cookies() Function to get cookie
2.requests.Session() Can generate an empty session
3.session Of cookies.update(cookies) The function can be based on cookies Generate corresponding session
4.session Of get(url) function , You can send a tape message to the server session Request
5. get cookie, Generate corresponding session in the future , Crawling web pages all use session Of get Function ( Premise : Web page is not javascript Generated . If it is , Still use pyppeteer Browser crawl )

import asyncio
import pyppeteer as pyp
import bs4
import requests
def sessionGetHtml(session,url): # Send tape session Web page request for 
	fakeHeaders = {
    
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \AppleWebKit/537.36 (KHTML, like Gecko) \Chrome/81.0.4044.138 Safari/537.36 Edg/81.0.416.77'
	} #  The request header used to disguise the browser 
	try:
		result = session.get(url,headers = fakeHeaders)
		result.encoding = result.apparent_encoding
		return result.text
	except Exception as e:
		print(e)
		return ""
		
async def makeSession(page):
	#  Return to one session, Put it inside cookies Modified into pypeteer In the browser page object cookies
	cookies = await page.cookies() #cookies It's a list , Every element is a dictionary 
	cookies1 = {
    }
	for cookie in cookies: # requests Medium cookies as long as  "name" attribute 
		cookies1[cookie['name']] = cookie['value']
	session = requests.Session()
	session.cookies.update(cookies1)
	return session		# Represents a browser that has been logged in session
	
async def antiAntiCrawler(page):
	# by page Add anti crawler means 
	await page.setUserAgent('Mozilla/5.0 (Windows NT 6.1; Win64; x64) \ ''AppleWebKit/537.36 (KHTML, like Gecko) ''Chrome/78.0.3904.70 Safari/537.36')
	await page.evaluateOnNewDocument('() =>{ Object.defineProperties(navigator,''{ webdriver:{ get: () => false } }) }')

async def getOjSourceCode(loginUrl):
	width, height = 1400, 800 # Web page width and height 
	browser = await pyp.launch(headless=False, userdataDir = "c:/tmp", args=[f'--window-size={
      width},{
      height}'])
	page = await browser.newPage()
	await antiAntiCrawler(page)
	await page.setViewport({
    'width': width, 'height': height})
	await page.goto(loginUrl)
	''' element = await page.querySelector("#email") # Find the account input box  await element.type("[email protected]") #  Enter email  element = await page.querySelector("#password") # Find the password input box  await element.type("XXXXXXXXX") #  Input password  element = await page.querySelector("#main > form > div.user-login > p:nth-child(2) > button") # Find the login button  await element.click() #  Click the login button  '''
	await page.waitForSelector("#main>h2", timeout=30000) # Wait for manual login ,“ Ongoing competition ...." The title appears 
	element = await page.querySelector("#userMenu>li:nth-child(2)>a")
	# look for " Personal homepage ” link 
	await element.click() # Click on the personal home page link 
	await page.waitForNavigation() # When the new page is loaded 
	elements = await page.querySelectorAll(".result-right")
	# Find all "Accepted" link ,  It has properties  class="result-right"
	
	# Something different from before 
	''' page2 = await browser.newPage() # Open a new page  ( label ) await antiAntiCrawler(page2) for element in elements[:2]: # Print only the first two programs  obj = await element.getProperty("href") # obtain href attribute  url = await obj.jsonValue() await page2.goto(url) # On the new page ( label ) Load new page  element = await page2.querySelector("pre") # lookup pre tag obj = await element.getProperty("innerText") # Take the source code  text = await obj.jsonValue() '''
	session = await makeSession(page)
	for element in elements[:2]:
		obj = await element.getProperty("href")	# obtain href attribute 
		url = await obj.jsonValue()
		html = sessionGetHtml(session, url)		# from session Object to send a request to the server , To get to url Webpage 
		soup = bs4.BeautifulSoup(html, "html.parser")
		element = soup.find("pre")	# lookup pre tag
		print(element.text)
		print("-------------------------")
	await browser.close()
	
def main():
	url = "http://openjudge.cn/auth/login/"
	asyncio.get_event_loop().run_until_complete(getOjSourceCode(url))
main()

Add tips

Use requests library , Get the current web address :

r = requests.get("http://openjudge.cn")
print(r.url) #>>http://openjudge.cn
 or :
session = requests.session()
r = session.get("http://openjudge.cn")
print(r.url)

Use pyppeteer library , Get the current web address :

browser = await pyp.launch(headless=False)
page = await browser.newPage()
await page.goto("http://openjudge.cn")
print(page.url) #>>http://openjudge.cn

Between two consecutive operations , Add appropriate delay , Simulate human action , To avoid being seen through too quickly , It can also be used. time.sleep(...) To wait for a while , Make sure the page is loaded

import time
time.sleep(2) # Pause 2 second , Do nothing 

Common operations

1. page.type('id/class','value',{
    'delay': time })
2. page.evaluate("""js Code , It can be executed in the browser by copying """)
3. await page.waitFor(2000) #  wait for 2 second 
4. await page.waitForNavigation() #  Wait for the page to jump 
5. await page.goto(url) #  Jump to link on current page 
6. await page.screenshot({
    'path': './get_pid.png'}) #  screenshots 
7. f_cat = await page.xpath('xpath') #  What you get here is a list 
   link = await (await f_cat[0].getProperty('href')).jsonValue()
   content = await (await f_cat[0].getProperty('textContent')).jsonValue()
 When getting custom properties  xpath writes  `/@data-size`  And get  textContent  that will do 
8. page.keyboard.press('Enter')
#  Others can see the above Document
9. goto(url,{
    "timeout":0},{
    'waitUntil': 'networkidle0'})  #  take timeout Set to 0  Prevent over reporting errors 
10.await page.type('selector', '')
    await page.click('selector') 
    The effect is the same as finding the element box first and then clicking or filling in 

Object oriented programming

Classes and objects

• The class used to represent . For one thing , You can design a class , Summarize the attributes of this kind of thing , Represented by member variables ; Also summarize the operation that this kind of thing can carry out , Represented by member functions . Member variables are also called class variables “ attribute ”, Member functions are also called class functions “ Method ”.
• Class , be called “ object ”. Classes represent the common characteristics of a thing , An object is a concrete individual .
• Method of generating object : Class name ( Parameters 1, Parameters 2…)

class  Class name :
	def __init__(self, Parameters 1, Parameters 2......):
	......
	def  Member functions 1(self, Parameters 1, Parameters 2......):
	......
	def  Member functions 2(self, Parameters 1, Parameters 2......):
	........
	def  Member functions n(self, Parameters 1, Parameters 2......):

Object comparison

a<b  Equivalent to  a.__lt__(b)
a>b  Equivalent to  a.__gt__(b)
a<=b  Equivalent to  a.__le__(b)
a>=b  Equivalent to  a.__ge__(b)
'''  By default , Of a custom class __eq__ Method , The function is to judge two objects id Are they the same? .   By default , Two objects of a custom class a and b,a == b  and a is b  It means the same thing , All are “a and b Whether to point to the same place ”. Empathy ,a != b  and  not a is b  The meaning is the same .  By default , Custom class objects cannot be compared in size , Because of its __lt__、__gt__、__le__、__ge__ Methods are set to None '''

Inheritance and derivation

• To write about primary school students 、 Middle school students 、 College students … All students have something in common , Each student has its own characteristics , How to avoid each class
All the repetitive work written from scratch ? Using inheritance ( The derived ).
• Define a new class B when , If you find a class B Have a written class A All the features of , In addition, there are classes A No features , Then you don't have to rewrite the class from the head B, But you can put A As a “ Base class ”( Also known as “ Parent class ”), hold B Write as base class A One of the “ Derived class ”( Also known as “ Subclass ”) To write . such , You can say from A class “ The derived ” It's out B class , It can also be said that B class “ Inherit ” 了 A class .
class Class name ( Base class name ):

import datetime
class student:
	def __init__(self,id,name,gender,birthYear):	#  These students have properties in common as arguments to the constructor , To initialize 
		self.id,self.name,self.gender,self.birthYear = \id,name,gender,birthYear
	def printInfo(self):	#  Output all the information of students 
		print("Name:",self.name)
		print("ID:", self.id)
		print("Birth Year:",self.birthYear)
		print("Gender:",self.gender)
		print("Age:",self.countAge())
	def countAge(self):	#  Count the age of the students 
		return datetime.datetime.now().year - self.birthYear

class undergraduateStudent(student): # Undergraduate class , Inherited student class , With student All properties and methods of the class ( Member variables and member functions )
	def __init__(self,id,name,gender,birthYear,department):	#  More department
		student.__init__(self,id,name,gender,birthYear)	#  Call the constructor of the base class ,init The object of action is the class name , So add self Pass it in as a parameter 
		self.department = department
	def qualifiedForBaoyan(self): #  Grant guarantee research qualification 
			print(self.name + " is qualified for baoyan")
	def printInfo(self): #  There is a method with the same name in the base class 
		student.printInfo(self) #  Calling the base class PrintInfo
		print("Department:" ,self.department)
def main():
	s2 = undergraduateStudent("118829212","Harry Potter","M",2000,"Computer Science")
	s2.printInfo()	# s2 Is an object of a derived class , therefore printInfo Is a method of a derived class 
	s2.qualifiedForBaoyan()
	if s2.countAge() > 18:
	print(s2.name , "is older than 18")
main()
'''  Output : Name: Harry Potter ID: 118829212 Birth Year: 2000 Gender: M Age: 20 Department: Computer Science Harry Potter is qualified for baoyan Harry Potter is older than 18 '''

object class
All classes are object A derived class of the , Thus has object Class .
Some classes __lt__, __gt__ And other methods are set to None, So the object is not comparable in size .

class A:
	def func(x):
	pass
print(dir(A)) # Output A Methods 
'''  Output : ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'func'] '''

Static properties and static methods

• Static properties are shared by all objects , There's only one
• Static methods do not act on a specific object , Cannot access non static properties
• The purpose of the mechanism of static attributes and static methods , In order to write less global variables and global functions .

class employee:
	totalSalary = 0 # Static attribute , Record the total amount of wages paid to employees 
	def __init__(self,name,income):
		self.name,self.income = name, income
	def pay(self,salary):
		self.income += salary
		employee.totalSalary += salary	#  Don't write self. That is the employee The static member variable of totalSalary
	
	@staticmethod
	def printTotalSalary(): #  Static methods , No, self Parameters , Because it doesn't specifically act on an object 
		print(employee.totalSalary)	#  Static methods cannot access non static properties self.name, Because static methods do not self Parameters 

e1 = employee("Jack",0)
e2 = employee("Tom",0)
e1.pay(100)
e2.pay(200)
employee.printTotalSalary() # >>300, use   Class name . Method name   To call a static method ( Static member functions )
e1.printTotalSalary() # >>300, It's like e1 e2 It doesn't matter. 
e2.printTotalSalary() # >>300
print(employee.totalSalary) # >>300, Class name . Property name 

Object as the key of collection element and Dictionary

• Dictionaries and collections are “ Hashtable ” data structure , Find the stored... For the element according to the hash value of the element “ Slot ”, The hash value can be regarded as the slot number . Multiple elements with the same hash value can be placed in one slot .
• Two objects a,b if hash(a) != hash(b), be a,b Can be in the same set ( It can also be used as the key of different elements of the same dictionary ).
• Two objects a,b if hash(a) == hash(b), but a == b Don't set up , be a,b Can be in the same set ( It can also be used as the key of different elements of the same dictionary ), That is, it is not repeated , Can be put in the same slot .

if dt It's a dictionary ,dt[x] The calculation process is as follows :

  1. according to hash(x) Look for x The number of the slot that should be in ;
  2. If the slot has no elements , Think dt No key in the is x The elements of ;
  3. If there are elements in the slot , Then try to find an element in the slot y, bring y Key == x. If you find , be dt[x] That is to say y Value , If you can't find it , be dt[x] No definition , The idea that dt There is no key in x The elements of .

• Custom class objects , By default, the hash value is based on the object id Calculate . So two objects , as long as a is b Don't set up ,a and b The hash value of is different , Can exist in a set at the same time , Or as keys to different elements of the same dictionary .
• You can override the of custom classes __hash__() Method , Make the hash value of the object and the value of the object , instead of id relevant , So objects with the same value , You can't be in the same set , Nor can it be used as a key for different elements of the same dictionary .

class A:
	def __init__(self,x):
		self.x = x
a,b = A(5),A(5) #  Two A(5) Not the same , therefore a and b Of id Different 
dt = {
    a:20,A(5):30,b:40} #  The key of three elements id Different , So in different tanks 
print(len(dt),dt[a],dt[b]) #>>3 20 40
print(dt[A(5)]) # runtime error

• a==b Equivalent to a.eq(b). Default for custom classes __eq__ The function is to judge two objects id Are they the same? . Default for custom classes __hash__ The function is based on the object id Calculate the hash value .
• If you override... For a custom class __eq__(self,other) Member functions , Then __hash__ The member function is automatically set to None. In this case , This class becomes non hashable
• A custom class , Only after rewriting __eq__ Methods are not overwritten __hash__ In the case of method , Is not hashable .

#  Custom classes override __hash___ and __eq__
class A:
	def __init__(self,x):
		self.x = x
	def __eq__(self,other):
		if isinstance(other,A): # Judge other Is it a class A The object of 
			return self.x == other.x
		elif isinstance(other,int): # If other Is an integer 
			return self.x == other
		else:
			return False
	def __hash__(self):
		return self.x

a = A(3)
print(3 == a) #>>True
b = A(3)
d = {
    A(5):10,A(3):20,a:30}
print(len(d),d[a],d[b],d[3]) #>>2 30 30 30

tkinter Graphical interface programming

Control properties and event response

 Insert picture description here

import tkinter as tk


def btLogin_click():  #  Event response function of login button , When you click this button, it is called 
    if username.get() == "pku" and password.get()  == "123": # Correct user name and password 
        lbHint["text"] = " Login successful !"  #  modify lbHint Words of 
        lbHint["fg"] = "black"	#  The text turns black ,"fg" The foreground view ,"bg" Show background color 
    else:
        username.set("")  #  Clear the user name input box 
        password.set("")  #  Clear the password input box 
        lbHint["fg"] = "red"  #  The text turns red 
        lbHint["text"] = " Wrong username and password , Please re-enter !"


def cbPassword_click(): # “ Display password ” Radio box event response function , When the radio box is clicked, it is called 
    if showPassword.get():  # showPassword Is and cbPassword The binding of tkinter Boolean variables 
        etPassword["show"] = ""  #  Make the password input box display the password normally .Entry Yes show attribute 
    else:
        etPassword["show"] = "*"    #  Make the password input box only display '*' character 


win = tk.Tk()
win.title(" Sign in ")
username, password = tk.StringVar(), tk.StringVar()
#  Two string type variables , Used to associate user name input box and password input box respectively 
lbHint = tk.Label(win, text=" Please log in ")  #  Define the location first and then place it 
lbHint.grid(row=0, column=0, columnspan=2)
lbUsername = tk.Label(win, text=" user name :")
lbUsername.grid(row=1, column=0, padx=5, pady=5)
lbPassword = tk.Label(win, text=" password :")
lbPassword.grid(row=2, column=0, padx=5, pady=5)
etUsername = tk.Entry(win, textvariable=username)
#  Input box etUsername And variables username relation 
etUsername.grid(row=1, column=1, padx=5, pady=5)
etPassword = tk.Entry(win, textvariable=password, show="*")
# Entry( Single line edit box ) Properties of show="*" Indicates that the input box, no matter what the content is , Display only '*' character , by "" The normal display is 
etPassword.grid(row=2, column=1, padx=5, pady=5)
showPassword = tk.BooleanVar()  #  Used to relate “ Display password ” Radio buttons 
showPassword.set(False)  #  bring cbPassword At first, it is unselected 
cbPassword = tk.Checkbutton(win, text=" Display password ",
                            variable=showPassword, command=cbPassword_click)
# cbPassword Associated variables showPassword, The event response function is cbPassword_click, That is, when you click it ,
#  Would call  cbPassword_click()
cbPassword.grid(row=3, column=0, padx=5, pady=5)
btLogin = tk.Button(win, text=" Sign in ", command=btLogin_click)
#  Click on btLogin The button will execute btLogin_click()
btLogin.grid(row=4, column=0, pady=5)
btQuit = tk.Button(win, text=" sign out ", command=win.quit)
#  Click on btQuit Will execute win.quit(),win.quit() Cause the window to close , So the whole process ends 
btQuit.grid(row=4, column=1, pady=5)
win.mainloop()

copyright notice
author[daoboker],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/131/202205110609376642.html

Random recommended