current position:Home>Python file content operation

Python file content operation

2022-05-15 05:51:04Tang Keke, Laoxiang, Henan

Classification of documents

  • text file : It's stored General string file , Usually, the ’\n’ As the end of each line .( A regular string refers to a string that can be edited directly by other editors , Text that humans can read directly , Such as English letters 、 Chinese characters 、 Numeric string )
  • Binary : Binary file takes the object content as a string of bytes (bytes) For storage , You can't edit directly with Notepad or other ordinary word processing software , It is usually not directly read and understood by humans , Special software needs to be used to decode and read 、 Show 、 Modify or execute . Common, such as graphics and image files 、 Audio and video files 、 Executable file 、 Resource file 、 Various database files 、 Various types office Documents and so on belong to binary files .

Basic process of document operation

Four main steps : open . operation , preservation , close .

Built in functions open

open(file, mode='r', buffering=-1, encoding=None, errors=None,newline=None, closefd=True, opener=None)
  • file Parameter specifies the name of the file to be opened .
  • mode Parameter specifies the processing method after opening the file .
  • buffering Parameter specifies the cache mode of the read-write file .0 Indicates no cache ,1 Represent cache , If it is greater than 1 Indicates the size of the buffer . The default is cache mode .
  • encoding Parameter specifies how text is encoded and decoded , Applies only to text mode , have access to Python Any format supported , Such as GBK、utf8、CP936 wait .

File open mode

  • r: Reading mode ( The default mode , Omission ), Throw an exception if the file does not exist
  • w: Write mode , If the file already exists , Empty the original content first
  • x: Write mode , Create a new file , If the file already exists, an exception is thrown
  • a: Append mode , Do not overwrite the original contents of the file
  • b: Binary mode ( Can be used in combination with other modes )
  • t: Text mode ( The default mode , Omission )
  • +: read 、 Write mode ( Can be used in combination with other modes )
  • r+: Can read but write , If the file does not exist , Report errors ;w+: Can read but write , If the file does not exist , establish

open function

If the execution is normal ,open() The function returns 1 File objects , Through this file object, you can read and write files . If the specified file does not exist 、 Insufficient access 、 If the file object creation fails due to insufficient disk space or other reasons, an exception will be thrown .

After the operation on the contents of the file , Be sure to close the file object , This ensures that any changes made are actually saved to the file .

f1 = open( 'file1.txt','r' ) #  Open the file in read mode 
f1.close()

File object properties and common methods

  • close() Write the contents of the buffer to the file , Close the file at the same time , And release the file object
  • flush() Write the contents of the buffer to the file , But do not close the file
  • read([size]) Read from a text file size Characters (Python 3.x) The contents of are returned as a result , Or read the specified number of bytes from the binary file and return , If omitted size It means that all contents are read
  • readline() Read a line from a text file and return it as a result
  • readlines() Store each line of text in the text file as a string in the list , Return to the list
  • seek(offset[,whence]) Move the file pointer to the new byte position ,offset Means relative to whence The location of .whence by 0 Indicates that the calculation starts from the file header ,1 Indicates that the calculation starts from the current position ,2 Indicates that the calculation starts from the end of the file , The default is 0
  • tell() Returns the current location of the file pointer
  • write(s) hold s Content write file for
  • writelines(s) Write a list of strings to a text file , Do not add line breaks

Context management statements with

In actual development , Priority should be given to the use of context management statements when reading and writing files with, keyword with Can automatically manage resources , For whatever reason ( Even if the code throws an exception ) Jump out of with block , There is always a guarantee that the file is closed correctly , And it can automatically restore the context when entering the code block after the code block is executed , Commonly used in File operations 、 Database connection 、 network connections 、 Lock object management during multi thread and multi process synchronization Etc .

with open(filename, mode, encoding) as fp:

The operation sample

Write to a text file , Then read out

s ='Hello world\n I always like deer \nmikudaisuki\n'
with open('sample.txt','w') as fp: # By default cp936 code 
    fp.write(s)
with open('sample.txt') as fp: # By default cp936 code 
    print(fp.read())

Traverse and output all lines of the text file .

with open('sample.txt') as fp: # It is assumed that the document adopts CP936 code 
    for line in fp: # File objects can be iterated directly 
        print(line)

Assumption file data.txt There are several integers in , One integer per row , Write a program to read all integers , Sort them in descending order before writing them to the text file data_asc.txt in .

with open('data.txt','r') as fp:
    data = fp.readlines() # Read all lines , Save list 
data = [int(item) for item in data] # List derivation , Convert to number 
data.sort(reverse=True) # null 
data = [str(item)+'\n' for item in data] # Convert the result to a string 
with open('data_desc.txt','w') as fp: # Write the results to a file 
    fp.writelines(data)

Count the length of the longest line and the content of the line in the text file .

with open('sample.txt') as fp:
    result = [0,'']
    for line in fp:
        t = len(line)
        if t > result[0]:
            result = [t, line]
print(result)

Binary operation

For binaries , Do not use Notepad or other text editing software to read and write directly , It can't pass Python The file object directly reads and understands the contents of binary files . must Correctly understand the binary file structure and serialization rules , Then design the correct deserialization rules , To accurately understand the contents of binary files .
So serialization , In short, it is the process of converting the data in memory into binary form without losing its type information , The serialized data of the object should be able to be accurately restored to the original object after the correct deserialization process .

pickle modular

#  write in 
import pickle
data = (" walnut ", 2333, [1, 2, 3])
with open('sample_pickle.dat', 'wb') as f:
    try:
        pickle.dump(len(data), f) # Number of objects to serialize 
        for item in data:
            pickle.dump(item, f) # Serialize data and write to file 
    except:
        print(' Write file exception ')
#  Read 
with open('sample_pickle.dat','rb') as f:
    n = pickle.load(f)
    for i in range(n):
        x = pickle.load(f) # Read and deserialize each data 
        print(x) 

struct modular

import struct
sn = struct.pack('if?', 123, 114.514, True) 
s = " Walnut, I really like you !"
#  serialize ,i Represents an integer ,f For real numbers ,? Represents a logical value 
# i -> i, f -> 114.514 , ? -> True
with open('sample_struct.dat','wb') as f:
    f.write(sn)
    f.write(s.encode()) # The string needs to be encoded as a byte string and then written to the file 

Read

with open('sample_struct.dat','rb') as f:
    sn = f.read(9)
    n, x, b1 = struct.unpack('if?', sn) # Deserialize using the specified format 
    print('n=',n,'x=',x, 'b1=', b1)
    s = f.read(9).decode()
    print('s=', s)

shelve modular

write in

import shelve
zhangsan = {
    'age':38,'sex':'Male','address':'SDIBT'}
lisi = {
    'age':40,'sex':'Male','qq':'1234567','tel':'7654321'}
with shelve.open('shelve_test.dat') as fp:
    fp['zhangsan'] = zhangsan #  Write data to a file like an operation Dictionary 
    fp['lisi'] = lisi
    for i in range(5):
        fp[str(i)] = str(i)

Read

with shelve.open('shelve_test.dat') as fp:
    print(fp['zhangsan']) # Read and display the contents of the file 
    print(fp['zhangsan']['age'])
    print(fp['lisi']['qq'])
    print(fp['3'])

result

{'sex': 'Male','address': 'SDIBT','age': 38}
38
1234567
3

marshal modular

write in

import marshal # The import module 
x = [30, 5.0, [1, 2, 3], (4, 5, 6), {
    'a': 1, 'b': 2, 'c': 3}, {
    8, 9, 7}]
with open('test.dat', 'wb') as fp: # Create binaries 
    marshal.dump(len(x), fp) # Write the number of objects first 
    for item in x:
        marshal.dump(item,fp) # Serialize the objects in the list in turn and write them to the file 

Read

with open('test.dat','rb') as fp: # Open binary 
    n = marshal.load(fp) # Get the number of objects 
    for i in range(n):
        print(marshal.load(fp)) # Deserialization , Output results 

result

30
5.0
[1, 2, 3]
(4, 5, 6)
{'a': 1,
'b': 2,
'c': 3}
{8, 9, 7}

operation Excel file

Use the extension library openpyxl Reading and writing Excel 2007 And later versions .

import openpyxl
from openpyxl import Workbook
fn = r'f:\test.xlsx' # file name 
wb = Workbook() # Create Workbook 
ws = wb.create_sheet(title=' Hello , The world ') # Create sheet 
ws['A1'] = ' This is the first cell ' # Cell assignment 
ws['B1'] = 3.1415926
wb.save(fn) # preservation Excel file 
wb = openpyxl.load_workbook(fn) # Open the existing Excel file 
ws = wb.worksheets[1] # Open the worksheet with the specified index 
print(ws['A1'].value) # Read and output the value of the specified cell ws.append([1,2,3,4,5]) # Add a row of data 
ws.merge_cells('F2:F3') # merge cell 
ws['F2'] = "=sum(A2:E2)" # Write the formula 
for r in range(10,15):
for c in range(3,8):
ws.cell(row=r, column=c, value=r*c) # Write cell data 
wb.save(fn)

Put notepad file test.txt convert to Excel 2007+ file . hypothesis test.txt The first line in the file is header , Starting from the second line is the actual data , And the information of different fields in the header and data row are separated by commas .

from openpyxl import Workbook
def main(txtFileName):
    new_XlsxFileName = txtFileName[:-3] + 'xlsx'
    wb = Workbook()
    ws = wb.worksheets[0]
    with open(txtFileName) as fp:
        for line in fp:
            line = line.strip().split(',')
            ws.append(line)
    wb.save(new_XlsxFileName)
main('test.txt')

copyright notice
author[Tang Keke, Laoxiang, Henan],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/131/202205110611060215.html

Random recommended