current position:Home>Python office automation - 91 - word file Automation - word operation and reading word files

Python office automation - 91 - word file Automation - word operation and reading word files

2022-05-15 05:08:20Husky eager for power

Wanyeji
Faint thunder , Cloudy sky .
But I hope the wind and rain will come , I can keep you here .

Preface
Author's brief introduction : Husky who yearns for power , You can call me Siberian Husky , Bloggers dedicated to explaining technical knowledge in vernacular
CSDN Blog expert certification 、 Nova plan Season 3 full stack track top_1 、 Huawei cloud sharing expert 、 Alibaba cloud expert Blogger
If there is something wrong with the knowledge of the article , Please correct me. ! Learn with you , Progress together
The motto of life : Being superior to some other man , Is not high , The true nobility is to be superior to one's former self .
If you feel the blogger's article is good , Please pay attention. 、 give the thumbs-up 、 Collect three companies to support bloggers


Series column :
                Python Full stack series - [ Updating ]     【 In this series 】
                        Python Zero foundation beginner level chapter
                        Python Advanced Grammar
                        Python Office automation
               Wangan road series
​                       Stepping on the pit on the road of network security
​                       Net security literacy
​                       Vulhub Loophole recurrence
​                       Shell Script programming
​                       Web Attack and defense    2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
​                       Collection of penetration tools   2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
​                ️ Dot dot engineer series
​                       Test artifact - Charles piece
​                       Test artifact - Fiddler piece
​                       Test artifact - Jmeter piece
​                       automation - RobotFrameWork series
​                       automation - be based on JAVA Realized WEB End UI automation
                       automation - be based on MonkeyRunner Realized APP End UI automation

In the previous chapters, we learned the operation of ordinary files , For example, create files 、 Copy and paste 、 Cut and paste 、 Rename the file name 、 Delete and so on . In addition, I also learned some basic exercises , How to find files 、 How to find files according to content, etc .

In this chapter and later , We will begin to learn about the automation of some special files . Such as word、excel、PPT, Although it is a special document , In fact, it is also the file type we often use in practical work .

Then we go to word Learning content of automatic operation of documents .

New modules covered in this chapter

  • python-docx
  • pdfkit
  • pydocx

utilize python Read files in bulk

word A sharp weapon python-docx

python-docx Is used to create modifiable Microsoft Word One of the python library , Provide a complete set of Word operation , Is the most commonly used Word Tools .

Before using , Let's start with a few concepts :

  • Document: It's a Word file object , differ VBA in Worksheet The concept of ,Document It's independent , Open different Word file , There will be different Document object , There is no influence on each other
  • Paragraph: It's a paragraph , One Word The document consists of multiple paragraphs , When entering a enter key in the document , It will become a new paragraph , Input shift + enter , No segmentation
  • Run Represents a segment , Each paragraph consists of multiple Segment form , Continuous text with the same style in a paragraph , Make up a section , So a The paragraph The object has a Run list .

For example, the word Document diagram :



word The document structure is divided as follows :



python-docx install

install :

pip install python-docx If the installation speed is too slow , You can change a domestic source address ( as follows )

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx


Import :

import docx

from docx import …

python-docx And Document

Import packages and modules :

from docx import Document


Usage method :

Document(word File address )


Return value :

word File object

python-docx Read the content of the paragraph

In fact, if you want to read a word file , The main thing is to read its paragraphs and tables . Whether it's paragraphs or tables , Inside it are strings , Our goal is to read the contents of these strings .

Let's first look at the reading method of paragraph content :

source :

document_obj.paragraphs adopt document Object's paragraphs Function returns a list of paragraphs ; If word There are multiple paragraphs in the file , There will be multiple paragraph objects .


Usage method :

Get each paragraph object by looping , And call text

The script of the demonstration case is as follows :

# coding:utf-8


import os
from docx import Document


path = os.path.join(os.getcwd(), 'test_file/ Text .docx')
print("\' Text .docx\'  The path of is :", path)     #  Debug path 

doc = Document(path)

for p in doc.paragraphs:
    print(p.text)

The operation results are as follows :(PS: The text is just a demonstration , I am not from a training institution !



python-docx Read the contents of the table

Next, let's look at how to read word Table contents in the document :

source :

document_obj.tables adopt document Object's paragraphs Function returns a list of tables ; Inside is the object of a table .


Usage method :

Also through circulation , Get the contents of rows and columns


Return value :

Each table field ( character string )

The demonstration case code is as follows :

# coding:utf-8


import os
from docx import Document


path = os.path.join(os.getcwd(), 'test_file/ Text .docx')
print("\' Text .docx\'  The path of is :", path)     #  Debug path 

doc = Document(path)

# for p in doc.paragraphs:
# print(p.text)

for t in doc.tables:            # for  Loop to get the table object 
    for row in t.rows:          #  Get each row 
        row_str = []
        for cell in row.cells:    #  Get a separate small table for each row , Then put the contents together ; After the splicing is completed, the second for Print out in the loop 
            row_str.append(cell.text)
        print(row_str)
        
#  It can also be done through  "columns"  Get the contents of the columns in the table , Try it on your own 

The operation results are as follows :



copyright notice
author[Husky eager for power],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/131/202205111302321904.html

Random recommended