current position:Home>Resume Automation - word 92
Resume Automation - word 92
2022-05-15 02:52:01【Husky eager for power】
Wanyeji |
---|
Faint thunder , Cloudy sky . |
But I hope the wind and rain will come , I can keep you here . |
Preface :
Author's brief introduction : Husky who yearns for power , You can call me Siberian Husky , Bloggers dedicated to explaining technical knowledge in vernacular
CSDN Blog expert certification 、 Nova plan Season 3 full stack track top_1 、 Huawei cloud sharing expert 、 Alibaba cloud expert Blogger
If there is something wrong with the knowledge of the article , Please correct me. ! Learn with you , Progress together
The motto of life : Being superior to some other man , Is not high , The true nobility is to be superior to one's former self .
If you feel the blogger's article is good , Please pay attention. 、 give the thumbs-up 、 Collect three companies to support bloggersSeries column :
Python Full stack series - [ Updating ] 【 In this series 】
Python Zero foundation beginner level chapter
Python Advanced Grammar
Python Office automation
Wangan road series
Stepping on the pit on the road of network security
Net security literacy
Vulhub Loophole recurrence
Shell Script programming
Web Attack and defense 2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
Collection of penetration tools 2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
️ Dot dot engineer series
Test artifact - Charles piece
Test artifact - Fiddler piece
Test artifact - Jmeter piece
automation - RobotFrameWork series
automation - be based on JAVA Realized WEB End UI automation
automation - be based on MonkeyRunner Realized APP End UI automation


List of articles
In the last chapter, we practiced getting through tables and paragraphs word After the information of the file , Now let's do a little practical exercise . Through reading the resume to screen out the resume that meets the recruitment conditions , Next, let's see how to implement this small function .
Resume screening
Resume related information is as follows :

Definition ReadDoc Class to read word file
Known condition :
Want to find a resume containing the specified keywords ( such as Python、Java)
Realize the idea :
Batch read each word file ( adopt glob obtain word Information ), Get all their readable content , And filter by keyword , Get the target resume address .
Here's one thing to pay attention to , Not all " resume " Are presented in the form of paragraphs , For instance from " Liepin " The resume downloaded from the website is " Form " Of , and "boss" The resume downloaded from is " Paragraph form " Of , When reading here again, you need to pay attention to , The demo script exercise we do is " Form " Of .
The words here , We can specifically define a "ReadDoc" Class , It defines two functions , For reading " The paragraph " and " form " .
The practical case script is as follows :
# coding:utf-8
from docx import Document
class ReadDoc(object): # Define a ReadDoc , To read word file
def __init__(self, path): # The constructor passes in and reads by default word Path to file
self.doc = Document(path)
self.p_text = ''
self.table_text = ''
self.get_para()
self.get_table()
def get_para(self): # Definition get_para Function to read word Paragraphs of the document
for p in self.doc.paragraphs:
self.p_text += p.text + '\n' # Wrap the read paragraph content
print(self.p_text)
def get_table(self): # Definition get_table The function loops through the contents of the table
for table in self.doc.tables:
for row in table.rows:
_cell_str = '' # Get the complete information of each line
for cell in row.cells:
_cell_str += cell.text + ',' # Add one... To each line "," separate
self.table_text += _cell_str + '\n' # Wrap the read table contents
print(self.table_text)
if __name__ == '__main__':
path = glob.os.path.join(glob.os.getcwd(), 'test_file/ resume 1.docx')
doc = ReadDoc(path)
print(doc)
to glance at ReadDoc
The running result of class

Definition search_word Function to filter word The content of the document is consistent with the desired resume
OK, I have successfully read the resume above word file , Next, we will read the content by selecting keyword information , Filter out resumes containing keywords .
The practical case script is as follows :
# coding:utf-8
import glob
from docx import Document
class ReadDoc(object): # Define a ReadDoc , To read word file
def __init__(self, path): # The constructor passes in and reads by default word Path to file
self.doc = Document(path)
self.p_text = ''
self.table_text = ''
self.get_para()
self.get_table()
def get_para(self): # Definition get_para Function to read word Paragraphs of the document
for p in self.doc.paragraphs:
self.p_text += p.text + '\n' # Wrap the read paragraph content
# print(self.p_text) # Debug the printout word The content of a document
def get_table(self): # Definition get_table The function loops through the contents of the table
for table in self.doc.tables:
for row in table.rows:
_cell_str = '' # Get the complete information of each line
for cell in row.cells:
_cell_str += cell.text + ',' # Add one... To each line "," separate
self.table_text += _cell_str + '\n' # Wrap the read table contents
# print(self.table_text) # Debug the printout word Table contents of the document
def search_word(path, targets): # Definition search_word To screen resumes that match the content ; Pass in path And targets(targets As the list )
result = glob.glob(path)
final_result = [] # Define an empty list , Use the information of subsequent storage files
for i in result: # for Cycle to get result Content
isuse = True # Is it available
if glob.os.path.isfile(i): # Determine if it's a document
if i.endswith('.docx'): # Determine whether the file suffix is "docx" , if , The use of ReadDoc class Instantiate the file object
doc = ReadDoc(i)
p_text = doc.p_text # obtain word The contents of the document
table_text = doc.table_text
all_text = p_text + table_text
for target in targets: # for Loop to determine whether the keyword information content exists
if target not in all_text:
isuse = False
break
if not isuse:
continue
final_result.append(i)
return final_result
if __name__ == '__main__':
path = glob.os.path.join(glob.os.getcwd(), '*')
result = search_word(path, ['python', 'golang', 'react', ' Buried point ']) # The embedding point is to demonstrate the effect , On purpose " resume 1.docx" Plus
print(result)
The operation results are as follows :

copyright notice
author[Husky eager for power],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/132/202205120053089437.html
The sidebar is recommended
- Common modules and third-party libraries of Python notes in class
- Educoder Linux and Python Programming 2021 (basic syntax of classes) - level 2: properties and instantiation of classes
- Educoder Linux and Python Programming 2021 (basic syntax of classes) - level 1: Declaration and definition of classes
- Educoder Linux and Python Programming 2021 (basic syntax of classes) - Level 3: binding and method calling
- Educoder Linux and Python Programming 2021 (basic syntax of classes) - level 4: static methods and class methods
- Educoder Linux and Python Programming 2021 (basic syntax of classes) - level 5: class import
- Python: unable to connect because the target computer actively refused
- If I can only choose one of the python introductory books on the market for you, I will choose this one
- Python simulation | how to manipulate the winning or losing of football matches
- How to learn after joining our Python zero foundation learning group?
guess what you like
Why choose Python as a programming language
Functions in Python
Python version of the lone brave | drawing + playing + Music Visualization
Smplify -python2. seven
Python office automation: use Python to automatically insert signatures into docx files
Python uses the opencv method to automatically insert the signed image into any position of the image
Python automatically inserts signatures into PDF files (pypdf2)
Basic learning notes of Python
Numpy, pandas, Matplotlib learning
[brush question Python] the problem of cutting and selling pipes to maximize profits
Random recommended
- [Python] gradient descent method to solve the trough of univariate quadratic function
- Python regular expression
- [Python GUI] wxPython automated data generator practice
- Python script compilation process
- [Python faiss library] (I) Introduction
- Python moves (copies) the pictures (files) under one file to another folder
- Python crawling 51job position information (regular violence matching)
- Python Post Bar irrigation script
- Python simulation QR code login Baidu
- Python calls wechat to send message call through COM port
- Analysis of birthday paradox in Python
- Equal scale compression of image files in Python
- Python dynamic programming (knapsack problem and longest common substring)
- Introduction to Python Programming and data analysis (I) basic use of built-in functions
- Introduction to Python Programming and data analysis (II) basic use of list derivation
- Introduction to Python Programming and data analysis (III) basic use of branch and process control
- Python turtle painting Chinese characters
- [a quick introduction and comparison of multiple languages -- JavaScript, typescript, python, golang, trust, Java, ruby]
- Relevant knowledge of Python web development (I)
- python3. Understanding of ID function in 6
- JSON data storage of MySQL in Python
- Python 3 development function
- My pandas
- Python game programming (pyGame)
- Python word cloud
- Python crawler crawls the Douban movie ranking list and writes it into CSV file for visual data analysis
- Python crawler crawls Beijing Xinfadi vegetables and displays them visually
- Python climbed the world university rankings
- Python crawling material commune picture
- Exception handling in Python and explanation of OS module
- Summation of corresponding position elements of multiple lists in Python
- [Python pandas] read excel table contents
- Python zip() function usage
- Application of Python startup subclass subprocess class
- Python's logging module
- MySQL application of Python
- Interesting games designed in Python
- Regularity of Python
- Jenkins reported an error in building Python project: CX_ Oracle. DatabaseError: DPI-1047: oci. dll is not the correct architecture
- Hamcrest assertion Library of Python