current position:Home>Resume Automation - word 92

Resume Automation - word 92

2022-05-15 02:52:01Husky eager for power

Wanyeji
Faint thunder , Cloudy sky .
But I hope the wind and rain will come , I can keep you here .

Preface
Author's brief introduction : Husky who yearns for power , You can call me Siberian Husky , Bloggers dedicated to explaining technical knowledge in vernacular
CSDN Blog expert certification 、 Nova plan Season 3 full stack track top_1 、 Huawei cloud sharing expert 、 Alibaba cloud expert Blogger
If there is something wrong with the knowledge of the article , Please correct me. ! Learn with you , Progress together
The motto of life : Being superior to some other man , Is not high , The true nobility is to be superior to one's former self .
If you feel the blogger's article is good , Please pay attention. 、 give the thumbs-up 、 Collect three companies to support bloggers


Series column :
                Python Full stack series - [ Updating ]     【 In this series 】
                        Python Zero foundation beginner level chapter
                        Python Advanced Grammar
                        Python Office automation
               Wangan road series
​                       Stepping on the pit on the road of network security
​                       Net security literacy
​                       Vulhub Loophole recurrence
​                       Shell Script programming
​                       Web Attack and defense    2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
​                       Collection of penetration tools   2021 year 9 month 3 Stop updating on , Move to safe communities such as Prophet
​                ️ Dot dot engineer series
​                       Test artifact - Charles piece
​                       Test artifact - Fiddler piece
​                       Test artifact - Jmeter piece
​                       automation - RobotFrameWork series
​                       automation - be based on JAVA Realized WEB End UI automation
                       automation - be based on MonkeyRunner Realized APP End UI automation

In the last chapter, we practiced getting through tables and paragraphs word After the information of the file , Now let's do a little practical exercise . Through reading the resume to screen out the resume that meets the recruitment conditions , Next, let's see how to implement this small function .

Resume screening

Resume related information is as follows :



Definition ReadDoc Class to read word file

Known condition :

Want to find a resume containing the specified keywords ( such as Python、Java)


Realize the idea :

Batch read each word file ( adopt glob obtain word Information ), Get all their readable content , And filter by keyword , Get the target resume address .


Here's one thing to pay attention to , Not all " resume " Are presented in the form of paragraphs , For instance from " Liepin " The resume downloaded from the website is " Form " Of , and "boss" The resume downloaded from is " Paragraph form " Of , When reading here again, you need to pay attention to , The demo script exercise we do is " Form " Of .


The words here , We can specifically define a "ReadDoc" Class , It defines two functions , For reading " The paragraph " and " form " .

The practical case script is as follows :

# coding:utf-8


from docx import Document


class ReadDoc(object):              #  Define a  ReadDoc , To read  word  file 
    def __init__(self, path):       #  The constructor passes in and reads by default  word  Path to file 
        self.doc = Document(path)
        self.p_text = ''
        self.table_text = ''

        self.get_para()
        self.get_table()


    def get_para(self):             #  Definition  get_para  Function to read  word  Paragraphs of the document 
        for p in self.doc.paragraphs:
            self.p_text += p.text + '\n'    #  Wrap the read paragraph content 
        print(self.p_text)


    def get_table(self):            #  Definition  get_table  The function loops through the contents of the table 
        for table in self.doc.tables:
            for row in table.rows:
                _cell_str = ''      #  Get the complete information of each line 
                for cell in row.cells:
                    _cell_str += cell.text + ','    #  Add one... To each line  ","  separate 
                self.table_text += _cell_str + '\n'     #  Wrap the read table contents 
        print(self.table_text)


if __name__ == '__main__':
    path = glob.os.path.join(glob.os.getcwd(), 'test_file/ resume 1.docx')
    doc = ReadDoc(path)
    print(doc)

to glance at ReadDoc The running result of class



Definition search_word Function to filter word The content of the document is consistent with the desired resume

OK, I have successfully read the resume above word file , Next, we will read the content by selecting keyword information , Filter out resumes containing keywords .

The practical case script is as follows :

# coding:utf-8


import glob

from docx import Document


class ReadDoc(object):              #  Define a  ReadDoc , To read  word  file 
    def __init__(self, path):       #  The constructor passes in and reads by default  word  Path to file 
        self.doc = Document(path)
        self.p_text = ''
        self.table_text = ''

        self.get_para()
        self.get_table()


    def get_para(self):             #  Definition  get_para  Function to read  word  Paragraphs of the document 
        for p in self.doc.paragraphs:
            self.p_text += p.text + '\n'    #  Wrap the read paragraph content 
        # print(self.p_text) #  Debug the printout  word  The content of a document 


    def get_table(self):            #  Definition  get_table  The function loops through the contents of the table 
        for table in self.doc.tables:
            for row in table.rows:
                _cell_str = ''      #  Get the complete information of each line 
                for cell in row.cells:
                    _cell_str += cell.text + ','    #  Add one... To each line  ","  separate 
                self.table_text += _cell_str + '\n'     #  Wrap the read table contents 
        # print(self.table_text) #  Debug the printout  word  Table contents of the document 


def search_word(path, targets):     #  Definition  search_word  To screen resumes that match the content ; Pass in  path  And  targets(targets  As the list )
    result = glob.glob(path)
    final_result = []               #  Define an empty list , Use the information of subsequent storage files 

    for i in result:             # for  Cycle to get  result  Content 

        isuse = True                #  Is it available 

        if glob.os.path.isfile(i):       #  Determine if it's a document 
            if i.endswith('.docx'):      #  Determine whether the file suffix is  "docx" , if , The use of  ReadDoc class   Instantiate the file object 
                doc = ReadDoc(i)
                p_text = doc.p_text         #  obtain  word  The contents of the document 
                table_text = doc.table_text
                all_text = p_text + table_text

                for target in targets:      # for  Loop to determine whether the keyword information content exists 
                    if target not in all_text:
                        isuse = False
                        break

                if not isuse:
                    continue
                final_result.append(i)
    return final_result


if __name__ == '__main__':
    path = glob.os.path.join(glob.os.getcwd(), '*')
    result = search_word(path, ['python', 'golang', 'react', ' Buried point '])      #  The embedding point is to demonstrate the effect , On purpose  " resume 1.docx"  Plus 
    print(result)

The operation results are as follows :



copyright notice
author[Husky eager for power],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/132/202205120053089437.html

Random recommended