current position:Home>Python batch PPT to picture, PDF to picture, word to picture script

Python batch PPT to picture, PDF to picture, word to picture script

2022-02-01 16:43:32 web_ zhou

Preface

One day when we were operating in the editing background, we said that every upload ppt,pdf,word Each file must be exported once before the image , Then upload one by one (png Use as Preview ,ppt,pdf,word The source file cannot be downloaded directly , You have to pay ), Say it's too inefficient , Ask if there is any way, just upload the file . I thought about it at that time. It's really inefficient to transfer every upload , Because some of them may have dozens of pictures .

Finally through GitHub And netizens blog . Finally, the problem of automatic image conversion is solved . Write for the first time python There are errors in the script. You are welcome to point out ~

this paper python edition 3.9.5 Need to be windows platform , To be installed Microsoft Office

Script ideas

Operators upload ppt,pdf,word To the database , Script read file remote connection -> Download to local -> Turn picture -> Upload to cloud storage -> Get remote picture connection -> Store in database .

Connect to the database to query the collection to be converted

def connectDatabase():
    conn = pymysql.connect(host='127.0.0.1', user='root', password="",database ='pic',port=3306)  
# host=localhost # Can also write , If 127.0.0.1 If it doesn't work #  Log in to the database 
    cur = conn.cursor(pymysql.cursors.DictCursor) 
    return {
       "conn":conn,
       "cur":cur
    }
 Copy code 
#  Get the collection of files to be transferred 
def getUrlArr(cur):
    sql = 'select * from pic' #  Write your own sql sentence 
    arr = ''
    try:
        cur.execute(sql)
        ex = cur.execute(sql)
        arr = cur.fetchmany(ex)
    except Exception as e:
        raise e
    finally:
        return arr
 Copy code 

Download files to local

#  Download files to local 
def downLoad(url):
    print('----url-----',url)
    filename=''
    try:
        suffix = os.path.basename(url).split('.')[1]
        filename = "miaohui."+suffix
        if os.path.exists(filename):  #  If the file exists   Delete file 
            os.remove(filename)
        wget.download(url,filename)
    except IOError:
        print(' Download failed ',url)
    else:
        print('\n')
        print(' Download successful ',url)
        return filename
 Copy code 

ppt Turn picture

# pip install pywin32

#  initialization PPT
def init_powerpoint():
    powerpoint = win32com.client.Dispatch('PowerPoint.Application') #comtypes.client.CreateObject("Powerpoint.Application")
    powerpoint.Visible = 1
    return powerpoint
 Copy code 
# PPT turn png
def ppt2png(url,pptFileName,powerpoint):
    try:
        ppt_path = os.path.abspath(pptFileName)
        ppt = powerpoint.Presentations.Open(ppt_path)
        # Save as picture 
        img_path = os.path.abspath(downLoad_path + '.png')
        ppt.SaveAs(img_path, 18) # 17 Save as jpg Format 
        #  Close the open ppt file 
        ppt.Close()
    except IOError:
        print('PPT turn png Failure ',url)
    else:
        print("PPT turn png success ",url)
 Copy code 

pdf Turn picture

# pip install PyMuPDF
# pdf Turn picture 
def pdf2png(_url,pptFileName):
    imagePath = os.path.abspath(downLoad_path)
    try:
        pdfDoc = fitz.open(pptFileName)
        for pg in range(pdfDoc.pageCount):
            page = pdfDoc[pg]
            rotate = int(0)
            #  The scaling factor for each size is 1.3, This will improve the resolution of our generation 2.6 Image .
            #  If there is no setting here , The default image size is :792X612, dpi=96
            zoom_x = 1.33333333  # (1.33333333-->1056x816) (2-->1584x1224)
            zoom_y = 1.33333333
            mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate)
            pix = page.get_pixmap(matrix=mat, alpha=False)

            if not os.path.exists(imagePath):  #  Determine whether the folder where the pictures are stored exists 
                os.makedirs(imagePath)  #  If the picture folder doesn't exist, create 
            pix.save(imagePath + '/' + ' Slide %s.png' % pg)  #  Write the picture to the specified folder 

    except IOError:
        print('pdf turn png Failure ',_url)
    else:
        print("pdf turn png success ",_url)
 Copy code 

word Turn picture

word To transfer a picture, you need to transfer it first , The first word Turn into pdf, Then take it. pdf Turn to picture .

# word turn Pdf
def word2pdf(word_file):
    '''  take word File conversion to pdf file  :param word_file: word file  :return: '''
    #  obtain word Format processing object 
    word = Dispatch('Word.Application')
    #  With Doc Object to open the file 
    doc_ = word.Documents.Open(word_file)
    #  Save as pdf file 
    suffix = os.path.basename(word_file).split('.')[1]
    doc_.SaveAs(word_file.replace(suffix, "pdf"), FileFormat=17)
    print(word_file,'---- turn pdf success ')
    #  close doc object 
    doc_.Close()
    #  sign out word object 
    word.Quit()
    return os.path.basename(word_file).split('.')[0]+'.pdf'
 Copy code 

Then call the above pdf2png

Upload to object store

I won't post it here , We use Huawei cloud OBS. Alibaba cloud , Tencent cloud and other object storage have their own Python edition SDK, Access is also very convenient .

Finally, group together to call

if __name__=='__main__':
    connect = connectDatabase()
    powerpoint = init_powerpoint()
    downArr = getUrlArr(connect['cur'])
    for i in downArr:
        if(os.path.exists('./'+downLoad_path)):
            removeFileInFirstDir('./'+downLoad_path)
        _url = unquote(i['url'])
        id = i['id']
        pptFileName = downLoad(_url)# Download the file 
        if(('.pdf' in _url) ==True):
            pdf2png(_url,pptFileName)
        elif (('.doc' in _url) ==True):
            _file = os.path.abspath(pptFileName)
            pdfNmae = word2pdf(_file)
            pdf2png(_url,pdfNmae)
        else:   
             ppt2png(_url,pptFileName,powerpoint) # turn png
        imgArr = uploadImg(_url) # Upload pictures to cloud storage to get remote Links 
        setData(_url,id,imgArr,connect) # Save to database 
        time.sleep(2)
        print('\n')
        print('\n')
    connect['cur'].close()    # Close cursor 
    connect['conn'].close()   # Disconnect database , Release resources 
    powerpoint.Quit()
    input(" Enter any key to end ")
 Copy code 

Because it is used internally , So you can use pyinstaller It's packed into one exe, For operation , After uploading the data, run , You can automatically transfer pictures in batches .

#py turn exe
pyinstaller -c -F -i a.ico ppt_to_img.py   
 Copy code 

Last

I hope this article will help you , If there is a problem , Welcome to correct ~

Find interview questions ? Come on Front end interview question bank wx Search for Advanced large front end Applet

copyright notice
author[web_ zhou],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202011643313210.html

Random recommended