current position:Home>Python office automation - play with browser

Python office automation - play with browser

2022-01-31 23:34:07 somenzz

On a daily basis , We can't avoid using browsers to do some work ,Python There are also many browser automation tools , I used selenium、splinter、playwright, Finally chose Microsoft's playwright, The reason for choosing it , Because it can automatically install the browser , There is no need to manually download the browser driver , such as chromedriver, The automation tools written in this way can be easily transplanted to other systems .

Playwright It can be done by a single API Automatic execution Chromium,Firefox and WebKit browser , Support headless browser (headless),Linux、macOS、Windows Can be used under ,Playwright The automation technology provided is green , Powerful , Stable and fast . You can make the most of your space , Imagine what it can do .

install :

Official documents playwright.dev/python/docs…

pip install playwright
playwright install
 Copy code 

playwright install Will install Chromium,Firefox and WebKit Browser binaries , Very convenient , need Python 3.7 And above .

Let's start with a sample code :

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("http://playwright.dev")
    print(page.title())
    browser.close()
 Copy code 

After running the program , The program will automatically open the browser , visit playwright.dev, And print the title of the website .

Automatic code generation

Playwright What attracts me most is that it can record your actions on the browser itself , And generate executable code from these operations , It's an artifact , Greatly improve the efficiency of browser automation . Generating code only needs to execute

python -m playwright codegen baidu.com
 Copy code 

You can generate the following code :

from playwright.sync_api import Playwright, sync_playwright
def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    # Open new page
    page = context.new_page()
    # Go to https://www.baidu.com/
    page.goto("https://www.baidu.com/")
    # Click input[name="wd"]
    page.click("input[name=\"wd\"]")
    # Fill input[name="wd"]
    page.fill("input[name=\"wd\"]", "playwright ")
    # Press CapsLock
    page.press("input[name=\"wd\"]", "CapsLock")
    # Fill input[name="wd"]
    page.fill("input[name=\"wd\"]", "playwright  course ")
    # Press Enter
    # with page.expect_navigation(url="https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=playwright%20%E6%95%99%E7%A8%8B&fenlei=256&rsv_pq=880cdb05002fe1ed&rsv_t=19abqiURFrqQT3i6%2F84nvsfVrJlI%2B1T6XbVpQkOap78JGssznOJ4%2FVasRzE&rqlang=cn&rsv_dl=tb&rsv_enter=1&rsv_sug3=23&rsv_sug1=20&rsv_sug7=100&rsv_sug2=0&rsv_btype=i&inputT=6608&rsv_sug4=11435&rsv_jmp=fail"):
    with page.expect_navigation():
        page.press("input[name=\"wd\"]", "Enter")
    # Click text=Playwright-python  course _ The world is free to go -CSDN Blog 
    # with page.expect_navigation(url="https://blog.csdn.net/lb245557472/article/details/111572119"):
    with page.expect_navigation():
        with page.expect_popup() as popup_info:
            page.click("text=Playwright-python  course _ The world is free to go -CSDN Blog ")
        page1 = popup_info.value
    # Click text=×
    page1.click("text=×")
    # ---------------------
    context.close()
    browser.close()
with sync_playwright() as playwright:
    run(playwright)

 Copy code 

How to interact with browser elements

Get familiar with some concepts first

browser

A browser is an instance of a browser , It can be Chromium, Firefox or WebKit,Playwright Scripts usually start by opening a browser , End by closing the browser , You can use headless browser mode , In other words, although the browser is opened , But I can't see the process of browser startup and operation , It's hidden .

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    browser.close()
 Copy code 

Browser context

A browser context is an isolated anonymous session in a browser instance . Browser context creation is fast and cheap . We recommend running each test scenario in our new browser context , To isolate browser state between tests . The browser context can also be used to simulate involving mobile devices 、 jurisdiction 、 Multi page scene with locale and color scheme .

browser = playwright.chromium.launch()
context = browser.new_context()
 Copy code 

Pages and frames

Browser context can have multiple pages . A page is a single tab or pop-up window in the browser context . It should be used to navigate to URL And interact with the page content .

page = context.new_page()

#  Explicit navigation , It is similar to entering... In the browser URL.
page.goto('http://example.com')
#  Fill in and enter .
page.fill('#search', 'query')

#  Click the link to navigate implicitly .
page.click('#submit')
#  Looking forward to a new website .
print(page.url)

#  Pages can be navigated from scripts  -  The playwright will receive .
# window.location.href = 'https://example.com'
 Copy code 

One or more... Can be attached to a page Frame object . Each page has a main frame , Assume page level interaction ( If you click ) Run in the main frame .

A page can have additional frames and iframe HTML label . You can visit these iframe:

# Get frame using the frame's name attribute
frame = page.frame('frame-login')

# Get frame using frame's URL
frame = page.frame(url=r'.*domain.*')

# Get frame using any other selector
frame_element_handle = page.query_selector('.frame-class')
frame = frame_element_handle.content_frame()

# Interact with the frame
frame.fill('#username-input', 'John')
 Copy code 

Selectors

A selector is a selection html Tools for elements in the page .

Playwright have access to CSS Selectors 、XPath Selectors 、id etc. HTML attribute 、data-test-id Even text content search elements .

You can explicitly specify which selector engine you are using , Or let Playwright Detected it .

Playwright The selector is very intuitive and easy to use , Learn more about selectors and Selector engine Information about .

Realize the automatic playback of video websites

Here is a simple way to open a video website , And by refreshing the browser to realize the code of perceiving the completion of video playback .


from playwright.sync_api import sync_playwright
import re, sys
import progressbar
from log import logger
from urllib.parse import urlparse

import time
from config import chromium, browser_path


current_milli_time = lambda: int(round(time.time() * 1000))


class AutoLearning(object):
 @staticmethod
    def get_total_seconds(time_str):
        hour, minute, seconds = 0, 0, 0
        time = [int(i) for i in time_str.split(":")]
        if len(time) == 2:
            minute, seconds = time
        elif len(time) == 1:
            seconds = time[0]
        elif len(time) == 3:
            hour, minute, seconds = time
        else:
            pass
        return hour * 60 * 60 + minute * 60 + seconds

    def __init__(self, username, passwd, base_url, key=None):
        self.username = username
        self.passwd = passwd
        urlparseObj = urlparse(base_url)
        self.base_url = f"{urlparseObj.scheme}://{urlparseObj.hostname}"
        self.hostname = urlparseObj.hostname
        self.sync_playwright = sync_playwright()
        self.playwright = self.sync_playwright.start()
        if chromium:
            self.browser = self.playwright.chromium.launch(executable_path=browser_path, headless=False)
        else:
            self.browser = self.playwright.firefox.launch(executable_path=browser_path,headless=False)
        self.context = self.browser.new_context()
        self.current_page = self.context.new_page()
        self.cookies = {}
        self.corp_code = "default"
        self.map_url = f"{self.base_url}/els/html/index.parser.do?id=0007"

        self.headers = {
            "Host": self.hostname,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
            "Origin": self.base_url,
        }

        self.eln_session_id = ""

    def __del__(self):
        self.context.close()
        self.browser.close()
        self.sync_playwright.__exit__()

    def login(self):
        logger.info(self.base_url)
        self.current_page.goto(url=self.base_url)
        page = self.current_page
        # self.context.set_default_timeout(6000)
        try:
            # Click [placeholder=" Please enter a user name "]
            page.click('[name="loginName"]')

            # Fill [placeholder=" Please enter a user name "]
            page.fill('[name="loginName"]', self.username)

            # Click [placeholder=" Please input a password "]
            page.click('[name="password"]')

            # Fill [placeholder=" Please input a password "]
            page.fill('[name="password"]', self.passwd)

            # Click text= Sign in 
            page.click("input.login_Btn")
            print(" If there is a verification code , Please log in manually on the browser ")
            # Click text= Continue to log in 
            # if page.is_visible("text= Continue to log in ", timeout=15000):
            page.click("text= Continue to log in ")

        except Exception:
            print(" Please log in manually on the browser ")
            time.sleep(3)

        while True:
            try:
                if (
                        page.is_visible("text=' Course Center '", timeout=3000)
                        or page.is_visible("text=' Personal center '", timeout=3000)
                        or page.is_visible("text=' Learning Center '", timeout=3000)
                ):
                    logger.info(" Landing successful ")
                    self.current_page = page
                    break
            except Exception:
                print(" Please log in manually on the browser ")
            time.sleep(5)


    def learn_course_from_learn_map(self, which_one_to_learn=1, skip_num=0):

        logger.info("learn_course_from_learn_map begin.")
        self.current_page.goto(self.map_url)
        self.current_page.wait_for_selector(
            f":nth-match(div.track-list-tit,{which_one_to_learn})"
        )
        item = self.current_page.query_selector(
            f":nth-match(div.track-list-tit,{which_one_to_learn})"
        )
        link = self.current_page.query_selector(
            f":nth-match(a.track-list-linktoName,{which_one_to_learn})"
        )
        item_title = item.inner_text()
        link_title = link.inner_text()
        if " Learning progress :100%" in item_title:
            logger.info(f"{link_title}  Have learned , sign out ")
            return

        logger.info(f" Begin to learn  {item_title}")
        link.click()

        self.current_page.wait_for_selector("a.innercan.goCourseByStudy")
        courses = self.current_page.query_selector_all("a.innercan.goCourseByStudy")

        for course in courses[skip_num:]:
            time.sleep(2)
            with self.current_page.expect_popup() as popup_info:
                course.click()
            new_page = popup_info.value
            new_page.wait_for_load_state(timeout=60000)
            time.sleep(1)
            h3 = new_page.query_selector("h3.cs-test-title")
            if h3:
                logger.info(" The video of this course has been played , No need to play ")
                if h3.inner_text() == " curriculum evaluation ":
                    self.evaluation(new_page)
                new_page.close()
                continue

            course_item = {
                "courseId": course.get_attribute("id"),
                "courseName": course.get_attribute("title"),
            }

            logger.info(
                f" Playing  {course_item['courseName']},courseId = {course_item['courseId']}"
            )

            if new_page.is_visible("iframe.url-course-content"):
                self.play_single_course2(new_page)

            if new_page.is_visible("text=' determine '",timeout = 3000):
                new_page.click("text=' determine '")

            if new_page.is_visible("a:has-text(' next step ')"):
                new_page.click("a:has-text(' next step ')")
                self.evaluation(new_page)
            new_page.close()
        logger.info(" The task of learning map has been completed ")







    def play_single_course2(self, page):
        """  A split screen 、 Split screen play  """

        page.wait_for_selector("time.cl-time")
        page.wait_for_selector("id=studiedTime")
        time.sleep(5)
        total_time_ele = page.query_selector("time.cl-time")
        total_minutes = int(0 if total_time_ele.inner_text() == '' else total_time_ele.inner_text())
        alread_time_ele = page.query_selector("id=studiedTime")
        alread_minutes = int(0 if alread_time_ele.inner_text() == '' else alread_time_ele.inner_text())

        chapters = page.query_selector_all("a.scormItem-no.cl-catalog-link.cl-catalog-link-sub.item-no")
        if len(chapters) > 0:
            logger.info(f" This time we need to play  {len(chapters)}  section ")
            chapters[0].click()
            logger.info(f" Playing  {chapters[0].get_attribute('title')}  It takes a total of time  {total_minutes}  minute ")

        bar = None
        if sys.platform == "win32":
            bar = progressbar.bar.ProgressBar(max_value=total_minutes)
        else:
            bar = progressbar.ProgressBar(max_value=total_minutes)

        bar.update(alread_minutes)
        wait_count = 0
        while True:
            time.sleep(60)
            wait_count += 1
            if wait_count >= 7:
                page.reload()
                wait_count = 0
            if wait_count % 3 == 0:
                chapters = page.query_selector_all("a.scormItem-no.cl-catalog-link.cl-catalog-link-sub.item-no")
                if len(chapters) > 0:
                    logger.info(f" This time we need to play  {len(chapters)}  section ")
                    chapters[0].click()
                    logger.info(f" Playing  {chapters[0].get_attribute('title')}  It takes a total of time  {total_minutes}  minute ")

            page.wait_for_selector("id=studiedTime")
            alread_time_ele = page.query_selector("id=studiedTime")
            alread_minutes = int(0 if alread_time_ele.inner_text() == '' else alread_time_ele.inner_text())
            if page.is_visible("a:has-text(' next step ')"):
                break
            bar.update(alread_minutes)
        time.sleep(1)
        bar.update(total_minutes)
        logger.info(f" The video on this page has been played ")



if __name__ == "__main__":
    auto = AutoLearning(username='****', passwd='*', base_url='http://*****.net')
    auto.login()
    auto.learn_course_from_learn_map(which_one_to_learn=1, skip_num=0)
    time.sleep(100)

 Copy code 

( End )

copyright notice
author[somenzz],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201312334045674.html

Random recommended