current position:Home>Python office automation - play with browser

Python office automation - play with browser

2022-01-31 23:34:07 somenzz

On a daily basis , We can't avoid using browsers to do some work ,Python There are also many browser automation tools , I used selenium、splinter、playwright, Finally chose Microsoft's playwright, The reason for choosing it , Because it can automatically install the browser , There is no need to manually download the browser driver , such as chromedriver, The automation tools written in this way can be easily transplanted to other systems .

Playwright It can be done by a single API Automatic execution Chromium,Firefox and WebKit browser , Support headless browser (headless),Linux、macOS、Windows Can be used under ,Playwright The automation technology provided is green , Powerful , Stable and fast . You can make the most of your space , Imagine what it can do .

install :

Official documents…

pip install playwright
playwright install
 Copy code 

playwright install Will install Chromium,Firefox and WebKit Browser binaries , Very convenient , need Python 3.7 And above .

Let's start with a sample code :

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
 Copy code 

After running the program , The program will automatically open the browser , visit, And print the title of the website .

Automatic code generation

Playwright What attracts me most is that it can record your actions on the browser itself , And generate executable code from these operations , It's an artifact , Greatly improve the efficiency of browser automation . Generating code only needs to execute

python -m playwright codegen
 Copy code 

You can generate the following code :

from playwright.sync_api import Playwright, sync_playwright
def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    # Open new page
    page = context.new_page()
    # Go to
    # Click input[name="wd"]"input[name=\"wd\"]")
    # Fill input[name="wd"]
    page.fill("input[name=\"wd\"]", "playwright ")
    # Press CapsLock"input[name=\"wd\"]", "CapsLock")
    # Fill input[name="wd"]
    page.fill("input[name=\"wd\"]", "playwright  course ")
    # Press Enter
    # with page.expect_navigation(url=""):
    with page.expect_navigation():"input[name=\"wd\"]", "Enter")
    # Click text=Playwright-python  course _ The world is free to go -CSDN Blog 
    # with page.expect_navigation(url=""):
    with page.expect_navigation():
        with page.expect_popup() as popup_info:
  "text=Playwright-python  course _ The world is free to go -CSDN Blog ")
        page1 = popup_info.value
    # Click text=×"text=×")
    # ---------------------
with sync_playwright() as playwright:

 Copy code 

How to interact with browser elements

Get familiar with some concepts first


A browser is an instance of a browser , It can be Chromium, Firefox or WebKit,Playwright Scripts usually start by opening a browser , End by closing the browser , You can use headless browser mode , In other words, although the browser is opened , But I can't see the process of browser startup and operation , It's hidden .

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
 Copy code 

Browser context

A browser context is an isolated anonymous session in a browser instance . Browser context creation is fast and cheap . We recommend running each test scenario in our new browser context , To isolate browser state between tests . The browser context can also be used to simulate involving mobile devices 、 jurisdiction 、 Multi page scene with locale and color scheme .

browser = playwright.chromium.launch()
context = browser.new_context()
 Copy code 

Pages and frames

Browser context can have multiple pages . A page is a single tab or pop-up window in the browser context . It should be used to navigate to URL And interact with the page content .

page = context.new_page()

#  Explicit navigation , It is similar to entering... In the browser URL.
#  Fill in and enter .
page.fill('#search', 'query')

#  Click the link to navigate implicitly .'#submit')
#  Looking forward to a new website .

#  Pages can be navigated from scripts  -  The playwright will receive .
# window.location.href = ''
 Copy code 

One or more... Can be attached to a page Frame object . Each page has a main frame , Assume page level interaction ( If you click ) Run in the main frame .

A page can have additional frames and iframe HTML label . You can visit these iframe:

# Get frame using the frame's name attribute
frame = page.frame('frame-login')

# Get frame using frame's URL
frame = page.frame(url=r'.*domain.*')

# Get frame using any other selector
frame_element_handle = page.query_selector('.frame-class')
frame = frame_element_handle.content_frame()

# Interact with the frame
frame.fill('#username-input', 'John')
 Copy code 


A selector is a selection html Tools for elements in the page .

Playwright have access to CSS Selectors 、XPath Selectors 、id etc. HTML attribute 、data-test-id Even text content search elements .

You can explicitly specify which selector engine you are using , Or let Playwright Detected it .

Playwright The selector is very intuitive and easy to use , Learn more about selectors and Selector engine Information about .

Realize the automatic playback of video websites

Here is a simple way to open a video website , And by refreshing the browser to realize the code of perceiving the completion of video playback .

from playwright.sync_api import sync_playwright
import re, sys
import progressbar
from log import logger
from urllib.parse import urlparse

import time
from config import chromium, browser_path

current_milli_time = lambda: int(round(time.time() * 1000))

class AutoLearning(object):
    def get_total_seconds(time_str):
        hour, minute, seconds = 0, 0, 0
        time = [int(i) for i in time_str.split(":")]
        if len(time) == 2:
            minute, seconds = time
        elif len(time) == 1:
            seconds = time[0]
        elif len(time) == 3:
            hour, minute, seconds = time
        return hour * 60 * 60 + minute * 60 + seconds

    def __init__(self, username, passwd, base_url, key=None):
        self.username = username
        self.passwd = passwd
        urlparseObj = urlparse(base_url)
        self.base_url = f"{urlparseObj.scheme}://{urlparseObj.hostname}"
        self.hostname = urlparseObj.hostname
        self.sync_playwright = sync_playwright()
        self.playwright = self.sync_playwright.start()
        if chromium:
            self.browser = self.playwright.chromium.launch(executable_path=browser_path, headless=False)
            self.browser = self.playwright.firefox.launch(executable_path=browser_path,headless=False)
        self.context = self.browser.new_context()
        self.current_page = self.context.new_page()
        self.cookies = {}
        self.corp_code = "default"
        self.map_url = f"{self.base_url}/els/html/"

        self.headers = {
            "Host": self.hostname,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
            "Origin": self.base_url,

        self.eln_session_id = ""

    def __del__(self):

    def login(self):
        page = self.current_page
        # self.context.set_default_timeout(6000)
            # Click [placeholder=" Please enter a user name "]

            # Fill [placeholder=" Please enter a user name "]
            page.fill('[name="loginName"]', self.username)

            # Click [placeholder=" Please input a password "]

            # Fill [placeholder=" Please input a password "]
            page.fill('[name="password"]', self.passwd)

            # Click text= Sign in 
            print(" If there is a verification code , Please log in manually on the browser ")
            # Click text= Continue to log in 
            # if page.is_visible("text= Continue to log in ", timeout=15000):
  "text= Continue to log in ")

        except Exception:
            print(" Please log in manually on the browser ")

        while True:
                if (
                        page.is_visible("text=' Course Center '", timeout=3000)
                        or page.is_visible("text=' Personal center '", timeout=3000)
                        or page.is_visible("text=' Learning Center '", timeout=3000)
          " Landing successful ")
                    self.current_page = page
            except Exception:
                print(" Please log in manually on the browser ")

    def learn_course_from_learn_map(self, which_one_to_learn=1, skip_num=0):"learn_course_from_learn_map begin.")
        item = self.current_page.query_selector(
        link = self.current_page.query_selector(
        item_title = item.inner_text()
        link_title = link.inner_text()
        if " Learning progress :100%" in item_title:
  "{link_title}  Have learned , sign out ")
            return" Begin to learn  {item_title}")

        courses = self.current_page.query_selector_all("a.innercan.goCourseByStudy")

        for course in courses[skip_num:]:
            with self.current_page.expect_popup() as popup_info:
            new_page = popup_info.value
            h3 = new_page.query_selector("h3.cs-test-title")
            if h3:
      " The video of this course has been played , No need to play ")
                if h3.inner_text() == " curriculum evaluation ":

            course_item = {
                "courseId": course.get_attribute("id"),
                "courseName": course.get_attribute("title"),

                f" Playing  {course_item['courseName']},courseId = {course_item['courseId']}"

            if new_page.is_visible("iframe.url-course-content"):

            if new_page.is_visible("text=' determine '",timeout = 3000):
      "text=' determine '")

            if new_page.is_visible("a:has-text(' next step ')"):
      "a:has-text(' next step ')")
            new_page.close()" The task of learning map has been completed ")

    def play_single_course2(self, page):
        """  A split screen 、 Split screen play  """

        total_time_ele = page.query_selector("")
        total_minutes = int(0 if total_time_ele.inner_text() == '' else total_time_ele.inner_text())
        alread_time_ele = page.query_selector("id=studiedTime")
        alread_minutes = int(0 if alread_time_ele.inner_text() == '' else alread_time_ele.inner_text())

        chapters = page.query_selector_all("")
        if len(chapters) > 0:
  " This time we need to play  {len(chapters)}  section ")
  " Playing  {chapters[0].get_attribute('title')}  It takes a total of time  {total_minutes}  minute ")

        bar = None
        if sys.platform == "win32":
            bar =
            bar = progressbar.ProgressBar(max_value=total_minutes)

        wait_count = 0
        while True:
            wait_count += 1
            if wait_count >= 7:
                wait_count = 0
            if wait_count % 3 == 0:
                chapters = page.query_selector_all("")
                if len(chapters) > 0:
          " This time we need to play  {len(chapters)}  section ")
          " Playing  {chapters[0].get_attribute('title')}  It takes a total of time  {total_minutes}  minute ")

            alread_time_ele = page.query_selector("id=studiedTime")
            alread_minutes = int(0 if alread_time_ele.inner_text() == '' else alread_time_ele.inner_text())
            if page.is_visible("a:has-text(' next step ')"):
        bar.update(total_minutes)" The video on this page has been played ")

if __name__ == "__main__":
    auto = AutoLearning(username='****', passwd='*', base_url='http://*****.net')
    auto.learn_course_from_learn_map(which_one_to_learn=1, skip_num=0)

 Copy code 

( End )

copyright notice
author[somenzz],Please bring the original link to reprint, thank you.

Random recommended