current position:Home>[Python learning] nanny level teaching parsing and parsing XML in Python

[Python learning] nanny level teaching parsing and parsing XML in Python

2022-02-02 04:45:31 Charlie is not a dog

Abstract : We often need to parse data written in different languages .Python Many libraries are provided to parse or split data written in other languages . Here it is Python XML Parser tutorial , You will learn how to use Python analysis XML.

We often need to parse data written in different languages .Python Many libraries are provided to parse or split data written in other languages . Here it is Python XML Parser tutorial , You will learn how to use Python analysis XML.

Here are all the topics covered in this tutorial :

Whatis XML?Python XML Parsing Modulesxml.etree.ElementTree Module

  • Usingparse() function
  • Usingfromstring() function
  • FindingElements of Interest
  • ModifyingXML files
  • Addingto XML
  • Deletingfrom XML

xml.dom.minidomModule

  • Usingparse() function
  • UsingfromString() function
  • FindingElements of Interest

Let's get started .:)

What is? XML?

XML Represents extensible markup language . It looks like HTML, but XML For data representation , and HTML Used to define the data being used .XML Designed to send and receive data back and forth between client and server . Take a look at the following example :

Example :

        Idly       $2.5          Two  idly's with chutney
    < /description>
      553         Paper Dosa       $2.7      <
         700         Upma       $3.65           Rava upma with bajji
           600         Bisi Bele Bath       $4.50         Bisi Bele Bath with sev
           400        Kesari Bath       $1.95           Saffron sweet lava 
          950
 Copy code 

The example above shows what I named “Sample.xml” The content of the document , I will be here Python XML Use the same content for all upcoming examples in the parser tutorial .

Python XML Parsing module

Python Two modules are allowed to parse these XML file , namely xml.etree.ElementTree Module and Minidom( Minimum DOM Realization ). Parsing means reading information from a file and identifying that particular XML The part of the file splits it into several parts . Let's learn more about how to use these modules to parse XML data .

xml.etree.ElementTree modular :

This module helps us format... In the tree structure XML data , This is the most natural representation of hierarchical data . Element types allow hierarchical data structures to be stored in memory , And has the following properties :

 Screen capture  2021-12-10 155224.png

ElementTree Is a packaging element structure and allows to be associated with XML Classes that transform each other . Now let's try to use python The module parses the above XML file .

There are two uses “ElementTree” Module parsing file method . The first is to use parse() function , The second is fromstring() function .parse() Function parsing is provided as a file XML file , and fromstring Parse the... Provided as a string XML, In three quotation marks .

Use parse() function :

As mentioned earlier , This function is in file format XML To resolve it . See the following example :

Example :

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
 Copy code 

As you can see , The first thing you need to do is import xml.etree.ElementTree modular . then , parse() Method resolution “Sample.xml” file .getroot() Method returns “Sample.xml” The root element of the .

When executing the above code , You don't see the output returned , But there are no errors that indicate that the code has executed successfully . To check the root element , You can simply use print sentence , As shown below :

Example :

import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
print(myroot)
 Copy code 

Output :

The output above shows that our XML The root element in the document is “ Metadata ”.

Use fromstring() function :

You can also use it fromstring() Function to parse your string data . If you want to do this , Please put XML Pass as a string to three quotes , As shown below :

import xml.etree.ElementTree as ET
data='''    Idly    $2.5       Two idly's with chutney
       553'''
myroot = ET.fromstring(data)
#print(myroot)
print(myroot.tag)
 Copy code 

The above code will return the same output as the previous one . Please note that , Used as a string XML The document is just “Sample.xml” Part of , I use it to improve visibility . You can also use the full XML file .

You can also use it “ label ” Object to retrieve the root tag , As shown below :

Example :

print(myroot.tag)
 Copy code 

Output : Metadata

You can also slice the label string output by specifying the part of the string you want to see in the output .

Example :

print(myroot.tag\[0:4\])
 Copy code 

Output : element

As mentioned earlier , Tags can also have dictionary properties . To check if the root tag has any properties , You can use “attrib” object , As shown below :

Example :

print(myroot.attrib)
 Copy code 

Output : {}

As you can see , The output is an empty dictionary , Because our root tag has no attributes .

Looking for elements of interest :

Roots are also made up of sub tags . To retrieve the children of the root tag , You can use the following command :

Example :

print(myroot\[0\].tag)
 Copy code 

** Output :** food

Now? , If you want to retrieve all the first child tags of the root , have access to for Loop over it , As shown below :

Example :

for x in myroot\[0\]:
     print(x.tag, x.attrib)
 Copy code 

Output :

item{'name': 'breakfast'} Price {} describe {} calories {}

All items returned are child attributes and labels of food .

To use ElementTree Change the text from XML Separate from , You can use text attribute . for example , If I want to retrieve all the information about the first food , I should use the following code :

Example :

for x in myroot\[0\]:
        print(x.text)
 Copy code 

Output :

Lazily $ 2.5 Two leisurely with chutney 553

You can see , The text information of the first item has been returned as output . Now? , If you want to display all items with a specific price , You can use get() Method . This method accesses the properties of the element .

Example :

for x in myroot.findall('food'):
    item =x.find('item').text
    price = x.find('price').text
    print(item, price)
 Copy code 

Output :

Idly$2.5

Paper Dosa $2.7

Upma $3.65

Bisi Bele Bath $4.50

Kesari Bath $1.95

The output above shows all the required items and the price of each item . Use ElementTree, You can also modify XML file .

modify XML file :

Can operate XML Elements in the file . So , You can use set() function . Let's first look at how to XML Add something .

Add to XML:

The following example shows how to add content to a project description .

Example :

for description in myroot.iter('description'):
     new\_desc = str(description.text)+'wil be served'
     description.text = str(new\_desc)
     description.set('updated', 'yes')
 
mytree.write('new.xml')
 Copy code 

write() Function to help create a new xml File and write the updated output to the same file . however , You can also use the same function to modify the original file . After executing the above code , You will be able to see that a new file has been created with updated results .

 Screen capture  2021-12-10 153221.png

The image above shows a modified description of our food . To add a new sub tag , You can use SubElement() Method . for example , If you want to be in the first Idly Add a new professional tag to , You can do the following :

Example :

ET.SubElement(myroot\[0\], 'speciality')
for x in myroot.iter('speciality'):
     new\_desc = 'South Indian Special'
     x.text = str(new\_desc)
 
mytree.write('output5.xml')
 Copy code 

Output :

As you can see , A new label has been added under the first food label . By means of [] The subscript... Is specified in parentheses , You can add labels anywhere . Now let's see how to use this module to delete items .

from XML Delete in :

To use ElementTree Delete attributes or child elements , You can use pop() Method . This method will remove required attributes or elements that the user does not need .

Example :

myroot\[0\]\[0\].attrib.pop('name', None)
 
# create a new XML file with the results
mytree.write('output5.xml')
 Copy code 

Output :

Shown above name Property has been removed from item Delete... From the tag . To remove the full tag , You can use the same pop() Method , As shown below :

Example :

myroot\[0\].remove(myroot\[0\]\[0\])
mytree.write('output6.xml')
 Copy code 

Output :

 Screen capture  2021-12-10 153640.png

The output shows that the first child element of the food label has been deleted . If you want to delete all tags , have access to clear() function , As shown below :

Example :

myroot\[0\].clear()
mytree.write('output7.xml')
 Copy code 

Output :

When executing the above code ,food The first child of the tag will be completely removed , Include all sub tags . So far , We've been using this Python XML Parser tutorial xml.etree.ElementTree modular . Now let's see how to use Minidom analysis XML.

xml.dom.minidom modular :

This module is basically made up of proficient DOM( Document object module ) Of people who use .DOM Applications usually start with XML It can be interpreted as DOM. stay xml.dom.minidom in , This can be done in the following ways :

Use parse() function :

The first way is by providing XML The file is used as a parameter parse() function . for example :

Example :

from xml.dom import minidom
p1 = minidom.parse("sample.xml");
 Copy code 

After doing this , You will be able to split XML File and get the data you need . You can also use this function to parse open files .

Example :

dat=open('sample.xml')
p2=minidom.parse(dat)
 Copy code 

under these circumstances , The variable that stores the open file is provided to the parsing function as a parameter .

Use parseString() Method :

When you want to provide XML when , This method will be used .

Example :

p3 = minidom.parseString('Using parseString')
 Copy code 

You can use any of the above methods to parse XML. Now let's try to use this module to get the data .

Looking for elements of interest :

After my file has been parsed , If I try to print it , The returned output will display a message , Indicates that the variable that stores the parsed data is DOM object .

Example :

dat=minidom.parse('sample.xml')
print(dat)
 Copy code 

Output :

Use GetElementByTagName Access elements :

Example :

tagname= dat.getElementsByTagName('item')\[0\]
print(tagname)
 Copy code 

If I try to use GetElementByTagName Method to get the first element , I'll see the following output :

Output :

Please note that , Only one output is returned , Because for convenience I use [0] Subscript , This will be removed in further examples .

To access the value of a property , I have to use it as follows value attribute :

Example :

dat = minidom.parse('sample.xml')
tagname= dat.getElementsByTagName('item')
print(tagname\[0\].attributes\['name'\].value)
 Copy code 

Output : breakfast

To retrieve the data that exists in these tags , You can use data attribute , As shown below :

Example :

print(tagname\[1\].firstChild.data)
 Copy code 

Output : paper Dosa

You can also use it value Property splitting and retrieving the value of a property .

Example :

print(items\[1\].attributes\['name'\].value)
 Copy code 

Output : breakfast

To print out all the items available in our menu , You can traverse these items and return all of them .

Example :

for x in items:
    print(x.firstChild.data)
 Copy code 

Output :

Stand idly by DOSAUPMA Jasper bath Kesari bath

To count the number of items on a menu , You can use len() function , As shown below :

Example :

print(len(items))
 Copy code 

The output specifies that our menu contains 5 A project .

This brings us to the end of this Python XML Parser tutorial . I hope you have a clear understanding of everything .

Reprinted from :blog.51cto.com/u_15214399/…

copyright notice
author[Charlie is not a dog],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020445299984.html

Random recommended