python lxml etree怎麽甩

lxml是Python語言中處理XML和HTML功能最豐富，最易於使用的庫。

lxml是libxml2和libxslt兩個C庫的Python化綁定，它的獨特之處在於兼顧了這些庫的速度和功能完整性，同時還具有Python API的簡介。兼容ElementTree API,但是比它更優越。

用libxml2編程就像是壹個異於常人的陌生人的令人驚恐的擁抱，它看上去可以滿足妳壹切瘋狂的夢想，但是妳的內心深處壹直在警告妳，妳有可能會以最糟糕的方式遭殃，所以就有了lxml。

這是壹個用lxml.etree來處理XML的教程，它簡單的概述了ElementTree API的主要概念，同時有壹些能讓妳的程序生涯更輕松的簡單的提高。

首先是導入lxml.etree的方式:

from?lxml?import?etree

為了協助代碼的可移植性，本教程中的例子很明顯可以看出，壹部分API是lxml.etree在ElementTree API（由Fredrik Lundh 的ElementTree庫定義）的基礎上的擴展。

Element是ElementTree API的主要容器類，大部分XML tree的功能都是通過這個類來實現的，Element的創建很容易：

root?=?etree.Element("root")

element的XML tag名通過tag屬性來訪問

>>>print?root.tag

root

許多Element被組織成壹個XML樹狀結構，創建壹個子element並添加進父element使用append方法：

>>>root.append(etree.Element("child1"))

還有壹個更簡短更有效的方法：the SubElement，它的參數和element壹樣，但是需要父element作為第壹個參數：

>>>child2?=?etree.SubElement(root,"child2")

>>>child3?=?etree.SubElement(root,"child3")

可以序列化妳創建的樹：

>>>print(etree.tostring(root,?pretty_print=True))

<root>

</root>

為了更方便直觀的訪問這些子節點，element模仿了正常的Python鏈：

>>>?child?=?root[0]>>>?print(child.tag)

child1

>>>?print(len(root))

>>>?root.index(root[1])?#?lxml.etree?only!

>>>?children?=?list(root)>>>?for?child?in?root:...?print(child.tag)child1child2

child3

>>>?root.insert(0,?etree.Element("child0"))>>>?start?=?root[:1]>>>?end=?root[-1:]>>>?print(start[0].tag)child0>>>?print(end[0].tag)child3

還可以根據element的真值看其是否有孩子節點：

if?root:#?this?no?longer?works!

print("The?root?element?has?children")

用len(element)更直觀，且不容易出錯：

>>>?print(etree.iselement(root))?#?test?if?it's?some?kind?of?Element

True

>>>?if?len(root):?#?test?if?it?has?children

...?print("The?root?element?has?children")

The?root?element?has?children

還有壹個重要的特性，原文的句子只可意會，看例子應該是能看懂什麽意思吧。

>>>?for?child?in?root:...?print(child.tag)child0child1child2child3>>>?root[0]?=?root[-1]?#移動了element>>>?for?child?in?root:...?print(child.tag)child3child1child2>>>?l?=?[0,?1,?2,?3]>>>?l[0]?=?l[-1]>>>?l[3,?1,?2,?3]

>>>?root?is?root[0].getparent()?#?lxml.etree?only!TrueIf?you?want?to?copy?an?element?to?a?different?position?in?lxml.etree,?consider?creating?an?independent?deep?copy?using?the?copy?module?from?Python's?standard?library:>>>?from?copy?import?deepcopy>>>?element?=?etree.Element("neu")>>>?element.append(?deepcopy(root[1])?)>>>?print(element[0].tag)child1>>>?print([?c.tag?for?c?in?root?])['child3',?'child1',?'child2']

XML支持屬性，創建方式如下：

>>>?root?=?etree.Element("root",?interesting="totally")

>>>?etree.tostring(root)

b'<root?interesting="totally"/>'

屬性是無序的鍵值對，所以可以用element類似於字典接口的方式處理：

>>>?print(root.get("interesting"))

totally

>>>?print(root.get("hello"))

None

>>>?root.set("hello",?"Huhu")

>>>?print(root.get("hello"))

Huhu

>>>?etree.tostring(root)

b'<root?interesting="totally"?hello="Huhu"/>'

>>>?sorted(root.keys())

['hello',?'interesting']

>>>?for?name,?value?in?sorted(root.items()):

...?print('%s?=?%r'?%?(name,?value))

hello?=?'Huhu'

interesting?=?'totally'

如果需要獲得壹個類似dict的對象，可以使用attrib屬性：

>>>?attributes?=?root.attrib

>>>?print(attributes["interesting"])

totally

>>>?print(attributes.get("no-such-attribute"))

None

>>>?attributes["hello"]?=?"Guten?Tag"

>>>?print(attributes["hello"])

Guten?Tag

>>>?print(root.get("hello"))

Guten?Tag

既然attrib是element本身支持的類似dict的對象，這就意味著任何對element的改變都會影響attrib，反之亦然。這還意味著只要element的任何壹個attrib還在使用，XML樹就壹直在內存中。通過如下方法，可以獲得壹個獨立於XML樹的attrib的快照：

>>>?d?=?dict(root.attrib)

>>>?sorted(d.items())

[('hello',?'Guten?Tag'),?('interesting',?'totally')]