xml解析 | 潘锦的空间

DOM解析器

DOM 是具有平台和语言无关性。它是表示 XML 文档的官方 W3C 标准。 DOM解析器在实现方式是预加载整个文档，并把XML文档转化为一个包含其内容的树。这个树结构方便开发人员在其中寻找特定信息，并可以调用树的一些操作很容易的添加和修改树中的元素。

在PHP中，它在形式上是基于对象的存储，然而在本质上是存储在一堆结构体中，在对象与对象的之间是以一种类似于父子的概念关系存在，从而在整体上构成了一个树的结构。

由于DOM解析器是预加载的，所以整个文档的结构在内存中是持久存在的。因此可以在其生命周期中随时修改它，以便应用程序能对数据和结构作出更改。它还可以在任何时候在整个树结构中上下导航，并且DOM解析器使用起来比较简单。但是，同样因为是预加载的方式，需要处理整个 XML 文档，所以对性能和内存的要求比较高。当遇到特别大的文档时，解析和加载整个文档可能会很慢且很耗资源，此时我们就需要另外一种方式，比如一边读取一边处理，又或者类似于SAX基于事件的模型。

在PHP中使用DOM解析器是基于DOMDocument类来实现。具体的实现请移步之前的文章： <<PHP中的XML解析的5种方法>>

SAX解析器

SAX是simple API for XML的简写，与DOM不同，它并不进行预加载操作，而是一边扫描文档，一边解析。 SAX解析器采用了基于事件的模型，它在解析 XML 文档的时候可以触发一系列的事件，当发现给定的tag的时候，它可以激活一个回调函数，告诉该函数指定的标签已经找到。 SAX解析器对内存的要求通常会比较低，因为它让开发人员自己来决定所要处理的tag。特别是当开发人员只需要处理文档中所包含的部分数据时，SAX解析器这种扩展能力得到了更好的体现。但用 SAX 解析器的时候编码工作会比较困难，而且很难同时访问同一个文档中的多处不同数据。　　看一个PHP中使用SAX解析器的例子。我们使用Google天气API的XML文档。

API地址：http://www.google.com/ig/api?weather=shenzhen

<?php /** * 简单的Google天气SAX解析器 * 解析http://www.google.com/ig/api?weather=shenzhen中将来几天的天所情况 */ class weatherSaxParser { private $_parser; private $_xmlData; /** * 当前的Tag * @var <type> */ private $_tag; private $_weather; /** * 保存天气的数组的key * @var <type> */ private $_key; private $_attributes; /** *需要解析的标签集合 * @var <type> */ private $_parseTags = array('low', 'day_of_week', 'high', 'condition'); public function __construct() { $this->_key = 0; $this->_parser = xml_parser_create(); xml_set_object($this->_parser, $this); xml_set_element_handler($this->_parser, 'tagStart', 'tagEnd'); xml_set_character_data_handler($this->_parser, 'tagContent'); } public function setXmlData($xml) { $this->_xmlData = $xml; } /** * 执行解析操作 */ public function run() { xml_parse($this->_parser, $this->_xmlData); } /** * 标签开始回调函数 * @param <type> $parser * @param <type> $tagName * @param <type> $attributes */ public function tagStart($parser, $tagName, $attributes = NULL ) { $this->_tag = strtolower($tagName); $this->_attributes = $attributes; if ($this->_tag == 'forecast_conditions') { $item = array(); $this->_weather[$this->_key] = $item; $this->_key++; } if ($this->checkTag()) { if (empty($this->_weather[$this->_key - 1][$this->_tag])) { $this->_weather[$this->_key - 1][$this->_tag] = $this->_attributes['DATA']; } } } public function tagEnd($parser, $tagName ) { $this->_tag = NULL; $htis->_attributes = NULL; } public function tagContent($parser, $content ) { if ($this->checkTag()) { $this->_weather[$this->_key - 1][$this->_tag] = $content; } } public function __destruct() { xml_parser_free($this->_parser); } public function checkTag() { return in_array($this->_tag, $this->_parseTags) && $this->_key > 0; } public function getWeather() { return $this->_weather; } } $fp = fopen('weather.xml', 'r'); $saxParser = new weatherSaxParser(); while ($data = fread($fp, 4096)) { $saxParser->setXmlData($data); $saxParser->run(); } print_r($saxParser->getWeather()); unset($saxParser);

以上针对XML文档中’low’, ‘day_of_week’, ‘high’, ‘condition’, ‘forecast_conditions’等标签进行处理。在SAX解析器中，对于文档的遍历也是依赖于文件的行读取操作。

从上面的例子我们也可以看出SAX解析器的一些缺点：

SAX解释器不允许对XML文件随机存取，只能顺序读取

SAX解释器中元素之间的遍历困难，在多个标签间移动比较困难

SAX是解析一个节点后回调一个方法，把该节点相关信息传送个调用者，然后丢弃这些信息，继续解析下一个节点。它不会预存储整个XML文档，也不会在解析后保存任何解析结果。

SAX的修改XML能力差

结论

DOM 采用建立树形结构的方式访问 XML 文档，它体现了预处理的编程优化思想。这对于一次获取多次查询或修改的情况较适用，并且DOM具有良好的接口定义，编程较方便，一般来说，选择DOM会舒服很多。但是SAX也有其存在的理由，SAX 采用的事件模型，它体现了只取所需的优化思想。这对于只处理一次或数据量巨大导致无法预加载时有更好的性能。

DOM与SAX有点类似于PHP中读取文件操作file_get_contents和fopen/fread/feof的组合。 file_get_contents的使用比较方便，并且可以一次性将所有数据取出来，仅以后调用。 fopen/fread/feof组合操作则是打开文件可以一段一段的处理。而在SAX中可能也会调用fopen/fread/feof组合。

正所谓有得有失，重点在一个平衡和取舍，根据实际情况使用合适的技术。

【前言】
不管是桌面软件开发，还是WEB应用，XML无处不在！
然而在平时的工作中，仅仅是使用一些已经封装好的类对XML对于处理，包括生成，解析等。假期有空，于是将PHP中的几种XML解析方法总结如下：

以解析Google API 接口提供的天气情况为例，我们取今天的天气及气温。
API地址：http://www.google.com/ig/api?weather=shenzhen

【XML文件内容】

<?xml version="1.0"?>
<xml_api_reply version="1">
    <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
        <forecast_information>
            <city data="Shenzhen, Guangdong"/>
            <postal_code data="shenzhen"/>
            <latitude_e6 data=""/>
            <longitude_e6 data=""/>
            <forecast_date data="2009-10-05"/>
            <current_date_time data="2009-10-04 05:02:00 +0000"/>
            <unit_system data="US"/>
        </forecast_information>
        <current_conditions>
            <condition data="Sunny"/>
            <temp_f data="88"/>
            <temp_c data="31"/>
            <humidity data="Humidity: 49%"/>
            <icon data="/ig/images/weather/sunny.gif"/>
            <wind_condition data="Wind:  mph"/>
        </current_conditions>
    </weather>
</xml_api_reply>

【使用DomDocument解析】

<?PHP
header("Content-type:text/html; Charset=utf-8");
$url = "http://www.google.com/ig/api?weather=shenzhen";
 
//  加载XML内容
$content = file_get_contents($url);
$content = get_utf8_string($content);
$dom = DOMDocument::loadXML($content);
/*
此处也可使用如下所示的代码，
$dom = new DOMDocument();
$dom->load($url);
 */
 
$elements = $dom->getElementsByTagName("current_conditions");
$element = $elements->item(0);
$condition = get_google_xml_data($element, "condition");
$temp_c = get_google_xml_data($element, "temp_c");
echo '天气：', $condition, '<br />';
echo '温度：', $temp_c, '<br />';
 
function get_utf8_string($content) {    //  将一些字符转化成utf8格式
    $encoding = mb_detect_encoding($content, array('ASCII','UTF-8','GB2312','GBK','BIG5'));
    return  mb_convert_encoding($content, 'utf-8', $encoding);
}
 
function get_google_xml_data($element, $tagname) {
    $tags = $element->getElementsByTagName($tagname);   //  取得所有的$tagname
 
    $tag = $tags->item(0);  //  获取第一个以$tagname命名的标签
    if ($tag->hasAttributes()) {    //  获取data属性
        $attribute = $tag->getAttribute("data");
        return $attribute;
    }else {
        return false;
    }
}
?>

这只是一个简单的示例，仅包括了loadXML, item, getAttribute,getElementsByTagName等方法，还有一些有用的方法，这个依据你的实际需要。

【XMLReader】
当我们要用php解读xml的内容时，有很多物件提供函式，让我们不用一个一个字元去解析，而只要根据标签和属性名称，就能取出文件中的属性与内容了，相较之下方便许多。其中XMLReader循序地浏览过xml档案的节点，可以想像成游标走过整份文件的节点，并抓取需要的内容。

<?PHP
header("Content-type:text/html; Charset=utf-8");
$url = "http://www.google.com/ig/api?weather=shenzhen";
 
//  加载XML内容
$xml = new XMLReader();
$xml->open($url);
 
$condition = '';
$temp_c = '';
while ($xml->read()) {
//      echo $xml->name, "==>", $xml->depth, "<br>";
      if (!empty($condition) && !empty($temp_c)) {
          break;
      }
      if ($xml->name == 'condition' && empty($condition)) {  //  取第一个condition
            $condition = $xml->getAttribute('data');
      }
 
      if ($xml->name == 'temp_c' && empty($temp_c)) {    //  取第一个temp_c
          $temp_c = $xml->getAttribute('data');
      }
 
      $xml->read();
}
 
$xml->close();
echo '天气：', $condition, '<br />';
echo '温度：', $temp_c, '<br />';

我们只是需要取第一个condition和第一个temp_c,于是遍历所有的节点，将遇到的第一个condition和第一个temp_c写入变量，最后输出。

【DOMXPath】
这种方法需要使用DOMDocument对象创建整个文档的结构，

<?PHP
header("Content-type:text/html; Charset=utf-8");
$url = "http://www.google.com/ig/api?weather=shenzhen";
 
//  加载XML内容
$dom = new DOMDocument();
$dom->load($url);
 
$xpath = new DOMXPath($dom);
$element = $xpath->query("/xml_api_reply/weather/current_conditions")->item(0);
$condition = get_google_xml_data($element, "condition");
$temp_c = get_google_xml_data($element, "temp_c");
echo '天气：', $condition, '<br />';
echo '温度：', $temp_c, '<br />';
 
function get_google_xml_data($element, $tagname) {
    $tags = $element->getElementsByTagName($tagname);   //  取得所有的$tagname
 
    $tag = $tags->item(0);  //  获取第一个以$tagname命名的标签
    if ($tag->hasAttributes()) {    //  获取data属性
        $attribute = $tag->getAttribute("data");
        return $attribute;
    }else {
        return false;
    }
}
?>

【xml_parse_into_struct】
说明：int xml_parse_into_struct ( resource parser, string data, array &values [, array &index] )

该函数将 XML 文件解析到两个对应的数组中，index 参数含有指向 values 数组中对应值的指针。最后两个数组参数可由指针传递给函数。
注意: xml_parse_into_struct() 失败返回 0，成功返回 1。这和 FALSE 与 TRUE 不同，使用例如 === 的运算符时要注意。

<?PHP
header("Content-type:text/html; Charset=utf-8");
$url = "http://www.google.com/ig/api?weather=shenzhen";
 
//  加载XML内容
$content = file_get_contents($url);
$p = xml_parser_create();
xml_parse_into_struct($p, $content, $vals, $index);
xml_parser_free($p);
 
echo '天气：', $vals[$index['CONDITION'][0]]['attributes']['DATA'], '<br />';
echo '温度：', $vals[$index['TEMP_C'][0]]['attributes']['DATA'], '<br />';

【Simplexml】
此方法在PHP5中可用
这个在google的官方文档中有相关的例子，如下：

// Charset: utf-8
/**
  * 用php Simplexml 调用google天气预报api,和g官方的例子不一样
  * google 官方php domxml 获取google天气预报的例子
  * http://www.google.com/tools/toolbar/buttons/intl/zh-CN/apis/howto_guide.html
  *
  * @copyright Copyright (c) 2008 <cmpan(at)qq.com>
  * @license New BSD License
  * @version 2008-11-9
  */
 
// 城市，用城市拼音
$city = empty($_GET['city']) ? 'shenzhen' : $_GET['city'];
$content = file_get_contents("http://www.google.com/ig/api?weather=$city&hl=zh-cn");
$content || die("No such city's data");
$content = mb_convert_encoding($content, 'UTF-8', 'GBK');
$xml = simplexml_load_string($content);
 
$date = $xml->weather->forecast_information->forecast_date->attributes();
$html = $date. "<br>\r\n";
 
$current = $xml->weather->current_conditions;
 
$condition = $current->condition->attributes();
$temp_c = $current->temp_c->attributes();
$humidity = $current->humidity->attributes();
$icon = $current->icon->attributes();
$wind = $current->wind_condition->attributes();
 
$condition && $condition = $xml->weather->forecast_conditions->condition->attributes();
$icon && $icon = $xml->weather->forecast_conditions->icon->attributes();
 
$html.= "当前: {$condition}, {$temp_c}°C,<img src='http://www.google.com/ig{$icon}'/> {$humidity} {$wind} <br />\r\n";
 
foreach($xml->weather->forecast_conditions as $forecast) {
    $low = $forecast->low->attributes();
    $high = $forecast->high->attributes();
    $icon = $forecast->icon->attributes();
    $condition = $forecast->condition->attributes();
    $day_of_week = $forecast->day_of_week->attributes();
    $html.= "{$day_of_week} : {$high} / {$low} °C, {$condition} <img src='http://www.google.com/ig{$icon}' /><br />\r\n";
}
 
header('Content-type: text/html; Charset: utf-8');
print $html;
?>

潘锦的空间

技术管理技术架构 PHP 内核扩展项目管理

标签归档：xml解析

XML解析中DOM和SAX的比较和选择

DOM解析器

SAX解析器

结论

参考资料

PHP中的XML解析的5种方法