Help

XML representaion of Wikipedia articles

The key structural elements are:

  • doc("wiki") - document which contains Wikipedia content
  • doc("wiki")/mediawiki/page - correspons to an article in Wikipedia
  • page/title - the title of the article
  • page/revision/text - a container for content of the article
  • page/revision/text/section - a section in the content
  • page/revision/text/section/@title - a title of the section
  • page/revision/text//a - a link from the article to another article specified as 'href' attribute
  • page/revision/text//p - a paragraph inside the content
  • page/revision/text//img - a reference to the image spesified via 'src' attribute
  • page/revision/text/catlinks - categories which the article belongs to
  • page/revision/text/catlinks/catlink/@href - a category which the article belongs to
  • page/revision/text//template - templates which contain various metadata. The type of metadata is defined in the 'head' attribute. In particular, infoboxes are represented as 'template' elements.

Article example in XML

<page xmlns="http://www.mediawiki.org/xml/export-0.3/">
    <title>Igor Kurchatov</title>
    <id>222567</id>
    <revision>
       <id>130994728</id>
       <timestamp>2007-05-15T09:34:39Z</timestamp>
         <contributor>
       <ip>212.192.96.9</ip>
       </contributor>
       <comment>/* References */</comment>
       <text xml:space="preserve">
          <section depth="1" title="Life and career">
          <p>
            <a href="image:kurchatov.jpg">thumb|right|Igor Kurchatov</a> <b>Igor Vasilyevich Kurchatov</b> 
            (<a href="January 12">January 12</a>, <a href="1903">1903</a> &ndash; <a href="February 7">February 7</a>,
            <a href="1960">1960</a>) was a <a href="Soviet Union">Soviet</a>/<a href="Russia">Russian</a> physicist.
            He was the leader of the <a href="Soviet atomic bomb project">Soviet atomic bomb project</a>. 
            Kurchatov was born in <i>Simsky zavod</i>, <a href="Ufa">Ufa</a> <a href="Guberniya">Guberniya</a>  
            (now city of <i>Sim</i>, <a href="Chelyabinsk Oblast">Chelyabinsk Oblast</a>). After completing 
            <a href="Simferopol gymnasium 1"> Simferopol gymnasium 1</a> he studied <a href="physics">physics</a> 
            at Crimea State University and ship building at the <a href="Saint Petersburg Polytechnical University">
            Polytechnical Institute</a> in <a href="Petrograd">Petrograd</a>. In <a href="1925">1925</a> he moved to the 
            <a href="Ioffe Physico-Technical Institute">Physico-Technical Institute</a>, where he worked (under <a href="Abram
            Fedorovich Ioffe">Abram Fedorovich Ioffe</a>) on various problems connected with <a href="radioactivity">
            radioactivity</a>.In <a href="1932">1932</a> he received funding for his own nuclear science research team, 
            which built the Soviet Union's first <a href="cyclotron">cyclotron</a> (<a href="September 21">September 21</a>, 
            <a href="1939">1939</a>).
          </p>
          <p>
            Igor Kurchatov and his apprentice <a href="Georgy Flyorov">Georgy Flyorov</a> discovered the basic ideas of the 
            uranium chain reaction and the nuclear reactor concept in the 1930's. In 1942 Kurchatov declared: "At breaking up
            of kernels in a kilogram of uranium, the energy released must be equal to the explosion of 20,000 tons of trotyl." 
            This announcement was practically verified during the atomic bombing of Hiroshima.</p>
          <p>
          </section>
          <section depth="1" title="External links">
            <links>
              <link href="http://www.kiae.ru/">Kurchatov institute</link>
              <link href="http://www.cultinfo.ru/fulltext/1/001/008/067/772.htm">
                 Biography of Igor Kurchatov (in Russian)
              </link>
            </links>
          </section>
       </text>
       <catlinks>
         <catlink href="Category:Russian physicists"/>
         <catlink href="Category:Soviet physicists"/>
         <catlink href="Category:Nuclear physicists"/>
       </catlinks>
    </revision>
</page>

Sedna Indexes

For efficient query execution there is a number of predefined indexes created in Sedna. Each index stands for some common use case in content processing.

Retrieve article by its name

Definition

declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
CREATE INDEX 'article-by-title' ON doc("wiki")/mediawiki/page BY title AS xs:string

Query example

(: Return article which title is 'Internet' :)
index-scan('article-by-title','Internet','EQ')

Retrieve articles which refers to the article with the specified title (what links here)

Definition

declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
CREATE INDEX 'article-by-link' ON doc("wiki")/mediawiki/page BY .//link/@label AS xs:string

Query example

(: Return the titles of articles which has references to the 'Anarchism' article :) declare default element namespace "http://www.mediawiki.org/xml/export-0.5/"; index-scan('article-by-link','Anarchism','EQ')/title

Retrive articles which belong to the given category

Definition

declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
CREATE INDEX 'article-by-cat' ON doc("wiki")/mediawiki/page BY ./revision/catlinks/catlink/@href AS xs:string

Query example

(: Return titles of the articles in 'Russian mathematicians' category :) declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
declare ordering unordered;
index-scan('article-by-cat','Category:Russian mathematicians','EQ')/title

Full-text search index

Definition

declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
CREATE FULL-TEXT INDEX 'fti' ON doc("wiki")/mediawiki/page TYPE "xml"

Query example

(: Retrive all links from the articles which has 'sedna' in its titles :) declare default element namespace "http://www.mediawiki.org/xml/export-0.5/";
declare ordering unordered;
ftindex-scan('fti','(title contains sedna)')/title//a/@href

Predefined queries

For your convenience we have also prepared a number of predefined queries. You can customize them in any way (click on [edit] to the right of the predefined queries) or write your own query from scratch.