Mediawiki API: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
= download / API / ask =
= Using python + mwclient =


Script interfaces with the mw api to download files according to ask requests  
<source lang="python">
from mwclient import Site
 
site = Site("en.wikipedia.org", path="/w/")
 
cat = site.pages["Category:Computer scientists"]
for page in cat.members():
print (page.name)
</source>
 
 
= Mediawiki API: semantic search and download =
 
Script wiki-download.py from [https://git.xpub.nl/scan-utils/log.html scan-utils],
interfaces with the mediawiki API in order to download files according to ask requests.


=API what is the API?=
=API what is the API?=
{{:Wiki_publishing}}
More on the basics of MW API in [[Wiki_publishing#API]]
 


=Semantic queries: ASK =
=Semantic queries: ASK =
Line 15: Line 30:
===Selecting pages===
===Selecting pages===
Wiki pages
Wiki pages
* select all pages under the File: namespace (all files) <code>[[File:+]]</code>
* select all pages under the File: namespace (all files) <nowiki>[[File:+]]</nowiki>
* select all pages under the User: namespace (all users) <code>[[User:+]]</code>
* select all pages under the User: namespace (all users) '''<nowiki>[[User:+]]</nowiki>'''


Semantic properties and values
Semantic properties and values
* all pages/items with property Actor <code>[[Actor::+]] </code>
* all pages/items with property Actor '''<nowiki>[[Actor::+]] </nowiki>'''
* all pages/items with property <nowiki>Actor::Eetcafe</nowiki> <code>[[Actor::Eetcafe]]</code>
* all pages/items with property <nowiki>Actor::Eetcafe</nowiki> '''<nowiki>[[Actor::Eetcafe]]</nowiki>'''


Combining more than one query string
Combining more than one query string
* all File: pages/items with property Production_Method::Typewriter <code>[[Files:+]][[Production_Method::Typewriter]]</code>
* all File: pages/items with property Production_Method::Typewriter '''<nowiki>[[Files:+]][[Production_Method::Typewriter]]</nowiki>'''
* all File: pages/items
* all File: pages/items


===Notes on selecting pages===
===Notes on selecting pages===
'''Difference between Wiki pages and Semantic property and values:'''  
'''Difference between Wiki pages and Semantic property and values:'''  
* 1 <code>:</code> wiki pages <code>[[User:Zalán_Szakács]]</code>[[User:Zalán_Szakács]]  
* 1 '''<nowiki>:</nowiki>''' wiki pages '''<nowiki>[[User:Zalán_Szakács]]</nowiki>'''[[User:Zalán_Szakács]]  
* 2 <code>::</code> semantic property values <code>[[Production_Method::Typewriter]]</code>
* 2 '''<nowiki>::</nowiki>''' semantic property values '''<nowiki>[[Production_Method::Typewriter]]</nowiki>'''




Displaying information on a property can be done by include in the  
Displaying information on a property can be done by include in the  
<code>?Property</code> to the ''Additional data to display'' window.
'''<nowiki>?Property</nowiki>''' to the ''Additional data to display'' window.
* <code>?Production Method</code> display the Production_Method values of the page
* '''<nowiki>?Production Method</nowiki>''' display the Production_Method values of the page
* <code>?Actor|?Origin</code> display the Actor and Origin values of the page
* '''<nowiki>?Actor|?Origin</nowiki>''' display the Actor and Origin values of the page


Finding existing properties and their use:
Finding existing properties and their use:
Line 46: Line 61:
The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module<ref>https://www.semantic-mediawiki.org/wiki/Help:API:ask</ref>
The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module<ref>https://www.semantic-mediawiki.org/wiki/Help:API:ask</ref>


Using: <code>/api.php?action=ask&query=</code>
Using: '''<nowiki>/api.php?action=ask&query=</nowiki>'''


https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm
<nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm</nowiki>


https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm
<nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm</nowiki>


https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm
<nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm</nowiki>
   
   
Note that:
Note that:
* we start using the <code>[[ ]]</code>, which we were in the wiki ask interface
* we start using the '''<nowiki>[[ ]]</nowiki>''', which we were in the wiki ask interface
* the <code>+</code> sign is encoded with [https://en.wikipedia.org/wiki/Percent-encoding URL percent encoding], becoming %2B
* the '''<nowiki>+</nowiki>''' sign is encoded with [https://en.wikipedia.org/wiki/Percent-encoding URL percent encoding], becoming %2B
* we are declaring the format <code>format=jsonfm</code>, <code>format=xml</code> are possible
* we are declaring the format '''<nowiki>format=jsonfm</nowiki>''', '''<nowiki>format=xml</nowiki>''' are possible


===Script===
===Script===
Line 64: Line 79:


Usage:
Usage:
<code>python wiki-download.py --help</code> ask for help
'''<nowiki>python wiki-download.py --help</nowiki>''' ask for help
 
'''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::Typewriter]]  </nowiki>'''
 
'''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::+]][[Document_Type::Flyer]]  </nowiki>'''


<code>python wiki-download.py --download imgs --ask [[Production_Method::Typewriter]]  </code>


<code>python wiki-download.py --download imgs --ask [[Production_Method::+]][[Document_Type::Flyer]] </code>
Production method can be Hand-written OR Typewriter: '''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::Hand-written]]OR[[Production_Method::Typewriter]]</nowiki>'''


=References=
=References=

Latest revision as of 18:49, 24 September 2019

Using python + mwclient

from mwclient import Site

site = Site("en.wikipedia.org", path="/w/")

cat = site.pages["Category:Computer scientists"]
for page in cat.members():
	print (page.name)


Mediawiki API: semantic search and download

Script wiki-download.py from scan-utils, interfaces with the mediawiki API in order to download files according to ask requests.

API what is the API?

More on the basics of MW API in Wiki_publishing#API


Semantic queries: ASK

Semantic MediaWiki includes a simple query language for Semantic search, so that users can directly request certain information from the wiki.[1].

Inside a wiki, with SMW extension installed,the page [[Special:Ask]] provides a interface for query the wiki. See Autonomous Archive's Special:Ask

Although there are many parameters possible to include, let's keep it simple and focus on syntax for Selecting pages

Selecting pages

Wiki pages

  • select all pages under the File: namespace (all files) [[File:+]]
  • select all pages under the User: namespace (all users) [[User:+]]

Semantic properties and values

  • all pages/items with property Actor [[Actor::+]]
  • all pages/items with property Actor::Eetcafe [[Actor::Eetcafe]]

Combining more than one query string

  • all File: pages/items with property Production_Method::Typewriter [[Files:+]][[Production_Method::Typewriter]]
  • all File: pages/items

Notes on selecting pages

Difference between Wiki pages and Semantic property and values:

  • 1 : wiki pages [[User:Zalán_Szakács]]User:Zalán_Szakács
  • 2 :: semantic property values [[Production_Method::Typewriter]]


Displaying information on a property can be done by include in the ?Property to the Additional data to display window.

  • ?Production Method display the Production_Method values of the page
  • ?Actor|?Origin display the Actor and Origin values of the page

Finding existing properties and their use:

Ask 2 more interfaces

Api

The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module[2]

Using: /api.php?action=ask&query=

https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm

https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm

https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm

Note that:

  • we start using the [[ ]], which we were in the wiki ask interface
  • the + sign is encoded with URL percent encoding, becoming %2B
  • we are declaring the format format=jsonfm, format=xml are possible

Script

With the wiki-download.py script from https://git.xpub.nl/scan-utils/ we can perform the same semantic queries, and dowload the resulting items, if they happen to be a file

Usage: python wiki-download.py --help ask for help

python wiki-download.py --download imgs --ask [[Production_Method::Typewriter]]

python wiki-download.py --download imgs --ask [[Production_Method::+]][[Document_Type::Flyer]]


Production method can be Hand-written OR Typewriter: python wiki-download.py --download imgs --ask [[Production_Method::Hand-written]]OR[[Production_Method::Typewriter]]

References