Mediawiki API: Difference between revisions
Andre Castro (talk | contribs) |
No edit summary |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= | = Using python + mwclient = | ||
Script interfaces with the | <source lang="python"> | ||
from mwclient import Site | |||
site = Site("en.wikipedia.org", path="/w/") | |||
cat = site.pages["Category:Computer scientists"] | |||
for page in cat.members(): | |||
print (page.name) | |||
</source> | |||
= Mediawiki API: semantic search and download = | |||
Script wiki-download.py from [https://git.xpub.nl/scan-utils/log.html scan-utils], | |||
interfaces with the mediawiki API in order to download files according to ask requests. | |||
=API what is the API?= | =API what is the API?= | ||
More on the basics of MW API in [[Wiki_publishing#API]] | |||
=Semantic queries: ASK = | =Semantic queries: ASK = | ||
Line 15: | Line 30: | ||
===Selecting pages=== | ===Selecting pages=== | ||
Wiki pages | Wiki pages | ||
* select all pages under the File: namespace (all files) < | * select all pages under the File: namespace (all files) <nowiki>[[File:+]]</nowiki> | ||
* select all pages under the User: namespace (all users) < | * select all pages under the User: namespace (all users) '''<nowiki>[[User:+]]</nowiki>''' | ||
Semantic properties and values | Semantic properties and values | ||
* all pages/items with property Actor < | * all pages/items with property Actor '''<nowiki>[[Actor::+]] </nowiki>''' | ||
* all pages/items with property <nowiki>Actor::Eetcafe</nowiki> < | * all pages/items with property <nowiki>Actor::Eetcafe</nowiki> '''<nowiki>[[Actor::Eetcafe]]</nowiki>''' | ||
Combining more than one query string | Combining more than one query string | ||
* all File: pages/items with property Production_Method::Typewriter < | * all File: pages/items with property Production_Method::Typewriter '''<nowiki>[[Files:+]][[Production_Method::Typewriter]]</nowiki>''' | ||
* all File: pages/items | * all File: pages/items | ||
===Notes on selecting pages=== | ===Notes on selecting pages=== | ||
'''Difference between Wiki pages and Semantic property and values:''' | '''Difference between Wiki pages and Semantic property and values:''' | ||
* 1 < | * 1 '''<nowiki>:</nowiki>''' wiki pages '''<nowiki>[[User:Zalán_Szakács]]</nowiki>'''[[User:Zalán_Szakács]] | ||
* 2 < | * 2 '''<nowiki>::</nowiki>''' semantic property values '''<nowiki>[[Production_Method::Typewriter]]</nowiki>''' | ||
Displaying information on a property can be done by include in the | Displaying information on a property can be done by include in the | ||
< | '''<nowiki>?Property</nowiki>''' to the ''Additional data to display'' window. | ||
* < | * '''<nowiki>?Production Method</nowiki>''' display the Production_Method values of the page | ||
* < | * '''<nowiki>?Actor|?Origin</nowiki>''' display the Actor and Origin values of the page | ||
Finding existing properties and their use: | Finding existing properties and their use: | ||
Line 46: | Line 61: | ||
The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module<ref>https://www.semantic-mediawiki.org/wiki/Help:API:ask</ref> | The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module<ref>https://www.semantic-mediawiki.org/wiki/Help:API:ask</ref> | ||
Using: < | Using: '''<nowiki>/api.php?action=ask&query=</nowiki>''' | ||
https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm | <nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm</nowiki> | ||
https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm | <nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm</nowiki> | ||
https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm | <nowiki>https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm</nowiki> | ||
Note that: | Note that: | ||
* we start using the < | * we start using the '''<nowiki>[[ ]]</nowiki>''', which we were in the wiki ask interface | ||
* the < | * the '''<nowiki>+</nowiki>''' sign is encoded with [https://en.wikipedia.org/wiki/Percent-encoding URL percent encoding], becoming %2B | ||
* we are declaring the format < | * we are declaring the format '''<nowiki>format=jsonfm</nowiki>''', '''<nowiki>format=xml</nowiki>''' are possible | ||
===Script=== | ===Script=== | ||
Line 64: | Line 79: | ||
Usage: | Usage: | ||
< | '''<nowiki>python wiki-download.py --help</nowiki>''' ask for help | ||
'''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::Typewriter]] </nowiki>''' | |||
'''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::+]][[Document_Type::Flyer]] </nowiki>''' | |||
< | Production method can be Hand-written OR Typewriter: '''<nowiki>python wiki-download.py --download imgs --ask [[Production_Method::Hand-written]]OR[[Production_Method::Typewriter]]</nowiki>''' | ||
=References= | =References= | ||
[[Category:MediaWiki]] |
Latest revision as of 12:13, 20 September 2024
Using python + mwclient
from mwclient import Site
site = Site("en.wikipedia.org", path="/w/")
cat = site.pages["Category:Computer scientists"]
for page in cat.members():
print (page.name)
Mediawiki API: semantic search and download
Script wiki-download.py from scan-utils, interfaces with the mediawiki API in order to download files according to ask requests.
API what is the API?
More on the basics of MW API in Wiki_publishing#API
Semantic queries: ASK
Semantic MediaWiki includes a simple query language for Semantic search, so that users can directly request certain information from the wiki.[1].
Inside a wiki, with SMW extension installed,the page [[Special:Ask]] provides a interface for query the wiki. See Autonomous Archive's Special:Ask
Although there are many parameters possible to include, let's keep it simple and focus on syntax for Selecting pages
Selecting pages
Wiki pages
- select all pages under the File: namespace (all files) [[File:+]]
- select all pages under the User: namespace (all users) [[User:+]]
Semantic properties and values
- all pages/items with property Actor [[Actor::+]]
- all pages/items with property Actor::Eetcafe [[Actor::Eetcafe]]
Combining more than one query string
- all File: pages/items with property Production_Method::Typewriter [[Files:+]][[Production_Method::Typewriter]]
- all File: pages/items
Notes on selecting pages
Difference between Wiki pages and Semantic property and values:
- 1 : wiki pages [[User:Zalán_Szakács]]User:Zalán_Szakács
- 2 :: semantic property values [[Production_Method::Typewriter]]
Displaying information on a property can be done by include in the
?Property to the Additional data to display window.
- ?Production Method display the Production_Method values of the page
- ?Actor|?Origin display the Actor and Origin values of the page
Finding existing properties and their use:
- Visit the Special:Properties page
- Visit the Special:3ASearchByProperty page
Ask 2 more interfaces
Api
The same Ask requests, which we have been making the wiki Ask interface, can be made using the mediawiki API, through the ASK module[2]
Using: /api.php?action=ask&query=
https://aa.xpub.nl/api.php?action=ask&query=[[Production_Method::%2B]]&format=jsonfm
https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]]&format=jsonfm
https://aa.xpub.nl/api.php?action=ask&query=[[Actor::Poortgebouw]][[Production_Method::Typewriter]]&format=jsonfm
Note that:
- we start using the [[ ]], which we were in the wiki ask interface
- the + sign is encoded with URL percent encoding, becoming %2B
- we are declaring the format format=jsonfm, format=xml are possible
Script
With the wiki-download.py script from https://git.xpub.nl/scan-utils/ we can perform the same semantic queries, and dowload the resulting items, if they happen to be a file
Usage: python wiki-download.py --help ask for help
python wiki-download.py --download imgs --ask [[Production_Method::Typewriter]]
python wiki-download.py --download imgs --ask [[Production_Method::+]][[Document_Type::Flyer]]
Production method can be Hand-written OR Typewriter: python wiki-download.py --download imgs --ask [[Production_Method::Hand-written]]OR[[Production_Method::Typewriter]]