Python

編輯

點樣用 pywikipedia

編輯

meta:python wikipedia bot, meta:wikipedia.py

  1. http://python.org/download down撈 python
  2. 安裝 python
  3. 裝pywikipediabot
    1. http://sourceforge.net/projects/pywikipediabot/ down撈 m:python wikipedia bot。或
    2. 經Subversion下載最新版 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/

試頑: 撈廣東話維基百科頭版: 整個咁嘅快佬 C:\python\loadGuongDongWhaTouBan.py

import wikipedia
enWikisrcSite = wikipedia.getSite('zh-yue', 'wikipedia') # loading a defined project's page
page = wikipedia.Page(enWikisrcSite, '%E9%A0%AD%E7%89%88')
text = page.get() # Taking the text of the page
print text
wikipedia.stopme()

然後入 C:\python\python loadGuongDongWhaTouBan.py (無腦嘅指醒:如果你用 windows, 記住啲快佬同 python.exe 要放響同一個 directory。)


要玩真嘅,就要開快佬 user-config.py

設定

編輯

user-config.py

family='wikipedia'
mylang='zh-yue'
usernames['wikipedia']['zh-yue'] = 'R. Hillgentleman'
usernames['wikiversity']['beta'] = 'R. Hillgentleman'
console_encoding = 'utf-8'

登入

編輯

C:\python\python login.py 就會幫你登入 user-config.py 度設定嘅網 (family='wikipedia' , mylang='zh-yue')。 行一次就夠。

例牌程式

編輯

pywikiBoilerplate.py

import wikipedia
#set up
site = wikipedia.getSite()
page = wikipedia.Page(site, u"pageName")
 
#to get a page:
text = page.get(get_redirect = True)
 
#to update a page:
page.put(u"newText", u"Edit comment")
 
#CategoryPageGenerator
import wikipedia
import pagegenerators
import catlib

site = wikipedia.getSite()
cat = catlib.Category(site,'Category:%E9%A1%9E') # %E9%A1%9E={{subst:urlencode:類}}
gen = pagegenerators.CategorizedPageGenerator(cat)
for page in gen:
  #Do something with the page object, for example:
  text = page.get()

編輯

編輯

wikipedia:sandbox加句嘢:

import wikipedia
# Define the main function
def main():
    site = wikipedia.getSite()
    pagename = 'wikipedia:Sandbox'
    page = wikipedia.Page(site, pagename)
    wikipedia.output(u"Loading %s..." % pagename) # Please, see the "u" before the text
    try:
        text = page.get(force = False, get_redirect=False, throttle = True, sysop = False, 
                                             nofollow_redirects=False, change_edit_time = True) # text = page.get() <-- is the same
    except wikipedia.NoPage: # First except, prevent empty pages
        text = ''
    except wikipedia.IsRedirectPage: # second except, prevent redirect
        wikipedia.output(u'%s is a redirect!' % pagename)
        exit()# wikipedia.stopme() is in the finally, we don't need to use it twice, exit() will only close the script
    except wikipedia.Error: # third exception, take the problem and print
        wikipedia.output(u"Some Error, skipping..")
        exit()
    newtext = text + '\nHello, World!'
    page.put(newtext, comment='Bot: Test', watchArticle = None, minorEdit = True)  # page.put(newtext, 'Bot: Test') <-- is the same
 
if __name__ == '__main__':
    try:
        main()
    finally:
        wikipedia.stopme()

category:類入面有邊幾頁?

編輯

查下category:類入面有邊幾頁,然後氹落wikipedia:sandbox度。[1]

getCategoryWriteYue.py

import wikipedia
import pagegenerators
import catlib

site = wikipedia.getSite()
cat = catlib.Category(site,'Category:%E9%A1%9E')   # %E9%A1%9E={{subst:urlencode:類}}
gen = pagegenerators.CategorizedPageGenerator(cat)
list=''
for page in gen:
  list =list+ '\n' + page.title()

#write the list at the end of [[wikipedia:sandbox]]
sandbox = wikipedia.Page(site, u"wikipedia:sandbox")
sandboxtext = sandbox.get(get_redirect = True)

sandboxtext = sandboxtext + '\n' + list

sandbox.put(sandboxtext, comment='Mechanical test: get pages in [[Category:%E9%A1%9E]] and dump them on [[wikipedia:sandbox]]', watchArticle = None, minorEdit = True)

揾下一類之內有邊幾類

編輯

C:\python>python category.py -family:wikipedia -lang:zh-yue listify -from:%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91 (如果加埋" -recurse:True ",咁啲子類嘅子嘅子類。。。都包埋)[2]

查下一類之內有邊幾頁

編輯
C:\python>python pagegenerators.py -family:wikipedia -lang:zh-yue subcat:%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91 

搬類

編輯
C:\python>python category.py move -from:A類  -to:B類  # 中文字要用啲%%%嘅unicode表示

示範:[3]

C:\python>python replace.py -linkes:pagename -regex "Template:(.*?)" "Template:\1"

或者簡單嘅見字代字:

 C:\python>python replace.py -cat:A類 "xxx舊字" "yyy新字"
#getting the history of [[wikipedia:sandbox]]
import wikipedia #importing the wikipedia.py module
pg = wikipedia.Page(wikipedia.getSite(), 'wikipedia:sandbox') #creating the Page object corresponding to [[wikipedia:sandbox]]
x=pg.getVersionHistoryTable() #calling the function getVersionHistoryTable() in the wikipedia.py module
print x

查下Main Page嘅沿革,然後氹落wikipedia:sandbox

編輯

getSandboxHistoryWrite.py

#get the history of a [[Main Page]] and dump it on [[wikipedia:sandbox]]

import wikipedia #importing the wikipedia.py module
site=wikipedia.getSite()  #setting the variable site = (wikipedia, zh-yue)
pg = wikipedia.Page(site, 'Main Page') #creating the Page object corresponding to [[wikipedia:sandbox]]
x=pg.getVersionHistoryTable() #calling the function getVersionHistoryTable() in the wikipedia.py module

#writing the result on [[wikipedia:sandbox]]
sand = wikipedia.Page(site, 'wikipedia:sandbox') #create the object corresponding to sandbox
y = sand.get()   #read sandbox
y = y+x           #append x, the page history of [[Main Page]]
sand.put( y , 'Robot testing: dumping the history of [[Main Page]] on [[wikipedia:sandbox]]')  #write

示範:[4]


查下template:copyvio嘅編者,然後氹落wikipedia:sandbox

編輯

getPageContributingUsersWrite.py

#get the history of [[template:copyvio]] and dump it on [[wikipedia:sandbox]]

import wikipedia #importing the wikipedia.py module
site=wikipedia.getSite()  # setting the site, from configuration
pg = wikipedia.Page(site, 'template:copyvio') #creating the Page object in question
x=pg.contributingUsers()  #getting the contributing users


sand = wikipedia.Page(site, 'wikipedia:sandbox')
y = sand.get() #getting the current sandbox
for i in x:
  y = y+i   #appending the crap
sand.put( y , 'Robot testing: getting the contributing users of the page [[template:copyvio]] and dump the result on [[wikipedia:sandbox]]')

示範: [5]

#to get a list of newpages from wikipedia.newpages()
import wikipedia
site=wikipedia.getSite()

newPageList = site.newpages()
for i in newPageList:
 page, timestamp, length, empty, username, comment = i
 t = page.title()
 wikipedia.output('User:'+username+'.....Title:'+t)

示範:

C:\Python25\pywikipedia>python newPages.py
Checked for running processes. 2 processes currently running, including the curr
ent process.
User:Contributions/219.79.136.159.....Title:闆呰檸鐭ヨ瓨+
User:Contributions/219.79.136.159.....Title:XD
User:Contributions/219.79.136.159.....Title:Orz
User:Contributions/219.79.136.159.....Title:鍥?
User:Contributions/59.112.213.23.....Title:娓呴洸绉戞妧澶у
User:Happynewyear.....Title:1250骞?
User:Happynewyear.....Title:1251骞?
User:Happynewyear.....Title:1252骞?
User:Happynewyear.....Title:1253骞?
User:Happynewyear.....Title:1254骞?

C:\Python25\pywikipedia>