-
关注Ta
-
- 注册时间 2009-06-22
- 最后登录 2023-08-28
-
- 发帖2150
- 在线1602小时
- 精华1
- DB3111
- 威望11110
- 保证金0
- 桃子2
- 鲜花6
- 鸡蛋0
-
访问TA的空间加好友用道具
|
历史封号数据excel表: test.xls.zip[点击下载](325 K) 下载次数:17 累计下载获得 DB 34 刀 数据统计(样本量共6132): 封停原因统计:
封停时间统计:
综合统计:
python代码(爬取封号数据,存为excel表,python2.7): - import xlwt, json, urllib2, urllib
- from BeautifulSoup import BeautifulSoup
- def get_data(wb, area, sz, sheet):
- req_url = 'http://service.tiancity.com/gb/Block/Querykart'
- n1 = area
- sz = sz
- req_data = {'n1': n1, 'sz': sz}
- req_data_urlencode = urllib.urlencode(req_data)
- req = urllib2.Request(url = req_url, data = req_data_urlencode)
- res_data = urllib2.urlopen(req)
- json_data = json.loads(res_data.read())
- html = BeautifulSoup(json_data['ReturnObject']['Html'].replace('<tr><tr>','</tr><tr>'))
- time = []
- area = []
- name = []
- reason = []
- statu = []
-
- for tr in html.findAll('tr'):
- td = tr.findAll('td')
- if (td == []):
- break
- time.append(td[0].string)
- area.append(td[1].contents[0])
- name.append(td[2].contents[0])
- reason.append(td[3].contents[0])
- statu.append(td[4].contents[0])
-
- ws = wb.add_sheet(sheet)
- for i in range(len(time)):
- ws.write(i,0,time[i])
- ws.write(i,1,area[i])
- ws.write(i,2,name[i])
- ws.write(i,3,reason[i])
- ws.write(i,4,statu[i])
- if __name__ == '__main__':
- wb = xlwt.Workbook()
- get_data(wb,'SVG013CT1','4414','SVG013CT1')
- get_data(wb,'SVG013CT2','263','SVG013CT2')
- get_data(wb,'SVG013CNC1','1592','SVG013CNC1')
- get_data(wb,'SVG013CNC2','52','SVG013CNC2')
- wb.save('test.xls')
[ 此帖被GisMan丶在2015-10-07 14:56重新编辑 ]
|