Chapter 02 大數據資料爬取與分析 Python Python Requests BeautifulSoup Regular Expression Selenium Pandas Python
2.4 Selenium Python 2.4.1 Selenium Selenium Selenium Selenium pip install selenium Chrome WebDriver Google Chrome (Linux, Mac, Windows) Chrome WebDriver https://sites.google.com/a/chromium.org/chromedriver/downloads Windows <chromedriver_win32.zip> <ChromeDrvier.exe> <ChromeDrvier.exe> <C:\ProgramData\ Anaconda3> Google Chrome selenium webdriver.chrome() Google Chrome from selenium import webdriver driver = webdriver.chrome() Selenium Webdriver Selenium Webdriver API 2-17
Python 方法 說明 current_url page_source text size get_window_position() set_window_position(x,y) maximize_window() get_window_size() set_window_size(x,y) click() close() get(url) refresh() back() forward() clear() send_keys() submit() quit() {'width': 250, 'height': 30} url Python Google Chrome Google Chrome get() quit() Google from selenium import webdriver driver = webdriver.chrome() driver.get('http://www.google.com') driver.quit() 2-18
2.4.2 Selenium Webdriver API 屬性或方法 find_element_by_id(id) find_element_by_class_name(name) find_element_by_tag_name("tag name") find_element_by_name(name) find_element_by_link_text(text) find_element_by_partial_link_text("cheese") find_element_by_css_selector(selector) find_element_by_xpath() 說明 id HTML CSS xml xpath node node element s HTML webdriver.chrome() driver <html> <body> <h1>welcome</h1> <form id="loginform"> <p class="content">are you sure you want to do this?</p> <a href="continue.html">continue</a> <a href="cancel.html">cancel</a> <input name="username" type="text" /> <input name="password" type="password" /> <input name="continue" type="submit" value="login" /> <input name="continue" type="button" value="clear" /> </form> </body> <html> 2-19
Python 3.1.4 Matplotlib 柱狀圖 bar.bar(x, y ) ch03\bar1.py 5 plt.bar(listx1, listy1, label=" ") 8 plt.bar(listx2, listy2, color="red", label=" ") 12 plt.title(" ") 13 plt.xlabel(" ") 14 plt.ylabel(" ") 3-8
圓餅圖 pie.pie( [, ]) labels colors explode0 explode=0.1 explode=0.2 labeldistance1.1 1.1 autopct% %%%2.1f%% 2 1 shadowtrue False startangle 90 180 0 270 3-9
4.2 csv csv.csv 4.2.1 csv csv csv csv writer csv writerow <test.csv> ch04\csv_write.py import csv # csv with open('test.csv', 'w', newline='') as csvfile: # csv writer = csv.writer(csvfile) # writer.writerow([' ', ' ', ' ']) # writer.writerow(['chiou', 170, 65]) writer.writerow(['david', 183, 78]) csv newline='' <test.csv>, 4-9
Python dictionary csv csv.dictwriter dictionary csv ch04\csv_write_dict.py import csv with open('test.csv', 'w', newline='') as csvfile: # fieldnames = [' ', ' ', ' '] # dictionary csv writer = csv.dictwriter(csvfile, fieldnames=fieldnames) # writer.writeheader() # writer.writerow({' ': 'chiou', ' ': 17, ' ': 6}) writer.writerow({' ': 'David', ' ': 183, ' ': 78}) 4.2.2 csv csv csv csv reader csv <test.csv> ch04\csv_read.py import csv # csv with open('test.csv', newline='') as csvfile: # csv rows = csv.reader(csvfile) # for row in rows: print(row) 4-10
Chapter 11 實戰 : 鄉鎮市區天氣預報及建立 API MySQL API JSON Python
Python 11.1 368 MySQL Django API JSON Heroku API 48 3 <threeday1.py> MySQL Django 2 JSON API Heroku API 11-2
Python 11.2 11.2.1 Excel http://www.cwb.gov.tw/v7/forecast/town368/3hr/.htm Excel https://www.stat.gov.tw/ct.asp?xitem=14380&ctnode=1519& mp=4 <712693030RPKUP4RX.xlsx> 7851 11-4
Python 11.3 API MySQL API Django JSON Heroku API 11.3.1 CSV https://www.stat.gov.tw/ct.asp?xitem=14380& CtNode=1519&mp=4 (Excel ) Excel <712693030RPKUP4RX.xlsx> Python <712693030RPKUP4RX.xlsx> CSV ch11\district1.py 1 import pandas 2 3 df = pandas.read_excel('712693030rpkup4rx.xlsx') 4 header = df.iloc[2] # 5 df1 = df[3:].copy() # 6 df1 = df1.rename(columns = header) # 7 df2 = df1.drop(columns=[' ', ' ', ' ', ' '], axis=1) # 8 df3 = df2.drop_duplicates() # 9 10 df3.to_csv('district.csv', encoding='big5', index=false) 程式說明 3 Pandas Excel 4 header 5 4 3 6 DataFrame 7 3 8 10 CSV big5 11-10