Python, selenium e bella zuppa per l’URL

Sto cercando di scrivere uno script usando Selenium per accedere a pastebin e fare una ricerca e stampare nel testo i risultati dell’URL. Ho bisogno dei risultati URL visibili e nient’altro.

pastebin.com/VYQTSbzY

Lo script attuale è:

 import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup browser = webdriver.Firefox() browser.get('http://www.pastebin.com') search = browser.find_element_by_name('q') search.send_keys("test") search.send_keys(Keys.RETURN) soup=BeautifulSoup(browser.page_source) for link in soup.find_all('a'): print link.get('href',None),link.get_text() 

Non hai davvero bisogno di BeautifulSoup . selenium stesso è molto potente nell’elemento di localizzazione:

 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.keys import Keys browser = webdriver.Firefox() browser.get('http://www.pastebin.com') search = browser.find_element_by_name('q') search.send_keys("test") search.send_keys(Keys.RETURN) # wait for results to appear wait = WebDriverWait(browser, 10) results = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.gsc-resultsbox-visible"))) # grab results for link in results.find_elements_by_css_selector("a.gs-title"): print link.get_attribute("href") browser.close() 

stampe:

 http://pastebin.com/VYQTSbzY http://pastebin.com/VYQTSbzY http://pastebin.com/VAAQCjkj ... http://pastebin.com/fVUejyRK http://pastebin.com/fVUejyRK 

Notare l’uso di un’espansione esplicita che aiuta ad attendere la visualizzazione dei risultati della ricerca.