Browse Source

Scraping

pull/41/head
Jure Šorn 5 years ago
parent
commit
bead7cc3b4
2 changed files with 9 additions and 15 deletions
  1. 12
      README.md
  2. 12
      index.html

12
README.md

@ -2300,7 +2300,7 @@ retention=<int>|<datetime.timedelta>|<str>
Scraping
--------
#### Scrapes and prints Python's URL and version number from Wikipedia:
#### Scrapes Python's URL, version number and logo from Wikipedia page:
```python
# $ pip3 install requests beautifulsoup4
import requests
@ -2312,15 +2312,11 @@ table = doc.find('table', class_='infobox vevent')
rows = table.find_all('tr')
link = rows[11].find('a')['href']
ver = rows[6].find('div').text.split()[0]
print(link, ver)
```
#### Downloads Python's logo:
```python
url_img = rows[0].find('img')['src']
image = requests.get(f'https:{url_img}').content
url_i = rows[0].find('img')['src']
image = requests.get(f'https:{url_i}').content
with open('test.png', 'wb') as file:
file.write(image)
print(link, ver)
```

12
index.html

@ -1981,7 +1981,7 @@ logger.&lt;level&gt;(<span class="hljs-string">'A logging message.'</span>)
<li><strong><code class="python hljs"><span class="hljs-string">'&lt;timedelta&gt;'</span></code> - Max age of a file.</strong></li>
<li><strong><code class="python hljs"><span class="hljs-string">'&lt;str&gt;'</span></code> - Max age as a string: <code class="python hljs"><span class="hljs-string">'1 week, 3 days'</span></code>, <code class="python hljs"><span class="hljs-string">'2 months'</span></code>, …</strong></li>
</ul>
<div><h2 id="scraping"><a href="#scraping" name="scraping">#</a>Scraping</h2><div><h4 id="scrapesandprintspythonsurlandversionnumberfromwikipedia">Scrapes and prints Python's URL and version number from Wikipedia:</h4><pre><code class="python language-python hljs"><span class="hljs-comment"># $ pip3 install requests beautifulsoup4</span>
<div><h2 id="scraping"><a href="#scraping" name="scraping">#</a>Scraping</h2><div><h4 id="scrapespythonsurlversionnumberandlogofromwikipediapage">Scrapes Python's URL, version number and logo from Wikipedia page:</h4><pre><code class="python language-python hljs"><span class="hljs-comment"># $ pip3 install requests beautifulsoup4</span>
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup
url = <span class="hljs-string">'https://en.wikipedia.org/wiki/Python_(programming_language)'</span>
@ -1991,16 +1991,14 @@ table = doc.find(<span class="hljs-string">'table'</span>, class_=<span class="h
rows = table.find_all(<span class="hljs-string">'tr'</span>)
link = rows[<span class="hljs-number">11</span>].find(<span class="hljs-string">'a'</span>)[<span class="hljs-string">'href'</span>]
ver = rows[<span class="hljs-number">6</span>].find(<span class="hljs-string">'div'</span>).text.split()[<span class="hljs-number">0</span>]
url_i = rows[<span class="hljs-number">0</span>].find(<span class="hljs-string">'img'</span>)[<span class="hljs-string">'src'</span>]
image = requests.get(<span class="hljs-string">f'https:<span class="hljs-subst">{url_i}</span>'</span>).content
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'test.png'</span>, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> file:
file.write(image)
print(link, ver)
</code></pre></div></div>
<div><h4 id="downloadspythonslogo">Downloads Python's logo:</h4><pre><code class="python language-python hljs">url_img = rows[<span class="hljs-number">0</span>].find(<span class="hljs-string">'img'</span>)[<span class="hljs-string">'src'</span>]
image = requests.get(<span class="hljs-string">f'https:<span class="hljs-subst">{url_img}</span>'</span>).content
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'test.png'</span>, <span class="hljs-string">'wb'</span>) <span class="hljs-keyword">as</span> file:
file.write(image)
</code></pre></div>
<div><h2 id="web"><a href="#web" name="web">#</a>Web</h2><pre><code class="python language-python hljs"><span class="hljs-comment"># $ pip3 install bottle</span>
<span class="hljs-keyword">from</span> bottle <span class="hljs-keyword">import</span> run, route, post, template, request, response
<span class="hljs-keyword">import</span> json

Loading…
Cancel
Save