Browse Source

Big changes in Pandas and Plotly

pull/144/merge
Jure Šorn 4 months ago
parent
commit
0975955787
4 changed files with 110 additions and 110 deletions
  1. 99
      README.md
  2. 104
      index.html
  3. 1
      parse.js
  4. 16
      pdf/index_for_pdf.html

99
README.md

@ -3155,7 +3155,7 @@ import pandas as pd, matplotlib.pyplot as plt
**Ordered dictionary with a name.**
```python
>>> pd.Series([1, 2], index=['x', 'y'], name='a')
>>> sr = pd.Series([1, 2], index=['x', 'y'], name='a'); sr
x 1
y 2
Name: a, dtype: int64
@ -3203,7 +3203,7 @@ plt.show() # Displays the plot. Also plt.sav
```
```python
>>> sr = pd.Series([2, 3], index=['x', 'y'])
>>> sr = pd.Series([2, 3], index=['x', 'y']); sr
x 2
y 3
```
@ -3234,7 +3234,7 @@ y 3
**Table with labeled rows and columns.**
```python
>>> pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
>>> l = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y']); l
x y
a 1 2
b 3 4
@ -3270,13 +3270,14 @@ b 3 4
<DF> = <DF>.sort_values(col_key/s) # Sorts rows by passed column/s. Also `axis=1`.
```
```python
<DF>.plot.line/area/bar/scatter(x=col_key, …) # `y=col_key/s`. Also hist/box(by=col_key).
plt.show() # Displays the plot. Also plt.savefig(<path>).
```
#### DataFrame — Merge, Join, Concat:
```python
>>> l = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
>>> r = pd.DataFrame([[4, 5], [6, 7]], index=['b', 'c'], columns=['y', 'z'])
>>> r = pd.DataFrame([[4, 5], [6, 7]], index=['b', 'c'], columns=['y', 'z']); r
y z
b 4 5
c 6 7
@ -3323,7 +3324,7 @@ c 6 7
* **All operations operate on columns by default. Pass `'axis=1'` to process the rows instead.**
```python
>>> df = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
>>> df = pd.DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y']); df
x y
a 1 2
b 3 4
@ -3350,15 +3351,11 @@ b 3 4
```
* **Use `'<DF>[col_key_1, col_key_2][row_key]'` to get the fifth result's values.**
#### DataFrame — Plot, Encode, Decode:
```python
<DF>.plot.line/area/bar/scatter(x=col_key, …) # `y=col_key/s`. Also hist/box(by=col_key).
plt.show() # Displays the plot. Also plt.savefig(<path>).
```
#### DataFrame — Encode, Decode:
```python
<DF> = pd.read_json/html('<str/path/url>') # Run `$ pip3 install beautifulsoup4 lxml`.
<DF> = pd.read_csv('<path/url>') # `header/index_col/dtype/parse_dates/…=<obj>`.
<DF> = pd.read_csv('<path/url>') # `header/index_col/dtype/usecols/…=<obj>`.
<DF> = pd.read_pickle/excel('<path/url>') # Use `sheet_name=None` to get all Excel sheets.
<DF> = pd.read_sql('<table/query>', <conn.>) # SQLite3/SQLAlchemy connection (see #SQLite).
```
@ -3369,23 +3366,29 @@ plt.show() # Displays the plot. Also plt.sav
<DF>.to_pickle/excel(<path>) # Run `$ pip3 install "pandas[excel]" odfpy`.
<DF>.to_sql('<table_name>', <connection>) # Also `if_exists='fail/replace/append'`.
```
* **Read\_csv() only parses dates of columns that were specified by 'parse\_dates' argument. It automatically tries to detect the format, but it can be helped with 'date\_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.**
* **If there's a single invalid date then it returns the whole column as a series of strings, unlike `'<Sr> = pd.to_datetime(<Sr>, errors="coerce")'`, which uses pd.NaT.**
* **To get specific attributes from a series of Timestamps use `'<Sr>.dt.year/date/…'`.**
### GroupBy
**Object that groups together rows of a dataframe based on the value of the passed column.**
```python
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], list('abc'), list('xyz'))
>>> df.groupby('z').get_group(6)
>>> gb = df.groupby('z'); gb.apply(print)
x y z
a 1 2 3
x y z
b 4 5 6
c 7 8 6
```
```python
<GB> = <DF>.groupby(column_key/s) # Splits DF into groups based on passed column.
<GB> = <DF>.groupby(col_key/s) # Splits DF into groups based on passed column.
<DF> = <GB>.apply(<func>) # Maps each group. Func can return DF, Sr or el.
<GB> = <GB>[column_key] # Single column GB. All operations return a Sr.
<DF> = <GB>.get_group(<num>) # Selects a group by grouping column's value.
<Sr> = <GB>.size() # A Sr of group sizes. Same keys as get_group().
<GB> = <GB>[col_key] # Single column GB. All operations return a Sr.
```
#### GroupBy — Aggregate, Transform, Map:
@ -3396,37 +3399,20 @@ c 7 8 6
```
```python
>>> gb = df.groupby('z'); gb.apply(print)
x y z
a 1 2 3
x y z
b 4 5 6
c 7 8 6
```
```text
+-----------------+-------------+-------------+-------------+---------------+
| | 'sum' | 'rank' | ['rank'] | {'x': 'rank'} |
+-----------------+-------------+-------------+-------------+---------------+
| gb.agg(…) | x y | | x y | |
| | z | x y | rank rank | x |
| | 3 1 2 | a 1 1 | a 1 1 | a 1 |
| | 6 11 13 | b 1 1 | b 1 1 | b 1 |
| | | c 2 2 | c 2 2 | c 2 |
+-----------------+-------------+-------------+-------------+---------------+
| gb.transform(…) | x y | x y | | |
| | a 1 2 | a 1 1 | | |
| | b 11 13 | b 1 1 | | |
| | c 11 13 | c 2 2 | | |
+-----------------+-------------+-------------+-------------+---------------+
>>> gb.sum()
x y
z
3 1 2
6 11 13
```
* **Result has a named index that creates column `'z'` instead of `'index'` on reset_index().**
### Rolling
**Object for rolling window calculations.**
```python
<RSr/RDF/RGB> = <Sr/DF/GB>.rolling(win_size) # Also: `min_periods=None, center=False`.
<RSr/RDF/RGB> = <RDF/RGB>[column_key/s] # Or: <RDF/RGB>.column_key
<RSr/RDF/RGB> = <RDF/RGB>[col_key/s] # Or: <RDF/RGB>.col_key
<Sr/DF> = <R>.mean/sum/max() # Or: <R>.apply/agg(<agg_func/str>)
```
@ -3435,10 +3421,20 @@ Plotly
------
```python
# $ pip3 install pandas plotly kaleido
import pandas as pd, plotly.express as ex
<Figure> = ex.line(<DF>, x=<col_name>, y=<col_name>) # Or: ex.line(x=<list>, y=<list>)
<Figure>.update_layout(margin=dict(t=0, r=0, b=0, l=0), …) # `paper_bgcolor='rgb(0, 0, 0)'`.
<Figure>.write_html/json/image('<path>') # Also <Figure>.show().
import pandas as pd, plotly.express as px
```
```python
<Fig> = px.line(<DF>, x=col_key, y=col_key) # Or: px.line(x=<list>, y=<list>)
<Fig>.update_layout(margin=dict(t=0, r=0, b=0, l=0)) # Also `paper_bgcolor='rgb(0, 0, 0)'`.
<Fig>.write_html/json/image('<path>') # Also <Fig>.show().
```
```python
<Fig> = px.area/bar/box(<DF>, x=col_key, y=col_key) # Also `color=col_key`.
<Fig> = px.scatter(<DF>, x=col_key, y=col_key) # Also `color/size/symbol=col_key`.
<Fig> = px.scatter_3d(<DF>, x=col_key, y=col_key, …) # `z=col_key`. Also color/size/symbol.
<Fig> = px.histogram(<DF>, x=col_key [, nbins=<int>]) # Number of bins depends on DF size.
```
#### Displays a line chart of total coronavirus deaths per million grouped by continent:
@ -3457,7 +3453,7 @@ df = df.groupby(['Continent_Name', 'date']).sum().reset_index()
df['Total Deaths per Million'] = df.total_deaths * 1e6 / df.population
df = df[df.date > '2020-03-14']
df = df.rename({'date': 'Date', 'Continent_Name': 'Continent'}, axis='columns')
ex.line(df, x='Date', y='Total Deaths per Million', color='Continent').show()
px.line(df, x='Date', y='Total Deaths per Million', color='Continent').show()
```
#### Displays a multi-axis line chart of total coronavirus cases and changes in prices of Bitcoin, Dow Jones and gold:
@ -3470,20 +3466,23 @@ import pandas as pd, plotly.graph_objects as go
def main():
covid, bitcoin, gold, dow = scrape_data()
display_data(wrangle_data(covid, bitcoin, gold, dow))
df = wrangle_data(covid, bitcoin, gold, dow)
display_data(df)
def scrape_data():
def get_covid_cases():
url = 'https://covid.ourworldindata.org/data/owid-covid-data.csv'
df = pd.read_csv(url, usecols=['location', 'date', 'total_cases'])
return df[df.location == 'World'].set_index('date').total_cases
df = df[df.location == 'World']
return df.set_index('date').total_cases
def get_ticker(symbol):
url = (f'https://query1.finance.yahoo.com/v7/finance/download/{symbol}?'
'period1=1579651200&period2=9999999999&interval=1d&events=history')
df = pd.read_csv(url, usecols=['Date', 'Close'])
return df.set_index('Date').Close
out = get_covid_cases(), get_ticker('BTC-USD'), get_ticker('GC=F'), get_ticker('^DJI')
return map(pd.Series.rename, out, ['Total Cases', 'Bitcoin', 'Gold', 'Dow Jones'])
names = ['Total Cases', 'Bitcoin', 'Gold', 'Dow Jones']
return map(pd.Series.rename, out, names)
def wrangle_data(covid, bitcoin, gold, dow):
df = pd.concat([bitcoin, gold, dow], axis=1) # Creates table by joining columns on dates.

104
index.html

@ -54,7 +54,7 @@
<body>
<header>
<aside>October 15, 2024</aside>
<aside>October 17, 2024</aside>
<a href="https://gto76.github.io" rel="author">Jure Šorn</a>
</header>
@ -2571,7 +2571,7 @@ W, H, MAX_S = <span class="hljs-number">50</span>, <span class="hljs-number">50<
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd, matplotlib.pyplot <span class="hljs-keyword">as</span> plt
</code></pre></div>
<div><h3 id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>], name=<span class="hljs-string">'a'</span>)
<div><h3 id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>sr = pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>], name=<span class="hljs-string">'a'</span>); sr
x <span class="hljs-number">1</span>
y <span class="hljs-number">2</span>
Name: a, dtype: int64
@ -2605,7 +2605,7 @@ plt.show() <span class="hljs-comment"># Disp
&lt;Sr&gt; = &lt;Sr&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;Sr&gt;.agg/transform/map(lambda &lt;el&gt;: &lt;el&gt;)</span>
</code></pre></div>
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>sr = pd.Series([<span class="hljs-number">2</span>, <span class="hljs-number">3</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>sr = pd.Series([<span class="hljs-number">2</span>, <span class="hljs-number">3</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>]); sr
x <span class="hljs-number">2</span>
y <span class="hljs-number">3</span>
</code></pre>
@ -2630,7 +2630,7 @@ y <span class="hljs-number">3</span>
<li><strong>Methods ffill(), interpolate(), fillna() and dropna() accept <code class="python hljs"><span class="hljs-string">'inplace=True'</span></code>.</strong></li>
<li><strong>Last result has a hierarchical index. Use <code class="python hljs"><span class="hljs-string">'&lt;Sr&gt;[key_1, key_2]'</span></code> to get its values.</strong></li>
</ul>
<div><h3 id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
<div><h3 id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>l = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>]); l
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
@ -2657,11 +2657,10 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
&lt;DF&gt; = &lt;DF&gt;.sort_index(ascending=<span class="hljs-keyword">True</span>) <span class="hljs-comment"># Sorts rows by row keys. Use `axis=1` for cols.</span>
&lt;DF&gt; = &lt;DF&gt;.sort_values(col_key/s) <span class="hljs-comment"># Sorts rows by passed column/s. Also `axis=1`.</span>
</code></pre>
<div><h4 id="dataframemergejoinconcat">DataFrame — Merge, Join, Concat:</h4><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>l = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>r = pd.DataFrame([[<span class="hljs-number">4</span>, <span class="hljs-number">5</span>], [<span class="hljs-number">6</span>, <span class="hljs-number">7</span>]], index=[<span class="hljs-string">'b'</span>, <span class="hljs-string">'c'</span>], columns=[<span class="hljs-string">'y'</span>, <span class="hljs-string">'z'</span>])
<pre><code class="python language-python hljs">&lt;DF&gt;.plot.line/area/bar/scatter(x=col_key, …) <span class="hljs-comment"># `y=col_key/s`. Also hist/box(by=col_key).</span>
plt.show() <span class="hljs-comment"># Displays the plot. Also plt.savefig(&lt;path&gt;).</span>
</code></pre>
<div><h4 id="dataframemergejoinconcat">DataFrame — Merge, Join, Concat:</h4><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>r = pd.DataFrame([[<span class="hljs-number">4</span>, <span class="hljs-number">5</span>], [<span class="hljs-number">6</span>, <span class="hljs-number">7</span>]], index=[<span class="hljs-string">'b'</span>, <span class="hljs-string">'c'</span>], columns=[<span class="hljs-string">'y'</span>, <span class="hljs-string">'z'</span>]); r
y z
b <span class="hljs-number">4</span> <span class="hljs-number">5</span>
c <span class="hljs-number">6</span> <span class="hljs-number">7</span>
@ -2705,7 +2704,7 @@ c <span class="hljs-number">6</span> <span class="hljs-number">7</span>
<ul>
<li><strong>All operations operate on columns by default. Pass <code class="python hljs"><span class="hljs-string">'axis=1'</span></code> to process the rows instead.</strong></li>
</ul>
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>]); df
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
@ -2730,72 +2729,70 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
<ul>
<li><strong>Use <code class="python hljs"><span class="hljs-string">'&lt;DF&gt;[col_key_1, col_key_2][row_key]'</span></code> to get the fifth result's values.</strong></li>
</ul>
<div><h4 id="dataframeplotencodedecode">DataFrame — Plot, Encode, Decode:</h4><pre><code class="python language-python hljs">&lt;DF&gt;.plot.line/area/bar/scatter(x=col_key, …) <span class="hljs-comment"># `y=col_key/s`. Also hist/box(by=col_key).</span>
plt.show() <span class="hljs-comment"># Displays the plot. Also plt.savefig(&lt;path&gt;).</span>
</code></pre></div>
<pre><code class="python language-python hljs">&lt;DF&gt; = pd.read_json/html(<span class="hljs-string">'&lt;str/path/url&gt;'</span>) <span class="hljs-comment"># Run `$ pip3 install beautifulsoup4 lxml`.</span>
&lt;DF&gt; = pd.read_csv(<span class="hljs-string">'&lt;path/url&gt;'</span>) <span class="hljs-comment"># `header/index_col/dtype/parse_dates/…=&lt;obj&gt;`.</span>
<div><h4 id="dataframeencodedecode">DataFrame — Encode, Decode:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = pd.read_json/html(<span class="hljs-string">'&lt;str/path/url&gt;'</span>) <span class="hljs-comment"># Run `$ pip3 install beautifulsoup4 lxml`.</span>
&lt;DF&gt; = pd.read_csv(<span class="hljs-string">'&lt;path/url&gt;'</span>) <span class="hljs-comment"># `header/index_col/dtype/usecols/…=&lt;obj&gt;`.</span>
&lt;DF&gt; = pd.read_pickle/excel(<span class="hljs-string">'&lt;path/url&gt;'</span>) <span class="hljs-comment"># Use `sheet_name=None` to get all Excel sheets.</span>
&lt;DF&gt; = pd.read_sql(<span class="hljs-string">'&lt;table/query&gt;'</span>, &lt;conn.&gt;) <span class="hljs-comment"># SQLite3/SQLAlchemy connection (see #SQLite).</span>
</code></pre>
</code></pre></div>
<pre><code class="python language-python hljs">&lt;dict&gt; = &lt;DF&gt;.to_dict(<span class="hljs-string">'d/l/s/…'</span>) <span class="hljs-comment"># Returns columns as dicts, lists or series.</span>
&lt;str&gt; = &lt;DF&gt;.to_json/html/csv/latex() <span class="hljs-comment"># Saves output to file if path is passed.</span>
&lt;DF&gt;.to_pickle/excel(&lt;path&gt;) <span class="hljs-comment"># Run `$ pip3 install "pandas[excel]" odfpy`.</span>
&lt;DF&gt;.to_sql(<span class="hljs-string">'&lt;table_name&gt;'</span>, &lt;connection&gt;) <span class="hljs-comment"># Also `if_exists='fail/replace/append'`.</span>
</code></pre>
<ul>
<li><strong>Read_csv() only parses dates of columns that were specified by 'parse_dates' argument. It automatically tries to detect the format, but it can be helped with 'date_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.</strong></li>
<li><strong>If there's a single invalid date then it returns the whole column as a series of strings, unlike <code class="python hljs"><span class="hljs-string">'&lt;Sr&gt; = pd.to_datetime(&lt;Sr&gt;, errors="coerce")'</span></code>, which uses pd.NaT.</strong></li>
<li><strong>To get specific attributes from a series of Timestamps use <code class="python hljs"><span class="hljs-string">'&lt;Sr&gt;.dt.year/date/…'</span></code>.</strong></li>
</ul>
<div><h3 id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of the passed column.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>], [<span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">6</span>]], list(<span class="hljs-string">'abc'</span>), list(<span class="hljs-string">'xyz'</span>))
<span class="hljs-meta">&gt;&gt;&gt; </span>df.groupby(<span class="hljs-string">'z'</span>).get_group(<span class="hljs-number">6</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>gb = df.groupby(<span class="hljs-string">'z'</span>); gb.apply(print)
x y z
a <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>
x y z
b <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span>
c <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">6</span>
</code></pre></div>
c <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">6</span></code></pre></div>
<pre><code class="python language-python hljs">&lt;GB&gt; = &lt;DF&gt;.groupby(column_key/s) <span class="hljs-comment"># Splits DF into groups based on passed column.</span>
<pre><code class="python language-python hljs">&lt;GB&gt; = &lt;DF&gt;.groupby(col_key/s) <span class="hljs-comment"># Splits DF into groups based on passed column.</span>
&lt;DF&gt; = &lt;GB&gt;.apply(&lt;func&gt;) <span class="hljs-comment"># Maps each group. Func can return DF, Sr or el.</span>
&lt;GB&gt; = &lt;GB&gt;[column_key] <span class="hljs-comment"># Single column GB. All operations return a Sr.</span>
&lt;DF&gt; = &lt;GB&gt;.get_group(&lt;num&gt;) <span class="hljs-comment"># Selects a group by grouping column's value.</span>
&lt;Sr&gt; = &lt;GB&gt;.size() <span class="hljs-comment"># A Sr of group sizes. Same keys as get_group().</span>
&lt;GB&gt; = &lt;GB&gt;[col_key] <span class="hljs-comment"># Single column GB. All operations return a Sr.</span>
</code></pre>
<div><h4 id="groupbyaggregatetransformmap">GroupBy — Aggregate, Transform, Map:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = &lt;GB&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;GB&gt;.agg(lambda &lt;Sr&gt;: &lt;el&gt;)</span>
&lt;DF&gt; = &lt;GB&gt;.rank/diff/cumsum/ffill() <span class="hljs-comment"># Or: &lt;GB&gt;.transform(lambda &lt;Sr&gt;: &lt;Sr&gt;)</span>
&lt;DF&gt; = &lt;GB&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;GB&gt;.transform(lambda &lt;Sr&gt;: &lt;Sr&gt;)</span>
</code></pre></div>
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>gb = df.groupby(<span class="hljs-string">'z'</span>); gb.apply(print)
x y z
a <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>
x y z
b <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span>
c <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">6</span></code></pre>
<pre><code class="python hljs">┏━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┓
┃ │ <span class="hljs-string">'sum'</span><span class="hljs-string">'rank'</span> │ [<span class="hljs-string">'rank'</span>] │ {<span class="hljs-string">'x'</span>: <span class="hljs-string">'rank'</span>} ┃
┠─────────────────┼─────────────┼─────────────┼─────────────┼───────────────┨
┃ gb.agg(…) │ x y │ │ x y │ ┃
┃ │ z │ x y │ rank rank │ x ┃
┃ │ <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> │ a <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ a <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ a <span class="hljs-number">1</span>
┃ │ <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> │ b <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ b <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ b <span class="hljs-number">1</span>
┃ │ │ c <span class="hljs-number">2</span> <span class="hljs-number">2</span> │ c <span class="hljs-number">2</span> <span class="hljs-number">2</span> │ c <span class="hljs-number">2</span>
┠─────────────────┼─────────────┼─────────────┼─────────────┼───────────────┨
┃ gb.transform(…) │ x y │ x y │ │ ┃
┃ │ a <span class="hljs-number">1</span> <span class="hljs-number">2</span> │ a <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ │ ┃
┃ │ b <span class="hljs-number">11</span> <span class="hljs-number">13</span> │ b <span class="hljs-number">1</span> <span class="hljs-number">1</span> │ │ ┃
┃ │ c <span class="hljs-number">11</span> <span class="hljs-number">13</span> │ c <span class="hljs-number">2</span> <span class="hljs-number">2</span> │ │ ┃
┗━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┛
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>gb.sum()
x y
z
<span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span>
<span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span>
</code></pre>
<ul>
<li><strong>Result has a named index that creates column <code class="python hljs"><span class="hljs-string">'z'</span></code> instead of <code class="python hljs"><span class="hljs-string">'index'</span></code> on reset_index().</strong></li>
</ul>
<div><h3 id="rolling">Rolling</h3><p><strong>Object for rolling window calculations.</strong></p><pre><code class="python language-python hljs">&lt;RSr/RDF/RGB&gt; = &lt;Sr/DF/GB&gt;.rolling(win_size) <span class="hljs-comment"># Also: `min_periods=None, center=False`.</span>
&lt;RSr/RDF/RGB&gt; = &lt;RDF/RGB&gt;[column_key/s] <span class="hljs-comment"># Or: &lt;RDF/RGB&gt;.column_key</span>
&lt;RSr/RDF/RGB&gt; = &lt;RDF/RGB&gt;[col_key/s] <span class="hljs-comment"># Or: &lt;RDF/RGB&gt;.col_key</span>
&lt;Sr/DF&gt; = &lt;R&gt;.mean/sum/max() <span class="hljs-comment"># Or: &lt;R&gt;.apply/agg(&lt;agg_func/str&gt;)</span>
</code></pre></div>
<div><h2 id="plotly"><a href="#plotly" name="plotly">#</a>Plotly</h2><pre><code class="python language-python hljs"><span class="hljs-comment"># $ pip3 install pandas plotly kaleido</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd, plotly.express <span class="hljs-keyword">as</span> ex
&lt;Figure&gt; = ex.line(&lt;DF&gt;, x=&lt;col_name&gt;, y=&lt;col_name&gt;) <span class="hljs-comment"># Or: ex.line(x=&lt;list&gt;, y=&lt;list&gt;)</span>
&lt;Figure&gt;.update_layout(margin=dict(t=<span class="hljs-number">0</span>, r=<span class="hljs-number">0</span>, b=<span class="hljs-number">0</span>, l=<span class="hljs-number">0</span>), …) <span class="hljs-comment"># `paper_bgcolor='rgb(0, 0, 0)'`.</span>
&lt;Figure&gt;.write_html/json/image(<span class="hljs-string">'&lt;path&gt;'</span>) <span class="hljs-comment"># Also &lt;Figure&gt;.show().</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd, plotly.express <span class="hljs-keyword">as</span> px
</code></pre></div>
<pre><code class="python language-python hljs">&lt;Fig&gt; = px.line(&lt;DF&gt;, x=col_key, y=col_key) <span class="hljs-comment"># Or: px.line(x=&lt;list&gt;, y=&lt;list&gt;)</span>
&lt;Fig&gt;.update_layout(margin=dict(t=<span class="hljs-number">0</span>, r=<span class="hljs-number">0</span>, b=<span class="hljs-number">0</span>, l=<span class="hljs-number">0</span>)) <span class="hljs-comment"># Also `paper_bgcolor='rgb(0, 0, 0)'`.</span>
&lt;Fig&gt;.write_html/json/image(<span class="hljs-string">'&lt;path&gt;'</span>) <span class="hljs-comment"># Also &lt;Fig&gt;.show().</span>
</code></pre>
<pre><code class="python language-python hljs">&lt;Fig&gt; = px.area/bar/box(&lt;DF&gt;, x=col_key, y=col_key) <span class="hljs-comment"># Also `color=col_key`.</span>
&lt;Fig&gt; = px.scatter(&lt;DF&gt;, x=col_key, y=col_key) <span class="hljs-comment"># Also `color/size/symbol=col_key`.</span>
&lt;Fig&gt; = px.scatter_3d(&lt;DF&gt;, x=col_key, y=col_key, …) <span class="hljs-comment"># `z=col_key`. Also color/size/symbol.</span>
&lt;Fig&gt; = px.histogram(&lt;DF&gt;, x=col_key [, nbins=&lt;int&gt;]) <span class="hljs-comment"># Number of bins depends on DF size.</span>
</code></pre>
<div><h4 id="displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">Displays a line chart of total coronavirus deaths per million grouped by continent:</h4><p></p><div id="2a950764-39fc-416d-97fe-0a6226a3095f" class="plotly-graph-div" style="height:312px; width:914px;"></div><pre><code class="python language-python hljs">covid = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/owid/covid-19-data/8dde8ca49b'</span>
<span class="hljs-string">'6e648c17dd420b2726ca0779402651/public/data/owid-covid-data.csv'</span>,
usecols=[<span class="hljs-string">'iso_code'</span>, <span class="hljs-string">'date'</span>, <span class="hljs-string">'total_deaths'</span>, <span class="hljs-string">'population'</span>])
@ -2806,7 +2803,7 @@ df = df.groupby([<span class="hljs-string">'Continent_Name'</span>, <span class=
df[<span class="hljs-string">'Total Deaths per Million'</span>] = df.total_deaths * <span class="hljs-number">1e6</span> / df.population
df = df[df.date &gt; <span class="hljs-string">'2020-03-14'</span>]
df = df.rename({<span class="hljs-string">'date'</span>: <span class="hljs-string">'Date'</span>, <span class="hljs-string">'Continent_Name'</span>: <span class="hljs-string">'Continent'</span>}, axis=<span class="hljs-string">'columns'</span>)
ex.line(df, x=<span class="hljs-string">'Date'</span>, y=<span class="hljs-string">'Total Deaths per Million'</span>, color=<span class="hljs-string">'Continent'</span>).show()
px.line(df, x=<span class="hljs-string">'Date'</span>, y=<span class="hljs-string">'Total Deaths per Million'</span>, color=<span class="hljs-string">'Continent'</span>).show()
</code></pre></div>
@ -2815,20 +2812,23 @@ ex.line(df, x=<span class="hljs-string">'Date'</span>, y=<span class="hljs-strin
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span><span class="hljs-params">()</span>:</span>
covid, bitcoin, gold, dow = scrape_data()
display_data(wrangle_data(covid, bitcoin, gold, dow))
df = wrangle_data(covid, bitcoin, gold, dow)
display_data(df)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">scrape_data</span><span class="hljs-params">()</span>:</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_covid_cases</span><span class="hljs-params">()</span>:</span>
url = <span class="hljs-string">'https://covid.ourworldindata.org/data/owid-covid-data.csv'</span>
df = pd.read_csv(url, usecols=[<span class="hljs-string">'location'</span>, <span class="hljs-string">'date'</span>, <span class="hljs-string">'total_cases'</span>])
<span class="hljs-keyword">return</span> df[df.location == <span class="hljs-string">'World'</span>].set_index(<span class="hljs-string">'date'</span>).total_cases
df = df[df.location == <span class="hljs-string">'World'</span>]
<span class="hljs-keyword">return</span> df.set_index(<span class="hljs-string">'date'</span>).total_cases
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_ticker</span><span class="hljs-params">(symbol)</span>:</span>
url = (<span class="hljs-string">f'https://query1.finance.yahoo.com/v7/finance/download/<span class="hljs-subst">{symbol}</span>?'</span>
<span class="hljs-string">'period1=1579651200&amp;period2=9999999999&amp;interval=1d&amp;events=history'</span>)
df = pd.read_csv(url, usecols=[<span class="hljs-string">'Date'</span>, <span class="hljs-string">'Close'</span>])
<span class="hljs-keyword">return</span> df.set_index(<span class="hljs-string">'Date'</span>).Close
out = get_covid_cases(), get_ticker(<span class="hljs-string">'BTC-USD'</span>), get_ticker(<span class="hljs-string">'GC=F'</span>), get_ticker(<span class="hljs-string">'^DJI'</span>)
<span class="hljs-keyword">return</span> map(pd.Series.rename, out, [<span class="hljs-string">'Total Cases'</span>, <span class="hljs-string">'Bitcoin'</span>, <span class="hljs-string">'Gold'</span>, <span class="hljs-string">'Dow Jones'</span>])
names = [<span class="hljs-string">'Total Cases'</span>, <span class="hljs-string">'Bitcoin'</span>, <span class="hljs-string">'Gold'</span>, <span class="hljs-string">'Dow Jones'</span>]
<span class="hljs-keyword">return</span> map(pd.Series.rename, out, names)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">wrangle_data</span><span class="hljs-params">(covid, bitcoin, gold, dow)</span>:</span>
df = pd.concat([bitcoin, gold, dow], axis=<span class="hljs-number">1</span>) <span class="hljs-comment"># Creates table by joining columns on dates.</span>
@ -2926,7 +2926,7 @@ $ deactivate <span class="hljs-comment"># Deactivates the active
<footer>
<aside>October 15, 2024</aside>
<aside>October 17, 2024</aside>
<a href="https://gto76.github.io" rel="author">Jure Šorn</a>
</footer>

1
parse.js

@ -310,6 +310,7 @@ const MARIO =
' main()\n';
const GROUPBY =
'<span class="hljs-meta">&gt;&gt;&gt; </span>df = pd.DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>], [<span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">6</span>]], list(<span class="hljs-string">\'abc\'</span>), list(<span class="hljs-string">\'xyz\'</span>))\n' +
'<span class="hljs-meta">&gt;&gt;&gt; </span>gb = df.groupby(<span class="hljs-string">\'z\'</span>); gb.apply(print)\n' +
' x y z\n' +
'a <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>\n' +

16
pdf/index_for_pdf.html

@ -30,7 +30,7 @@
<strong>copy function, <a href="#copy">15</a></strong><br>
<strong>coroutine, <a href="#coroutines">33</a></strong><br>
<strong>counter, <a href="#counter">2</a>, <a href="#generator">4</a>, <a href="#nonlocal">12</a>, <a href="#iterator-1">17</a></strong><br>
<strong>csv, <a href="#csv">26</a>, <a href="#printsacsvspreadsheettotheconsole">34</a>, <a href="#dataframeplotencodedecode">46</a>, <a href="#displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">47</a></strong><br>
<strong>csv, <a href="#csv">26</a>, <a href="#printsacsvspreadsheettotheconsole">34</a>, <a href="#dataframeencodedecode">46</a>, <a href="#displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">47</a></strong><br>
<strong>curses module, <a href="#runsaterminalgamewhereyoucontrolanasteriskthatmustavoidnumbers">33</a>, <a href="#consoleapp">34</a></strong><br>
<strong>cython, <a href="#typeannotations">15</a>, <a href="#cython">49</a></strong> </p>
<h3 id="d">D</h3>
@ -43,11 +43,11 @@
<h3 id="e">E</h3>
<p><strong>enum module, <a href="#enum">19</a>-<a href="#inline-1">20</a></strong><br>
<strong>enumerate function, <a href="#enumerate">3</a></strong><br>
<strong>excel, <a href="#dataframeplotencodedecode">46</a></strong><br>
<strong>excel, <a href="#dataframeencodedecode">46</a></strong><br>
<strong>exceptions, <a href="#exceptions">20</a>-<a href="#exceptionobject">21</a>, <a href="#exceptions-1">23</a>, <a href="#logging">32</a></strong><br>
<strong>exit function, <a href="#exit">21</a></strong> </p>
<h3 id="f">F</h3>
<p><strong>files, <a href="#print">22</a>-<a href="#memoryview">29</a>, <a href="#runsabasicfileexplorerintheconsole">34</a>, <a href="#dataframeplotencodedecode">46</a></strong><br>
<p><strong>files, <a href="#print">22</a>-<a href="#memoryview">29</a>, <a href="#runsabasicfileexplorerintheconsole">34</a>, <a href="#dataframeencodedecode">46</a></strong><br>
<strong>filter function, <a href="#mapfilterreduce">11</a></strong><br>
<strong>flask library, <a href="#web">36</a></strong><br>
<strong>floats, <a href="#abstractbaseclasses">4</a>, <a href="#floats">6</a>, <a href="#numbers">7</a></strong><br>
@ -73,7 +73,7 @@
<strong>iterator, <a href="#enumerate">3</a>-<a href="#generator">4</a>, <a href="#comprehensions">11</a>, <a href="#iterator-1">17</a></strong><br>
<strong>itertools module, <a href="#itertools">3</a>, <a href="#combinatorics">8</a></strong> </p>
<h3 id="j">J</h3>
<p><strong>json, <a href="#json">25</a>, <a href="#restrequest">36</a>, <a href="#dataframeplotencodedecode">46</a></strong> </p>
<p><strong>json, <a href="#json">25</a>, <a href="#restrequest">36</a>, <a href="#dataframeencodedecode">46</a></strong> </p>
<h3 id="l">L</h3>
<p><strong>lambda, <a href="#lambda">11</a></strong><br>
<strong>lists, <a href="#list">1</a>-<a href="#list">2</a>, <a href="#abstractbaseclasses">4</a>, <a href="#otheruses">11</a>, <a href="#sequence">18</a>-<a href="#tableofrequiredandautomaticallyavailablespecialmethods">19</a>, <a href="#collectionsandtheirexceptions">21</a></strong><br>
@ -82,7 +82,7 @@
<h3 id="m">M</h3>
<p><strong>main function, <a href="#main">1</a>, <a href="#basicscripttemplate">49</a></strong><br>
<strong>match statement, <a href="#matchstatement">31</a></strong><br>
<strong>matplotlib library, <a href="#plot">34</a>, <a href="#series">44</a>, <a href="#dataframeplotencodedecode">46</a></strong><br>
<strong>matplotlib library, <a href="#plot">34</a>, <a href="#series">44</a>, <a href="#dataframeencodedecode">46</a></strong><br>
<strong>map function, <a href="#mapfilterreduce">11</a>, <a href="#operator">31</a></strong><br>
<strong>math module, <a href="#numbers">7</a></strong><br>
<strong>memoryviews, <a href="#memoryview">29</a></strong><br>
@ -102,7 +102,7 @@
<strong>paths, <a href="#paths">23</a>-<a href="#oscommands">24</a>, <a href="#runsabasicfileexplorerintheconsole">34</a></strong><br>
<strong>pickle module, <a href="#pickle">25</a></strong><br>
<strong>pillow library, <a href="#image">39</a>-<a href="#animation">40</a></strong><br>
<strong>plotting, <a href="#plot">34</a>, <a href="#series">44</a>, <a href="#dataframeplotencodedecode">46</a>, <a href="#plotly">47</a>-<a href="#displaysamultiaxislinechartoftotalcoronaviruscasesandchangesinpricesofbitcoindowjonesandgold">48</a></strong><br>
<strong>plotting, <a href="#plot">34</a>, <a href="#series">44</a>, <a href="#dataframeencodedecode">46</a>, <a href="#plotly">47</a>-<a href="#displaysamultiaxislinechartoftotalcoronaviruscasesandchangesinpricesofbitcoindowjonesandgold">48</a></strong><br>
<strong>print function, <a href="#class">14</a>, <a href="#print">22</a></strong><br>
<strong>profiling, <a href="#profiling">36</a>-<a href="#profilingbyline">37</a></strong><br>
<strong>progress bar, <a href="#progressbar">34</a></strong><br>
@ -119,14 +119,14 @@
<strong>requests library, <a href="#scrapespythonsurlandlogofromitswikipediapage">35</a>, <a href="#startstheappinitsownthreadandqueriesitsrestapi">36</a></strong> </p>
<h3 id="s">S</h3>
<p><strong>scope, <a href="#insidefunctiondefinition">10</a>, <a href="#nonlocal">12</a>, <a href="#complexexample">20</a></strong><br>
<strong>scraping, <a href="#scraping">35</a>, <a href="#basicmariobrothersexample">43</a>, <a href="#dataframeplotencodedecode">46</a>, <a href="#displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">47</a>-<a href="#displaysamultiaxislinechartoftotalcoronaviruscasesandchangesinpricesofbitcoindowjonesandgold">48</a></strong><br>
<strong>scraping, <a href="#scraping">35</a>, <a href="#basicmariobrothersexample">43</a>, <a href="#dataframeencodedecode">46</a>, <a href="#displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">47</a>-<a href="#displaysamultiaxislinechartoftotalcoronaviruscasesandchangesinpricesofbitcoindowjonesandgold">48</a></strong><br>
<strong>sequence, <a href="#abstractbaseclasses">4</a>, <a href="#sequence">18</a>-<a href="#abcsequence">19</a></strong><br>
<strong>sets, <a href="#set">2</a>, <a href="#abstractbaseclasses">4</a>, <a href="#otheruses">11</a>, <a href="#tableofrequiredandautomaticallyavailablespecialmethods">19</a>, <a href="#collectionsandtheirexceptions">21</a></strong><br>
<strong>shell commands, <a href="#shellcommands">25</a></strong><br>
<strong>sleep function, <a href="#progressbar">34</a></strong><br>
<strong>sortable, <a href="#list">1</a>, <a href="#sortable">16</a></strong><br>
<strong>splat operator, <a href="#splatoperator">10</a>-<a href="#otheruses">11</a>, <a href="#readrowsfromcsvfile">26</a></strong><br>
<strong>sql, <a href="#sqlite">27</a>, <a href="#dataframeplotencodedecode">46</a></strong><br>
<strong>sql, <a href="#sqlite">27</a>, <a href="#dataframeencodedecode">46</a></strong><br>
<strong>statistics, <a href="#statistics">7</a>, <a href="#numpy">37</a>-<a href="#indexing">38</a>, <a href="#pandas">44</a>-<a href="#displaysamultiaxislinechartoftotalcoronaviruscasesandchangesinpricesofbitcoindowjonesandgold">48</a></strong><br>
<strong>strings, <a href="#abstractbaseclasses">4</a>-<a href="#comparisonofpresentationtypes">7</a>, <a href="#class">14</a></strong><br>
<strong>struct module, <a href="#struct">28</a>-<a href="#integertypesuseacapitalletterforunsignedtypeminimumandstandardsizesareinbrackets">29</a></strong><br>

Loading…
Cancel
Save