Browse Source

Working on Pandas

pull/57/head
Jure Šorn 4 years ago
parent
commit
1abd12b6e7
2 changed files with 211 additions and 239 deletions
  1. 231
      README.md
  2. 219
      index.html

231
README.md

@ -3111,6 +3111,7 @@ Name: a, dtype: int64
<Sr>.update(<Sr>) # Updates items that are already present.
```
#### Apply, Aggregate, Transform:
```python
<el> = <Sr>.sum/max/mean/idxmax/all() # Or: <Sr>.aggregate(<agg_func>)
<Sr> = <Sr>.diff/cumsum/rank/pct_change() # Or: <Sr>.agg/transform(<trans_func>)
@ -3119,32 +3120,31 @@ Name: a, dtype: int64
* **Also: `'ffill()'` and `'interpolate()'`.**
* **The way `'aggregate()'` and `'transform()'` find out whether a function accepts an element or the whole Series is by passing it a single value at first and if it raises an error, then they pass it the whole Series.**
#### Apply, Aggregate, Transform:
```python
>>> sr = Series([1, 2], index=['x', 'y'], name='a')
>>> sr = Series([1, 2], index=['x', 'y'])
x 1
y 2
Name: a, dtype: int64
dtype: int64
```
```python
+-------------+--------+-----------+---------------+
| | 'sum' | ['sum'] | {'s': 'sum'} |
+-------------+--------+-----------+---------------+
| sr.apply(…) | | | |
| sr.agg(…) | 3 | sum 3 | s 3 |
| | | | |
+-------------+--------+-----------+---------------+
+-------------+---------------+---------------+---------------+
| | 'sum' | ['sum'] | {'s': 'sum'} |
+-------------+---------------+---------------+---------------+
| sr.apply(…) | | | |
| sr.agg(…) | 3 | sum 3 | s 3 |
| | | | |
+-------------+---------------+---------------+---------------+
```
```python
+-------------+--------+-----------+---------------+
| | 'rank' | ['rank'] | {'r': 'rank'} |
+-------------+--------+-----------+---------------+
| sr.apply(…) | | rank | |
| sr.agg(…) | x 1 | x 1 | r x 1 |
| sr.trans(…) | y 2 | y 2 | y 2 |
+-------------+--------+-----------+---------------+
+-------------+---------------+---------------+---------------+
| | 'rank' | ['rank'] | {'r': 'rank'} |
+-------------+---------------+---------------+---------------+
| sr.apply(…) | | rank | |
| sr.agg(…) | x 1 | x 1 | r x 1 |
| sr.trans(…) | y 2 | y 2 | y 2 |
+-------------+---------------+---------------+---------------+
```
### DataFrame
@ -3187,44 +3187,6 @@ b 3 4
<DF> = <DF>.melt(id_vars=column_key/s) # Melts on columns.
```
```python
<Sr> = <DF>.sum/max/mean/idxmax/all() # Or: <DF>.apply/agg/transform(<agg_func>)
<DF> = <DF>.diff/cumsum/rank/pct_change() # Or: <DF>.apply/agg/transform(<trans_func>)
<DF> = <DF>.fillna(<el>) # Or: <DF>.applymap(<map_func>)
```
* **Also: `'ffill()'` and `'interpolate()'`.**
* **All operations operate on columns by default. Use `'axis=1'` parameter to process the rows instead.**
#### Apply, Aggregate, Transform:
```python
>>> df = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
```
```python
+-------------+---------------+---------------+---------------+
| | 'sum' | ['sum'] | {'x': 'sum'} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | | x y | |
| df.agg(…) | x 4 | sum 4 6 | x 4 |
| df.trans(…) | y 6 | | |
+-------------+---------------+---------------+---------------+
```
```python
+-------------+---------------+---------------+---------------+
| | 'rank' | ['rank'] | {'x': 'rank'} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | x y | x y | x |
| df.agg(…) | a 1 1 | rank rank | a 1 |
| df.trans(…) | b 2 2 | a 1 1 | b 2 |
| | | b 2 2 | |
+-------------+---------------+---------------+---------------+
```
* **Transform() doesen't work with `['sum']` and `{'x': 'sum'}`.**
#### Merge, Join, Concat:
```python
>>> l = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
@ -3269,99 +3231,124 @@ c 6 7
┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━┛
```
### GroupBy
**Object that groups together rows of a dataframe based on the value of passed column.**
#### Apply, Aggregate, Transform:
```python
<Sr> = <DF>.sum/max/mean/idxmax/all() # Or: <DF>.apply/agg/transform(<agg_func>)
<DF> = <DF>.diff/cumsum/rank/pct_change() # Or: <DF>.apply/agg/transform(<trans_func>)
<DF> = <DF>.fillna(<el>) # Or: <DF>.applymap(<map_func>)
```
* **Also: `'ffill()'` and `'interpolate()'`.**
* **All operations operate on columns by default. Use `'axis=1'` parameter to process the rows instead.**
```python
>>> df = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], index=list('abc'), columns=list('xyz'))
>>> gb = df.groupby('z')
x y z
3: a 1 2 3
6: b 4 5 6
c 7 8 6
>>> df = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
x y
a 1 2
b 3 4
```
```python
<GB> = <DF>.groupby(column_key/s) # DF is split into groups based on passed column.
<DF> = <GB>.get_group(group_key) # Selects a group by value of grouping column.
<DF> = <GB>.<operation>() # Executes operation on each col of each group.
+-------------+---------------+---------------+---------------+
| | 'sum' | ['sum'] | {'x': 'sum'} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | | x y | |
| df.agg(…) | x 4 | sum 4 6 | x 4 |
| | y 6 | | |
+-------------+---------------+---------------+---------------+
```
* **Result of an operation is a dataframe with index made up of group keys. Use `'<DF>.reset_index()'` to move the index back into it's own column.**
#### Aggregations:
```python
<DF> = <GB>.sum/max/mean/idxmax/all()
<DF> = <GB>.apply/agg/transform(<agg_func>)
+-------------+---------------+---------------+---------------+
| | 'rank' | ['rank'] | {'x': 'rank'} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | x y | x y | x |
| df.agg(…) | a 1 1 | rank rank | a 1 |
| df.trans(…) | b 2 2 | a 1 1 | b 2 |
| | | b 2 2 | |
+-------------+---------------+---------------+---------------+
```
#### Encode:
```python
+-------------+------------+-------------+---------------+
| | 'sum' | ['sum'] | {'x': 'sum'} |
+-------------+------------+-------------+---------------+
| gb.apply(…) | x y z | | |
| | z | | |
| | 3 1 2 3 | | |
| | 6 11 13 12 | | |
+-------------+------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | z | sum sum | z |
| | 3 1 2 | z | 3 1 |
| | 6 11 13 | 3 1 2 | 6 11 |
| | | 6 11 13 | |
+-------------+------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a 1 2 | | |
| | b 11 13 | | |
| | c 11 13 | | |
+-------------+------------+-------------+---------------+
<DF> = pd.read_json/html('<str/path/url>')
<DF> = pd.read_csv/pickle/excel('<path/url>')
<DF> = pd.read_sql('<query>', <connection>)
<DF> = pd.read_clipboard()
```
#### Transformations:
#### Decode:
```python
<DF> = <GB>.diff/cumsum/rank() # …/pct_change/fillna/ffill()
<DF> = <GB>.agg/transform(<trans_func>)
<dict> = <DF>.to_dict(['d/l/s/sp/r/i'])
<str> = <DF>.to_json/html/csv/markdown/latex([<path>])
<DF>.to_pickle/excel(<path>)
<DF>.to_sql('<table_name>', <connection>)
```
### GroupBy
**Object that groups together rows of a dataframe based on the value of passed column.**
```python
+-------------+------------+-------------+---------------+
| | 'rank' | ['rank'] | {'x': 'rank'} |
+-------------+------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | a 1 1 | rank rank | a 1 |
| | b 1 1 | a 1 1 | b 1 |
| | c 2 2 | b 1 1 | c 2 |
| | | c 2 2 | |
+-------------+------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a 1 1 | | |
| | b 1 1 | | |
| | c 1 1 | | |
+-------------+------------+-------------+---------------+
<GB> = <DF>.groupby(column_key/s) # DF is split into groups based on passed column.
<DF> = <GB>.get_group(group_key) # Selects a group by value of grouping column.
```
### Rolling
#### Apply, Aggregate, Transform:
```python
<Rl_S/D/G> = <Sr/DF/GB>.rolling(window_size) # Also: `min_periods=None, center=False`.
<Rl_S/D> = <Rl_D/G>[column_key/s] # Or: <Rl>.column_key
<Sr/DF/DF> = <Rl_S/D/G>.sum/max/mean()
<Sr/DF/DF> = <Rl_S/D/G>.apply(<agg_func>) # Invokes function on every window.
<Sr/DF/DF> = <Rl_S/D/G>.aggregate(<func/str>) # Invokes function on every window.
<DF> = <GB>.sum/max/mean/idxmax/all() # Or: <GB>.apply/agg(<agg_func>)
<DF> = <GB>.diff/cumsum/rank/ffill() # Or: <GB>.aggregate(<trans_func>)
<DF> = <GB>.fillna(<el>) # Or: <GB>.transform(<map_func>)
```
### Encode
```python
<DF> = pd.read_json/html('<str/path/url>')
<DF> = pd.read_csv/pickle/excel('<path/url>')
<DF> = pd.read_sql('<query>', <connection>)
<DF> = pd.read_clipboard()
>>> df = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], index=list('abc'), columns=list('xyz'))
>>> gb = df.groupby('z')
x y z
3: a 1 2 3
6: b 4 5 6
c 7 8 6
```
### Decode
```python
<dict> = <DF>.to_dict(['d/l/s/sp/r/i'])
<str> = <DF>.to_json/html/csv/markdown/latex([<path>])
<DF>.to_pickle/excel(<path>)
<DF>.to_sql('<table_name>', <connection>)
+-------------+-------------+-------------+---------------+
| | 'sum' | ['sum'] | {'x': 'sum'} |
+-------------+-------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | z | sum sum | z |
| | 3 1 2 | z | 3 1 |
| | 6 11 13 | 3 1 2 | 6 11 |
| | | 6 11 13 | |
+-------------+-------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a 1 2 | | |
| | b 11 13 | | |
| | c 11 13 | | |
+-------------+-------------+-------------+---------------+
```
```python
+-------------+-------------+-------------+---------------+
| | 'rank' | ['rank'] | {'x': 'rank'} |
+-------------+-------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | a 1 1 | rank rank | a 1 |
| | b 1 1 | a 1 1 | b 1 |
| | c 2 2 | b 1 1 | c 2 |
| | | c 2 2 | |
+-------------+-------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a 1 1 | | |
| | b 1 1 | | |
| | c 1 1 | | |
+-------------+-------------+-------------+---------------+
```
### Rolling
```python
<Rl_S/D/G> = <Sr/DF/GB>.rolling(window_size) # Also: `min_periods=None, center=False`.
<Rl_S/D> = <Rl_D/G>[column_key/s] # Or: <Rl>.column_key
<Sr/DF/DF> = <Rl_S/D/G>.sum/max/mean()
<Sr/DF/DF> = <Rl_S/D/G>.apply(<agg_func>) # Invokes function on every window.
<Sr/DF/DF> = <Rl_S/D/G>.aggregate(<func/str>) # Invokes function on every window.
```

219
index.html

@ -2639,35 +2639,35 @@ Name: a, dtype: int64
&lt;Sr&gt; = &lt;Sr&gt;.combine_first(&lt;Sr&gt;) <span class="hljs-comment"># Adds items that are not yet present.</span>
&lt;Sr&gt;.update(&lt;Sr&gt;) <span class="hljs-comment"># Updates items that are already present.</span>
</code></pre>
<pre><code class="python language-python hljs">&lt;el&gt; = &lt;Sr&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;Sr&gt;.aggregate(&lt;agg_func&gt;)</span>
<div><h4 id="applyaggregatetransform">Apply, Aggregate, Transform:</h4><pre><code class="python language-python hljs">&lt;el&gt; = &lt;Sr&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;Sr&gt;.aggregate(&lt;agg_func&gt;)</span>
&lt;Sr&gt; = &lt;Sr&gt;.diff/cumsum/rank/pct_change() <span class="hljs-comment"># Or: &lt;Sr&gt;.agg/transform(&lt;trans_func&gt;)</span>
&lt;Sr&gt; = &lt;Sr&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;Sr&gt;.apply/agg/transform/map(&lt;map_func&gt;)</span>
</code></pre>
</code></pre></div>
<ul>
<li><strong>Also: <code class="python hljs"><span class="hljs-string">'ffill()'</span></code> and <code class="python hljs"><span class="hljs-string">'interpolate()'</span></code>.</strong></li>
<li><strong>The way <code class="python hljs"><span class="hljs-string">'aggregate()'</span></code> and <code class="python hljs"><span class="hljs-string">'transform()'</span></code> find out whether a function accepts an element or the whole Series is by passing it a single value at first and if it raises an error, then they pass it the whole Series.</strong></li>
</ul>
<div><h4 id="applyaggregatetransform">Apply, Aggregate, Transform:</h4><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>sr = Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>], name=<span class="hljs-string">'a'</span>)
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>sr = Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], index=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x <span class="hljs-number">1</span>
y <span class="hljs-number">2</span>
Name: a, dtype: int64
</code></pre></div>
<pre><code class="python language-python hljs">+-------------+--------+-----------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'s'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+--------+-----------+---------------+
| sr.apply(…) | | | |
| sr.agg(…) | <span class="hljs-number">3</span> | sum <span class="hljs-number">3</span> | s <span class="hljs-number">3</span> |
| | | | |
+-------------+--------+-----------+---------------+
dtype: int64
</code></pre>
<pre><code class="python language-python hljs">+-------------+--------+-----------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'r'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+--------+-----------+---------------+
| sr.apply(…) | | rank | |
| sr.agg(…) | x <span class="hljs-number">1</span> | x <span class="hljs-number">1</span> | r x <span class="hljs-number">1</span> |
| sr.trans(…) | y <span class="hljs-number">2</span> | y <span class="hljs-number">2</span> | y <span class="hljs-number">2</span> |
+-------------+--------+-----------+---------------+
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'s'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+---------------+---------------+---------------+
| sr.apply(…) | | | |
| sr.agg(…) | <span class="hljs-number">3</span> | sum <span class="hljs-number">3</span> | s <span class="hljs-number">3</span> |
| | | | |
+-------------+---------------+---------------+---------------+
</code></pre>
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'r'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+---------------+---------------+---------------+
| sr.apply(…) | | rank | |
| sr.agg(…) | x <span class="hljs-number">1</span> | x <span class="hljs-number">1</span> | r x <span class="hljs-number">1</span> |
| sr.trans(…) | y <span class="hljs-number">2</span> | y <span class="hljs-number">2</span> | y <span class="hljs-number">2</span> |
+-------------+---------------+---------------+---------------+
</code></pre>
<div><h3 id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x y
@ -2696,40 +2696,6 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
&lt;DF&gt; = &lt;DF&gt;.transpose() <span class="hljs-comment"># Rotates the table.</span>
&lt;DF&gt; = &lt;DF&gt;.melt(id_vars=column_key/s) <span class="hljs-comment"># Melts on columns.</span>
</code></pre>
<pre><code class="python language-python hljs">&lt;Sr&gt; = &lt;DF&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;DF&gt;.apply/agg/transform(&lt;agg_func&gt;)</span>
&lt;DF&gt; = &lt;DF&gt;.diff/cumsum/rank/pct_change() <span class="hljs-comment"># Or: &lt;DF&gt;.apply/agg/transform(&lt;trans_func&gt;)</span>
&lt;DF&gt; = &lt;DF&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;DF&gt;.applymap(&lt;map_func&gt;)</span>
</code></pre>
<ul>
<li><strong>Also: <code class="python hljs"><span class="hljs-string">'ffill()'</span></code> and <code class="python hljs"><span class="hljs-string">'interpolate()'</span></code>.</strong></li>
<li><strong>All operations operate on columns by default. Use <code class="python hljs"><span class="hljs-string">'axis=1'</span></code> parameter to process the rows instead.</strong> </li>
</ul>
<div><h4 id="applyaggregatetransform-1">Apply, Aggregate, Transform:</h4><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
</code></pre></div>
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | | x y | |
| df.agg(…) | x <span class="hljs-number">4</span> | sum <span class="hljs-number">4</span> <span class="hljs-number">6</span> | x <span class="hljs-number">4</span> |
| df.trans(…) | y <span class="hljs-number">6</span> | | |
+-------------+---------------+---------------+---------------+
</code></pre>
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | x y | x y | x |
| df.agg(…) | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | rank rank | a <span class="hljs-number">1</span> |
| df.trans(…) | b <span class="hljs-number">2</span> <span class="hljs-number">2</span> | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | b <span class="hljs-number">2</span> |
| | | b <span class="hljs-number">2</span> <span class="hljs-number">2</span> | |
+-------------+---------------+---------------+---------------+
</code></pre>
<ul>
<li><strong>Transform() doesen't work with <code class="python hljs">[<span class="hljs-string">'sum'</span>]</code> and <code class="python hljs">{<span class="hljs-string">'x'</span>: <span class="hljs-string">'sum'</span>}</code>.</strong></li>
</ul>
<div><h4 id="mergejoinconcat">Merge, Join, Concat:</h4><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>l = DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
@ -2770,84 +2736,103 @@ c <span class="hljs-number">6</span> <span class="hljs-number">7</span>
┃ │ c . <span class="hljs-number">6</span> <span class="hljs-number">7</span> │ │ │ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━┛
</code></pre>
<div><h3 id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of passed column.</strong></p><pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>], [<span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">6</span>]], index=list(<span class="hljs-string">'abc'</span>), columns=list(<span class="hljs-string">'xyz'</span>))
<span class="hljs-meta">&gt;&gt;&gt; </span>gb = df.groupby(<span class="hljs-string">'z'</span>)
x y z
<span class="hljs-number">3</span>: a <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>
<span class="hljs-number">6</span>: b <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span>
c <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">6</span>
<div><h4 id="applyaggregatetransform-1">Apply, Aggregate, Transform:</h4><pre><code class="python language-python hljs">&lt;Sr&gt; = &lt;DF&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;DF&gt;.apply/agg/transform(&lt;agg_func&gt;)</span>
&lt;DF&gt; = &lt;DF&gt;.diff/cumsum/rank/pct_change() <span class="hljs-comment"># Or: &lt;DF&gt;.apply/agg/transform(&lt;trans_func&gt;)</span>
&lt;DF&gt; = &lt;DF&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;DF&gt;.applymap(&lt;map_func&gt;)</span>
</code></pre></div>
<pre><code class="python language-python hljs">&lt;GB&gt; = &lt;DF&gt;.groupby(column_key/s) <span class="hljs-comment"># DF is split into groups based on passed column.</span>
&lt;DF&gt; = &lt;GB&gt;.get_group(group_key) <span class="hljs-comment"># Selects a group by value of grouping column.</span>
&lt;DF&gt; = &lt;GB&gt;.&lt;operation&gt;() <span class="hljs-comment"># Executes operation on each col of each group.</span>
</code></pre>
<ul>
<li><strong>Result of an operation is a dataframe with index made up of group keys. Use <code class="python hljs"><span class="hljs-string">'&lt;DF&gt;.reset_index()'</span></code> to move the index back into it's own column.</strong></li>
<li><strong>Also: <code class="python hljs"><span class="hljs-string">'ffill()'</span></code> and <code class="python hljs"><span class="hljs-string">'interpolate()'</span></code>.</strong></li>
<li><strong>All operations operate on columns by default. Use <code class="python hljs"><span class="hljs-string">'axis=1'</span></code> parameter to process the rows instead.</strong> </li>
</ul>
<div><h4 id="aggregations">Aggregations:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = &lt;GB&gt;.sum/max/mean/idxmax/all()
&lt;DF&gt; = &lt;GB&gt;.apply/agg/transform(&lt;agg_func&gt;)
</code></pre></div>
<pre><code class="python language-python hljs">+-------------+------------+-------------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+------------+-------------+---------------+
| gb.apply(…) | x y z | | |
| | z | | |
| | <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span> | | |
| | <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> <span class="hljs-number">12</span> | | |
+-------------+------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | z | sum sum | z |
| | <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> | z | <span class="hljs-number">3</span> <span class="hljs-number">1</span> |
| | <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> | <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> | <span class="hljs-number">6</span> <span class="hljs-number">11</span> |
| | | <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> | |
+-------------+------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a <span class="hljs-number">1</span> <span class="hljs-number">2</span> | | |
| | b <span class="hljs-number">11</span> <span class="hljs-number">13</span> | | |
| | c <span class="hljs-number">11</span> <span class="hljs-number">13</span> | | |
+-------------+------------+-------------+---------------+
</code></pre>
<div><h4 id="transformations">Transformations:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = &lt;GB&gt;.diff/cumsum/rank() <span class="hljs-comment"># …/pct_change/fillna/ffill()</span>
&lt;DF&gt; = &lt;GB&gt;.agg/transform(&lt;trans_func&gt;)
</code></pre></div>
<pre><code class="python language-python hljs">+-------------+------------+-------------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | rank rank | a <span class="hljs-number">1</span> |
| | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | b <span class="hljs-number">1</span> |
| | c <span class="hljs-number">2</span> <span class="hljs-number">2</span> | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | c <span class="hljs-number">2</span> |
| | | c <span class="hljs-number">2</span> <span class="hljs-number">2</span> | |
+-------------+------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
| | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
| | c <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
+-------------+------------+-------------+---------------+
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>]], index=[<span class="hljs-string">'a'</span>, <span class="hljs-string">'b'</span>], columns=[<span class="hljs-string">'x'</span>, <span class="hljs-string">'y'</span>])
x y
a <span class="hljs-number">1</span> <span class="hljs-number">2</span>
b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
</code></pre>
<div><h3 id="rolling">Rolling</h3><pre><code class="python language-python hljs">&lt;Rl_S/D/G&gt; = &lt;Sr/DF/GB&gt;.rolling(window_size) <span class="hljs-comment"># Also: `min_periods=None, center=False`.</span>
&lt;Rl_S/D&gt; = &lt;Rl_D/G&gt;[column_key/s] <span class="hljs-comment"># Or: &lt;Rl&gt;.column_key</span>
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.sum/max/mean()
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.apply(&lt;agg_func&gt;) <span class="hljs-comment"># Invokes function on every window.</span>
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.aggregate(&lt;func/str&gt;) <span class="hljs-comment"># Invokes function on every window.</span>
</code></pre></div>
<div><h3 id="encode-2">Encode</h3><pre><code class="python language-python hljs">&lt;DF&gt; = pd.read_json/html(<span class="hljs-string">'&lt;str/path/url&gt;'</span>)
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | | x y | |
| df.agg(…) | x <span class="hljs-number">4</span> | sum <span class="hljs-number">4</span> <span class="hljs-number">6</span> | x <span class="hljs-number">4</span> |
| | y <span class="hljs-number">6</span> | | |
+-------------+---------------+---------------+---------------+
</code></pre>
<pre><code class="python language-python hljs">+-------------+---------------+---------------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+---------------+---------------+---------------+
| df.apply(…) | x y | x y | x |
| df.agg(…) | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | rank rank | a <span class="hljs-number">1</span> |
| df.trans(…) | b <span class="hljs-number">2</span> <span class="hljs-number">2</span> | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | b <span class="hljs-number">2</span> |
| | | b <span class="hljs-number">2</span> <span class="hljs-number">2</span> | |
+-------------+---------------+---------------+---------------+
</code></pre>
<div><h4 id="encode-2">Encode:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = pd.read_json/html(<span class="hljs-string">'&lt;str/path/url&gt;'</span>)
&lt;DF&gt; = pd.read_csv/pickle/excel(<span class="hljs-string">'&lt;path/url&gt;'</span>)
&lt;DF&gt; = pd.read_sql(<span class="hljs-string">'&lt;query&gt;'</span>, &lt;connection&gt;)
&lt;DF&gt; = pd.read_clipboard()
</code></pre></div>
<div><h3 id="decode-3">Decode</h3><pre><code class="python language-python hljs">&lt;dict&gt; = &lt;DF&gt;.to_dict([<span class="hljs-string">'d/l/s/sp/r/i'</span>])
<div><h4 id="decode-3">Decode:</h4><pre><code class="python language-python hljs">&lt;dict&gt; = &lt;DF&gt;.to_dict([<span class="hljs-string">'d/l/s/sp/r/i'</span>])
&lt;str&gt; = &lt;DF&gt;.to_json/html/csv/markdown/latex([&lt;path&gt;])
&lt;DF&gt;.to_pickle/excel(&lt;path&gt;)
&lt;DF&gt;.to_sql(<span class="hljs-string">'&lt;table_name&gt;'</span>, &lt;connection&gt;)
</code></pre></div>
<div><h3 id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of passed column.</strong></p><pre><code class="python language-python hljs">&lt;GB&gt; = &lt;DF&gt;.groupby(column_key/s) <span class="hljs-comment"># DF is split into groups based on passed column.</span>
&lt;DF&gt; = &lt;GB&gt;.get_group(group_key) <span class="hljs-comment"># Selects a group by value of grouping column.</span>
</code></pre></div>
<div><h4 id="applyaggregatetransform-2">Apply, Aggregate, Transform:</h4><pre><code class="python language-python hljs">&lt;DF&gt; = &lt;GB&gt;.sum/max/mean/idxmax/all() <span class="hljs-comment"># Or: &lt;GB&gt;.apply/agg(&lt;agg_func&gt;)</span>
&lt;DF&gt; = &lt;GB&gt;.diff/cumsum/rank/ffill() <span class="hljs-comment"># Or: &lt;GB&gt;.aggregate(&lt;trans_func&gt;) </span>
&lt;DF&gt; = &lt;GB&gt;.fillna(&lt;el&gt;) <span class="hljs-comment"># Or: &lt;GB&gt;.transform(&lt;map_func&gt;)</span>
</code></pre></div>
<pre><code class="python language-python hljs"><span class="hljs-meta">&gt;&gt;&gt; </span>df = DataFrame([[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>], [<span class="hljs-number">7</span>, <span class="hljs-number">8</span>, <span class="hljs-number">6</span>]], index=list(<span class="hljs-string">'abc'</span>), columns=list(<span class="hljs-string">'xyz'</span>))
<span class="hljs-meta">&gt;&gt;&gt; </span>gb = df.groupby(<span class="hljs-string">'z'</span>)
x y z
<span class="hljs-number">3</span>: a <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span>
<span class="hljs-number">6</span>: b <span class="hljs-number">4</span> <span class="hljs-number">5</span> <span class="hljs-number">6</span>
c <span class="hljs-number">7</span> <span class="hljs-number">8</span> <span class="hljs-number">6</span>
</code></pre>
<pre><code class="python language-python hljs">+-------------+-------------+-------------+---------------+
| | <span class="hljs-string">'sum'</span> | [<span class="hljs-string">'sum'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'sum'</span>} |
+-------------+-------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | z | sum sum | z |
| | <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> | z | <span class="hljs-number">3</span> <span class="hljs-number">1</span> |
| | <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> | <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">2</span> | <span class="hljs-number">6</span> <span class="hljs-number">11</span> |
| | | <span class="hljs-number">6</span> <span class="hljs-number">11</span> <span class="hljs-number">13</span> | |
+-------------+-------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a <span class="hljs-number">1</span> <span class="hljs-number">2</span> | | |
| | b <span class="hljs-number">11</span> <span class="hljs-number">13</span> | | |
| | c <span class="hljs-number">11</span> <span class="hljs-number">13</span> | | |
+-------------+-------------+-------------+---------------+
</code></pre>
<pre><code class="python language-python hljs">+-------------+-------------+-------------+---------------+
| | <span class="hljs-string">'rank'</span> | [<span class="hljs-string">'rank'</span>] | {<span class="hljs-string">'x'</span>: <span class="hljs-string">'rank'</span>} |
+-------------+-------------+-------------+---------------+
| gb.agg(…) | x y | x y | x |
| | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | rank rank | a <span class="hljs-number">1</span> |
| | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | b <span class="hljs-number">1</span> |
| | c <span class="hljs-number">2</span> <span class="hljs-number">2</span> | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | c <span class="hljs-number">2</span> |
| | | c <span class="hljs-number">2</span> <span class="hljs-number">2</span> | |
+-------------+-------------+-------------+---------------+
| gb.trans(…) | x y | | |
| | a <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
| | b <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
| | c <span class="hljs-number">1</span> <span class="hljs-number">1</span> | | |
+-------------+-------------+-------------+---------------+
</code></pre>
<div><h3 id="rolling">Rolling</h3><pre><code class="python language-python hljs">&lt;Rl_S/D/G&gt; = &lt;Sr/DF/GB&gt;.rolling(window_size) <span class="hljs-comment"># Also: `min_periods=None, center=False`.</span>
&lt;Rl_S/D&gt; = &lt;Rl_D/G&gt;[column_key/s] <span class="hljs-comment"># Or: &lt;Rl&gt;.column_key</span>
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.sum/max/mean()
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.apply(&lt;agg_func&gt;) <span class="hljs-comment"># Invokes function on every window.</span>
&lt;Sr/DF/DF&gt; = &lt;Rl_S/D/G&gt;.aggregate(&lt;func/str&gt;) <span class="hljs-comment"># Invokes function on every window.</span>
</code></pre></div>
<div><h2 id="plotly"><a href="#plotly" name="plotly">#</a>Plotly</h2><div><h3 id="top10countriesbypercentageofpopulationwithconfirmedcovid19infection">Top 10 Countries by Percentage of Population With Confirmed COVID-19 Infection</h3><pre><code class="text language-text">|
|
|

Loading…
Cancel
Save