<Sr> = <Sr> > <el/Sr> # Returns a Series of bools.
<Sr> = <Sr> + <el/Sr> # Items with non-matching keys get value NaN.
<S>= <S> > <el/S> # Returns a Series of bools.
<S>= <S> + <el/S> # Items with non-matching keys get value NaN.
```
```python
<Sr> = pd.concat(<coll_of_Sr>) # Concats multiple series into one long Series.
<Sr> = <Sr>.combine_first(<Sr>) # Adds items that are not yet present.
<Sr>.update(<Sr>) # Updates items that are already present.
<S> = pd.concat(<coll_of_S>) # Concats multiple series into one long Series.
<S> = <S>.combine_first(<S>) # Adds items that are not yet present.
<S>.update(<S>) # Updates items that are already present.
```
```python
<Sr>.plot.line/area/bar/pie/hist() # Generates a Matplotlib plot.
<S>.plot.line/area/bar/pie/hist() # Generates a Matplotlib plot.
plt.show() # Displays the plot. Also plt.savefig(<path>).
```
* **Indexing objects can't be tuples because `'obj[x, y]'` is converted to `'obj[(x, y)]'`!**
* **Pandas uses NumPy types like `'np.int64'`. Series is converted to `'float64'` if we assign np.nan to any item. Use `'<S>.astype(<str/type>)'` to get converted Series.**
* **Use `'<DF>[col_key_1, col_key_2][row_key]'` to get the fifth result's values.**
* **All methods operate on columns by default. Pass `'axis=1'` to process the rows instead.**
* **Fifth result's columns are indexed with a multi-index. This means we need a tuple of column keys to specify a single column: `'<DF>.loc[row_k, (col_k_1, col_k_2)]'`.**
#### DataFrame — Encode, Decode:
#### DataFrame — Multi-Index:
```python
<DF> = <DF>.xs(row_key, level=<int>) # Rows with key on passed level of multi-index.
<DF> = <DF>.xs(row_keys, level=<ints>) # Rows that have first key on first level, etc.
<DF> = <DF>.set_index(col_keys) # Combines multiple columns into a multi-index.
<S/DF> = <DF>.stack/unstack(level=-1) # Combines col keys with row keys or vice versa.
<DF>.to_sql('<table_name>', <connection>) # Also `if_exists='fail/replace/append'`.
```
* **Read\_csv() only parses dates of columns that were specified by 'parse\_dates' argument. It automatically tries to detect the format, but it can be helped with 'date\_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.**
* **If there's a single invalid date then it returns the whole column as a series of strings, unlike `'<Sr> = pd.to_datetime(<Sr>, errors="coerce")'`, which uses pd.NaT.**
* **To get specific attributes from a series of Timestamps use `'<Sr>.dt.year/date/…'`.**
* **If there's a single invalid date then it returns the whole column as a series of strings, unlike `'<S> = pd.to_datetime(<S>, errors="coerce")'`, which uses pd.NaT.**
* **To get specific attributes from a series of Timestamps use `'<S>.dt.year/date/…'`.**
### GroupBy
**Object that groups together rows of a dataframe based on the value of the passed column.**
<str> = <pipe>.read(size=<spanclass="hljs-number">-1</span>) <spanclass="hljs-comment"># Reads 'size' chars or until EOF. Also readline/s().</span>
<int> = <pipe>.close() <spanclass="hljs-comment"># Closes the pipe. Returns None on success (returncode 0).</span>
<int> = <pipe>.close() <spanclass="hljs-comment"># Returns None if last command exited with returncode 0.</span>
</code></pre></div>
<div><h4id="sends11tothebasiccalculatorandcapturesitsoutput">Sends '1 + 1' to the basic calculator and captures its output:</h4><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>subprocess.run(<spanclass="hljs-string">'bc'</span>, input=<spanclass="hljs-string">'1 + 1\n'</span>, capture_output=<spanclass="hljs-keyword">True</span>, text=<spanclass="hljs-keyword">True</span>)
<div><h3id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>sr = pd.Series([<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], index=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>], name=<spanclass="hljs-string">'a'</span>); sr
<div><h3id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>s = pd.Series([<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], index=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>], name=<spanclass="hljs-string">'a'</span>); s
<pre><codeclass="python language-python hljs"><Sr> = <Sr>><el/Sr><spanclass="hljs-comment"># Returns a Series of bools.</span>
<Sr> = <Sr> + <el/Sr><spanclass="hljs-comment"># Items with non-matching keys get value NaN.</span>
<pre><codeclass="python language-python hljs"><S> = <S>><el/S><spanclass="hljs-comment"># Returns a Series of bools.</span>
<S> = <S> + <el/S><spanclass="hljs-comment"># Items with non-matching keys get value NaN.</span>
</code></pre>
<pre><codeclass="python language-python hljs"><Sr> = pd.concat(<coll_of_Sr>) <spanclass="hljs-comment"># Concats multiple series into one long Series.</span>
<Sr> = <Sr>.combine_first(<Sr>) <spanclass="hljs-comment"># Adds items that are not yet present.</span>
<Sr>.update(<Sr>) <spanclass="hljs-comment"># Updates items that are already present.</span>
<pre><codeclass="python language-python hljs"><S> = pd.concat(<coll_of_S>)<spanclass="hljs-comment"># Concats multiple series into one long Series.</span>
<S> = <S>.combine_first(<S>)<spanclass="hljs-comment"># Adds items that are not yet present.</span>
<S>.update(<S>)<spanclass="hljs-comment"># Updates items that are already present.</span>
</code></pre>
<pre><codeclass="python language-python hljs"><Sr>.plot.line/area/bar/pie/hist() <spanclass="hljs-comment"># Generates a Matplotlib plot.</span>
<pre><codeclass="python language-python hljs"><S>.plot.line/area/bar/pie/hist()<spanclass="hljs-comment"># Generates a Matplotlib plot.</span>
plt.show() <spanclass="hljs-comment"># Displays the plot. Also plt.savefig(<path>).</span>
<li><strong>Indexing objects can't be tuples because <codeclass="python hljs"><spanclass="hljs-string">'obj[x, y]'</span></code> is converted to <codeclass="python hljs"><spanclass="hljs-string">'obj[(x, y)]'</span></code>!</strong></li>
<li><strong>Pandas uses NumPy types like <codeclass="python hljs"><spanclass="hljs-string">'np.int64'</span></code>. Series is converted to <codeclass="python hljs"><spanclass="hljs-string">'float64'</span></code> if we assign np.nan to any item. Use <codeclass="python hljs"><spanclass="hljs-string">'<S>.astype(<str/type>)'</span></code> to get converted Series.</strong></li>
<li><strong>Indexing objects can't be tuples because <codeclass="python hljs"><spanclass="hljs-string">'obj[x, y]'</span></code> is converted to <codeclass="python hljs"><spanclass="hljs-string">'obj[(x, y)]'</span></code>!</strong></li>
<li><strong>Methods ffill(), interpolate(), fillna() and dropna() accept <codeclass="python hljs"><spanclass="hljs-string">'inplace=True'</span></code>.</strong></li>
<li><strong>Last result has a hierarchical index. Use <codeclass="python hljs"><spanclass="hljs-string">'<Sr>[key_1, key_2]'</span></code> to get its values.</strong></li>
<li><strong>Last result has a multi-index. Use <codeclass="python hljs"><spanclass="hljs-string">'<S>[key_1, key_2]'</span></code> to get its values.</strong></li>
</ul>
<div><h3id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>l = pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], [<spanclass="hljs-number">3</span>, <spanclass="hljs-number">4</span>]], index=[<spanclass="hljs-string">'a'</span>, <spanclass="hljs-string">'b'</span>], columns=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>]); l
x y
@ -2638,25 +2638,29 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
</code></pre></div>
<pre><codeclass="python language-python hljs"><DF>= pd.DataFrame(<list_of_rows>) <spanclass="hljs-comment"># Rows can be either lists, dicts or series.</span>
<DF>= pd.DataFrame(<dict_of_columns>) <spanclass="hljs-comment"># Columns can be either lists, dicts or series.</span>
<pre><codeclass="python language-python hljs"><DF> = pd.DataFrame(<list_of_rows>)<spanclass="hljs-comment"># Rows can be either lists, dicts or series.</span>
<DF> = pd.DataFrame(<dict_of_columns>)<spanclass="hljs-comment"># Columns can be either lists, dicts or series.</span>
<DF> = <DF>[row_bools]<spanclass="hljs-comment"># Keeps rows as specified by bools.</span>
<DF> = <DF>[<DF_of_bools>]<spanclass="hljs-comment"># Assigns NaN to items that are False in bools.</span>
</code></pre>
<pre><codeclass="python language-python hljs"><DF>= <DF>><el/Sr/DF><spanclass="hljs-comment"># Returns DF of bools. Sr is treated as a row.</span>
<DF>= <DF> + <el/Sr/DF><spanclass="hljs-comment"># Items with non-matching keys get value NaN.</span>
<pre><codeclass="python language-python hljs"><DF> = <DF>><el/S/DF><spanclass="hljs-comment"># Returns DF of bools. S is treated as a row.</span>
<DF> = <DF> + <el/S/DF><spanclass="hljs-comment"># Items with non-matching keys get value NaN.</span>
<DF> = <DF>.reset_index(drop=<spanclass="hljs-keyword">False</span>) <spanclass="hljs-comment"># Drops or moves row keys to column named index.</span>
<DF> = <DF>.sort_index(ascending=<spanclass="hljs-keyword">True</span>) <spanclass="hljs-comment"># Sorts rows by row keys. Use `axis=1` for cols.</span>
<DF> = <DF>.sort_values(col_key/s) <spanclass="hljs-comment"># Sorts rows by passed column/s. Also `axis=1`.</span>
<DF> = <DF>.reset_index(drop=<spanclass="hljs-keyword">False</span>) <spanclass="hljs-comment"># Drops or moves row keys to column named index.</span>
<DF> = <DF>.sort_index(ascending=<spanclass="hljs-keyword">True</span>) <spanclass="hljs-comment"># Sorts rows by row keys. Use `axis=1` for cols.</span>
<DF> = <DF>.sort_values(col_key/s) <spanclass="hljs-comment"># Sorts rows by passed column/s. Also `axis=1`.</span>
</code></pre>
<pre><codeclass="python language-python hljs"><DF> = <DF>.head/tail/sample(<int>) <spanclass="hljs-comment"># Returns first, last, or random n elements.</span>
<DF> = <DF>.describe() <spanclass="hljs-comment"># Describes columns. Also shape, info(), corr().</span>
<DF> = <DF>.query(<spanclass="hljs-string">'<query>'</span>) <spanclass="hljs-comment"># Filters rows with e.g. 'col_1 == val_1 and …'.</span>
</code></pre>
<pre><codeclass="python language-python hljs"><DF>.plot.line/area/bar/scatter(x=col_key, …) <spanclass="hljs-comment"># `y=col_key/s`. Also hist/box(by=col_key).</span>
plt.show() <spanclass="hljs-comment"># Displays the plot. Also plt.savefig(<path>).</span>
@ -2684,52 +2688,47 @@ c <span class="hljs-number">6</span> <span class="hljs-number">7</span>
┃ axis=<spanclass="hljs-number">0</span>, │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span> . │ <spanclass="hljs-number">2</span> │ │ Uses <spanclass="hljs-string">'outer'</span> by default. ┃
┃ join=…) │ b <spanclass="hljs-number">3</span><spanclass="hljs-number">4</span> . │ <spanclass="hljs-number">4</span> │ │ A Series is treated as a ┃
┃ │ b . <spanclass="hljs-number">4</span><spanclass="hljs-number">5</span> │ <spanclass="hljs-number">4</span> │ │ column. To add a row use ┃
┃ pd.concat([l, r], │ x y y z │ │ │ Adds columns at the ┃
┃ axis=<spanclass="hljs-number">1</span>, │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span> . . │ x y y z │ │ right end. Uses <spanclass="hljs-string">'outer'</span> ┃
┃ join=…) │ b <spanclass="hljs-number">3</span><spanclass="hljs-number">4</span><spanclass="hljs-number">4</span><spanclass="hljs-number">5</span> │ <spanclass="hljs-number">3</span><spanclass="hljs-number">4</span><spanclass="hljs-number">4</span><spanclass="hljs-number">5</span> │ │ by default. A Series is ┃
┃ │ c . . <spanclass="hljs-number">6</span><spanclass="hljs-number">7</span> │ │ │ treated as a column. ┃
<li><strong>All operations operate on columns by default. Pass <codeclass="python hljs"><spanclass="hljs-string">'axis=1'</span></code> to process the rows instead.</strong></li>
┃ df.transform(…) │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ a <spanclass="hljs-number">1</span> ┃
┃ │ b <spanclass="hljs-number">2</span><spanclass="hljs-number">2</span>│ b<spanclass="hljs-number">2</span><spanclass="hljs-number">2</span> │ b <spanclass="hljs-number">2</span> ┃
┃ l.apply(…) │ │ x y │ ┃
┃ l.agg(…) │ x y │ rank rank │ x ┃
┃ l.transform(…) │ a <spanclass="hljs-number">1.0</span><spanclass="hljs-number">1.0</span> │ a <spanclass="hljs-number">1.0</span><spanclass="hljs-number">1.0</span> │ a <spanclass="hljs-number">1.0</span> ┃
┃ │ b <spanclass="hljs-number">2.0</span><spanclass="hljs-number">2.0</span> │ b <spanclass="hljs-number">2.0</span><spanclass="hljs-number">2.0</span> │ b <spanclass="hljs-number">2.0</span> ┃
<li><strong>Use <codeclass="python hljs"><spanclass="hljs-string">'<DF>[col_key_1, col_key_2][row_key]'</span></code> to get the fifth result's values.</strong></li>
<li><strong>All methods operate on columns by default. Pass <codeclass="python hljs"><spanclass="hljs-string">'axis=1'</span></code> to process the rows instead.</strong></li>
<li><strong>Fifth result's columns are indexed with a multi-index. This means we need a tuple of column keys to specify a single column: <codeclass="python hljs"><spanclass="hljs-string">'<DF>.loc[row_k, (col_k_1, col_k_2)]'</span></code>.</strong></li>
</ul>
<div><h4id="dataframemultiindex">DataFrame — Multi-Index:</h4><pre><codeclass="python language-python hljs"><DF> = <DF>.xs(row_key, level=<int>) <spanclass="hljs-comment"># Rows with key on passed level of multi-index.</span>
<DF> = <DF>.xs(row_keys, level=<ints>) <spanclass="hljs-comment"># Rows that have first key on first level, etc.</span>
<DF> = <DF>.set_index(col_keys) <spanclass="hljs-comment"># Combines multiple columns into a multi-index.</span>
<S/DF> = <DF>.stack/unstack(level=<spanclass="hljs-number">-1</span>) <spanclass="hljs-comment"># Combines col keys with row keys or vice versa.</span>
<DF> = pd.read_pickle/excel(<spanclass="hljs-string">'<path/url>'</span>) <spanclass="hljs-comment"># Use `sheet_name=None` to get all Excel sheets.</span>
@ -2743,41 +2742,37 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
</code></pre>
<ul>
<li><strong>Read_csv() only parses dates of columns that were specified by 'parse_dates' argument. It automatically tries to detect the format, but it can be helped with 'date_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.</strong></li>
<li><strong>If there's a single invalid date then it returns the whole column as a series of strings, unlike <codeclass="python hljs"><spanclass="hljs-string">'<Sr> = pd.to_datetime(<Sr>, errors="coerce")'</span></code>, which uses pd.NaT.</strong></li>
<li><strong>To get specific attributes from a series of Timestamps use <codeclass="python hljs"><spanclass="hljs-string">'<Sr>.dt.year/date/…'</span></code>.</strong></li>
<li><strong>If there's a single invalid date then it returns the whole column as a series of strings, unlike <codeclass="python hljs"><spanclass="hljs-string">'<S> = pd.to_datetime(<S>, errors="coerce")'</span></code>, which uses pd.NaT.</strong></li>
<li><strong>To get specific attributes from a series of Timestamps use <codeclass="python hljs"><spanclass="hljs-string">'<S>.dt.year/date/…'</span></code>.</strong></li>
</ul>
<div><h3id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of the passed column.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>df = pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>, <spanclass="hljs-number">3</span>], [<spanclass="hljs-number">4</span>, <spanclass="hljs-number">5</span>, <spanclass="hljs-number">6</span>], [<spanclass="hljs-number">7</span>, <spanclass="hljs-number">8</span>, <spanclass="hljs-number">6</span>]], list(<spanclass="hljs-string">'abc'</span>), list(<spanclass="hljs-string">'xyz'</span>))
<div><h3id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of the passed column.</strong></p><pre><codeclass="python language-python hljs"><GB> = <DF>.groupby(col_key/s) <spanclass="hljs-comment"># Splits DF into groups based on passed column.</span>
<DF> = <GB>.apply(<func>) <spanclass="hljs-comment"># Maps each group. Func can return DF, S or el.</span>
<DF> = <GB>.get_group(<el>) <spanclass="hljs-comment"># Selects a group by grouping column's value.</span>
<S> = <GB>.size() <spanclass="hljs-comment"># S of group sizes. Same keys as get_group().</span>
<GB> = <GB>[col_key] <spanclass="hljs-comment"># Single column GB. All operations return S.</span>
<div><h4id="dividesrowsintogroupsandsumstheircolumnsresulthasanamedindexthatcreatescolumnzonreset_index">Divides rows into groups and sums their columns. Result has a named index that creates column <codeclass="python hljs"><spanclass="hljs-string">'z'</span></code> on reset_index():</h4><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>df = pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>, <spanclass="hljs-number">3</span>], [<spanclass="hljs-number">4</span>, <spanclass="hljs-number">5</span>, <spanclass="hljs-number">6</span>], [<spanclass="hljs-number">7</span>, <spanclass="hljs-number">8</span>, <spanclass="hljs-number">6</span>]], list(<spanclass="hljs-string">'abc'</span>), list(<spanclass="hljs-string">'xyz'</span>))
a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span><spanclass="hljs-number">3</span>
x y z
b <spanclass="hljs-number">4</span><spanclass="hljs-number">5</span><spanclass="hljs-number">6</span>
c <spanclass="hljs-number">7</span><spanclass="hljs-number">8</span><spanclass="hljs-number">6</span></code></pre></div>
<pre><codeclass="python language-python hljs"><GB> = <DF>.groupby(col_key/s) <spanclass="hljs-comment"># Splits DF into groups based on passed column.</span>
<DF> = <GB>.apply(<func>) <spanclass="hljs-comment"># Maps each group. Func can return DF, Sr or el.</span>
<DF> = <GB>.get_group(<el>) <spanclass="hljs-comment"># Selects a group by grouping column's value.</span>
<Sr> = <GB>.size() <spanclass="hljs-comment"># A Sr of group sizes. Same keys as get_group().</span>
<GB> = <GB>[col_key] <spanclass="hljs-comment"># Single column GB. All operations return a Sr.</span>
<li><strong>Result has a named index that creates column <codeclass="python hljs"><spanclass="hljs-string">'z'</span></code> instead of <codeclass="python hljs"><spanclass="hljs-string">'index'</span></code> on reset_index().</strong></li>
</ul>
<div><h3id="rolling">Rolling</h3><p><strong>Object for rolling window calculations.</strong></p><pre><codeclass="python language-python hljs"><RSr/RDF/RGB> = <Sr/DF/GB>.rolling(win_size) <spanclass="hljs-comment"># Also: `min_periods=None, center=False`.</span>