<DF> = pd.read_pickle/excel('<path/url>') # Use `sheet_name=None` to get all Excel sheets.
<DF> = pd.read_sql('<table/query>', <conn.>) # SQLite3/SQLAlchemy connection (see #SQLite).
```
@ -3369,23 +3366,29 @@ plt.show() # Displays the plot. Also plt.sav
<DF>.to_pickle/excel(<path>) # Run `$ pip3 install "pandas[excel]" odfpy`.
<DF>.to_sql('<table_name>', <connection>) # Also `if_exists='fail/replace/append'`.
```
* **Read\_csv() only parses dates of columns that were specified by 'parse\_dates' argument. It automatically tries to detect the format, but it can be helped with 'date\_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.**
* **If there's a single invalid date then it returns the whole column as a series of strings, unlike `'<Sr> = pd.to_datetime(<Sr>, errors="coerce")'`, which uses pd.NaT.**
* **To get specific attributes from a series of Timestamps use `'<Sr>.dt.year/date/…'`.**
### GroupBy
**Object that groups together rows of a dataframe based on the value of the passed column.**
<div><h3id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>pd.Series([<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], index=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>], name=<spanclass="hljs-string">'a'</span>)
<div><h3id="series">Series</h3><p><strong>Ordered dictionary with a name.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>sr = pd.Series([<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], index=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>], name=<spanclass="hljs-string">'a'</span>); sr
<pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>sr = pd.Series([<spanclass="hljs-number">2</span>, <spanclass="hljs-number">3</span>], index=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>]); sr
x <spanclass="hljs-number">2</span>
y <spanclass="hljs-number">3</span>
</code></pre>
@ -2630,7 +2630,7 @@ y <span class="hljs-number">3</span>
<li><strong>Methods ffill(), interpolate(), fillna() and dropna() accept <codeclass="python hljs"><spanclass="hljs-string">'inplace=True'</span></code>.</strong></li>
<li><strong>Last result has a hierarchical index. Use <codeclass="python hljs"><spanclass="hljs-string">'<Sr>[key_1, key_2]'</span></code> to get its values.</strong></li>
</ul>
<div><h3id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], [<spanclass="hljs-number">3</span>, <spanclass="hljs-number">4</span>]], index=[<spanclass="hljs-string">'a'</span>, <spanclass="hljs-string">'b'</span>], columns=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>])
<div><h3id="dataframe">DataFrame</h3><p><strong>Table with labeled rows and columns.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>l = pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>], [<spanclass="hljs-number">3</span>, <spanclass="hljs-number">4</span>]], index=[<spanclass="hljs-string">'a'</span>, <spanclass="hljs-string">'b'</span>], columns=[<spanclass="hljs-string">'x'</span>, <spanclass="hljs-string">'y'</span>]); l
x y
a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span>
b <spanclass="hljs-number">3</span><spanclass="hljs-number">4</span>
@ -2657,11 +2657,10 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
<DF> = <DF>.sort_index(ascending=<spanclass="hljs-keyword">True</span>) <spanclass="hljs-comment"># Sorts rows by row keys. Use `axis=1` for cols.</span>
<DF> = <DF>.sort_values(col_key/s) <spanclass="hljs-comment"># Sorts rows by passed column/s. Also `axis=1`.</span>
b <spanclass="hljs-number">4</span><spanclass="hljs-number">5</span>
c <spanclass="hljs-number">6</span><spanclass="hljs-number">7</span>
@ -2705,7 +2704,7 @@ c <span class="hljs-number">6</span> <span class="hljs-number">7</span>
<ul>
<li><strong>All operations operate on columns by default. Pass <codeclass="python hljs"><spanclass="hljs-string">'axis=1'</span></code> to process the rows instead.</strong></li>
a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span>
b <spanclass="hljs-number">3</span><spanclass="hljs-number">4</span>
@ -2730,72 +2729,70 @@ b <span class="hljs-number">3</span> <span class="hljs-number">4</span>
<ul>
<li><strong>Use <codeclass="python hljs"><spanclass="hljs-string">'<DF>[col_key_1, col_key_2][row_key]'</span></code> to get the fifth result's values.</strong></li>
<DF> = pd.read_pickle/excel(<spanclass="hljs-string">'<path/url>'</span>) <spanclass="hljs-comment"># Use `sheet_name=None` to get all Excel sheets.</span>
<DF> = pd.read_sql(<spanclass="hljs-string">'<table/query>'</span>, <conn.>) <spanclass="hljs-comment"># SQLite3/SQLAlchemy connection (see #SQLite).</span>
</code></pre>
</code></pre></div>
<pre><codeclass="python language-python hljs"><dict> = <DF>.to_dict(<spanclass="hljs-string">'d/l/s/…'</span>) <spanclass="hljs-comment"># Returns columns as dicts, lists or series.</span>
<str> = <DF>.to_json/html/csv/latex() <spanclass="hljs-comment"># Saves output to file if path is passed.</span>
<DF>.to_pickle/excel(<path>) <spanclass="hljs-comment"># Run `$ pip3 install "pandas[excel]" odfpy`.</span>
<DF>.to_sql(<spanclass="hljs-string">'<table_name>'</span>, <connection>) <spanclass="hljs-comment"># Also `if_exists='fail/replace/append'`.</span>
</code></pre>
<ul>
<li><strong>Read_csv() only parses dates of columns that were specified by 'parse_dates' argument. It automatically tries to detect the format, but it can be helped with 'date_format' or 'datefirst' arguments. Both dates and datetimes get stored as pd.Timestamp objects.</strong></li>
<li><strong>If there's a single invalid date then it returns the whole column as a series of strings, unlike <codeclass="python hljs"><spanclass="hljs-string">'<Sr> = pd.to_datetime(<Sr>, errors="coerce")'</span></code>, which uses pd.NaT.</strong></li>
<li><strong>To get specific attributes from a series of Timestamps use <codeclass="python hljs"><spanclass="hljs-string">'<Sr>.dt.year/date/…'</span></code>.</strong></li>
</ul>
<div><h3id="groupby">GroupBy</h3><p><strong>Object that groups together rows of a dataframe based on the value of the passed column.</strong></p><pre><codeclass="python language-python hljs"><spanclass="hljs-meta">>>></span>df = pd.DataFrame([[<spanclass="hljs-number">1</span>, <spanclass="hljs-number">2</span>, <spanclass="hljs-number">3</span>], [<spanclass="hljs-number">4</span>, <spanclass="hljs-number">5</span>, <spanclass="hljs-number">6</span>], [<spanclass="hljs-number">7</span>, <spanclass="hljs-number">8</span>, <spanclass="hljs-number">6</span>]], list(<spanclass="hljs-string">'abc'</span>), list(<spanclass="hljs-string">'xyz'</span>))
a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span><spanclass="hljs-number">3</span>
x y z
b <spanclass="hljs-number">4</span><spanclass="hljs-number">5</span><spanclass="hljs-number">6</span>
c <spanclass="hljs-number">7</span><spanclass="hljs-number">8</span><spanclass="hljs-number">6</span>
</code></pre></div>
c <spanclass="hljs-number">7</span><spanclass="hljs-number">8</span><spanclass="hljs-number">6</span></code></pre></div>
<pre><codeclass="python language-python hljs"><GB> = <DF>.groupby(column_key/s) <spanclass="hljs-comment"># Splits DF into groups based on passed column.</span>
<pre><codeclass="python language-python hljs"><GB> = <DF>.groupby(col_key/s)<spanclass="hljs-comment"># Splits DF into groups based on passed column.</span>
<DF> = <GB>.apply(<func>) <spanclass="hljs-comment"># Maps each group. Func can return DF, Sr or el.</span>
<GB> = <GB>[column_key] <spanclass="hljs-comment"># Single column GB. All operations return a Sr.</span>
<DF> = <GB>.get_group(<num>)<spanclass="hljs-comment"># Selects a group by grouping column's value.</span>
<Sr> = <GB>.size() <spanclass="hljs-comment"># A Sr of group sizes. Same keys as get_group().</span>
<GB> = <GB>[col_key] <spanclass="hljs-comment"># Single column GB. All operations return a Sr.</span>
┃ │ <spanclass="hljs-number">3</span><spanclass="hljs-number">1</span><spanclass="hljs-number">2</span> │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ a <spanclass="hljs-number">1</span> ┃
┃ │ <spanclass="hljs-number">6</span><spanclass="hljs-number">11</span><spanclass="hljs-number">13</span> │ b <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ b <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ b <spanclass="hljs-number">1</span> ┃
┃ │ │ c <spanclass="hljs-number">2</span><spanclass="hljs-number">2</span> │ c <spanclass="hljs-number">2</span><spanclass="hljs-number">2</span> │ c <spanclass="hljs-number">2</span> ┃
┃ │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">2</span> │ a <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ │ ┃
┃ │ b <spanclass="hljs-number">11</span><spanclass="hljs-number">13</span> │ b <spanclass="hljs-number">1</span><spanclass="hljs-number">1</span> │ │ ┃
┃ │ c <spanclass="hljs-number">11</span><spanclass="hljs-number">13</span> │ c <spanclass="hljs-number">2</span><spanclass="hljs-number">2</span> │ │ ┃
<li><strong>Result has a named index that creates column <codeclass="python hljs"><spanclass="hljs-string">'z'</span></code> instead of <codeclass="python hljs"><spanclass="hljs-string">'index'</span></code> on reset_index().</strong></li>
</ul>
<div><h3id="rolling">Rolling</h3><p><strong>Object for rolling window calculations.</strong></p><pre><codeclass="python language-python hljs"><RSr/RDF/RGB> = <Sr/DF/GB>.rolling(win_size) <spanclass="hljs-comment"># Also: `min_periods=None, center=False`.</span>
<Fig>.update_layout(margin=dict(t=<spanclass="hljs-number">0</span>, r=<spanclass="hljs-number">0</span>, b=<spanclass="hljs-number">0</span>, l=<spanclass="hljs-number">0</span>)) <spanclass="hljs-comment"># Also `paper_bgcolor='rgb(0, 0, 0)'`.</span>
<Fig>.write_html/json/image(<spanclass="hljs-string">'<path>'</span>) <spanclass="hljs-comment"># Also <Fig>.show().</span>
</code></pre>
<pre><codeclass="python language-python hljs"><Fig> = px.area/bar/box(<DF>, x=col_key, y=col_key) <spanclass="hljs-comment"># Also `color=col_key`.</span>
<Fig> = px.scatter(<DF>, x=col_key, y=col_key) <spanclass="hljs-comment"># Also `color/size/symbol=col_key`.</span>
<Fig> = px.scatter_3d(<DF>, x=col_key, y=col_key, …) <spanclass="hljs-comment"># `z=col_key`. Also color/size/symbol.</span>
<Fig> = px.histogram(<DF>, x=col_key [, nbins=<int>]) <spanclass="hljs-comment"># Number of bins depends on DF size.</span>
</code></pre>
<div><h4id="displaysalinechartoftotalcoronavirusdeathspermilliongroupedbycontinent">Displays a line chart of total coronavirus deaths per million grouped by continent:</h4><p></p><divid="2a950764-39fc-416d-97fe-0a6226a3095f"class="plotly-graph-div"style="height:312px; width:914px;"></div><pre><codeclass="python language-python hljs">covid = pd.read_csv(<spanclass="hljs-string">'https://raw.githubusercontent.com/owid/covid-19-data/8dde8ca49b'</span>