{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Jupyter Notebooks\n",
    "\n",
    "In this tutorial, we discuss some basic tasks to get your Jupyter notebooks up and running on your computer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "source": [
    "## Spellchecker: LanguageTool Browser Extension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last thing you want on your Jupyter notebooks is typos. Jupyter notebooks have some spellchecker extensions, but it gets problematic installing them on different software environments. For spellchecking, we actually recommend a free browser-based extension called **LanguageTool** [here](https://languagetool.org/). This extension not only checks for typos in your notebooks, but also anything you type within your browser as an extra bonus. Sweet!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to check for Python and module versions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within your shell, you can issue the command:\n",
    "```HTML\n",
    "> python --version\n",
    "```\n",
    "Sometimes python's executable command name will be \"python3\", so you might need:\n",
    "```HTML\n",
    "> python3 --version\n",
    "```\n",
    "Within the Jupyter notebooks environment, to issue a system command, you will need to an exclamation mark (\"!\") in front as shown below, which will have the same effect:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python 3.11.9\n"
     ]
    }
   ],
   "source": [
    "!python3 --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To check for version number of a Python module, you can view its `__version__` attribute as below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.0.0'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to read CSV files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assuming that your file is under a directory called `data`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_radius</th>\n",
       "      <th>mean_texture</th>\n",
       "      <th>mean_perimeter</th>\n",
       "      <th>mean_area</th>\n",
       "      <th>mean_smoothness</th>\n",
       "      <th>mean_compactness</th>\n",
       "      <th>mean_concavity</th>\n",
       "      <th>mean_concave_points</th>\n",
       "      <th>mean_symmetry</th>\n",
       "      <th>mean_fractal_dimension</th>\n",
       "      <th>...</th>\n",
       "      <th>worst_texture</th>\n",
       "      <th>worst_perimeter</th>\n",
       "      <th>worst_area</th>\n",
       "      <th>worst_smoothness</th>\n",
       "      <th>worst_compactness</th>\n",
       "      <th>worst_concavity</th>\n",
       "      <th>worst_concave_points</th>\n",
       "      <th>worst_symmetry</th>\n",
       "      <th>worst_fractal_dimension</th>\n",
       "      <th>diagnosis</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17.99</td>\n",
       "      <td>10.38</td>\n",
       "      <td>122.80</td>\n",
       "      <td>1001.0</td>\n",
       "      <td>0.11840</td>\n",
       "      <td>0.27760</td>\n",
       "      <td>0.3001</td>\n",
       "      <td>0.14710</td>\n",
       "      <td>0.2419</td>\n",
       "      <td>0.07871</td>\n",
       "      <td>...</td>\n",
       "      <td>17.33</td>\n",
       "      <td>184.60</td>\n",
       "      <td>2019.0</td>\n",
       "      <td>0.1622</td>\n",
       "      <td>0.6656</td>\n",
       "      <td>0.7119</td>\n",
       "      <td>0.2654</td>\n",
       "      <td>0.4601</td>\n",
       "      <td>0.11890</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20.57</td>\n",
       "      <td>17.77</td>\n",
       "      <td>132.90</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>0.08474</td>\n",
       "      <td>0.07864</td>\n",
       "      <td>0.0869</td>\n",
       "      <td>0.07017</td>\n",
       "      <td>0.1812</td>\n",
       "      <td>0.05667</td>\n",
       "      <td>...</td>\n",
       "      <td>23.41</td>\n",
       "      <td>158.80</td>\n",
       "      <td>1956.0</td>\n",
       "      <td>0.1238</td>\n",
       "      <td>0.1866</td>\n",
       "      <td>0.2416</td>\n",
       "      <td>0.1860</td>\n",
       "      <td>0.2750</td>\n",
       "      <td>0.08902</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>19.69</td>\n",
       "      <td>21.25</td>\n",
       "      <td>130.00</td>\n",
       "      <td>1203.0</td>\n",
       "      <td>0.10960</td>\n",
       "      <td>0.15990</td>\n",
       "      <td>0.1974</td>\n",
       "      <td>0.12790</td>\n",
       "      <td>0.2069</td>\n",
       "      <td>0.05999</td>\n",
       "      <td>...</td>\n",
       "      <td>25.53</td>\n",
       "      <td>152.50</td>\n",
       "      <td>1709.0</td>\n",
       "      <td>0.1444</td>\n",
       "      <td>0.4245</td>\n",
       "      <td>0.4504</td>\n",
       "      <td>0.2430</td>\n",
       "      <td>0.3613</td>\n",
       "      <td>0.08758</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11.42</td>\n",
       "      <td>20.38</td>\n",
       "      <td>77.58</td>\n",
       "      <td>386.1</td>\n",
       "      <td>0.14250</td>\n",
       "      <td>0.28390</td>\n",
       "      <td>0.2414</td>\n",
       "      <td>0.10520</td>\n",
       "      <td>0.2597</td>\n",
       "      <td>0.09744</td>\n",
       "      <td>...</td>\n",
       "      <td>26.50</td>\n",
       "      <td>98.87</td>\n",
       "      <td>567.7</td>\n",
       "      <td>0.2098</td>\n",
       "      <td>0.8663</td>\n",
       "      <td>0.6869</td>\n",
       "      <td>0.2575</td>\n",
       "      <td>0.6638</td>\n",
       "      <td>0.17300</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20.29</td>\n",
       "      <td>14.34</td>\n",
       "      <td>135.10</td>\n",
       "      <td>1297.0</td>\n",
       "      <td>0.10030</td>\n",
       "      <td>0.13280</td>\n",
       "      <td>0.1980</td>\n",
       "      <td>0.10430</td>\n",
       "      <td>0.1809</td>\n",
       "      <td>0.05883</td>\n",
       "      <td>...</td>\n",
       "      <td>16.67</td>\n",
       "      <td>152.20</td>\n",
       "      <td>1575.0</td>\n",
       "      <td>0.1374</td>\n",
       "      <td>0.2050</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.1625</td>\n",
       "      <td>0.2364</td>\n",
       "      <td>0.07678</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_radius  mean_texture  mean_perimeter  mean_area  mean_smoothness  \\\n",
       "0        17.99         10.38          122.80     1001.0          0.11840   \n",
       "1        20.57         17.77          132.90     1326.0          0.08474   \n",
       "2        19.69         21.25          130.00     1203.0          0.10960   \n",
       "3        11.42         20.38           77.58      386.1          0.14250   \n",
       "4        20.29         14.34          135.10     1297.0          0.10030   \n",
       "\n",
       "   mean_compactness  mean_concavity  mean_concave_points  mean_symmetry  \\\n",
       "0           0.27760          0.3001              0.14710         0.2419   \n",
       "1           0.07864          0.0869              0.07017         0.1812   \n",
       "2           0.15990          0.1974              0.12790         0.2069   \n",
       "3           0.28390          0.2414              0.10520         0.2597   \n",
       "4           0.13280          0.1980              0.10430         0.1809   \n",
       "\n",
       "   mean_fractal_dimension  ...  worst_texture  worst_perimeter  worst_area  \\\n",
       "0                 0.07871  ...          17.33           184.60      2019.0   \n",
       "1                 0.05667  ...          23.41           158.80      1956.0   \n",
       "2                 0.05999  ...          25.53           152.50      1709.0   \n",
       "3                 0.09744  ...          26.50            98.87       567.7   \n",
       "4                 0.05883  ...          16.67           152.20      1575.0   \n",
       "\n",
       "   worst_smoothness  worst_compactness  worst_concavity  worst_concave_points  \\\n",
       "0            0.1622             0.6656           0.7119                0.2654   \n",
       "1            0.1238             0.1866           0.2416                0.1860   \n",
       "2            0.1444             0.4245           0.4504                0.2430   \n",
       "3            0.2098             0.8663           0.6869                0.2575   \n",
       "4            0.1374             0.2050           0.4000                0.1625   \n",
       "\n",
       "   worst_symmetry  worst_fractal_dimension  diagnosis  \n",
       "0          0.4601                  0.11890          M  \n",
       "1          0.2750                  0.08902          M  \n",
       "2          0.3613                  0.08758          M  \n",
       "3          0.6638                  0.17300          M  \n",
       "4          0.2364                  0.07678          M  \n",
       "\n",
       "[5 rows x 31 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"./data/breast_cancer_wisconsin.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(569, 31)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python Package Management"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We strongly recommend installing a virtual environment to avoid module version clashes. For Mac:\n",
    "\n",
    "The command below will create a folder called `.venv` which will host your virtual environment.\n",
    "```HTML\n",
    "> python3 -m venv .venv\n",
    "```\n",
    "Activate it so that you can use it:\n",
    "```HTML\n",
    "> source .venv/bin/activate\n",
    "```\n",
    "When done, simply deactivate your virtual environment:\n",
    "```HTML\n",
    "> deactivate\n",
    "```\n",
    "Please look this up on Google if you're on Windows.\n",
    "\n",
    "To get a list of all the Python modules on your current environment, try pip list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip list "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install --upgrade pip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How to install multiple packages at once:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install pandas matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The pipreqs Module"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In many cases, you will need to compile a list of all the modules you installed in your virtual environment for documentation, which is usually in a text file called `requirements.txt`. \n",
    "\n",
    "We recommend the `pipreqs` module for this purpose, which is usually better than the common practice of `pip freeze requirements.txt`. In particular, `pipreqs` will avoid listing Jupyter notebooks modules, which is what you need as you won't need these in case you just need to run the code elsewhere without Jupyter notebooks. \n",
    "\n",
    "Simply install it via \n",
    "```HTML\n",
    "> pip install pipreqs \n",
    "```\n",
    "Next, save the list of your installed modules to `requirements.txt` via \n",
    "```HTML\n",
    "> pipreqs . --force\n",
    "```\n",
    "The `--force` option above overrides any existing `requirements.txt` file. The dot means \"this folder\". That is, you will need to run this command where your virtual environment folder `.venv` is, so that pipreqs picks up the correct modules.\n",
    "\n",
    "When you need to replicate your Python environment on a different machine, create a new virtual environment and install all the modules in your requirements.txt file as below:\n",
    "```HTML\n",
    "> pip install -r requirements.txt\n",
    "```\n",
    "This way, all the modules you need for your project will be installed with the correct version numbers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}