{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Jupyter Notebooks\n", "\n", "In this tutorial, we discuss some basic tasks to get your Jupyter notebooks up and running on your computer." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "source": [ "## Spellchecker: LanguageTool Browser Extension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last thing you want on your Jupyter notebooks is typos. Jupyter notebooks have some spellchecker extensions, but it gets problematic installing them on different software environments. For spellchecking, we actually recommend a free browser-based extension called **LanguageTool** [here](https://languagetool.org/). This extension not only checks for typos in your notebooks, but also anything you type within your browser as an extra bonus. Sweet!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to check for Python and module versions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Within your shell, you can issue the command:\n", "```HTML\n", "> python --version\n", "```\n", "Sometimes python's executable command name will be \"python3\", so you might need:\n", "```HTML\n", "> python3 --version\n", "```\n", "Within the Jupyter notebooks environment, to issue a system command, you will need to an exclamation mark (\"!\") in front as shown below, which will have the same effect:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.11.9\n" ] } ], "source": [ "!python3 --version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To check for version number of a Python module, you can view its `__version__` attribute as below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.0.0'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to read CSV files" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming that your file is under a directory called `data`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_radiusmean_texturemean_perimetermean_areamean_smoothnessmean_compactnessmean_concavitymean_concave_pointsmean_symmetrymean_fractal_dimension...worst_textureworst_perimeterworst_areaworst_smoothnessworst_compactnessworst_concavityworst_concave_pointsworst_symmetryworst_fractal_dimensiondiagnosis
017.9910.38122.801001.00.118400.277600.30010.147100.24190.07871...17.33184.602019.00.16220.66560.71190.26540.46010.11890M
120.5717.77132.901326.00.084740.078640.08690.070170.18120.05667...23.41158.801956.00.12380.18660.24160.18600.27500.08902M
219.6921.25130.001203.00.109600.159900.19740.127900.20690.05999...25.53152.501709.00.14440.42450.45040.24300.36130.08758M
311.4220.3877.58386.10.142500.283900.24140.105200.25970.09744...26.5098.87567.70.20980.86630.68690.25750.66380.17300M
420.2914.34135.101297.00.100300.132800.19800.104300.18090.05883...16.67152.201575.00.13740.20500.40000.16250.23640.07678M
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " mean_radius mean_texture mean_perimeter mean_area mean_smoothness \\\n", "0 17.99 10.38 122.80 1001.0 0.11840 \n", "1 20.57 17.77 132.90 1326.0 0.08474 \n", "2 19.69 21.25 130.00 1203.0 0.10960 \n", "3 11.42 20.38 77.58 386.1 0.14250 \n", "4 20.29 14.34 135.10 1297.0 0.10030 \n", "\n", " mean_compactness mean_concavity mean_concave_points mean_symmetry \\\n", "0 0.27760 0.3001 0.14710 0.2419 \n", "1 0.07864 0.0869 0.07017 0.1812 \n", "2 0.15990 0.1974 0.12790 0.2069 \n", "3 0.28390 0.2414 0.10520 0.2597 \n", "4 0.13280 0.1980 0.10430 0.1809 \n", "\n", " mean_fractal_dimension ... worst_texture worst_perimeter worst_area \\\n", "0 0.07871 ... 17.33 184.60 2019.0 \n", "1 0.05667 ... 23.41 158.80 1956.0 \n", "2 0.05999 ... 25.53 152.50 1709.0 \n", "3 0.09744 ... 26.50 98.87 567.7 \n", "4 0.05883 ... 16.67 152.20 1575.0 \n", "\n", " worst_smoothness worst_compactness worst_concavity worst_concave_points \\\n", "0 0.1622 0.6656 0.7119 0.2654 \n", "1 0.1238 0.1866 0.2416 0.1860 \n", "2 0.1444 0.4245 0.4504 0.2430 \n", "3 0.2098 0.8663 0.6869 0.2575 \n", "4 0.1374 0.2050 0.4000 0.1625 \n", "\n", " worst_symmetry worst_fractal_dimension diagnosis \n", "0 0.4601 0.11890 M \n", "1 0.2750 0.08902 M \n", "2 0.3613 0.08758 M \n", "3 0.6638 0.17300 M \n", "4 0.2364 0.07678 M \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"./data/breast_cancer_wisconsin.csv\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(569, 31)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python Package Management" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We strongly recommend installing a virtual environment to avoid module version clashes. For Mac:\n", "\n", "The command below will create a folder called `.venv` which will host your virtual environment.\n", "```HTML\n", "> python3 -m venv .venv\n", "```\n", "Activate it so that you can use it:\n", "```HTML\n", "> source .venv/bin/activate\n", "```\n", "When done, simply deactivate your virtual environment:\n", "```HTML\n", "> deactivate\n", "```\n", "Please look this up on Google if you're on Windows.\n", "\n", "To get a list of all the Python modules on your current environment, try pip list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip list " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!pip install --upgrade pip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How to install multiple packages at once:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install pandas matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The pipreqs Module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In many cases, you will need to compile a list of all the modules you installed in your virtual environment for documentation, which is usually in a text file called `requirements.txt`. \n", "\n", "We recommend the `pipreqs` module for this purpose, which is usually better than the common practice of `pip freeze requirements.txt`. In particular, `pipreqs` will avoid listing Jupyter notebooks modules, which is what you need as you won't need these in case you just need to run the code elsewhere without Jupyter notebooks. \n", "\n", "Simply install it via \n", "```HTML\n", "> pip install pipreqs \n", "```\n", "Next, save the list of your installed modules to `requirements.txt` via \n", "```HTML\n", "> pipreqs . --force\n", "```\n", "The `--force` option above overrides any existing `requirements.txt` file. The dot means \"this folder\". That is, you will need to run this command where your virtual environment folder `.venv` is, so that pipreqs picks up the correct modules.\n", "\n", "When you need to replicate your Python environment on a different machine, create a new virtual environment and install all the modules in your requirements.txt file as below:\n", "```HTML\n", "> pip install -r requirements.txt\n", "```\n", "This way, all the modules you need for your project will be installed with the correct version numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 4 }