If you got here via a link to the slides

Please check out the full git repository for the presentation notebook as well as other example notebooks.

https://github.com/OttoStruve/ipython_notebook_presentation/tree/gsps

This presentation is actually adapted from one given by Joshua Barratt at LinuxCon

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('XkXXpaVpNSc', width=853, height=480)
Out[1]:

Open Source FTW!

Outline

  • What is the notebook
    • OMG it's the best
    • Quick Demo
    • IPython Ecosystem (2.0 Caveat)
  • Why is it awesome
    • Literate Code
    • Sharing
    • Rapid Prototyping/Exploring/Learning
      • Terminal(Editor <-> Renderer) -> Email vs Cyclic
    • Blogging
  • Workflows (edit, share, publish)
    • Under the hood ipynb
    • HTML
    • PDF
    • Gist + nbviewer
  • Demos
    • Code Mentorship (sets, objects, katas)
    • Runbooks
    • Log analysis epic
    • Shell scripting demo
    • Churn analysis
    • Latency Heatmap
  • Extending
  • It came from inside the presentation!

What is the notebook?

A "browser-based interactive computing environment"

Why are we talking about it today?

  • Extremely useful tool, like IT WILL CHANGE YOUR LIFE useful.
  • Makes the (powerful) python data/science/module ecosystem even more powerful
  • If you code, (even not in python), sysadmin, write documentation, blog, do any analysis or visualization, you might get a lot out of the notebook.

Why It's Great

These attributes will come up over and over as we explore this tool.

"Literate Computing"

  • This is a big part of where the title comes from: it's about the story more than the software.

  • IPython's founder, Fernando Perez @fperez_org has a blog post on this concept.

  • Here's and example

Enough Meta: Let's Install It:

Install Anaconda

  • Everything* you need gets installed
  • Easy to install additional packages using the conda utility
  • Even more packages available through Binstar

    *Well, almost everything

If you're morally opposed to Continuum Analytics

you can use pip install too.

pip install ipython[all] Simple as that.*

*Dependencies can be painful, YMMV.

Run 'ipython notebook'

Browser Launches

Build a notebook

(Actually, it's a process per open notebook.)

Run A Cell

No Need To Be Local...

Demo Time

Notebook Workflows: The Big Picture

Not covered today but cool; clustering capabilities

How I Fell: Report Workflow v1

Problems:

  • slow (read whole data file each time, lots of context switching)
  • version controlled analysis, but not commentary, difficult to 'go back to'
  • Automating requires non-trivial additional dev

Report Workflow now()

Speedups primarily from no context switching, interactivity, and reusable data loading.

Reproducible, literate, annotatable, auditable.

An Example of Iterative Workflow

Viewing and manipulating FITS images in Python

We'll need to import some basic modules to do this; note that %matplotlib inline causes images to appear in the notebook.

In [2]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

In order to handle FITS images in python, we'll need the fits module from astropy:

In [3]:
from astropy.io import fits

The following is optional, and will only work if you have Seaborn installed on your machine, but is a significant improvement over the matplotlib defaults.

  • Linux: conda install -c mutirri seaborn
  • OSX: conda install -c asmeurer seaborn
In [4]:
import seaborn as sns
sns.set_context('poster')
sns.set_style('white')

Downloading the data

Go to the SDSS website and download the image of your choice from the SDSS catalog. Here we'll use M42, the Orion Nebula, which looks like this:

You will need to unzip the files!!! The FITS images you download will be compressed (with a .gz or .bz2 extension). You'll need to extract the file using the program of your choice before proceeding.

Opening FITS files and loading the image data

Let's open the g-band FITS file and find out what it contains.

In [5]:
hdu_list = fits.open("data/frame-g-006073-4-0063.fits")
hdu_list.info()
Filename: data/frame-g-006073-4-0063.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      96   (2048, 1489)   float32   
1                ImageHDU         6   (2048,)      float32   
2                BinTableHDU     27   1R x 3C      [49152E, 2048E, 1489E]   
3                BinTableHDU     79   1R x 31C     [J, 3A, J, A, D, D, 2J, J, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, E, E]   

Generally the image information is located in the PRIMARY block. The blocks are numbered and can be accessed by indexing hdu_list.

In [6]:
image_data = hdu_list[0].data

You data is now stored as a 2-D numpy array. Want to know the dimensions of the image? Just look at the shape of the array.

In [7]:
print(type(image_data))
print(image_data.shape)
<type 'numpy.ndarray'>
(1489, 2048)

At this point, we can just close the FITS file. We have stored everything we wanted to a variable.

In [8]:
hdu_list.close()

SHORTCUT

If you don't need to examine the FITS header, you can call fits.getdata to bypass the previous steps.

In [9]:
image_data = fits.getdata("data/frame-g-006073-4-0063.fits")
print(type(image_data))
print(image_data.shape)
<type 'numpy.ndarray'>
(1489, 2048)

Let's get some basic statistics about our image

In [10]:
print('Min:', np.min(image_data))
print('Max:', np.max(image_data))
print('Mean:', np.mean(image_data))
print('Stdev:', np.std(image_data))
('Min:', 0.037841797)
('Max:', 208.25)
('Mean:', 6.6654449)
('Stdev:', 19.653004)

Viewing the image

In [11]:
plt.imshow(image_data, cmap='afmhot', origin='lower')
plt.colorbar()
# To see more color maps go to http://wiki.scipy.org/Cookbook/Matplotlib/Show_colormaps
Out[11]:
<matplotlib.colorbar.Colorbar instance at 0x10cb79d40>

Unforturnately, we can't really see much here because of the color range. Lets adjust that manually and see what happens.

Plotting a histogram

To make a histogram with matplotlib.pyplot.hist(), you need to cast the data from a 2-D to array to something one dimensional.

Here we'll use the iterable python object image_data.flat.

In [12]:
print(type(image_data.flat))
<type 'numpy.flatiter'>

In [13]:
NBINS = 1000
with sns.axes_style("darkgrid"):
    histogram = plt.hist(image_data.flat, NBINS)
In [14]:
plt.imshow(image_data, cmap='afmhot', origin='lower')
plt.clim(0,30)

Displaying the image with a logarithmic scale

In [15]:
NBINS = 1000
with sns.axes_style("darkgrid"):
    histogram = plt.hist(image_data.flat, NBINS)
    plt.yscale('log', nonposy='clip') #Same histogram, just with logarithmic scaling on the y-axis.

To get a logarithmically scaled image, we need to load the LogNorm object from matplotlib.

In [16]:
from matplotlib.colors import LogNorm
In [17]:
plt.imshow(image_data, cmap='afmhot', norm=LogNorm(), origin='lower')
Out[17]:
<matplotlib.image.AxesImage at 0x1212bc8d0>

Why was that cool?

  • Interactive & Exploratory.
    • Scroll back up, re-review JSON, go another route
  • Cached all the things
    • Not hitting twitter a bunch (rate limits, etc)
    • Static data set (not changing every time you run the code)
    • Can even keep developing while on conference wifi (oohhhhhh)
  • Easy to keep around as a log for future experiments
  • Easy to take that learning and 'bake' it into something more permanent

The "IPython" in "IPython Notebook"`: Interactive Python

The Future

Skills port to the IPython console

[jhummel@r900-4 ~]$ ipython
Python 2.7.8 |Anaconda 2.0.1 (64-bit)| (default, Aug 21 2014, 18:22:21) 
Type "copyright", "credits" or "license" for more information.

IPython 2.2.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pyGadget as pyg

In [2]: pyg.
pyg.analyze      pyg.hdf5         pyg.sim          pyg.units
pyg.constants    pyg.multiplot    pyg.sink         pyg.visualize
pyg.coordinates  pyg.nbody        pyg.snapshot     
pyg.halo         pyg.plotting     pyg.sph

Interactive Gotcha: Single Namespace

As you recall:

Remember Me

So what happens when you do...

In [28]:
x = 5
x
Out[28]:
5
In [32]:
# I ran this cell a few times
x += 1
x
Out[32]:
9

IPython Magic: Development Powertools

Which method is faster?

In [20]:
# make a big array of random numbers
x = np.random.random(500)
In [21]:
# Plan A: iterate through and add them up
def addr(numbers):
    tot = 0.
    for entry in numbers:
        tot += entry

# Plan B: use numpy.sum()

# %timeit is IPython Magic to do a quick benchmark
%timeit addr(x)
10000 loops, best of 3: 100 µs per loop

In [22]:
%timeit np.sum(x)
100000 loops, best of 3: 5.9 µs per loop

In [23]:
%lsmagic
Out[23]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Don't Panic

%%writefile?

    %writefile [-a] filename
    Write the contents of the cell to a file.

Exporting

ipynb format is clean, readable JSON, which inlines any output results, including base64'd images.

...
{
 "cell_type": "markdown",
 "metadata": {
  "slideshow": {
   "slide_type": "slide"
  }
 },
 "source": [
  "# Magic can be magical"
 ]
},
...

Great Notebook Use Cases

There are many use cases where the notebook makes a lot of sense to use. Here are a few illustrated examples:

We won't go into them all for time, but a few highlights:

Use Case: Data Analysis

This is the gateway drug that gets many people into IPython Notebook. It's the real sweet spot between what makes Python great (pandas, scikit*, numpy, matplotlib, etc) and IPython Notebook great (Literate, Visual, Interactive, Iterative.)

big data

Did I permanenently ruin your ability to hear the term 'big data' without thinking of this? You're welcome.

Use Case: Code Mentorship

Pairon

Because you can't always pair...

Use Case: Document ALL THE THINGS

disturbing

Use Case: Wiki Publishing

Also can work for HTML emails, etc.

Use Case: Blogging

It turns out this is super easy!

See nikola or pelican for automated ways to blog.

Rich Objects

You can also define additional __repr__()-type methods on custom objects. This has all kinds of fun possibilities.

_repr_html_(), svg, png, jpeg, html, javascript, latex.

In [24]:
class FancyText(object):
    def __init__(self, text):
        self.text = text
        
    def _repr_html_(self):
        """ Use some fancy CSS3 styling when we return this """
        style=("text-shadow: 0 1px 0 #ccc,0 2px 0 #c9c9c9,0 3px 0 #bbb,"
               "0 4px 0 #b9b9b9,0 5px 0 #aaa,0 6px 1px rgba(0,0,0,.1)")
        
        return '<h1 style="{}">{}</h1>'.format(style, self.text)

FancyText("Hello GSPS!")
Out[24]:

Hello GSPS!

IPython (& Notebook) Customization

custommap

See more on Profiles, Javascript Extensions, IPython Extensions, and nbconvert Templates

Javascript, Huh, What is it good for

Customizing the UI

IPython.toolbar.add_buttons_group([
            {
                id : 'toggle_codecells',
                label : 'Toggle codecell display',
                icon : 'icon-list-alt',
                callback : toggle
            }
      ]);

And more...

Turns out, a lot! You can execute anything you can run in an IPython Notebook cell.

IPython.notebook.kernel.execute("!rm -rf /")

Demo Of a less scary example

Don't forget custom.css

For example, base16 color schemes

Sharing Notebooks

Oh, one more thing

IT CAME FROM INSIDE THE NOTEBOOK

cant stop cant stop the top

  • Highly technical decks can be created quickly
  • Collaboration features are still quite useful
  • Check It Out

Building slides

  • Turn on the 'slideshow' cell toolbar
  • Types:
    • Slide: start a new slide
    • -: Continue a slide
    • Sub-Slide: Make a 'down' slide
    • Fragment: Make a 'bullet' type incoming slide
    • Skip: keep in the notebook, not the deck
    • Notes: speaker notes
In [25]:
!ipython nbconvert Presentation.ipynb --to slides
[NbConvertApp] Using existing profile dir: u'/Users/jhummel/.ipython/profile_default'
[NbConvertApp] Converting notebook Presentation.ipynb to slides
[NbConvertApp] Support files will be in Presentation_files/
[NbConvertApp] Loaded template slides_reveal.tpl
[NbConvertApp] Writing 1882989 bytes to Presentation.slides.html

Other Resources

Try It Online

Installing

  • Anaconda OR
  • $ pip install ipython[all] (brew install python) OR
  • docker-ipython
    • Preloaded with lots of sometimes challenging-to-install packages like Pattern, NLTK, Pandas, NumPy, SciPy, Numba, Biopython...

Learning More

  • Slides will be up on the GSPS site later.
  • Complete source for this presentation is available on github!
  • A Gallery of Interesting IPython Notebooks
  • Extensions
  • nbviewer (good way to discover organically)
  • Pandas/numpy, Statsmodels, Matplotlib, bokeh, vincent, scikit-learn, scikit-image, .... (F150!)
  • Talk to me!

Credits