In [53]:

```
import numpy as np
import pandas as pd
import seaborn as sb
from numpy.random import randn
import matplotlib.pyplot as plt
from matplotlib import rcParams
from pandas import Series, DataFrame
```

Don't forget to add inline matplotlib and configure your base grid.

In [54]:

```
%matplotlib inline
# Figsize is represented in inches.
rcParams['figure.figsize']= 5,4
sb.set_style('whitegrid')
```

This takes two dataframes and concatinates them based on a common key. This is handy if you have sales data on a platform like Google Analytics while keeping the details of your profit margins offline. In this way you can merge them and do your own offline analytics while keeping the details of your specific margins private.

This would then make it easy to maintain a single list of your products and bump it against a sales report from any time period. Below we read in the test analytics data. Set the sku data type to string, and specify which columns to use.

In [55]:

```
Sales = pd.read_csv("./Test.csv", parse_dates=True, dtype = {"sku":str}, usecols = ["date", "price", "sku"])
Sales
```

Out[55]:

date | price | sku | |
---|---|---|---|

0 | 22 Aug 2013 | 72 | 445 |

1 | 22 Aug 2013 | 56 | 203 |

2 | 22 Aug 2013 | 56 | 203 |

3 | 22 Aug 2013 | 115 | 661 |

4 | 04 Sep 2013 | 115 | 661 |

5 | 11 Sep 2013 | 72 | 445 |

6 | 28 Sep 2013 | 72 | 445 |

7 | 28 Sep 2013 | 117 | 661 |

8 | 02 Oct 2013 | 72 | 445 |

9 | 02 Oct 2013 | 56 | 203 |

10 | 02 Oct 2013 | 115 | 661 |

In [56]:

```
# Cost of goods sold COGS
COGS = DataFrame({'sku':['203','445', '661'],
'cost':[ 46, 53, 89]})
COGS
```

Out[56]:

cost | sku | |
---|---|---|

0 | 46 | 203 |

1 | 53 | 445 |

2 | 89 | 661 |

In [57]:

```
# Create a new frame by joining the original two, merging by matching sku numbers to the main DF
C = pd.merge(Sales, COGS, how = 'left', on='sku')
print(C)
```

In [58]:

```
Col_Order = ['date', 'sku','cost', 'price']
C1 = DataFrame(C, columns=Col_Order)
C1
```

Out[58]:

date | sku | cost | price | |
---|---|---|---|---|

0 | 22 Aug 2013 | 445 | 53 | 72 |

1 | 22 Aug 2013 | 203 | 46 | 56 |

2 | 22 Aug 2013 | 203 | 46 | 56 |

3 | 22 Aug 2013 | 661 | 89 | 115 |

4 | 04 Sep 2013 | 661 | 89 | 115 |

5 | 11 Sep 2013 | 445 | 53 | 72 |

6 | 28 Sep 2013 | 445 | 53 | 72 |

7 | 28 Sep 2013 | 661 | 89 | 117 |

8 | 02 Oct 2013 | 445 | 53 | 72 |

9 | 02 Oct 2013 | 203 | 46 | 56 |

10 | 02 Oct 2013 | 661 | 89 | 115 |

In [59]:

```
# Creates a profit series and calculates the difference between the
# price and the cost.
C['profit']=C['price']-C['cost']
# Sets the column order including new column.
cols = ['date', 'sku', 'cost', 'price', 'profit']
# Creates a new DF and passes in the source DF and desired column order.
Profit = DataFrame(C, columns=cols)
Profit
```

Out[59]:

date | sku | cost | price | profit | |
---|---|---|---|---|---|

0 | 22 Aug 2013 | 445 | 53 | 72 | 19 |

1 | 22 Aug 2013 | 203 | 46 | 56 | 10 |

2 | 22 Aug 2013 | 203 | 46 | 56 | 10 |

3 | 22 Aug 2013 | 661 | 89 | 115 | 26 |

4 | 04 Sep 2013 | 661 | 89 | 115 | 26 |

5 | 11 Sep 2013 | 445 | 53 | 72 | 19 |

6 | 28 Sep 2013 | 445 | 53 | 72 | 19 |

7 | 28 Sep 2013 | 661 | 89 | 117 | 28 |

8 | 02 Oct 2013 | 445 | 53 | 72 | 19 |

9 | 02 Oct 2013 | 203 | 46 | 56 | 10 |

10 | 02 Oct 2013 | 661 | 89 | 115 | 26 |

In [60]:

```
sales_by_date = Profit.groupby(Profit['date'])
sales_by_date.sum()
```

Out[60]:

cost | price | profit | |
---|---|---|---|

date | |||

02 Oct 2013 | 188 | 243 | 55 |

04 Sep 2013 | 89 | 115 | 26 |

11 Sep 2013 | 53 | 72 | 19 |

22 Aug 2013 | 234 | 299 | 65 |

28 Sep 2013 | 142 | 189 | 47 |

In [64]:

```
sales_by_date.sum().plot()
```

Out[64]:

<matplotlib.axes._subplots.AxesSubplot at 0x116b239b0>

In [63]:

```
sales_by_date['profit'].sum().plot()
```

Out[63]:

<matplotlib.axes._subplots.AxesSubplot at 0x116ab3e48>

In [65]:

```
sales_by_date = Profit.groupby(Profit['sku'])
sales_by_date.sum()
```

Out[65]:

cost | price | profit | |
---|---|---|---|

sku | |||

203 | 138 | 168 | 30 |

445 | 212 | 288 | 76 |

661 | 356 | 462 | 106 |

In [66]:

```
sales_by_date.sum().plot(kind='bar')
```

Out[66]:

<matplotlib.axes._subplots.AxesSubplot at 0x116cbae80>

Here we take a closer look at our vehicle data again.

In [12]:

```
# Reading in the data
cars = pd.read_csv("/Users/Steglitz/jupyter/mtcars.csv")
# Setting the columns that you want to use.
# NOTE: You can chose the display order by reordering the columns the way you want.
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
cars.index = cars.car_names
# Returns the first five rows.
cars.head()
```

Out[12]:

car_names | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

car_names | ||||||||||||

Mazda RX4 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |

Mazda RX4 Wag | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |

Datsun 710 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |

Hornet 4 Drive | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |

Hornet Sportabout | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |

In [13]:

```
mpg = cars['mpg']
fig = plt.figure()
ax = fig.add_axes([.1,.1,1,1])
mpg.plot()
ax.set_xticks(range(32))
ax.set_xticklabels(cars.car_names, rotation=60, fontsize='medium')
ax.set_title('Miles per Gallon of Cars in mtcars')
ax.set_xlabel('car names')
ax.set_ylabel('miles/gal')
ax.legend(loc='best')
```

Out[13]:

<matplotlib.legend.Legend at 0x112fd1240>

You can also grab quick details about the dataset like:

In [15]:

```
# Maximum value
mpg.max()
```

Out[15]:

33.899999999999999

In [16]:

```
# Minimum value
mpg.min()
```

Out[16]:

10.4

In [17]:

```
# Record count
mpg.count()
```

Out[17]:

32

In [18]:

```
mpg.describe()
```

Out[18]:

count 32.000000 mean 20.090625 std 6.026948 min 10.400000 25% 15.425000 50% 19.200000 75% 22.800000 max 33.900000 Name: mpg, dtype: float64

In [19]:

```
mpg.plot(kind='hist')
```

Out[19]:

<matplotlib.axes._subplots.AxesSubplot at 0x112f2a828>

In [20]:

```
sb.distplot(mpg)
```

Out[20]:

<matplotlib.axes._subplots.AxesSubplot at 0x113144f60>

In [70]:

```
cars.plot(kind='scatter', x='hp', y='mpg', c=['lightblue'], s=50)
```

Out[70]:

<matplotlib.axes._subplots.AxesSubplot at 0x11701c908>

In [76]:

```
sb.regplot(x='hp', y='mpg', data=cars, scatter=True, ci=80)
```

Out[76]:

<matplotlib.axes._subplots.AxesSubplot at 0x1176869e8>

In [82]:

```
cars_df = pd.DataFrame((cars.ix[:,(1,3,4,6)].values), columns = ['mpg', 'disp', 'hp', 'wt'])
cars_target = cars.ix[:,9].values
target_names = [0,1]
cars_df['group'] = pd.Series(cars_target, dtype="category")
sb.pairplot(cars_df, hue='group', palette='hls')
```

Out[82]:

<seaborn.axisgrid.PairGrid at 0x1186f7ba8>

In [27]:

```
np.set_printoptions(precision=2)
```

You can create arrays by passing lists to the array method.

In [83]:

```
a = np.array([1,2,3,4,5,6])
a
```

Out[83]:

array([1, 2, 3, 4, 5, 6])

You can create a multidimensional array by passing in a list of lists.

In [84]:

```
b = np.array([[10,20,30], [40,50,60]])
b
```

Out[84]:

array([[10, 20, 30], [40, 50, 60]])

Creating an array by assignment

In [85]:

```
np.random.seed(25)
c = 36*np.random.randn(6)
c
```

Out[85]:

array([ 8.22, 36.97, -30.23, -21.28, -34.45, -8. ])

In [86]:

```
d = np.arange(1,35)
d
```

Out[86]:

array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

Using the arrays we just created, you can do arithmetic operations on entire arrays.

You can do multiplication:

In [87]:

```
a * 10
```

Out[87]:

array([10, 20, 30, 40, 50, 60])

Addition:

In [88]:

```
print('C: ', c);print('A: ', a)
c + a
```

C: [ 8.22 36.97 -30.23 -21.28 -34.45 -8. ] A: [1 2 3 4 5 6]

Out[88]:

array([ 9.22, 38.97, -27.23, -17.28, -29.45, -2. ])

In [89]:

```
Subtraction:
```

File "<ipython-input-89-c7951b29771d>", line 1 Subtraction: ^ SyntaxError: invalid syntax

In [90]:

```
print('C: ', c);print('A: ', a)
c - a
```

C: [ 8.22 36.97 -30.23 -21.28 -34.45 -8. ] A: [1 2 3 4 5 6]

Out[90]:

array([ 7.22, 34.97, -33.23, -25.28, -39.45, -14. ])

Array multiplication:

In [91]:

```
print('C: ', c);print('A: ', a)
c * a
```

C: [ 8.22 36.97 -30.23 -21.28 -34.45 -8. ] A: [1 2 3 4 5 6]

Out[91]:

array([ 8.22, 73.94, -90.68, -85.13, -172.24, -48.02])

Division:

In [92]:

```
print('C: ', c);print('A: ', a)
c /a
```

C: [ 8.22 36.97 -30.23 -21.28 -34.45 -8. ] A: [1 2 3 4 5 6]

Out[92]:

array([ 8.22, 18.48, -10.08, -5.32, -6.89, -1.33])

In [93]:

```
# Setting up the aa matrix.
aa = np.array([[2.,4.,6.],[1.,3.,5.],[10.,20.,30.]])
aa
```

Out[93]:

array([[ 2., 4., 6.], [ 1., 3., 5.], [ 10., 20., 30.]])

In [94]:

```
# Setting up the bb matrix.
bb = np.array([[0.,1.,2.],[3.,4.,5.],[6.,7.,8.]])
bb
```

Out[94]:

array([[ 0., 1., 2.], [ 3., 4., 5.], [ 6., 7., 8.]])

Matrix addition:

In [95]:

```
print('Matrix aa: \n', aa);print('Matrix bb: \n', bb)
aa+bb
```

Matrix aa: [[ 2. 4. 6.] [ 1. 3. 5.] [ 10. 20. 30.]] Matrix bb: [[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]]

Out[95]:

array([[ 2., 5., 8.], [ 4., 7., 10.], [ 16., 27., 38.]])

Subtraction:

In [96]:

```
print('Matrix aa: \n', aa);print('Matrix bb: \n', bb)
aa-bb
```

Matrix aa: [[ 2. 4. 6.] [ 1. 3. 5.] [ 10. 20. 30.]] Matrix bb: [[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]]

Out[96]:

array([[ 2., 3., 4.], [ -2., -1., 0.], [ 4., 13., 22.]])

Matrix multiplication:

In [97]:

```
print('Matrix aa: \n', aa);print('Matrix bb: \n', bb)
aa*bb
```

Matrix aa: [[ 2. 4. 6.] [ 1. 3. 5.] [ 10. 20. 30.]] Matrix bb: [[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]]

Out[97]:

array([[ 0., 4., 12.], [ 3., 12., 25.], [ 60., 140., 240.]])

Matrix division:

In [46]:

```
print('Matrix aa: \n', aa);print('Matrix bb: \n', bb)
aa /bb
```

Matrix aa: [[ 2. 4. 6.] [ 1. 3. 5.] [ 10. 20. 30.]] Matrix bb: [[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]]

/Users/Steglitz/Envs/DataAnalysis/lib/python3.6/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in true_divide This is separate from the ipykernel package so we can avoid doing imports until

Out[46]:

array([[ inf, 4. , 3. ], [ 0.33, 0.75, 1. ], [ 1.67, 2.86, 3.75]])