01_Getting_&_Knowing_Your_Data -> Chipotle

This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd
import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

chipo = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv', sep= '\t')

Step 4. See the first 10 entries

chipo.head(10)

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39
1	1	1	Izze	[Clementine]	$3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98
5	3	1	Chicken Bowl	[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...	$10.98
6	3	1	Side of Chips	NaN	$1.69
7	4	1	Steak Burrito	[Tomatillo Red Chili Salsa, [Fajita Vegetables...	$11.75
8	4	1	Steak Soft Tacos	[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...	$9.25
9	5	1	Steak Burrito	[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...	$9.25

Step 5. What is the number of observations in the dataset?

# Solution 1

chipo.shape

(4622, 5)

# Solution 2

chipo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   order_id            4622 non-null   int64 
 1   quantity            4622 non-null   int64 
 2   item_name           4622 non-null   object
 3   choice_description  3376 non-null   object
 4   item_price          4622 non-null   object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB

Step 6. What is the number of columns in the dataset?

chipo.shape[1]

Step 7. Print the name of all the columns.

chipo.head(0)
##chipo.columns

	order_id	quantity	item_name	choice_description	item_price

Step 8. How is the dataset indexed?

chipo.index

RangeIndex(start=0, stop=4622, step=1)

Step 9. Which was the most-ordered item?

chipo.groupby(by="item_name").sum().sort_values('quantity',ascending=False).head(1)

	order_id	quantity	choice_description	item_price
item_name
Chicken Bowl	713926	761	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98 $10.98 $11.25 $8.75 $8.49 $11.25 $8.75 ...

Step 10. For the most-ordered item, how many items were ordered?

chipo.groupby(by="item_name").sum().sort_values('quantity',ascending=False).head(1)

	order_id	quantity	choice_description	item_price
item_name
Chicken Bowl	713926	761	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98 $10.98 $11.25 $8.75 $8.49 $11.25 $8.75 ...

Step 11. What was the most ordered item in the choice_description column?

chipo.groupby(by="choice_description").sum().sort_values('quantity',ascending=False).head(1)

	order_id	quantity	item_name	item_price
choice_description
[Diet Coke]	123455	159	Canned SodaCanned SodaCanned Soda6 Pack Soft D...	$2.18 $1.09 $1.09 $6.49 $2.18 $1.25 $1.09 $6.4...

Step 12. How many items were orderd in total?

chipo.item_name.count()

Step 13. Turn the item price into a float

Step 13.a. Check the item price type

chipo.item_price.dtype

dtype('O')

Step 13.b. Create a lambda function and change the type of item price

dollarizer = lambda x: float(x[1:-1])
chipo.item_price = chipo.item_price.apply(dollarizer)

Step 13.c. Check the item price type

chipo.item_price.dtype

dtype('float64')

Step 14. How much was the revenue for the period in the dataset?

revenue =  (chipo.item_price * chipo.quantity).sum()
print('Revenue is : $ '+ str(revenue))

Revenue is : $ 39237.02

Step 15. How many orders were made in the period?

chipo.order_id.value_counts().count()

Step 16. What is the average revenue amount per order?

# Solution 1

chipo['revenue'] = chipo['quantity'] * chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped['revenue'].mean()

21.39423118865867

Step 17. How many different items are sold?

chipo.item_name.value_counts().count()