This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries
import pandas as pd
import numpy as np
Step 2. Import the dataset from this address .
Step 3. Assign it to a variable called chipo.
chipo = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv' , sep= ' \t ' )
Step 4. See the first 10 entries
0
1
1
Chips and Fresh Tomato Salsa
NaN
$2.39
1
1
1
Izze
[Clementine]
$3.39
2
1
1
Nantucket Nectar
[Apple]
$3.39
3
1
1
Chips and Tomatillo-Green Chili Salsa
NaN
$2.39
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98
5
3
1
Chicken Bowl
[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...
$10.98
6
3
1
Side of Chips
NaN
$1.69
7
4
1
Steak Burrito
[Tomatillo Red Chili Salsa, [Fajita Vegetables...
$11.75
8
4
1
Steak Soft Tacos
[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...
$9.25
9
5
1
Steak Burrito
[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...
$9.25
Step 5. What is the number of observations in the dataset?
# Solution 2
chipo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 4622 non-null int64
1 quantity 4622 non-null int64
2 item_name 4622 non-null object
3 choice_description 3376 non-null object
4 item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB
Step 6. What is the number of columns in the dataset?
Step 7. Print the name of all the columns.
chipo.head(0 )
##chipo.columns
Step 8. How is the dataset indexed?
RangeIndex(start=0, stop=4622, step=1)
Step 9. Which was the most-ordered item?
chipo.groupby(by= "item_name" ).sum ().sort_values('quantity' ,ascending= False ).head(1 )
item_name
Chicken Bowl
713926
761
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98 $10.98 $11.25 $8.75 $8.49 $11.25 $8.75 ...
Step 10. For the most-ordered item, how many items were ordered?
chipo.groupby(by= "item_name" ).sum ().sort_values('quantity' ,ascending= False ).head(1 )
item_name
Chicken Bowl
713926
761
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98 $10.98 $11.25 $8.75 $8.49 $11.25 $8.75 ...
Step 11. What was the most ordered item in the choice_description column?
chipo.groupby(by= "choice_description" ).sum ().sort_values('quantity' ,ascending= False ).head(1 )
choice_description
[Diet Coke]
123455
159
Canned SodaCanned SodaCanned Soda6 Pack Soft D...
$2.18 $1.09 $1.09 $6.49 $2.18 $1.25 $1.09 $6.4...
Step 12. How many items were orderd in total?
Step 13. Turn the item price into a float
Step 13.a. Check the item price type
Step 13.b. Create a lambda function and change the type of item price
dollarizer = lambda x: float (x[1 :- 1 ])
chipo.item_price = chipo.item_price.apply (dollarizer)
Step 13.c. Check the item price type
Step 14. How much was the revenue for the period in the dataset?
revenue = (chipo.item_price * chipo.quantity).sum ()
print ('Revenue is : $ ' + str (revenue))
Step 15. How many orders were made in the period?
chipo.order_id.value_counts().count()
Step 16. What is the average revenue amount per order?
# Solution 1
chipo['revenue' ] = chipo['quantity' ] * chipo['item_price' ]
order_grouped = chipo.groupby(by= ['order_id' ]).sum ()
order_grouped['revenue' ].mean()
Step 17. How many different items are sold?
chipo.item_name.value_counts().count()