import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
# set this so the graphs open internally
%matplotlib inline
03_Visualization -> Chipotle
This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called chipo.
= 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
url
= pd.read_csv(url, sep = '\t') chipo
Step 4. See the first 10 entries
10) chipo.head(
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 |
1 | 1 | 1 | Izze | [Clementine] | $3.39 |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 |
5 | 3 | 1 | Chicken Bowl | [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | $10.98 |
6 | 3 | 1 | Side of Chips | NaN | $1.69 |
7 | 4 | 1 | Steak Burrito | [Tomatillo Red Chili Salsa, [Fajita Vegetables... | $11.75 |
8 | 4 | 1 | Steak Soft Tacos | [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... | $9.25 |
9 | 5 | 1 | Steak Burrito | [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... | $9.25 |
Step 5. Create a histogram of the top 5 items bought
# get the Series of the names
= chipo.item_name
x
# use the Counter class from collections to create a dictionary with keys(text) and frequency
= Counter(x)
letter_counts
# convert the dictionary to a DataFrame
= pd.DataFrame.from_dict(letter_counts, orient='index')
df
# sort the values from the top to the least value and slice the first 5 items
= df[0].sort_values(ascending = True)[45:50]
df
# create the plot
='bar')
df.plot(kind
# Set the title and labels
'Items')
plt.xlabel('Number of Times Ordered')
plt.ylabel('Most ordered Chipotle\'s Items')
plt.title(
# show the plot
plt.show()
Step 6. Create a scatterplot with the number of items orderered per order price
Hint: Price should be in the X-axis and Items ordered in the Y-axis
# create a list of prices
= [float(value[1:-1]) for value in chipo.item_price] # strip the dollar sign and trailing space
chipo.item_price
# then groupby the orders and sum
= chipo.groupby('order_id').sum()
orders
# creates the scatterplot
# plt.scatter(orders.quantity, orders.item_price, s = 50, c = 'green')
= orders.item_price, y = orders.quantity, s = 50, c = 'green')
plt.scatter(x
# Set the title and labels
'Order Price')
plt.xlabel('Items ordered')
plt.ylabel('Number of items ordered per order price')
plt.title(0) plt.ylim(
Step 7. BONUS: Create a question and a graph to answer your own question.
Question:Create a bar chart of the last 5 items purchased
= chipo['item_name'].value_counts()
item_counts
= pd.DataFrame(item_counts).iloc[-5:]
df
=True)
df.reset_index(inplace= ['item_name', 'Number of Times Ordered']
df.columns
='bar', x='item_name', y='Number of Times Ordered', legend=False)
df.plot(kind'Items')
plt.xlabel('Number of Times Ordered')
plt.ylabel('Least Ordered Chipotle\'s Items')
plt.title(=45)
plt.xticks(rotation
plt.tight_layout() plt.show()