November 14, 2014 Stardate: 68335.6 Tagged as: Python Matplotlib
from IPython.display import Image
Image("https://upload.wikimedia.org/wikipedia/en/thumb/e/e0/CS81698-01A-BIG.jpg/220px-CS81698-01A-BIG.jpg")
Ludacris is a rapper with success in the early 2000’s who went on to become an actor. His most famous role is probably in The Fast and the Furious movies. Anyways, he’s got this song named Area Codes where he raps about all the area codes in which he’s got “hoes”. I’ve seen an old version of a map where all the area codes he mentions are mapped. It’s from 2008 and the link from FlowingData.com claims that the source is from a non-operational blog called Strangemaps.com.
I haven’t worked very much with maps or shapefiles so I thought it would be fun to recreate this map in Python. It was a good experience and I wanted to share it.
# import libraries
from bs4 import BeautifulSoup
import requests
import re
import warnings
warnings.filterwarnings('ignore')
%load_ext version_information
First I scrape the lyrics from the song page at http://rap.genius.com/.
# this is the url for the page
artist_url = "http://genius.com/Ludacris-area-codes-lyrics"
# Scrap rap genius lyrics page
response = requests.get(artist_url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'})
soup = BeautifulSoup(response.text, "html.parser")
lyrics = soup.find('div', class_='lyrics').text.strip()
Let’s check that is worked. The lyrics should be in a long string.
print(lyrics)
Success! Now, let’s scrape out all the area codes. If you look, they are in a format #-#-#. We can use a regular expressiong to search for this pattern.
By the way, regular expressions are a blank magic. I use an online site www.pyrex.com to build and verify the expression.
pattern = r"[0-9]-[0-9]-[0-9]"
areacodes = re.compile(pattern).findall(lyrics)
Let’s check again.
areacodes[0:10]
Good, now we have all the area codes pulled out but it’s in a goofy format. Let’s pull out the hyphens.
codes = []
for code in areacodes: codes.append(code.replace("-", ""))
And check it…
codes[0:10]
Import libraries for plotting the map
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon
%matplotlib inline
I downloaded shapefiles for all the area codes at USGS. It looked good but it had some Canadian boundaries that I was not interested in. I loaded the shapefile into www.mapshaper.org and filtered out the Canadian boundaries. I’m still left with the Hawaii, Alaska, and Puerto Rico but I can window those out if I needed.
Image("mapshaper.png")
Now that I have a good shapefile let’s map it out.
# instantiate a figure
plt.figure(figsize=(24,12))
# create the map
map = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49,
projection='lcc',lat_1=33,lat_2=45,lon_0=-95)
# load the shapefile I created
map.readshapefile('./USAreaCode/AreaCode', name='areacodes', drawbounds=True)
# collect the area codes from the shapefile attributes so we can look up the shape obect for an area code by it's 3-digit number
area_codes = []
for shape_dict in map.areacodes_info:
area_codes.append(shape_dict['NPA'])
ax = plt.gca() # get current axes instance
# loop through the song area codes previously parsed and cleaned
for code in codes:
seg = map.areacodes[area_codes.index(code)]
poly = Polygon(seg, facecolor='red',edgecolor='red')
ax.add_patch(poly)
plt.title('Area Codes that Ludacris has \'Hoes\'', fontsize=16)
plt.show()
#plt.savefig('ludacris_areacodes.png')
There you go, a plot of where Ludacris got hoes. I did not include the 808 area code from Hawaii because it throws the view way off and the Hawaiian plot is so small you don’t even see it.
%version_information bs4, requests, matplotlib, basemap