Lecture 30 – Perception, Case Study

Data 94, Spring 2021

In [1]:
from datascience import *
import numpy as np
Table.interactive_plots()
import plotly.express as px
In [2]:
sky = Table.read_table('data/skyscrapers.csv') \
           .where('status.current', are.contained_in(['completed', 'under construction'])) \
           .select('name', 'location.city', 'location.latitude', 'location.longitude', 
                   'statistics.floors above', 'statistics.height', 'status.completed.year') \
           .relabeled(['location.city', 'location.latitude', 'location.longitude',
                       'statistics.floors above', 'statistics.height', 'status.completed.year'],
                      ['city', 'latitude', 'longitude', 'floors', 'height', 'year']) \
           .where('height', are.above(0)) \
           .where('floors', are.above(0))

sky
Out[2]:
name city latitude longitude floors height year
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019
Empire State Building New York City 40.7484 -73.9856 102 381 1931
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973

... (2145 rows omitted)

Perception

In [3]:
sky.group('city') \
   .where('count', are.above_or_equal_to(40)) \
   .sort('count', descending = True) \
   .barh('city', title = 'Number of Skyscrapers Per City')
In [4]:
# Remember, you're not responsible for the code here.
px.pie(sky.group('city').where('count', are.above_or_equal_to(40)).to_df(),
       values = 'count',
       names = 'city',
       title = 'Number of Skyscrapers Per City (Top 10 Only)'
)

Case Study – Skyscrapers

In [5]:
sky.shuffle()
Out[5]:
name city latitude longitude floors height year
Sheraton Denver Downtown Hotel Denver 39.7419 -104.989 22 66.45 1960
52 East End Avenue New York City 40.7723 -73.9471 40 125 1987
Kaleida Health GVI and UB CRTC Buffalo 42.9016 -78.866 10 57.53 2012
Opera Tower Miami 25.7924 -80.1872 56 165.5 2007
Austin Hilton Covention Center Hotel Austin 30.2654 -97.7379 31 114.91 2004
Paraiso Bayviews Miami 25.8069 -80.1874 44 152.4 2018
West Ocean Condominiums I Long Beach 33.7667 -118.197 29 105.16 2007
North Harbor Tower Chicago 41.8854 -87.6154 55 169.47 1988
One Rockefeller Plaza New York City 40.758 -73.9789 34 149 1937
Museum Tower Chicago 41.8661 -87.6213 38 124.64 2006

... (2145 rows omitted)

Which cities have the most skyscrapers?

In [6]:
sky.group('city') \
   .where('count', are.above_or_equal_to(20)) \
   .sort('count', descending = True)
Out[6]:
city count
New York City 558
Chicago 376
Miami 107
San Francisco 65
Houston 61
Seattle 60
Honolulu 56
Los Angeles 51
Las Vegas 43
Minneapolis 39

... (9 rows omitted)

In [7]:
sky.group('city') \
   .where('count', are.above_or_equal_to(20)) \
   .sort('count', descending = True) \
   .barh('city', title = 'Number of Skyscrapers Per City (Min. 20)')

Do any of the above cities stick out to you?

What is the distribution of skyscraper heights?

In [8]:
sky.column('height').min()
Out[8]:
35.970001220703004
In [9]:
sky.column('height').max()
Out[9]:
541.29998779297
In [10]:
sky.hist('height', density = False, bins = np.arange(0, 600, 25),
        title = 'Distribution of Skyscraper Heights')

Let's zoom in a little more.

In [11]:
sky.where('height', are.below(300)) \
   .hist('height', density = False, bins = np.arange(0, 310, 10),
        title = 'Distribution of Skyscraper Heights Below 300m')

What's the distribution of short vs. tall skyscrapers in each city?

In [12]:
sky
Out[12]:
name city latitude longitude floors height year
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019
Empire State Building New York City 40.7484 -73.9856 102 381 1931
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973

... (2145 rows omitted)

Let's say a skyscraper is "short" if its height is less than or equal to 150 meters; otherwise, it's "tall".

In [13]:
def height_cat(height):
    if height <= 150:
        return 'short'
    return 'tall'
In [14]:
sky.apply(height_cat, 'height')
Out[14]:
array(['tall', 'tall', 'tall', ..., 'short', 'short', 'short'],
      dtype='<U5')
In [15]:
sky = sky.with_columns('height category', sky.apply(height_cat, 'height'))
sky
Out[15]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

We can use pivot to draw a bar chart of the number of short and tall skyscrapers per city.

Quick Check 1

Fill in the blanks to create the table short_and_tall, which has two columns, 'short' and 'tall', and one row for each city with at least 5 short and 5 tall skyscrapers. The first five rows of short_and_tall are shown below.

city short tall
New York City 341 217
Chicago 268 108
Miami 58 49
Houston 34 27
San Francisco 43 22
short_and_tall = sky.pivot(__(a)__, __(b)__) \
                    .where(__(c)__, are.above_or_equal_to(5)) \
                    .where('tall', are.above_or_equal_to(5)) \
                    .sort('tall', descending = True)
In [16]:
# short_and_tall = sky.pivot(__(a)__, __(b)__) \
#                     .where(__(c)__, are.above_or_equal_to(5)) \
#                     .where('tall', are.above_or_equal_to(5)) \
#                     .sort('tall', descending = True)
In [17]:
# short_and_tall.barh('city', title = 'Number of Short and Tall Skyscrapers Per City (Min. 5 Each)')

It seems like most cities have roughly twice as many "short" skyscrapers as they do "tall" skyscrapers.

What if we want to look at the distribution of the number of floors per skyscraper, separated by height category?

In [36]:
sky.hist('floors', group = 'height category', 
         density = False,
         bins = np.arange(0, 150, 5),
         title = 'Distribution of Number of Floors Per Skyscraper')

Since there is overlap between the two histograms, we have that there are some short skyscrapers (below 150m) with more floors than some tall skyscrapers!

What's the relationship between height and number of floors?

In [19]:
sky
Out[19]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [20]:
sky.scatter('height', 'floors',
           s = 30,
           group = 'height category',
           title = 'Number of Floors vs. Height', 
           yaxis_title = 'Number of Floors')
In [21]:
sky.where('height', are.above(300)) \
   .scatter('height', 'floors',
            s = 50,
            labels = 'name',
            title = 'Number of Floors vs. Height (Min. 300m)')

How many skyscrapers were built per year?

In [22]:
sky
Out[22]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [23]:
sky.group('year')
Out[23]:
year count
0 7
1861 1
1888 1
1889 1
1892 1
1893 1
1895 2
1896 2
1897 1
1898 1

... (111 rows omitted)

This is obviously an error in our data.

In [24]:
sky.where('year', 0)
Out[24]:
name city latitude longitude floors height year height category
Miami River Miami 0 0 57 192.03 0 tall
461 Dean Street New York City 40.682 -73.9755 32 105.77 0 short
640 North Wells Street Chicago 41.8936 -87.6344 22 85.34 0 short
SkyHouse Channelside Tampa 27.9479 -82.4469 23 83.82 0 short
SkyHouse Dallas Dallas 0 0 24 80 0 short
2950 North Sheridan Road Chicago 41.9359 -87.6398 19 65.2 0 short
Anthem Blue Cross Los Angeles 34.1806 -118.599 13 57.91 0 short
In [25]:
sky.where('year', are.not_equal_to(0)) \
   .group('year') \
   .plot('year', title = 'Number of Skyscrapers Built Per Year')

What if we want to look at the number of skyscrapers per year built in different cities?

In [26]:
sky.where('city', are.contained_in(['New York City', 'Chicago'])) \
   .where('year', are.not_equal_to(0)) \
   .pivot('city', 'year')
Out[26]:
year Chicago New York City
1888 1 0
1889 1 0
1893 1 0
1895 1 1
1896 0 1
1899 1 2
1902 0 2
1903 1 2
1904 1 0
1905 0 1

... (100 rows omitted)

In [27]:
sky.where('city', are.contained_in(['New York City', 'Chicago'])) \
   .where('year', are.not_equal_to(0)) \
   .pivot('city', 'year') \
   .plot('year',
        title = 'Number of Skyscrapers Built Per Year in NYC and Chicago')

Where on a map are most skyscrapers located?

In [28]:
sky
Out[28]:
name city latitude longitude floors height year height category
One World Trade Center New York City 40.7131 -74.0134 94 541.3 2014 tall
Central Park Tower New York City 40.7664 -73.9809 95 541.02 2019 tall
Willis Tower Chicago 41.8789 -87.6359 108 442.14 1974 tall
111 West 57th Street New York City 40.7648 -73.9775 80 438.3 2018 tall
432 Park Avenue New York City 40.7616 -73.9719 85 425.5 2015 tall
Trump International Hotel & Tower Chicago 41.8889 -87.6264 98 423.22 2009 tall
30 Hudson Yards New York City 40.754 -74.0008 73 386.61 2019 tall
Empire State Building New York City 40.7484 -73.9856 102 381 1931 tall
Bank of America Tower New York City 40.7554 -73.9844 55 365.8 2009 tall
Aon Center Chicago 41.8852 -87.6215 83 346.26 1973 tall

... (2145 rows omitted)

In [29]:
Circle.map_table(sky.select('latitude', 'longitude'),
                line_color = None,
                fill_opacity = 0.65,
                area = 75,
                color = 'orange')
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook