This project explores the costs of living and purchasing power characteristics of 500 major cities around the world.
The analysis in this post concerns itself with the following questions:
The data was sourced from Numbeo.com, which hosts user-contributed data - current within the last 18 months.
The IPython Notebook for this project is available on github.
First, let's take a look at our key metrics - local purchasing power and total cost of living (including rent) - on a global scale:
Next, let's look at the distribution of purchasing power in each region of the world.
The first three graphs plot the proportion of cities in each region that enjoy varying levels of wealth, relative to the worldwide median. The fourth graph plots the same metric for the world at large, against the medians for each of our regions. (These are just two slightly different views into the same data).
Next, let's look at the distribution of rent costs - same regions, different metric.
How much does the average price of rent vary from city to city, for each region in the world?
What is the relationship between non-rent costs of living, and cost of rent, for the top 500 major cities, worldwide?
Which of these costs varies more?
The scatterplot left illustrates cost of rent vs non-rent costs of living*.
The kdeplot (essentially, a smoothed histogram) shows a much greater spread in the cost of rent, compared to non-rent costs*.
What does this tell us?
On average, as costs of living increase, rent increases a full 2.13 times faster.
*The above charts graph the delta in costs for each city, with relation to the worldwide median for each metric.
The scatterplot, then, does not illustrate absolute costs, but rather the ratio by which costs are more (or less) expensive than average.
This raises an interesting question:
Given that rent and non-rent costs are strongly correlated, and given that rent rises faster (relative to the worldwide median) than non-rent costs of living, to what degree is rent (rather than non-rent) the major driver of variance in cost, for cities around the world?
In simple terms, are "expensive" cities expensive because rent in those cities are expensive, or are they expensive because non-rent factors are driving up the cost of living?
The following graph plots each of our major cities relative to the worldwide median cost of living
From this chart, it becomes immediately apparent that - in the vast majority of cities around the world - rent is the primary driver of cost.
The reason the typical expensive city is expensive is because rent in that city is expensive. Cheap cities, then, are cheap because rent is cheap.
The more astute readers will notice that more than half of our cities in this visualization fall above the "median." This is because we used a calculated median to address sampling bias in the data. Significantly more than half of the cities sampled are from rich countries. This means that taking a simple median (the 250th of 500 data points) would result in a measure that was more expensive than the true worldwide median cost of living. To fix this we first calculated the median cost in each region, then took the median of all of our regional medians.
A final point that needs to be addressed is how we calculated the cost ratio for each city, relative to our median:
Numbeo creates their total cost of living index by attributing (essentially) equal weight to both rent, and non-rent costs of living. Thus, 50% of the cost for any city is derived from rent, and the other 50% from non-rent. Using these figures would have resulted in a boring and quite useless graph, with equal parts red and blue for every city. To get around this, we calculated two additional indexes for each city:
We then used these two columns to calculate one final metric: the proportion of the variance in total cost that is attributable to rent costs, specifically. This is the metric which determines our red/blue splits on the graph displayed above.
Here is another graph, highlighting the top five most expensive, and top five least expensive cities in our dataset:
Interestingly, San Francisco (a city in which the author of the study has lived) is the second most expensive location in the world, and nearly all of the reason it is expensive is due to the costs of rent. This is of course no surprise, as San Francisco holds claim to the most expensive real estate on the entire continent.
Next, let's explore another of our hypotheses:
Is there a relationship between cost of living and local purchasing power?