DATA EXPLORATIONS
  • Data Science Lab
  • Lucid Analytics Project

Primary Drivers of Costs of Living in 500 Major Cities, Worldwide

3/31/2017

 
This project explores the costs of living and purchasing power characteristics of 500 major cities around the world.

 The analysis in this post concerns itself with the following questions:
  1. Is there a relationship between the cost of living and local purchasing power?
  2. Which is the primary driver of cost in "expensive" cities - rent, or non-rent costs of living?
  3. How can we visualize the data - and highlight the differences between regions?

The data was sourced from Numbeo.com, which hosts user-contributed data - current within the last 18 months.
The IPython Notebook for this project is available on github.

First, let's take a look at our key metrics - local purchasing power and total cost of living (including rent) - on a global scale:

Next, let's look at the distribution of purchasing power in each region of the world.

The first three graphs plot the proportion of cities in each region that enjoy varying levels of wealth, relative to the worldwide median. The fourth graph plots the same metric for the world at large, against the medians for each of our regions. (These are just two slightly different views into the same data).

Picture
Picture
Picture
Picture

Next, let's look at the distribution of rent costs - same regions, different metric.
​

How much does the average price of rent vary from city to city, for each region in the world?
Picture
Picture
Picture
Picture

What is the relationship between non-rent costs of living, and cost of rent, for the top 500 major cities, worldwide?
Which of these costs varies more?
Picture
Picture
The scatterplot left illustrates cost of rent vs non-rent costs of living*.
The kdeplot (essentially, a smoothed histogram) shows a much greater spread in the cost of rent, compared to non-rent costs*.

What does this tell us?
  • The cost of rent is indeed correlated with the non-rent cost of living in any given city
  • While it is unclear which, if either,  variable is causal, an r-squared of .63 indicates a very strong degree of correlation.
  • A rise in the non-rent cost of living is correlated with a much greater rise in the cost of rent in that city
  • (Or, conversely, a rise in the cost of rent correlates to a predictable, but smaller increase in non-rent costs of living)

On average, as costs of living increase, rent increases a full 2.13 times faster.

*The above charts graph the delta in costs for each city, with relation to the worldwide median for each metric.
The scatterplot, then, does not illustrate absolute costs, but rather the ratio by which costs are more (or less) expensive than average.


​This raises an interesting question:

Given that rent and non-rent costs are strongly correlated, and given that rent rises faster (relative to the worldwide median) than non-rent costs of living, to what degree is rent (rather than non-rent) the major driver of variance in cost, for cities around the world?

In simple terms, are "expensive" cities expensive because rent in those cities are expensive, or are they expensive because non-rent factors are driving up the cost of living?

The following graph plots each of our major cities relative to the worldwide median cost of living

  • Cities range from most expensive on the left, to least expensive on the right.
  • The absolute height of each bar indicates the multiple above or below the worldwide average.
  • Red indicates the portion of the variance due to costs of rent
  • Blue indicates the portion of the variance due to non-rent costs
Picture
From this chart, it becomes immediately apparent that - in the vast majority of cities around the world - rent is the primary driver of cost.

The reason the typical expensive city is expensive is because rent in that city is expensive. Cheap cities, then, are cheap because rent is cheap.

The more astute readers will notice that more than half of our cities in this visualization fall above the "median." This is because we used a calculated median to address sampling bias in the data. Significantly more than half of the cities sampled are from rich countries. This means that taking a simple median (the 250th of 500 data points) would result in a measure that was more expensive than the true worldwide median cost of living. To fix this we first calculated the median cost in each region, then took the median of all of our regional medians. 

A final point that needs to be addressed is how we calculated the cost ratio for each city, relative to our median:

Numbeo creates their total cost of living index by attributing (essentially) equal weight to both rent, and non-rent costs of living. Thus, 50% of the cost for any city is derived from rent, and the other 50% from non-rent. Using these figures would have resulted in a boring and quite useless graph, with equal parts red and blue for every city. To get around this, we calculated two additional indexes for each city:

  • Cost of living (non-rent), as multiple above/below worldwide median (again, median of regional medians)
  • Cost of rent, as multiple above/below worldwide median

We then used these two columns to calculate one final metric: the proportion of the variance in total cost that is attributable to rent costs, specifically. This is the metric which determines our red/blue splits on the graph displayed above.

Here is another graph, highlighting the top five most expensive, and top five least expensive cities in our dataset:
Picture
Interestingly, San Francisco (a city in which the author of the study has lived) is the second most expensive location in the world, and nearly all of the reason it is expensive is due to the costs of rent. This is of course no surprise, as San Francisco holds claim to the most expensive real estate on the entire continent.

​
​
Next, let's explore another of our hypotheses:  
​Is there a relationship between cost of living and local purchasing power?


Comments are closed.
Powered by Create your own unique website with customizable templates.
  • Data Science Lab
  • Lucid Analytics Project