Data Rep: 500,000 Hotels

all

This week in Data Rep, we are working with a database of 500,000 hotels around the globe. The assignment is to create different maps for hotel star ratings, to find the northernmost hotel and to find the most remote hotel.

Split by Star Rating

In class, we went over creating the original map, shown at the top of the page. To make different maps based on star rating, I offset the X position based on the width of the map multiplied by the number of stars. The result is shown below (one star on left, five stars on right).

hotels_split_full

Northernmost

To find the northernmost, I check the latitude of each hotel against a running “largest latitude” variable. If the latitude of my current hotel is greater than my current largest latitude, I replace the largest latitude and continue checking. The result is the Radisson Blu Polar Hotel Spitsbergen Longyearbyen, located in Longyearbyen, Norway. Sure enough, the hotel’s website confirms its status as the “World’s northernmost full-service hotel.”

1
2
3
4
5
6
7
8
9
10
String[] northernmost;
float maxLa = -90.0;

void checkNorthernmost(float la, String[] cols) {
  if(la < maxLa) {
    maxLa = la;
    northernmost = cols;
    println("The current northernmost hotel is " + northernmost[1] + ", located in " + northernmost[4] + ", " + northernmost[7]);
  }
}

Most Remote (Guessing)

I figured that ultimately I would have to write code that manually checks every hotel’s coordinates against every other hotel’s coordinates to accurately determine the most remote. But since I knew that that code would take a while to run, I first wanted to figure out a faster method to guess the most remote location. The solution I came up with was to shrink the canvas so much so that one pixel represents a very large area of the Earth’s surface. That pixel would be dimmer depending on how many hotels were contained within; therefore the brightest pixels remaining contain only one hotel and are among the remotest.

To do this, I had to take two passes through the data. Starting with a black background and a 22x11px canvas, I plotted the map as normal, except ignoring star rating data and just putting down a white dot for every hotel:

firstpass

Then I passed through the data once more. This time, instead of a white dot, I drew a 50% opaque black dot for every hotel. Enough of these stacked on top of each other, as found in less remote areas, effectively turn the white pixel back to black. What’s left are only a few illuminated pixels:

secondpass

Next, I scaled that image up and laid it over the full map at 50% opacity. Since the pixels represent remote hotels, it was easy to look inside them and find the actual hotel on the high resolution map. I’ve placed arrowheads to the actual location of the dots so it’s easier to see at lower resolution. What these represent are the six most remote hotels on the map (numbers do not correlate to remoteness).

mostremote_guess

The problem with this is now I have to go back and find the names of the hotels. In relying on pixel data to figure out remoteness, I’ve baked out all supplemental data associated with each hotel location. This is where a little cheating helps: with the aid of a map overlay to help with borders and Google Maps, I could search around and find the hotels with relative ease, crosschecking the dataset to confirm names. They are, in no particular order:

  • Narwhal Inn, Resolute, Canada
  • Hotel Narsaq, Narsaq, Greenland
  • Talnakh Hotel, Talnakh, Russia
  • Azia, Blagoveschensk, Russia
  • Best Eastern Vm-Centralnaya, Magadan, Russia
  • Cocos Castaway, Cocos (Keeling) Islands, Australia

While this method involves manual labor, it took me about 10 minutes from running the Processing sketch to having my list of the six most remote hotels.

Most Remote (Checking)

My approach for checking these results was to have every hotel find its closest neighbor. The hotel with the most distant neighbor is the most remote. To implement this, I first created a class Hotel for every hotel:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class Hotel {
  private float _la,_lo;
  private String _name,_region,_country;
  private float _distanceToClosestHotel;
  private int _id;
 
  Hotel(float la, float lo, String name, String region, String country, int id) {
    _la = la;
    _lo = lo;
    _name = name;
    _region = region;
    _country = country;
    _id = id;
   
    _distanceToClosestHotel = 40000.0;
  }
 
  public void findDistanceToClosestHotel(ArrayList<Hotel> hotels, int hotelsLength) {
    for(int i=1;i<hotelsLength;i++) {
      Hotel h = hotels.get(i);
      if(h.getId() == _id) continue;
     
      float d = dist(h.getLo(), h.getLa(), _lo, _la);
      if(d < _distanceToClosestHotel) {
        _distanceToClosestHotel = d;
        if(d < 6) return; //threshold to improve performance - after running this with a threshold of d < .5 degrees, I determined that the most remote hotel is at least 6 degrees from its nearest neighbor  
      }
    }
   
    println(_name + ", " + _id + ": closest above threshold: " + _distanceToClosestHotel);
  }
 
  public float getLa() {
    return _la;
  }
 
  public float getLo() {
    return _lo;
  }
 
  public int getId() {
    return _id;
  }
 
  public float getDistanceToClosestHotel() {
    return _distanceToClosestHotel;
  }
 
  public String toString() {
    return _name + " in " + _region + ", " + _country;
  }
}

The class has one main function, besides the constructor: findDistanceToClosestHotel. Its job is to run through the provided ArrayList of every hotel (line 19), check its distance from it (line 23), compare that distance to the shortest distance it has found so far (line 24), and if that distance is shorter, replace the shortest distance (line 25) and continue checking. To improve performance, I added a threshold (line 26) which skips checking if it comes across a distance less than 6 degrees (a note about my unit of measurement: in lieu of converting geographic coordinate distance to miles, I’m leaving them in degrees. One degree on Earth is about 69 miles, though that can change since the planet is an ellipsoid). I first ran the code with a threshold of .5 degrees, which took half an hour. I saw a couple of hotels had a closest neighbor of at least six degrees away, so I then increased the threshold to 6 degrees, which lowered the time to around 15 minutes.

At this point, two bugs in the dataset made themselves known. First, the Fairmont Dubai had its longitude listed as 555 instead of 55. This made it the most remote hotel by a great distance, which is obviously not true since there are many hotels in Dubai. The second error was the Villa del Faro, a hotel that popped out as a candidate for most remote. According to the dataset, the hotel is in Baja California, which I figured was an odd place for a remote hotel. Sure enough, when I went into the data and compared the geographical coordinates there to the ones listed on Google Maps, the latitude and longitude were flipped and the latitude was off by 10 degrees.

With these errors fixed, I ran the code and got these results, (in descending order of remoteness):

  1. Talnakh Hotel, Talnakh, Russia: 13.236553 degrees
  2. Hotel Narsaq, Narsaq, Greenland: 11.598428 degrees
  3. Best Eastern Vm-Centralnaya, Magadan, Russia: 10.022774 degrees
  4. Cocos Castaway, Cocos (Keeling) Islands, Australia: 9.980954 degrees
  5. Manihiki Lagoon Villas, Aitutaki, Cook Islands: 8.503619 degrees
  6. Narwhal Inn, Resolute, Canada: 7.3379154 degrees
  7. Hipton Hotel, Kassala, Sudan: 7.1672177 degrees
  8. Kosrae Nautilus Resort, Lelu, Federated States of Micronesia: 7.049556 degrees
  9. American House (Amerikansky Dom), Irkutsk, Russia: 6.708204 degrees
  10. Crusoe Island Lodge, Robinson Crusoe Island, Chile: 6.5688543 degrees

One hotel I had earlier guessed would be among the most remote, Azia, is absent from the top 10. The remaining five earlier guesses are all here though, bolded in the list.

Having coded both ways of finding the most remote hotel, there really wasn’t much of a need for the guessing method. It was faster, but it was unable to tell me exactly which hotel was the remotest. It was a fun exercise however, and I’m glad that it actually worked as best it could.