K-Means Algorithm – Roadtrip

The goal of the project is to create a website for travel planning on Google Maps using the K-Means algoritm

General description of the website:

[P_LA.png]

The website plans you a trip using Google Maps according to your choices. The user has the option to choose between –

K-Means using RANSAC

[P_LA_ransac.png]

K-Means using Coreset

[P_LA_coreset.png]

Both algorithms

[P_LA_both.png]

The user can choose the types of attractions he wants to see on the map.

[P_LA_places.png]

General workflow of the website:

The map bounds are saved and all the roads inside those bounds are sent to the server (NodeJS Server) as a JSON and the server runs a python file that runs the K-Means algorithm.

The points returned by the algorithm are marked on the map and a route is drawn between them, using DirectionsService in the Google Maps API.

The route is wrapped by multiple objects called RouteBoxer. Those objects contain every point within a given distance of the route.

Using the PlacesService in the Google Maps API, every attraction (of types chosen by the user) within the RouteBoxer objects are marked on the map asynchronously.

Input:

The roads are obtained from a file that includes most of the major roads in the world.

The file was genrated by using ArcGIS, with the help of Dr. Anna Brook, from the department of geography and enviormental studies in the University of Haifa.

Algoritm description:

Given a set L on n lines in (ℝ^2), the goal is to compute k-means (points) that minimize the sum of squared Euclidean distances over each input line to it’s nearest mean.

That problem can be solved via exhaustive search over every possible k-points out of G ⊆ (ℝ^2),

where G is the union over every closest two points on each pair of input lines, with O(n^k) running time – might be very slow when n increases.

The state-of-the-art approach is to sample a subset and run the slow algorithm on that sample, this algorithm is called RANSAC.

We use Coreset to solve that. Using a small (logarithmic size to the input) and smart weighted sample out of L called coreset,

we can run the slowest and naive algorithm on that coreset and with probability of at least 1/2,

we get the same results we would get if we ran the slow algorithm on the entire data, up to small ε additive error.

Comparison between RANSAC and Coreset K-Means algorithms:

The following experiment was carried out- We set the centers number to 3 and each time increased the sample size – out of 40,000 input 2D lines we got from the roads file.

The results were different from one data to another, in Los Angeles we got good results:

[P_Graph1.png]

However, different places yield different roads, and when the lines are not “clustered” and basically looks like one big mess of roads, Coreset doesn’t perform much better than RANSAC.

for example, these are the results we got from Shanghai, China:

[P_Graph2.png]

This happens when all the sensitivities (lines importance for the sample, see the paper) are more or less around the value 1/40000, which means uniform distribution given 40,000 input data points (lines).

Video presentation:

[V_KMeans_Roadtrip.avi]

Contact us:

Bar Hassin hashi7ster@gmail.com

Yuval Zernik yuvalzernik.yz@gmail.com