If you've ever needed geolocation (address to GPS coordinates) or reverse geolocation, you have most probably used Google. And then forgot about it. Unless you are doing an insane amount of queries, the cost doesn't really become prohibitive, until it does. And their data is good, really good.Well, we started using reverse geocoding a rather lot in one of the projects, so I had to dig a little deeper, look at the alternatives, and whether it's possible to cache more. I searched far and wide but didn't manage to find anything else than some partial lists of services providing geocoding. So I had to dig deeper myself. And it might have gotten a little out of hand, so I decided to publish my findings so that YOU don't have to jump through all these hoops.In addition to service comparison, I will list the most common pitfalls, hopefully saving you a couple of hours of work, should you be implementing or evaluating these services. I looked at 14 different services, providing at least on principle a street number level accuracy. That's ideally five decimal places by the way. You might be able to live with four decimals just fine, accuracy, in this case, being 3m - 4.5m.
Google; good, bad & the confusing
For the project I needed the geolocation for, Google was my choice. And would be good to go, if it weren't for two things: cost & cache policy. After free credits, it can get might expensive. According to their , you are not allowed to cache the data in any way. I just recently ran an analysis of a data set, which costs some hundreds per month with Google, but should it be cached, the bill would be in some dollars. Need to do a geolocation query at each page load of a busy ecommerce site? We are easily looking at tens of thousands of dollars per month.
When I write about geolocation queries, I refer to backend calls. Google has several client-side interfaces that are free to use, such as their Android and iOS SDK's, where they provide awesome suggestive search without any cost. So you can help users find a place for free, but then when you actually extract info about the said place, it's cha-ching again. Also, because Google hasn't been very good at staying consistent with their licensing, this means that your model of implementation might become prohibitively expensive overnight.
So the alternatives, you can just switch, right?
There are tons of alternatives and since the implementation is only one backend call, the change in your implementation is probably less than an hour. Firstly, about the alternatives; you will never get a 100% match on data quality of Google. You'll find a data quality comparison at the end of the artcile. From that you can see, that the spread is pretty big. And a lot of the services miss some of the very basic things like reporting the city wrong. Something that's fairly specific to countries with Cyrillic letters, sometimes your might get a return on streets in Cyrillic characters, sometimes Latin. Ulitsa (street) might be Улица, Ул. or Ul and as some services don't return all address components in a well-defined way, you might end up with different permutations, such as:
• ulitsa „Georgi S. Rakovski“ 59
• 59 ul Georgi Rakovski
• Улица Георги Раковски
• Улица Георги С. Раковски 59
• etc.
I'm sure you get the idea ... For many applications, this is not such a big problem, but for anything, where the documents are official, it can be a real disaster. Say you got a parking ticket unless the location is written exactly as is the official format, it would be contestable. When comparing services, you should compare them for your target area and country to get a full picture. Some other things to note are the huge variance in pricing, pricing models that can be really hard to calculate at times, rate limiting/throttling (calling directly on user's session?), and of course the already mentioned TOS policies regarding caching.
How we ended up implementing it
Our solution might result in some providers updating their TOS, but so be it. We do it quite simply so, that if a cached value does not exist on our database, we serve Google queried value. After this, we try to see if we get 100% or close match from any of the other services, and save that to our caching database as per caching policies. Might be a little overengineered solution, but it does give us very good flexibility to work with any provider at any time.
Where is all this data from & special mentions
It's not very surprising, but if a service provider doesn't have their own worldwide mapping data (like Google or Here), they will need to aggregate from numerous sources.
There are two main aggrate open-source data providers:
&
OpenAddress collects the data mostly for countries public sources and provides them as downloadable data sets. Unfortunately, the data sets are not standardised, so if you want to turn into a nifty searchable database, you will need to put in some effort. Looks like the majority of the providers have committed that effort and are using the data from OpenAddress and OpenStreetMap.
Things to consider when analysing the output
Output from different services differs. Some give accuracy estimation, with some services you could get a large number of responses, all services have the information organised in a different way.Here is an example output from Precisely, showing us the number of matches & the precision level:
An example from the same service, when the address information is well organised:
So you will need some further logic if you use several providers to normalise the data for your app.
Data analysis from different geolocation services
I put together a small Go app that takes a small data set, where the GPS coordinates are from around the world. These are first translated by Google and then compared to this data for accuracy. It's not looking for an exact match, but something that gives an address with a certain degree of similarity. The accuracy of this analysis is obviously not 100%, so it's more of a rough overview.
Query median times are calculated from all requests, discarding the outliers and executed from Europe. Prices are from the lowest tier when pricing is tiered. Street address only means that we got a fairly good match on the street address. Accuracy takes into account also the city, region, and postcode. So you might have poor accuracy for the street, but the overall accuracy is better, due to more information. Ah, the averages.
- Accuracy: baseline
- Street address only: baseline
- Median query time: 97.20ms
- Cost per 1000: 4.22€
- Accuracy: 84.25%
- Street address only: 61.21%
- Median query time: 89.92ms
- Cost per 1000: 1€
- Accuracy: 71.75%
- Street address only: 31.73%
- Median query time: 93.64ms
- Cost per 1000: 0.1€
- Accuracy: 50.88%
- Street address only: 53.40%
- Median query time: 750.22ms
- Cost per 1000: 0.1€
- Accuracy: 78.12%
- Street address only: 50.83%
- Median query time: 192.88ms
- Cost per 1000: 0.15€
- Accuracy: 80.30%
- Street address only: 54.88%
- Median query time: 144.62ms
- Cost per 1000: ≈0.16€
- Accuracy: 25.12%
- Street address only: 40.05%
- Median query time: 102.86ms
- Cost per 1000: 0.5€
- Accuracy: 52.36%
- Street address only: 27.30%
- Median query time: 157.25ms
- Cost per 1000: 3.3€
- Accuracy: 24.87%
- Street address only: 28.01%
- Median query time: 103.94ms
- Cost per 1000: n/a
- Accuracy: 62.63%
- Street address only: 38.68%
- Median query time: 471.20ms
- Cost per 1000: 0.84€
- Accuracy: 52.12%
- Street address only: 28.59%
- Median query time: 181.08ms
- Cost per 1000: 10€
Your mileage might vary so if you've found some service that works exceptionally well, do share it in the comments. If there is one key takeaway, it would be the fairly obvious one: compare before you commit.