Create Geo-aware System: How to Better Detect Whether User Is Inside the City [Bonus: Implement Using Tile38]
To engage user in a particular area, we need to attract them with local promotion or information. This can be done by capturing user location via GPS (of course by asking their permission first). Then queries to our database whether an user is need to know an information or eligible for a promotion based on their location data. The data sometimes only valid for a certain cities (which may have small area for each city), but how to ensure that user is really inside the city? How about if user 1KM away from the city, is it still valid?
This article will show you how to convert user latitude and longitude data into location name.
Using Paid Service
Well, the easier but costly solution is using 3rd party service. You can use:
- Google Maps API https://developers.google.com/maps/documentation/geocoding/overview#ReverseGeocoding
- Bing Maps https://www.bing.com/api/maps/sdk/mapcontrol/isdk/searchbypoint
- …
- or any similar services.
This is good and scale very well, but for cost-aware start-ups we may need free basic solution before proving the business model gains revenue. For free solution, we need to write our own system. In the last section of this article I will share how tile38 — an open source (MIT licensed), in-memory geolocation data store, spatial index, and realtime geofence, can be used to solve this.
Before using Tile38, I want to share that we should never use single city’s latitude and longitude as the main data.
Using City Point
We starting with simple basic idea: list all of possible city latitude and longitude. When we get the user’s location, we match the user’s geo location with all of the possibility. But, this simply can’t be done because there will be small amount of the user (or it maybe zero) that have the exact match. So, then we added radius.
Supposed that we have only one city with name Sleman. When I search in Google maps with keyword “Sleman” it shows this URL. Its center point is -7.689355, 110.2411879.
For example, we have a user located in -7.7677384, 110.3771634 (it is my campus location 😝). It actually in Sleman regency. But, as we see below, the user location is too far from city’s point. Even after we add radius 1KM it still couldn’t reach city’s center.
In order to make this easier to understand, I create the page in here https://yusufsyaifudin.github.io/wilayah-indonesia/radius.html
How if we move city’s point? Yes, we can, but it leads to another problem because we know:
- City’s border is not round nor rectangle, it better represented using polygon.
- If we add too large radius, it may resulting wrong values as it may near different city than actual user’s city location.
Using Polygon as Geo fencing
As we’ve seen in Google Maps screenshot above, Sleman city have a border somewhat like a mountain (pyramid). So, every users inside the border should be detected as inside the Sleman city.
To draw where is Sleman border, we can use GeoJson.
But, it may leave a question, how do I know the border of the city? Thanks to Openstreet Map for it’s free to use and under open license of data.
If you already have or know about GeoJSON data, you may skip this part. In this part I will tell you how to get GeoJSON data of the city’s border.
For example, if you want to get city’s border of Sleman, Indonesia. You can open https://www.openstreetmap.org and search “Sleman Indonesia”:
Then select “County Boundary Sleman Regency, Special Region of Yogyakarta, Indonesia”, there you find the Sleman city’s boundary:
In the top left corner you find that Sleman Regency have OpenStreetMap ID (OSMID) of 5615254. To get detail of this, we can use Overpass API:
You can select one of Overpass API Open in browser this URL and you will prompted to download the file output. It may take a while before download begin.
http://overpass-api.de/api/interpreter?data=(relation(5615254);>;);out;
If you took too long to get the response, it may because the server is busy. You can change using different server with the same payload:
https://overpass.kumi.systems/api/interpreter?data=(relation(5615254);>;);out;
or you can use cURL:
curl 'https://overpass.kumi.systems/api/interpreter?data=(relation(5615254);>;);out;' > 5615254.xml
Please note that the format data=(relation(osmid);>;);out;
is MUST be written like that. It is a Overpass Query Language.
Then you can convert it using https://tyrasd.github.io/osmtogeojson/
This process may exhausting if you do manual labor. You need to create a system to automate this process. But, for Indonesia, I have done it when I do my bachelor thesis. I forget where I put my script to crawl the data.
Back to the main topic, we already have one city’s border. Then we still assume that our user is on -7.7677384, 110.3771634. To illustrate this, I create https://yusufsyaifudin.github.io/wilayah-indonesia/geojson.html
And we got this:
Zoom in the area!
But, it still leave a problem. How if the polygom we’ve got from the internet or the one we manually create from https://geojson.io have sub-meters wrong data? Since the actual city border (IRL) vs in the map is different, we need to add some error threshold in our system.
Using Polygon as Geo Fencing and Add User Radius
Still using Sleman city as the polygon data, now we have an user at Museum Affandi. The latitude and longitude is -7.7828038, 110.3963476. Now using the same geojson data, we got this:
Now we add radius 500 meters from the user position as the error threshold.
Now, after adding the radius of 500m, the user location’s circle is intersected with the polygon. Here we can assume that now system can perfectly detect user location. But, wait, this is not the end. How if we add neighbor city’s polygon? Try add Yogyakarta, we got this:
In this situation, the system may return Yogyakarta as the first rank result than Sleman. This should be acceptable because we don’t have true border data of each city. Also, compared to the first method which using city’s lat long only, this system should have less false positive and false negative.
Another example is when user is located in Masjid Baitul Arqom (-7.7928974, 110.3983692) to do prayer. This mosque is located in Bantul, so none of the current polygon in the map (Sleman and Yogyakarta) is match. But, since we use radius of 500 meter, we will see this:
User may be detected in Yogyakarta because of circle radius 500 meter is have more intersected area near Yogyakarta than Sleman.
Implement Using Tile38
We have done with the theory, now we can implement it using Tile38. I will not write thorough steps of the installation since it already documented very well in the website.
Supposed that you have successfully install and run the tile38-server.
You can add polygon of Sleman city using HTTP:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'SET cities sleman OBJECT geojson from https://github.com/yusufsyaifudin/wilayah-indonesia/blob/master/data/geojson/regency/3404.geojson'
Now try to get user in location -7.7677384, 110.3771634 with radius 0:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7677384 110.3771634 0'
It will return sleman
:
{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 1,
"cursor": 0,
"elapsed": "47.899µs"
}
Now, test if user located in Museum Affandi (-7.7828038, 110.3963476):
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 0'
It return empty response:
{
"ok": true,
"points": [],
"count": 0,
"cursor": 0,
"elapsed": "36.427µs"
}
Now, add radius of 500 meter:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '
It detect Sleman:
{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 1,
"cursor": 0,
"elapsed": "142.258µs"
}
Now, add new city Yogyakarta:
SET cities yogyakarta OBJECT geojson from https://github.com/yusufsyaifudin/wilayah-indonesia/blob/master/data/geojson/regency/3471.geojson
And test the same user located in Museum Affandi (-7.7828038, 110.3963476) with radius 500 meter:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '
{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
},
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "155.937µs"
}
The result wasn’t expected, because Yogyakarta must on the first rank than Sleman. So I tried to lowering radius to 300, and it result the same:
When I tried to set radius to 0, it works as expected where Yogyakarta is the only polygon that match:
After deleting all polygon data and then insert Yogyakarta first then Sleman, using same queries I got Yogyakarta. Maybe this is an issue of Tile38 where insertion order is matter.
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '
Response:
{
"ok": true,
"points": [
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
},
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "151.785µs"
}
Now, back with inserting Sleman then Yogyakarta and user location is at Masjid Baitul Arqom (-7.7928974, 110.3983692) which actually located at Bantul:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7928974 110.3983692 500 '
Again, it return Sleman although in above we see that it has more area intersected in Yogyakarta rather than Sleman.
{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
},
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "129.766µs"
}
I tried to lowering radius to 230 meter and got that Yogyakarta is the only match:
Same as above, I tried to change the insertion order. Now, I insert Yogyakarta then Sleman and do the same exact query:
curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7928974 110.3983692 500 '
And got the result Yogyakarta first:
{
"ok": true,
"points": [
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
},
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "121.298µs"
}
For now we can consider this as a minor bug. This because when we query using radius of 0 meter we can still got the result if it really inside the polygon and got no result if outside the polygon. For the workaround, we can add radius by 10 meter gradually to see the nearest city, but it may cost the operation. For asynchronous system it may still acceptable, but for synchronous call it may cause severe system performance.
Conclusion
Reverse geocoding (translating latitude and longitude data into location name) is not easy to solve. But knowing limitation of each possible solution will minimize the problem or at least we know it is just the right system behaviour when system is wrong detect location info.
Yogyakarta, July 20th, 2021 20.48 GMT+7
Happy Ied Al Adha 1442 Hijri!