Create Geo-aware System: How to Better Detect Whether User Is Inside the City [Bonus: Implement Using Tile38]

Yusuf Syaifudin
10 min readJul 20, 2021

To engage user in a particular area, we need to attract them with local promotion or information. This can be done by capturing user location via GPS (of course by asking their permission first). Then queries to our database whether an user is need to know an information or eligible for a promotion based on their location data. The data sometimes only valid for a certain cities (which may have small area for each city), but how to ensure that user is really inside the city? How about if user 1KM away from the city, is it still valid?

This article will show you how to convert user latitude and longitude data into location name.

Using Paid Service

Well, the easier but costly solution is using 3rd party service. You can use:

This is good and scale very well, but for cost-aware start-ups we may need free basic solution before proving the business model gains revenue. For free solution, we need to write our own system. In the last section of this article I will share how tile38 — an open source (MIT licensed), in-memory geolocation data store, spatial index, and realtime geofence, can be used to solve this.

Before using Tile38, I want to share that we should never use single city’s latitude and longitude as the main data.

Using City Point

We starting with simple basic idea: list all of possible city latitude and longitude. When we get the user’s location, we match the user’s geo location with all of the possibility. But, this simply can’t be done because there will be small amount of the user (or it maybe zero) that have the exact match. So, then we added radius.

Supposed that we have only one city with name Sleman. When I search in Google maps with keyword “Sleman” it shows this URL. Its center point is -7.689355, 110.2411879.

https://www.google.com/maps/place/Sleman+Regency,+Special+Region+of+Yogyakarta/@-7.689355,110.2411879,11z/data=!3m1!4b1!4m5!3m4!1s0x2e7a5ee1c5671249:0x3027a76e352bc20!8m2!3d-7.7325213!4d110.402376

For example, we have a user located in -7.7677384, 110.3771634 (it is my campus location 😝). It actually in Sleman regency. But, as we see below, the user location is too far from city’s point. Even after we add radius 1KM it still couldn’t reach city’s center.

In order to make this easier to understand, I create the page in here https://yusufsyaifudin.github.io/wilayah-indonesia/radius.html

How if we move city’s point? Yes, we can, but it leads to another problem because we know:

  • City’s border is not round nor rectangle, it better represented using polygon.
  • If we add too large radius, it may resulting wrong values as it may near different city than actual user’s city location.

Using Polygon as Geo fencing

As we’ve seen in Google Maps screenshot above, Sleman city have a border somewhat like a mountain (pyramid). So, every users inside the border should be detected as inside the Sleman city.

To draw where is Sleman border, we can use GeoJson.

But, it may leave a question, how do I know the border of the city? Thanks to Openstreet Map for it’s free to use and under open license of data.

If you already have or know about GeoJSON data, you may skip this part. In this part I will tell you how to get GeoJSON data of the city’s border.

For example, if you want to get city’s border of Sleman, Indonesia. You can open https://www.openstreetmap.org and search “Sleman Indonesia”:

Search result of “Sleman Indonesia” on openstreetmap.org at 20 July 2021 16:26 GMT+7

Then select “County Boundary Sleman Regency, Special Region of Yogyakarta, Indonesia”, there you find the Sleman city’s boundary:

Sleman’s boundary on openstreetmap.org

In the top left corner you find that Sleman Regency have OpenStreetMap ID (OSMID) of 5615254. To get detail of this, we can use Overpass API:

You can select one of Overpass API Open in browser this URL and you will prompted to download the file output. It may take a while before download begin.

http://overpass-api.de/api/interpreter?data=(relation(5615254);>;);out;

If you took too long to get the response, it may because the server is busy. You can change using different server with the same payload:

https://overpass.kumi.systems/api/interpreter?data=(relation(5615254);>;);out;

or you can use cURL:

curl 'https://overpass.kumi.systems/api/interpreter?data=(relation(5615254);>;);out;' > 5615254.xml

Please note that the format data=(relation(osmid);>;);out; is MUST be written like that. It is a Overpass Query Language.

OSM XML data from Overpass API

Then you can convert it using https://tyrasd.github.io/osmtogeojson/

Voila, you get the GeoJSON data from OSM XML!

This process may exhausting if you do manual labor. You need to create a system to automate this process. But, for Indonesia, I have done it when I do my bachelor thesis. I forget where I put my script to crawl the data.

Back to the main topic, we already have one city’s border. Then we still assume that our user is on -7.7677384, 110.3771634. To illustrate this, I create https://yusufsyaifudin.github.io/wilayah-indonesia/geojson.html

And we got this:

User is inside the Sleman city.

Zoom in the area!

The user location (-7.7677384, 110.3771634) is inside the area (Sleman, Indonesia).

But, it still leave a problem. How if the polygom we’ve got from the internet or the one we manually create from https://geojson.io have sub-meters wrong data? Since the actual city border (IRL) vs in the map is different, we need to add some error threshold in our system.

Using Polygon as Geo Fencing and Add User Radius

Still using Sleman city as the polygon data, now we have an user at Museum Affandi. The latitude and longitude is -7.7828038, 110.3963476. Now using the same geojson data, we got this:

Museum Affandi is not inside the polygon, but IRL it located in Sleman regency.

Now we add radius 500 meters from the user position as the error threshold.

After adding the radius 500 meter, the circle is intersect with Sleman regency polygon.

Now, after adding the radius of 500m, the user location’s circle is intersected with the polygon. Here we can assume that now system can perfectly detect user location. But, wait, this is not the end. How if we add neighbor city’s polygon? Try add Yogyakarta, we got this:

Circle is intersected with both Yogyakarta and Sleman.

In this situation, the system may return Yogyakarta as the first rank result than Sleman. This should be acceptable because we don’t have true border data of each city. Also, compared to the first method which using city’s lat long only, this system should have less false positive and false negative.

Another example is when user is located in Masjid Baitul Arqom (-7.7928974, 110.3983692) to do prayer. This mosque is located in Bantul, so none of the current polygon in the map (Sleman and Yogyakarta) is match. But, since we use radius of 500 meter, we will see this:

User may be detected in Yogyakarta because of circle radius 500 meter is have more intersected area near Yogyakarta than Sleman.

User may be detected in Yogyakarta because of circle radius 500 meter is have more intersected area near Yogyakarta than Sleman.

Implement Using Tile38

We have done with the theory, now we can implement it using Tile38. I will not write thorough steps of the installation since it already documented very well in the website.

Supposed that you have successfully install and run the tile38-server.

You can add polygon of Sleman city using HTTP:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'SET cities sleman OBJECT geojson from https://github.com/yusufsyaifudin/wilayah-indonesia/blob/master/data/geojson/regency/3404.geojson'

Now try to get user in location -7.7677384, 110.3771634 with radius 0:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7677384 110.3771634 0'

It will return sleman :

{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 1,
"cursor": 0,
"elapsed": "47.899µs"
}
With user located in -7.7677384, 110.3771634 it detects Sleman.

Now, test if user located in Museum Affandi (-7.7828038, 110.3963476):

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 0'

It return empty response:

{
"ok": true,
"points": [],
"count": 0,
"cursor": 0,
"elapsed": "36.427µs"
}
With user located in -7.7828038, 110.3963476 it not detect any city.

Now, add radius of 500 meter:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '

It detect Sleman:

{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 1,
"cursor": 0,
"elapsed": "142.258µs"
}
With user located in -7.7828038, 110.3963476 and radius 500 meter it detects Sleman.

Now, add new city Yogyakarta:

SET cities yogyakarta OBJECT geojson from https://github.com/yusufsyaifudin/wilayah-indonesia/blob/master/data/geojson/regency/3471.geojson
Success adding Yogyakarta

And test the same user located in Museum Affandi (-7.7828038, 110.3963476) with radius 500 meter:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '
Museum Affandi (-7.7828038, 110.3963476) with radius 500 meter still return Sleman as first result even though is has more intersected area in Yogyakarta.
{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
},
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "155.937µs"
}

The result wasn’t expected, because Yogyakarta must on the first rank than Sleman. So I tried to lowering radius to 300, and it result the same:

Museum Affandi (-7.7828038, 110.3963476) with radius 300 meter still return Sleman as first result even though is has more intersected area in Yogyakarta.
Museum Affandi (-7.7828038, 110.3963476) with radius 300 meter using Tile38.

When I tried to set radius to 0, it works as expected where Yogyakarta is the only polygon that match:

Museum Affandi (-7.7828038, 110.3963476) with radius 0 meter works as expected: return Yogyakarta only.

After deleting all polygon data and then insert Yogyakarta first then Sleman, using same queries I got Yogyakarta. Maybe this is an issue of Tile38 where insertion order is matter.

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7828038 110.3963476 500 '

Response:

{
"ok": true,
"points": [
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
},
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "151.785µs"
}
After changing insertion order (Yogyakarta then Sleman), Museum Affandi (-7.7828038, 110.3963476) with radius 500 meter return Yogyakarta as first rank.

Now, back with inserting Sleman then Yogyakarta and user location is at Masjid Baitul Arqom (-7.7928974, 110.3983692) which actually located at Bantul:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7928974 110.3983692 500 '

Again, it return Sleman although in above we see that it has more area intersected in Yogyakarta rather than Sleman.

{
"ok": true,
"points": [
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
},
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "129.766µs"
}
Masjid Baitul Arqom (-7.7928974, 110.3983692) with radius 500 meter return Sleman as first result.

I tried to lowering radius to 230 meter and got that Yogyakarta is the only match:

Masjid Baitul Arqom (-7.7928974, 110.3983692) with radius 230 meter return only Yogyakarta.
Masjid Baitul Arqom (-7.7928974, 110.3983692) with radius 230 meter only intersected with Yogyakarta.

Same as above, I tried to change the insertion order. Now, I insert Yogyakarta then Sleman and do the same exact query:

curl -X POST 'localhost:9851' \
-H 'Content-Type: text/plain' \
--data-raw 'INTERSECTS cities POINTS CIRCLE -7.7928974 110.3983692 500 '

And got the result Yogyakarta first:

{
"ok": true,
"points": [
{
"id": "yogyakarta",
"point": {
"lat": -7.80279421806335,
"lon": 110.3762512207
}
},
{
"id": "sleman",
"point": {
"lat": -7.6903140544890505,
"lon": 110.38651657104501
}
}
],
"count": 2,
"cursor": 0,
"elapsed": "121.298µs"
}
When changing insertion order, Masjid Baitul Arqom (-7.7928974, 110.3983692) with radius 500 meter return Yogyakarta as first result.

For now we can consider this as a minor bug. This because when we query using radius of 0 meter we can still got the result if it really inside the polygon and got no result if outside the polygon. For the workaround, we can add radius by 10 meter gradually to see the nearest city, but it may cost the operation. For asynchronous system it may still acceptable, but for synchronous call it may cause severe system performance.

Conclusion

Reverse geocoding (translating latitude and longitude data into location name) is not easy to solve. But knowing limitation of each possible solution will minimize the problem or at least we know it is just the right system behaviour when system is wrong detect location info.

Yogyakarta, July 20th, 2021 20.48 GMT+7
Happy Ied Al Adha 1442 Hijri!

--

--