User:Jules/geoipdatabases

From XPUB & Lens-Based wiki
About GeoIP databases



I have been been working with a GeoIP database recently. After running through a lot of questions on inaccuracy of Geomapping IP addresses, I thought it would be nice to have a look at how these type of databases were made. GeoIP databases are mostly not a free service. The information mapping Internet Protocol addresses and geography is a commercial product that you can use under a certain set of conditions. I use the Maxmind GeoIP legacy database which provides me with free information but is partial. If I wished to have more precise information, I'd have to pay 12$ per month for the country database and 90$ for the city database (including geographic coordinates, postcode ,etc). Maxmind makes the claim that their information covers 99.9999% of the IP addresses currently in use, that the precision service is 99,98% uptime since 2002. The precision service is advertised as being more precise and is billed per query but remains accessible through an API, which means that you cannot see the database but just interact with it. The terms and conditions state that it is absolutely forbidden to give access to external users. You cannot share information, nor use it to develop a commercial product yourself. Maxmind states that “the GeoIP2 data is updated weekly, based on insights gleaned from the MaxMind network”. The problematic aspect of that sentence is that it isn't to clear how this gleaning can guaranty the quality of the data and what constitutes the MaxMind network. One has to trust that they provide with accurate data although it is not made clear how the process involved can guaranty any level of quality. So I wondered how is it possible to make the claims that Maxmind is making. Where does the data come from? How do you gather such information? Answering those questions may provide with an idea regarding the level of trust you can attach to this type of service.

Part 1- IP address distribution isn't geographically determined

Both Ipv4 and IPv6 are assigned in hierarchical manner. Users are assigned IP addresses by Internet Service Providers (ISPs). ISPs obtain allocations of IP addresses from a Local Internet Registry (LIR), National internet Registry (NIR), or Regional Internet Registry (RIR). There are five Regional Internet Registries since the AfriNIC was created in 2005. This means that addresses are not assigned on a per country basis, but rather per big regions, which can be redefined.

RIR.png

IP address ranges are allocated as needed by IANA[1] to each of the Regional Internet Registries, in accordance to the global policy and document protocol assignment made by the IETF.

In the end we get a very hierarchical schema:
Users → ISP → RIR → IANA + IETF

IP addresses are not distributed in correlation with geographical places where they get attributed. So how can that be traced?

Part 2- How Geographic information about IP address gets generated

Each of the RIRs maintain a whois server which can be queried to find out not only which ISP has been assigned any netblock, but to a certain extent to which end-user, and that end-user's address. Many ISPs do not fill out this information for every single customer. Hence if you're a residential subscriber of a DSL service, it's likely that the Geo records will give the ISP's address, and not the actual address where the IP address has been allocated. The various GeoLocation providers mostly work by mining these whois records. Information is generated by looking at records that may be incorrectly filled. Not surprisingly, the legality of doing so is something of a grey area, as you cannot copyright facts but can argue there is creativity in the way you build up the structure that ties them together.[2].

Regarding "how it works": there's not much information provided by maxmind but it is obvious that the databases are to a large extent maintained manually, which explains the prices. Another service called HostIP[3], is more transparent on the method used. The database is free as the data gets collected by users through an API in this particular case.
In the last years the system of IP address space distribution has been even more decentralized which means small private vendors can acquire IPv4 address ranges regardless of geographic region. This is why Google acquired Urchin in 2005 (the service stopped in 2012) so they could use their services for Google Analytics, which provides very accurate IP-to-geographic-region information.

There is always an element of risk/estimation in mapping an IP address (or a domain name) to a physical location.

  1. The IANA, Internet Assigned Numbers Authority, is the organization in charge of the global coordination of the Internet Protocol addressing systems and the Autonomous System Numbers used for routing Internet traffic. IANA also maintains the root zone for the DNS, but that is completely separate from any IP allocation functions. Domain name operations and IP addresses are distinct.
  2. http://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._Rural_Telephone_Service_Co.
  3. http://www.hostip.info/about.html