Google’s ascent is one of the most remarkable technological feats of our modern era.
Not only did Google require engineering prowess and a unique penchant for digital innovation, but it also required long-term strategy, agility, and foresight.
To remain ahead of the curve, Google has had to undergo huge development.
- In 1999, Google spent a month indexing 50 million webpages.
- In 2012, Google could index 50 million webpages in one minute.
- In 1998, when Google was founded, it was serving 10,000 searches each day.
- By the end of 2006, Google served 10,000 searches every second. (InternetLiveStats)
Now, Google serves some 90,000 searches each second – since you’ve been reading this, Google has served millions of searches for millions of people worldwide!
Of course, many of these stats are indicative of how technological capacity has grown exponentially on the whole, but it also begs the question of how Google manages to do it.
What Are Data Centers?
Data centers are buildings that house data infrastructure and other technology, including the various software stacks required to run the technology.
Data centers are comprised of servers, data storage devices, and processor systems, plus a vast plethora of supportive technologies, power grids, and security systems.
Google’s data centers are unique in that they house the hardware and software required to maintain Google’s services, including Google Search and the Google search index which is over 100,000,000 gigabytes in size.
The data center has to store this data, but also has to serve it to the user.
Data Center Technologies
Data center technologies include:
- Routers
- Servers
- Application-delivery controllers
- Software security (e.g. firewalls)
- Hardware security (e.g. biometric or passcode entrances and exits, blast-proof walls)
- Storage systems
- Switches
- Environmental control (e.g. temperature, dehumidification)
- Cooling
- Load balance and fault tolerance software for managing issues and errors
- Predictive maintenance tools to measure system decline and forecast/schedule engineering/maintenance works
How Many Google Data Centers Are There?
Google currently has 21 data center locations.
- 13 data centers in North America
- 1 data center in South America
- 5 data centers in Europe
- 2 data centers in Asia
Between these 21 data centers, in 2016, it was predicted Google had some 2.5 million servers.
The current figure is likely much higher, especially as Google has invested considerably in providing cloud computing services to rival AWS and Microsoft Azure.
There has been considerable speculation that Google owns backup or other secret data centers, but these have not been verified to date.
It’s worth mentioning that Google’s cloud data centers are not the same as the data centers they use to serve Google services, though there is likely some cross-over.
Google Cloud data centers are available in 24 regions right now in 2021.
Google Data Center Technology
Google’s Data Center technology can be broken down into two broad categories:
Indexation
Google says they know about some 130 trillion pages or more, but this doesn’t represent how many pages are actually in their index.
The index is different from the internet. Imagine the index as the collections of books that the librarian has read, or analyzed, for placing in the library database. Aside from these books, there are stacks of other books the librarian hasn’t looked indexed – they may have had a look and decided the content is not worthy of the index.
This information resides in what has come to be known as the “deep web”. Whilst the connotations of the deep web is often dark or even criminal, this really isn’t the case.
The deep web is mostly full of unpublished or half-finished or half-dead material, any content that resides between paywalls or sign-in credentials, content that the owners have blocked from indexing, and much more.
Google’s index works via an inverted index system where the index obtains documents listed by query. When you use a query on Google, it will retrieve the relevant material from where it’s stored in the index and serve it to you.
Documents IDs are divided into what are known as database shards, divided and replicated between many servers.
In 2010, Google designed a new index called “Caffeine” that continuously crawled the search index to update it at rapid periodical intervals. These days, the entire search index is created by crawlers that work in tandem with indexation algorithms to locate, parse, save and rank internet data.
Google Data Center Server Types
Google uses many hardware and software technologies, some are publicly known, others are secret.
There are several server types:
- Web servers form the backbone of the data center’s networking tech. These execute the queries of internet users. Queries are sent from Google Search to index servers. These are coordinated with the ranking algorithms and also receive information on auto-suggest and other search features. The servers will deliver search results and all other relevant information.
- Google stores internet documents and files in document servers. These return documents relevant to queries.
- Google employs spiders and crawlers that live inside the web. Google’s crawler is known as GoogleBot. This robot network crawls the web to locate new and existing web pages for potential ranking in the index.
- Other servers manage other Google services ranging from Maps to Shopping and Ads.
Environment and Security
Google Data Center security was hush-hush until Google published some posts on the interior of their data centers. Indeed, they’re not as secretive as they once were.
Security is provided by a multitude of entry protocols and security tiers controlled using badges. Retinal scanning and multi-stage doors are also used at some data centers.
Hardware and software security is obviously a huge priority and there are numerous in-house-developed and third-party technologies used here to safeguard both data and hardware from all forms of vulnerability and attack.
Some of the most common criticisms of data centers revolve around energy usage and environmental impact. Some data centers are thought to use billions of gallons of water daily to cool servers and other hardware.
Back in 2016, Google pledged to use 100% renewable energy and is one of the world’s biggest corporate investors in renewables technologies today. It’s unclear what progression has been made on that earlier claim, but progress has certainly been made.
How Google Serves Your Query
The data center is the brain and lifeblood of Google Search and other Google services.
Google has become phenomenally intelligent over the years and is now able to serve queries with extreme accuracy.
This involves a few steps:
- Google uses natural language processing (NLP), a branch of machine learning (ML), to understand your query. Different operative words you use will alter the results.
For example, ‘change’ can be an entirely different operative word to ‘replace’, despite them being interchangeable in some contexts. Google will understand when ‘replace’ is being used differently to ‘change’ to modify the query. - Once Google understands your query, it will calculate the most relevant responses. This is where SEO comes into play as Google will have already ranked responses for that query.
Google uses a priority system to allow it to serve users with the best results. Optimized pages that Google thinks will help the user will be at the top of the pecking order in the search engine results page (SERPs).
- Google will also take the context of the search into account, including your location and other personalized variables.
For example, if you’re currently in Barcelona and Barcelona vs Real Madrid is on that night, the query “football” will bring up results about this match.
All of these actions are processed at Google’s data centers. They can be thought of as super-computers that are networked to the web via colossal numbers of servers.
Of course, Google’s data centers will also process other information and data related to their business, whether that’s data related to the Android mobile operating system or data related to their DeepMind AI platform.