What is the deep web?

The deep web is an umbrella term for parts of the internet not fully accessible using standard search engines such as Google, Bing and Yahoo. The contents of the deep web range from pages that were not indexed by search engines, paywalled sites, private databases and the dark web.

Every search engine uses bots to crawl the web and add the new content they find to the search engine’s index. It isn’t known how large the deep web is, but many experts estimate that search engines crawl and index less than 1% of all the content that can be accessed over the internet. The searchable content of the web is referred to as the surface web.

Much of the content of the deep web is legitimate and noncriminal in nature. Deep web content includes email messages, chat messages, private content on social media sites, electronic bank statements, electronic health records (EHR) and other content that is accessible one way or another over the internet.

Any website that is paywalled, such as the text of news articles or educational content site that requires a subscription, is also blocked from search engine bots. Fee-for-service sites like Netflix are also not crawled by the bots.

For that reason, there are some advantages to the deep web. For starters, much of the content on the deep web is irrelevant and would only make searches that much more difficult. And there’s also a privacy issue; no one would want Google bots crawling their Netflix viewings or Fidelity Investments account.

The Deep Web and the Dark Web have been conflated in public discourse. Most people don’t know that the Deep Web contains mostly benign sites, such as your password-protected email account, certain parts of paid subscription services like Netflix, and sites that can be accessed only through an online form. (Just imagine if someone could access your Gmail inbox by simply googling your name!) Also, the Deep Web is huge: back in 2001, it was estimated to be 400–550 times larger than the Surface Web, and it’s been growing exponentially since then.

Also called the hidden web or invisible web, the deep web is different from the surface web, where contents can be accessed through search engines. Information on sites like Investopedia is part of the surface web, as it can be reached through search engines. Most experts estimate that the deep web is much bigger than the surface web. Many web pages are dynamically generated or do not have links from other sites. Without links from previously indexed sites, the search engines cannot find them. That is why getting links from other pages is a basic principle of search engine optimization (SEO).

Fee-for-service sites are another major source of deep web content. Although fee-for-service sites, such as Netflix, are visible on the web, most of their content is not. Customers must pay a fee, create a user id, and set up a password to get most of the material offered by these sites. Only those willing and able to pay the fees for these sites can get access to their content. This restriction of information to paying customers goes against the egalitarian spirit of the early internet. While access to movies might seem trivial, serious research tools like JSTOR and Statista also charge fees. Private databases are also a crucial component of the deep web. Private databases can be as simple as a few photos shared between friends on Dropbox. They also include financial transactions made on major sites like PayPal. The key feature of private databases is that people wish to share it with just certain people or preserve this information without having it publicly accessible to everyone. That makes it part of the deep web rather than the surface web.