Google Just Gets to the Tip of the Iceberg: How to Get to the Gems in the Deep Web
The Deep Web fascinates most of us and scares some of us, but is used by almost all of us. While over the past couple of years, more and more information has surfaced about the Deep Web, finding reputable information in those depths is still shrouded in mystery. Abe Lederman, CEO of Deep Web Technologies, wrote a guest article for Refer Summer 2016, republished in part below. Refer is an online journal published three times a year for the Information Services Group of the Chartered Institute of Library and Information Professionals (CILIP).
The Web is divided into 3 layers: the Surface Web, the Deep Web and the Dark Web. The Surface Web consists of several billion web sites, different subsets of which are crawled by search engines such as Google, Yahoo and Bing. In the next layer of the Web, the Deep Web consists of millions of databases or information sources – public, subscription or internal to an organization. Deep Web content is usually behind paywalls, often requires a password to access or is dynamically generated when a user enters a query into a search box (e.g. Netflix), and thus is not accessible to the Surface Web search engines. This content of the Deep Web is valuable because, for the most part, it contains higher quality information than the Surface Web. The bottom-most layer, called the Dark Web, gained a lot of notoriety in October 2013 when the FBI shut down the Silk Road website, an eBay-style marketplace for selling illegal drugs, stolen credit cards and other nefarious items. The Dark Web guarantees anonymity and thus is also used to conduct political dissent without fear of repercussion. Accessing the gems that can be found in the Deep Web is the focus of this article.
Michael Bergman, in a seminal white paper published in August 2001 entitled – The Deep Web: Surfacing Hidden Value, coined the term “Deep Web”. The Deep Web is also known as the Hidden Web or the Invisible Web. According to a study conducted in 2000 by Bergman and colleagues, the Deep Web was 400-550 times larger than the Surface Web, consisting of 200,000 websites, 550 billion documents and 7,500 terabytes of information. Every few years while writing an article on the Deep Web, I search for current information on the size of the Deep Web and I’m not able to find anything new and authoritative. Many articles that I come across still, like this article, refer to Bergman’s 2001 white paper.
Many users may not be familiar with the concept of the Deep Web. However If they have searched the U.S. National Library of Medicine PubMed Database, if they have searched subscription databases from EBSCO or Elsevier, if they have gone and searched the website of a newspaper such as the Financial Times or went to purchase a train ticket online, then they have been to the Deep Web.
If you are curious about what’s in the Deep Web and how can to find some good stuff, here are some places you can go to do some deep web diving…