Deep Web Tech Blog

  • An Innovative Approach to Discovery

    I was pleasantly surprised and pleased when I woke up one recent morning to an email message from Nick Dimant, Managing Director of our partner PTFS Europe. My company and PTFS Europe were partners-in-crime in a unique (hopefully to be repeated many more times) collaboration at the University of the Arts, London (UAL).ual-cilip

    Nick had sent me a copy of – An innovative approach to discovery (available here), a feature article in the June 2016 issue of Update, the monthly magazine of CILIP (Chartered Institute of Library and Information Professionals) by Karen Carden, Resources & Systems Manager, Library Services, UAL and by Jess Crilly, Associate Director, Content and Discovery, Library Services, UAL.

    Carden and Crilly explain in their article in detail their justification and approach to implementing their Library Search solution which “brings together two separate products into (what looks like) a single interface for the user where they can search across our print and e-resources.”

    Carden and Crilly discuss their selection of Explorit Everywhere! back in 2013 (which of course I love) in:

    “After a great deal of research, discussion and testing we opted for an unusual – especially in the UK – next generation federated search tool. Like most libraries in the sector we had experienced first generation federated search, but found that this was quite a different experience.”

    The authors describe UAL as a specialist university. What this means to me is that as a specialist university focused on the arts, a lot of the databases that UAL subscribes to are not mainstream databases and thus not included in the Discovery Services but easily federated by Explorit Everywhere!.

    We give another example in Federating the Unfederatable of a specialist library, this time a defense/international policy focused university where Explorit Everywhere! provides the one-stop discovery of all the sources important to the library patrons, many not available through the Discovery Services.

    If you’d like to read further on our Explorit Everywhere! solution at UAL check out these blog articles: Customer Corner – Paul Mellinger presentationPromoting Explorit Everywhere! at UAL, and Faceted Navigation – UAL example


  • Faceted Navigation – Explorit Everywhere!’s Coolest New Feature

    In a January Sneak Peek blog article Darcy gave us a preview of some of what my engineers were working on. Now, I am excited to present to you faceted navigation, one of the coolest ever features added to Explorit Everywhere!

    In an excerpt from Peter Morville’s Search Patterns (a classic on designing effective search focused User Interfaces published in 2010), Morville quotes Professor Marti Hearst (from UC Berkeley) as saying,

    “Faceted Navigation is arguably the most significant search innovation of the past decade.”

    michaelangeloMorville describes faceted navigation simply: “It features an integrated, incremental search and browse experience that lets users begin with a classic keyword search and then scan a list of results” (p. 95).

    Our faceted navigation, combined with our clustering technology offers the researcher a more refined approach for zooming in to find the most relevant results from their search.  When reviewing the cluster facets, which show other related terms to the search query, the researcher can narrow their results by selecting a Topic. With the Topic selected, the clusters are refreshed using those associated Topic results, and the researcher is presented with new facets only related to that selected Topic. It cuts out the noise, and allows the user to review very specific results.

    Let’s now take a look at how faceted navigation works on one of our customer solutions at the University of the Arts, London.

    mich-leftWe will start with a search for “Michelangelo” which returns 2,785 results (See Figure 1 above). And in the Topics list of clusters, we can see that there are several related topics: Art, Artist, Design, Sistine Chapel, David, Analysis, and so forth. These topics were derived from the results metadata returned from the 50 databases searched simultaneously.
    By selecting the topic facet Sistine Chapel, the cluster facets were re-generated using the 81 results for that topic (See Figure 2). With this new view of the selected results, we now see more specific topics related primarily to Michelangelo’s Sistine Chapel. While the topic of Ceiling Frescos looks interesting, I am curious to focus on the images under the facet Document Type.

    As the researcher explores their results, our faceted navigation generates “bread crumbs” that record the drill-down of steps taken. In Figure 3, we see the trail of selections we have made so far. Clicking on > Sistine Chapel will let me step back up, and step down into the Ceiling Frescos when I want to. See Figure 4 below for some of the interesting images I found of Michelangelo’s Sistine Chapel.


    If you’d like to learn more about Explorit Everywhere! clustering check out these two blog articles:  Clusters that Think and Explorit Now Features Visual Clustering.


  • Google Just Gets to the Tip of the Iceberg: How to Get to the Gems in the Deep Web

    The Deep Web fascinates most of us and scares some of us, but is used by almost all of us. While over the past couple of years, more and more information has surfaced about the Deep Web, finding reputable information in those depths is still shrouded in mystery. Abe Lederman, CEO of Deep Web Technologies, wrote a guest article for Refer Summer 2016, republished in part below. Refer is an online journal published three times a year for the Information Services Group of the Chartered Institute of Library and Information Professionals (CILIP).


    The Web is divided into 3 layers: the Surface Web, the Deep Web and the Dark Web. The Surface Web consists of several billion web sites, different subsets of which are crawled by search engines such as Google, Yahoo and Bing. In the next layer of the Web, the Deep Web consists of millions of databases or information sources – public, subscription or internal to an organization. Deep Web content is usually behind paywalls, often requires a password to access or is dynamically generated when a user enters a query into a search box (e.g. Netflix), and thus is not accessible to the Surface Web search engines. This content of the Deep Web is valuable because, for the most part, it contains higher quality information than the Surface Web. The bottom-most layer, called the Dark Web, gained a lot of notoriety in October 2013 when the FBI shut down the Silk Road website, an eBay-style marketplace for selling illegal drugs, stolen credit cards and other nefarious items. The Dark Web guarantees anonymity and thus is also used to conduct political dissent without fear of repercussion. Accessing the gems that can be found in the Deep Web is the focus of this article.

    Michael Bergman, in a seminal white paper published in August 2001 entitled – The Deep Web: Surfacing Hidden Value, coined the term “Deep Web”. The Deep Web is also known as the Hidden Web or the Invisible Web. According to a study conducted in 2000 by Bergman and colleagues, the Deep Web was 400-550 times larger than the Surface Web, consisting of 200,000 websites, 550 billion documents and 7,500 terabytes of information. Every few years while writing an article on the Deep Web, I search for current information on the size of the Deep Web and I’m not able to find anything new and authoritative. Many articles that I come across still, like this article, refer to Bergman’s 2001 white paper.

    Many users may not be familiar with the concept of the Deep Web. However If they have searched the U.S. National Library of Medicine PubMed Database, if they have searched subscription databases from EBSCO or Elsevier, if they have gone and searched the website of a newspaper such as the Financial Times or went to purchase a train ticket online, then they have been to the Deep Web.

    If you are curious about what’s in the Deep Web and how can to find some good stuff, here are some places you can go to do some deep web diving…

    Read more at Refer Summer 2016

  • Extending our Account Management Services

    The Professional Services department at Deep Web Technologies is extending its Account Management 3computers-swirlefforts to assist our customers in getting the most out of their Explorit Everywhere! (EE!) service. As with most any software service, it is possible to use its basic features easily, and EE! is no exception. Everyone who has EE! can do searches, review results, and create alerts with ease. But, we also all know that to really get the most out of any technology, it needs to become part of our habits. And for EE! that means having it available where you need it whenever you have a question or an information need. After all, while Google has become a habit for many, EE!, with its high quality, authoritative, and often expensive sources purchased by our customers for their users, needs to be that habit for those information needs and questions that Google can’t easily answer.

    Explorit Everywhere! has several features that can help customers integrate their search application into their website. That way it’s there for their users wherever they need it. And folks at DWT want to help. Three quick and easy ways to better integrate your EE! searcher include:

    Another part of DWT’s Account Management campaign is to help our customers fine-tune their Explorit
    Everywhere! service so that the quality of results meets their users’ needs. As we all know, while there are hundreds, if not, thousands of information databases, they all perform differently and provide a wide range of data. And the advantage of EE! is its ability to bring all those results from multiple databases together. So, given that your databases may be a combination of medical, news, journals, or other informational websites, we can help you tune your application so your searches display the most meaningful data.

    In the coming months, here are a couple of different things we will be doing behind-the-scenes to make your EE! searcher work better for your users:

    • We recently added the means to fine-tune how the first page of results are generated. As part of our Account Management services, we will be reviewing your EE! service to ensure that those first results are the highest ranking results.
    • In addition to our daily monitoring of your connectors to make sure they are working, we will be doing periodical reviews to make sure we are optimizing them for quality results. Sources often add new search fields and new results data. For example, we have been incorporating the ability to search using journal DOI or PubMed ID numbers in the Full Record field in EE!, which includes the Quick Search box. Our goal is to make sure your EE! service makes the most of the source.

    Lastly, we will be asking our customers to fill out a short survey to facilitate more discussion so we can better understand how to help make Explorit Everywhere! be the best service possible. We also look forward to having more conference calls with our customers. Ultimately, we want our customers’ users to best use EE! to fulfill their information needs.

  • Unique Chem Safety Gateway Developed in Collaboration with Stanford University Libraries

    Abe’s Note:

    I and some of my staff have had the pleasure to work closely with Grace Baysinger, Head Librarian and

    Photo credit: Kenneth Chan

    Photo credit: Kenneth Chan

    Bibliographer of the Swain Chemistry and Chemical Engineering Library at Stanford University, to develop a unique research gateway focused on chemical safety.

    My relationship with Grace goes back two decades when I developed SciSearch@LANL, a precursor to Web of Science for Los Alamos National Laboratory and Grace was our customer representative at Stanford.

    More recently we have worked closely with Grace on development of xSearch (Stanford’s name for Explorit Everywhere!, our largest federated search implementation.

    I have asked Grace to give us an overview of this important chemical safety resource that we have developed together.

    While chemists are one of the most intensive users of information, many are unfamiliar with chemical safety resources they should consult before working in the lab.   Chemists consulting materials safety data sheets or safety data sheets (MSDS/SDS) discover that they often have “NA” or not available for physical properties that they need for their lab work.

    Grace’s goal in working with Deep Web Technologies was to develop a research gateway that provides access to a wide collection of information sources focused on chemical safety.  This gateway uses federated search technology, the ability to search multiple sources at one time, which helps users find the information they need more effectively and efficiently.  It is possible to view results visually, move to a particular resource in the search results, and to set up an alert to be notified when new information is published on a topic.   Common search terms include chemical name, CAS Registry Number, and searching topics using keywords.  If a resource contains InChI or SMILES values for a chemical substance, it may be used as a search term too.

    Moving soon from prototype to production, the Stanford University version of the chem safety gateway will be a collaborative effort between the Stanford University Libraries and Stanford Environmental Health and Safety.   This gateway has 60+ information sources that includes SDS/MSDS, safety data, syntheses and reactions databases, citation databases, full-text eBooks and eJournals, plus a number of Health & Environmental Safety (EH&S) websites. While the SDS/MSDS and safety data resources formSample_Search_Results_Stanford-version the core of this collection, curated databases such as Organic Syntheses, Organic Reactions, Science of Synthesis, Merck Index, Reaxys, and the e-EROS (Encyclopedia of Reagents for Organic Synthesis) include protocols and safety information that is useful to bench chemists.  eBooks and eJournals are full-text searchable, allowing researchers to find property and safety information in handbooks and methods and protocols in journal articles.  EH&S websites from selected universities plus websites for the ACS Committee on Chemical Safety, ACS Division of Chemical Health and Safety, and the U.S. Chemical Safety Board will help users discover information such as training materials, standard operating procedures, and lessons learned.  Search results for a chemical name search also include the “Chemical Box” from Wikipedia in the right column.

    At the ACS National Spring 2016 Meeting held in San Diego, Grace and colleagues from Stanford’s EH&S Unit  gave a presentation on Using a chemical inventory system to optimize safe laboratory research in a Division of Chemical Health and Safety symposium.  The first part of this presentation covers ChemTracker and the latter part (starting on slide 23) shows screen shots of the Stanford Chem Safety Gateway.  Slide 24 has a list of the resources being searched in the Stanford gateway.  For a current list of resources, please see Grace’s recent blog entry, Chemical safety resource gateway available.

    Grace then helped the DWT team develop a public version of the Chem Safety Gateway that is available to test-drive at:

    This public site searches a subset of the sources that are searched at the Stanford site as DWT is not able to search subscription sources through their public site.

    Please test-drive the public version of the Explorit Everywhere! Chem Safety Gateway and give Abe feedback as to how useful it is to be able to search a broad set of chemical safety resources at the same time. Be sure to register (not required to search) to use the Alerts and MyLibrary features. Did you find that the gateway returned relevant results? What sources (subscription or public) would you add to make this gateway even better? Do you have any other suggestions for improving the gateway?

    Please email your feedback to

  • Let’s take a look at some really BIG “big data”

    A couple of months ago I came across the claim that we are generating 2.5 billion GB of new data every day and thought that I big-datashould write a fun little blog article about this claim. Here it is.

    This claim, repeated by many, is attributed to IBM’s 2013 Annual Report. In this report, IBM claims that in 2012 2.5 billion GB of data was generated every day of which 80% of this data is unstructured and includes audio, video, sensor data and social media as some of the newer contributions to this deluge of data being generated. IBM also claims in this report that by 2015 1 trillion connected objects and devices will be generating data across our planet.

    So how big is a billion GB? A billion GB is an Exabyte (a 1 followed by 18 zeros), i.e., 1000 petabytes or 1,000,000 terabytes.

    My research took me to this article in Scientific American —  What is the Memory Capacity of the Human Brain? which I pursued to try to put into context all the huge numbers I’ve been throwing around. Professor of Psychology Paul Reber estimates that:

    The human brain consists of about one billion neurons. Each neuron forms about 1,000 connections to other neurons, amounting to more than a trillion connections. If each neuron could only help store a single memory, running out of space would be a problem. You might have only a few gigabytes of storage space, similar to the space in an iPod or a USB flash drive. Yet neurons combine so that each one helps with many memories at a time, exponentially increasing the brain’s memory storage capacity to something closer to around 2.5 petabytes (or a million gigabytes).

    So if I’m doing my math right, the 2.5 billion GB of information generated daily could be stored by 1,000 human brains and human memory capacity is still way higher than our electronic storage capacity.

    My research then took me to this interesting blog article from 2015  – Surprising Facts and Stats About The Big Data Industry. Some of the facts and stats that I found most interesting in their infographic include:

    1. Google is the largest ‘big data’ company in the world, processing 3.5 billion requests per day, storing 10 Exabytes of data.
    2. Amazon hosts the most servers of any company, estimated at 1,400,000 servers with Google and Microsoft close behind.
    3. Amazon Web Services (AWS) are used by 60,000 companies and field more than 650,000 requests every second. It is estimated that 1/3 of all Internet users use a website hosted on AWS daily, and that 1% of all Internet traffic goes through Amazon.
    4. Facebook collects 500 terabytes of data daily, including 2.5 billion pieces of content, 2.7 billion likes and 300 million photos.
    5. 90% of all the data in the world was produced in the last 2 years.
    6. It is estimated that 40 Zettabytes (40,000 Exabytes) of data will be created by 2020.

    Another interesting infographic on how much data was generated every minute in 2014 by some of our favorite web applications is available at – Data Never sleeps 2.0.

    I would be remiss if I didn’t tie my blog article to what we do at Deep Web Technologies. So please take a look at our marketing piece – Take on Big Data & Web Sources with Deep Web Technologies. We’d love to hear from you and explore how we can feed content and data from a myriad of disparate sources to your big data analytics engine on the back end as well as explore how we can enhance the insights derived by your big data solutions by providing real-time access to content that complement these insights.

  • The Doctor is in the House

    ElleeCongratulations to Dr. Ellen Wilson (Ellee), our VP of Professional Services and Engineering, and a new PhD recipient. Ellee walked the PhD ceremony at her alma mater, Pacifica Graduate Institute, on Sunday, May 29, no doubt breathing a huge sigh of relief as she wrapped up this milestone achievement.

    Deep Web Technologies hires smart people; we have smart engineers, smart project managers, smart leaders. But Ellee just may be at the top of the list. While her day job may be to guide the ebb and flow of DWT’s team, projects, and software development, her real passion is depth psychology, the exploration of theories and ideas delving into the relationship between consciousness and the unconscious. 

    Ellee’s dissertation didn’t just add a new set of letters behind her name; her line of exploration is unique to the realm of depth psychology. She presents credible research and ideas about how the world imprints itself on people, tying the order of mathematics throughout history and the present day to the chaos of a more sensory-based world. Are you interested how the development of non-Euclidean geometry created a diversion in the linear focus at the time and contributed to a multiplistic expression of human thought and experience? So is Dr. Ellen Wilson.

    Don’t think that receiving her PhD is the end of the line for Ellee. From a very young age, Ellee has been driven to explore what makes us tick. She has multitudinous degrees, ranging from Women’s Studies, Mathematics, and Computer Engineering to Library and Information Science, and now, Depth Psychology. Her PhD may just be her intellectual fulcrum, harnessing her past and funneling it into a rich, erudite future.

    DWT is proud to have Ellee on our staff.

  • Include Libraries in Learning Management Systems, Don’t EXCLUDE Them!

    Learning Management Systems (LMS) are web based technologies used to offer online classes. The most common LMS are Blackboard, Canvas, Moodle, and Brightspace (previously known as Desire2Learn). And there are many more edging into the market because the idea of providing an online educational portal is really taking off. It’s one of the fastest-growing markets out there!learn-868815_960_720

    These LMS are used not only for online classes, but also for the traditional, face-to-face classrooms as an online portal for managing course materials, for offering bulletin boards for online discussions, and for turning in assignments. Given this growing, ubiquitous need for any classroom (including K-12, corporate, etc.) to cultivate an online presence, you would think that one of the central features of these LMS would be to allow for easy incorporation of the library into their online portals since every learning setting has one. But alas, this is far from the truth. In fact, it’s just the opposite. (See Libraries in the Learning Management System, Farkas, 2015)

    For most of these LMS, they have no easy user interface modules that allow instructors to include library tools such as access to librarians (e.g., Ask a Librarian), subject focused web pages (e.g., LibGuides), much less a module for students to search the school catalog and the numerous available research databases. Often the best the instructors can offer is a link to the library website as an additional resource. The goal, however, should be to bring the library into these online portals, not have the student leave them.

    At Deep Web Technologies, we have been researching how to improve this situation for our customers since the primary goal of our federated search tool, Explorit Everywhere!, is to help librarians provide students with easy, quick, and direct access to their subscription content. We found that for most LMS, if they have tools to support third-party content, it is an additional cost to access these features to incorporate library materials. Then, after the school does add these additional LMS features, the librarian has to do further work to get access to the course management interfaces and do the work to develop these modules. Once all of this is in place, the librarians will most likely need to work with the instructors to coordinate adding their library tools to their course management pages. Obviously, such an ordeal does not scale for the hundreds of instructors at larger institutions.

    Every LMS should automatically support incorporating the library as one of their base features. For any LMS, it should be easy for an instructor to configure an Explorit Everywhere! Search widget so students have easy access to the library’s catalog and subscription content. Moveover, the instructor can use the Explorit Everywhere! Search Builder to customize their search widget to search specific course-related databases. Librarians work hard to make their library useful and patron-friendly. Learning Management Systems need to work with them, not put up more barriers.

  • Promoting Explorit Everywhere! at the University of the Arts London

    This past weekend I was writing an email message to one of our new customers. I was sending this customer, as promised, UAL
    some example uses of Search Builder, our Explorit Everywhere! tool that makes it easy to create custom search widgets that can be dropped on any web page.

    I started my quest for Search Builder widgets by going to the library website of one of our favorite customers, the University of the Arts, London (UAL). UAL is one of our favorite customers, in part, because they do a lot to promote Explorit Everywhere! to their faculty and students across their 6 colleges.

    I digress, but last December (2015) I had the pleasure to attend the User Conference for one of our partners, PTFS Europe, and at this conference Paul Mellinger, Discovery Manager at the UAL Library gave a wonderful presentation on their implementation of Explorit Everywhere! and followed that with a guest article on the Customer Corner section of our company blog.

    Back to the task at hand, I went to UAL’s Library Subject Guides page and found that they have created 39 subject guides on subjects such as Beauty and Cosmetic Sciences, Graphic Design and Jewellery, subjects that parallel the way that they have organized the sources that we federate for them. If you go into many of these subject guides then click on the e-Resources tab you’ll see at the top left of the page an Articles Plus Search widget (this is what UAL calls Explorit Everywhere!). Each of the widgets on the various e-Resources pages searches a different, appropriate subset of all the sources that we federate for them.

    Then to top it off, I came across the Articles Plus Help subject guide (which I had not seen before), a wonderful guide, nicely organized into tabs for Simple Search, Advanced Search, Search Results and Tools and Navigation. Also added recently to this subject guide is a 3 and a half minute video tutorial on Articles Plus. This is why we love UAL!!

  • Is Google Good Enough for Medicine?

    Every morning I wake up to a number of Alerts generated by a number of our portals including Biznar, google mugMednar and Yesterday morning one alert with the title – Is Google good enough for Medicine – caught my attention.

    In their editorial commentary in the Journal of Neurology, Neurosurgery and Psychiatry, 3 medical professionals from down-under talk about how Google (now a verb in the Oxford English Dictionary) is changing the way that doctors practice medicine.

    Here’s one anecdote that Dr. Cindy Shin-Yi Lin and her colleagues relate that I found interesting (and scary):

    “In a recent letter, a rheumatologist describes a scene at rounds where a professor asked the presenting fellow to explain how he arrived at his diagnosis, ‘I entered the salient features into Google, and [the diagnosis] popped right up’.”

    The authors of the editorial also talk about:

    “Most clinicians will be familiar with the increasingly frequent scenario of a patient entering the consult room with a sizeable stack of printed webpages containing symptoms, pictures and a dreaded list of potential (and often grave) diagnoses that will undoubtedly commit the clinician to an arduous task of analyzing (and not infrequently, refuting) this information with the ‘cyberchondriac’ patient.”

    If this topic interests you check out this blog article that we published last year – Relying on Google for Science Information is Bad for your Health, and if you want to bring more authoritative stacks of paper to your physician in your next visit try out our freely available medical research site, Mednar which searches 40+ sources of quality medical information all at the same time.

    Oh, and I love that mug!!

Page 1 of 1112345»10...Last »