FAQ’s

Frequently Asked Questions

Numerous myths and misunderstandings exist about federated search. For example, how does real-time search differ from an indexed search and why is this important? At Deep Web Technologies, we welcome your questions and we feel confident that once you recognize the value of our technology, Explorit Everywhere!, and understand our offerings you’ll choose Deep Web Technologies.

We invite you to read the questions and answers below and to contact us with questions we didn’t cover.

The Deep Web and Federated Search
What is the Deep Web?
What is federated search?
How does federated search work?
What is the relationship between the Deep Web and federated search?
Why is it so difficult to search the Deep Web?
How is federated search different from metasearch?
Why can’t I harvest/crawl/index everything like Google does?
What type of content can I find in the Deep Web via federated search that Google won’t provide?
Explorit Everywhere! and My Business
How does Explorit Everywhere! fit in with enterprise search?
How does Explorit Everywhere! improve my bottom line?
How does Explorit Everywhere! connect me to my knowledge?
Technical Questions about Explorit Everywhere! Federated Search
What is a connector?
How do you handle duplicates?
Why is Explorit Everywhere! faster than other federated searches?
What are “incremental results?”
What is “Collection Status?”
What is clustering?
How do you access subscription content that requires authentication?
How do you handle advanced searches if a particular source doesn’t support all the fields filled in?
How do you perform relevance ranking?
Where would my application run? Would you host it? Would I host it? Who monitors and maintains it?
What are alerts and how do they work?

The Deep Web and Federated Search

What is the Deep Web?

The Deep Web is the set of web sites and their documents that cannot be accessed via crawler-type search engines such as Google. Deep web content typically lives inside of databases, and is accessed through search forms. Read more about the Deep Web vs. the Dark Web here. Deep Web Technologies is not involved with searching the Dark Web.

What is federated search?

Federated search is a powerful way to comprehensively search multiple databases in real-time. Instead of crawling and indexing static content like Google, Bing, and other popular search engines, Explorit Everywhere! federated search queries select, high quality collections to search simultaneously. While this usually takes a few seconds longer, it ensures a superior level of results.  For instance, federated search helps researchers avoid outdated articles and spam, allowing for the exploration of only the most pertinent information. Also, federated search enables private or other collections that can’t be indexed, to be searched (this is more common than you might imagine). While federated search comes in a variety of sizes and shapes, Explorit Everywhere! federated search by Deep Web Technologies is packaged to assist serious researchers to simultaneously search their known resources. Wikipedia has a good article about federated search.

How does federated search work?

Federated search engines, such as Explorit Everywhere!, use software “connectors” to access information sources. Connectors are a snippet of code that takes the user’s search query, transforms the search terms to match each content source’s requirements, and submits the query to each of the sources simultaneously, returning the results to the user. Explorit Everywhere! takes federated search one step further – when the search results come back from each of the sources, Explorit Everywhere! merges them together into a single set of results pages with a unified look and feel. From here, the user can use filters, clusters, and sorts to continue to narrow their results to the critical information.

What is the relationship between the Deep Web and federated search?

Although federated search technically refers to the simultaneous search of multiple content sources regardless of how the content is accessed, the reality is that federated search is often performed on Deep Web content sources such as legal, medical, academic, and multilingual sources of information. Deep Web sources are usually not indexed by popular search engines, so must either be searched individually one by one, or simultaneously by search engines like Explorit Everywhere!.

Why is it so difficult to search the Deep Web?

Searching the Deep Web is difficult because each source searched has a unique method of access. The search engine must be configured to properly search each source and to process its unique results. Additionally, sources change their access methods from time to time, requiring modification or rewriting of the interface to that source. Some sources are very difficult to search because they require use of cookies, sessions, and authentication mechanisms.

How is federated search different from metasearch?

Meta search typically refers to a search engine that searches other search engines. However, some people use the term “metasearch” synonymously with the term “federated search.” Other names that are associated with a single search are: distributed search, broadcast search, unified search, data fusion, parallel search, cross-database search, single search, one-search, integrated search and universal search.

Why do I need federated search? Can’t I harvest / crawl / index everything like Google does?

In some instances, Deep Web sources may be indexed and easily searched in a Google-like manner. However, the Deep Web has a vast number of sources that are not available to crawl-type search engines. The number of Deep Web content sources available to indexers is relatively small, and depends on content vendors allowing an indexer to crawl their data. Federated search engines, such as Explorit Everywhere!, perform searches more comprehensively over Deep Web sources of information without the use of an index. Explorit Everywhere! connects directly to the Deep Web content source to perform a real-time search, without relying on an index to store data.  This keeps the integrity of the information intact and allows for a faster setup.

What type of content can I find in the Deep Web via federated search that Google won’t provide?

There are many scientific, technical, business and multilingual databases whose contents are not available to Google. Many of these are subscription or premium databases, but some may be located within an organization, behind their firewall, or may be private databases requiring user authentication. Google only indexes publicly available websites and databases that have agreed to being crawled. Explorit Everywhere! integrates with authentication schemas and can access subscription databases through use of specialized connectors.

Explorit Everywhere! and My Business

How does Explorit Everywhere! fit in with enterprise search?

As knowledge workers demand applications that can search more sources of content from fewer search pages it will only be a matter of time before the distinction between federated search and enterprise search disappears. Deep Web Technologies’ Explorit Everywhere! federated search applications can be configured with custom connectors to search a number of repositories normally accessed via enterprise search, and even unavailable through the enterprise search, providing seamless access to more content than enterprise search alone provides.

How does Explorit Everywhere! improve my bottom line?

Federated search solutions reduce the time it takes to find relevant information, decreases the chance of missing relevant content and improves the utilization of paid content. Additionally, the time needed for researchers to learn the quirks of numerous search interfaces is eliminated. The time saved from not needing to search multiple sources plus the improved quality of documents found translates to labor and cost savings.

How does Explorit Everywhere! connect me to my knowledge?

Federated search connects directly to your knowledge sources, whether they be subscription, internal or public. Information that you would normally search to by going to the source, federated search uses connectors for, which retrieve the results and returns them to you. In this manner, searching ten, twenty or more of your knowledge sources in real time is simply a matter of typing in your query. You are assured that the information returned is the same information that is on the source.

Technical Questions about Explorit Everywhere! Federated Search

What is a connector?

In general, a connector makes it possible to talk to another data source. A connector is a piece of software that is written to access a a specific content source. It must know the URL of the source, how to send search commands, what the search syntax is, and how to process the search results that are returned from a source. Connectors can be challenging to write if access to a source requires handling multiple steps, URL redirection, cookies, sessions, or authentication methods. Not all connectors are the same. Deep Web technologies takes great pride in our connector technology!  We build robust connectors to retrieve the best, most relevant, up-to-date data from a source, not just any results a source offers. This is the difference between finding the needle and the big mound of hay.

How do you handle duplicates?

Deep Web Technologies’ Explorit Everywhere! is flexible in its approach to de-duplication of results. Duplicate results from multiple sources should be removed to improve the user search experience. Results with identical URL’s can be considered duplicates as can results that have the same title and author. The configuration of the de-duplication algorithm should be customized to the particular databases being searched based on how duplicates manifest themselves in the results, i.e. what fields are being duplicated. As with most deployments, when it comes to duplicates, no one solution is right for everyone.

Why is Explorit Everywhere! faster than other federated searches?

Explorit Everywhere! searches in real-time so it is dependent on the speed of the content sources it searches to complete user queries. Some sources are lightening quick, while other sources drag their search out for 5 to 10 seconds. Explorit Everywhere! federated search doesn’t wait for the slowest source to return information before it returns results. It “speeds” up the search process by using a feature called “incremental search”. Incremental search displays the results from the fastest sources immediately while the search continues in the background on the slower sources. Once all of the results are in, they can be included in the displayed list of results. In this way, real-time quality results are not sacrificed for speed.

How do you access subscription content that requires authentication?

Deep Web Technologies has developed tremendous expertise in developing connectors to content requiring cookies, sessions, username/password authentication, and IP-based authentication. Our connectors perform the authentication steps just as they occur when a user accesses an authenticated database using a browser.

What are “incremental results?”

To answer this question, it’s important to keep in mind that Explorit Everywhere! is a federated search solution that searches other collections in real-time. These other collections are operated by other companies and organizations, and are all very different in their age, value, capability, speed and overall performance.

Explorit Everywhere! is a rare breed of federated search in that it won’t make you wait forever while results are being compiled from all the collections being searched. Imagine if it did, and one or more collections were “offline?” You could wait a couple of minutes (or more), before seeing any results.

To speed up the search process, Explorit Everywhere! will display results immediately from those sources that quickly return results. And, when the slower collections have provided their results, Explorit will ask you if you want to incorporate them into your search results at which point it will re-rank all of the results to account for the new, relevant results. You do not have to add in the remaining results in immediately. If you are looking at results, simply choose “Ignore” when Explorit Everywhere! asks.  Later, you may add the remaining results in to your list by clicking on the link “XXXX additional results found” on the left side of the results page.

What is “Collection Status?”

The Collection Status list will display all collections that are searched in a particular query and shows the responsiveness of each separate source or collection. You can view this list by clicking on the “XX of XX sources complete” link on the left side of the results page under Search Summary.

There are two numbers in this list, and they are “Top Results” and ‘Found Results.” “Top Results” indicates the number of relevant results retrieved from each collection for your search. “Found Results” represents the number of results that each collection indicates exist in their database for your query.  Not all collections provide totals of results, so this column may be blank for a source.

The Status column tells you the “health” of the particular source, or if Explorit Everywhere! experienced problems while retrieving the results.

  • A green check mark indicates the search at that particular collection was successfully completed with no timeouts or errors.
  • A yellow check mark indicates that some results were retrieved, but there was a subsequent timeout or other error before completion of the search.
  • A timeout icon indicates that the sources timed out and did not respond within the time allowed to compete the search.
  • A red X indicates that the collection returned an error. Hover the mouse over the icon to see the error message.

Using the Collection Status feature can help users to discover which individual sources with large numbers of total results should be searched further, or to eliminate sources that don’t have many results applicable to a query.

There are some collections that may have tens or even hundreds of thousands of results displayed under the “Found Results” column.  Explorit Everywhere! usually limits the results retrieved to the application to 100 or less, depending on the collection. This is done for a number of performance reasons.  If your Explorit Everywhere! application pulled in all of the results from all of the collections, it could potentially have millions of results to pull in, sort, de-duplicate, rank and display for each and every search performed. The time it would take would be quite lengthy and rarely are these results useful. Read more about Getting the Best Results vs. Getting All of the Results.

The information in “Collection Status” should be used as a discovery tool to understand what sources are relevant to your research and either limit your search to those sources within Explorit Everywhere! or continue your search directly at that content source.

What is clustering?

The Explorit Everywhere! clusters generate groups of results based on real-time semantic analysis of results. Beyond the ability of simple sorting to organize search results by author, date or publication, clustering allows users to view search topics in text or visual format. Our smart clustering software is able to organize the results into topics and subtopics when results are returned, allowing researchers to drill down into details of a topic to find relevant results. Read more about our clusters here.

How do you handle advanced searches if a particular source doesn’t support all the fields filled in?

A Deep Web Technologies’ Explorit Everywhere! application has sophisticated support for mapping user search fields to fields supported by the remote source. The customer can decide which field or fields to search on the remote host on a per source basis in a number of flexible ways. Fields not available on the remote source can be ignored or the search engine can search different fields instead.

How do you perform relevance ranking?

Deep Web Technologies has developed a number of sophisticated relevance ranking algorithms. Our algorithms take into account the document source, the frequency of user search terms in various fields of the search results, and other factors. We compare search results from different sources against one another to determine ranks of individual results against the result sets. Customers find our relevance ranking to be quite good, often better than that provided by the content provider.

By default, results are displayed in a relevance-ranked results list, with those highest ranking on top. There are several factors that contribute to a higher ranking, including: the length of the title, the occurrence of the search term within the title and snippet, and the frequency of occurrence. If a collection presents highly relevant results for a query, Explorit  Everywhere! will automatically retrieve additional results from that collection instead of only returning the first page of its results. Because native collection relevance ranking engines vary, Explorit’s sought-after ranking approach normalizes results from multiple collections for a consistent, relevance-ranked results list.

Where would my application run? Would you host it? Would I host it? Who monitors and maintains it?

Deep Web Technologies provides flexible hosting and maintenance options. For some customers, we host their applications in our state-of-the-art data center, providing backups as well as application, operating system, hardware, and connector monitoring and support. With a hybrid installation, customers provide the hardware and operating system and we install, monitor and maintain the application on their behalf. Other customers license the application and host and maintain it themselves, with zero Deep Web Technologies access. Read more about our Hosting Solutions here.

What are alerts and how do they work?

Alerts are simply search terms that you have defined, that run on a schedule and are delivered directly to your email inbox or RSS feed. Explorit Everywhere! alerts can be set up for daily, weekly or monthly delivery and always deliver the newest results from your knowledge sources, regardless of whether or not your sources have an alerts system.