Category Archive for: ‘Federated Search’

  • Federated Search – The Killer iWatch App?

    The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert iwatchmight also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

    Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

    In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

    Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite has connections.

    There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

  • Is Vendor-Neutral Searching Important?

    As a matter of fact, YES!  In doing research, being vendor-neutral implies, in part, that the order of the results set is impartial, and not biased toward any one information provider. A results set partial to an information vendor, source or database could cripple a serious research endeavour, particularly if a vital component of research is missed, scalehidden, or dropped due to a biased display of results, with a particular vendor’s results bubbling to the top.  In addition, being vendor-neutral implies a design approach compatible with different technologies, and a company philosophy that is willing to integrate with a broad spectrum of sources, technologies and products.

    Back in 2010, our CEO and CTO Abe Lederman questioned the bias of information vendors in light of Google’s possible “preferential placement” in search results — “If Google Might Be Doing It…”  He asks, “If Google is being accused of such bias, might not EBSCO or ProQuest also have a bias?”

    Deep Web Technologies recently received an email inquiry from someone looking, very specifically, for a vendor-neutral search service.

    “We’re looking for a no vendor biased, easily applied and customised, reliable federated search. Can you help?”

    The short answer is ABSOLUTELY!  Vendor-neutrality is a Deep Web Technologies specialty and a core value.  In addition to the Deep Web Technologies simple setup and deployment process and our next-generation single-search technology, we believe strongly that no vendor, information provider, database or source should be weighted differently, return results more frequently, or appear highlighted in any way UNLESS requested by you, our customer.  The choice to assign different databases a higher or lower ranking should be dictated by you.  It is your search, after all.

  • 3 Reasons an Indexed Discovery Service Doesn’t Work For Serious Researchers

    Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment.  But, we should get a few definitions out of the way before we jump  in to why an index doesn’t work for all researchers.rolodex2

    What is an index as used by Discovery Services?  An index holds all of the metadata, and sometimes full text, for one or more sources of information, usually external public or premium sources.  For example, Google has an index; its spiders compile information from millions of public websites across the Internet, funneling the information into a single, unified database for users to search. On a much smaller scale, Discovery Services compile information from an organization’s internal catalog, as well as a subset of the customer’s premium and public information sources, if these sources permit their information to be indexed. Discovery Services establish relationships with the information sources directly (if possible), procure the metadata (if they can) and add it to their index.

    What is real-time (on-the-fly) searching?  Real-time, or “on-the-fly” searching, sends queries directly to the source – the catalog, premium databases, public sources – retrieves the metadata of top ranked results from each source relative to the user’s search,  then normalizes and aggregates the information into a single display of results.  The results are listed with direct links to the original source of information so users can explore the record and the full text.  There is no index built and the information is not stored.

    To be perfectly clear, both of these approaches have their pros and their cons.  And, we certainly love the Google and Bing indexes for the everyday quick search for the nearest coffee house.  But, as we’ve heard from our customers time and again, “We’re serious about our research, and a single index just doesn’t work for us.”  (Note that in this post we’re singling out Discovery Service indexes, but will need to address how this applies to Enterprise Search indexes in a separate post.)

    Abe Lederman, our CEO and CTO, addressed this issue before in other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services.  Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.

    1. Limited access to content
      Serious researchers like to search for critical information on sources they trust.  In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe.  So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher.   And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service.  That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source.  Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete.  It just doesn’t contain what they need, or want to search.  
    2. Frequency of content changes and updates
      Information vendors, or content sources, vary in how frequently they update their own database of information.  Some sources update daily or even hourly while other sources may update their database with new information every few weeks.  Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old.  Could you imagine Kayak, a travel metasearch site, showing day-old data?  It just wouldn’t work.   Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current.  Bottom line: Serious researchers require current information, not stale data.
    3. Muddiness about source searching and clickthrough
      In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source.  Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source.  Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process.  Bottom line:  A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself.  They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.

    At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information.  If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.

    We’d love to hear from researchers about this – What has your experience been with Discovery Services?

  • Do You Mean Federated Search?

    Occasionally, DWT employees will test the water when describing what we do by dropping the term “federated search” and trying a more generic description.

    “We search all of the databases from a single search box in real-time.”

    “We perform a single search of subscribed, public and deep web databases…”

    “We capture results from your premium sources, the stuff Google can’t reach, and return them to you ranked and deduplicated…”

    If a person is plugged in to the search world, we usually get this response: “Oh!  Do you mean you do Federated Search?”  Bingo.

    Over the years, the concept of Federated Search has gone by many different names.  We’ve heard our technola-rose-by-any-other-name-would-smell-as-sweetogy called:

    • Distributed Search
    • Broadcast Search
    • Unified Search
    • Data Fusion
    • Meta-Search
    • Parallel Search
    • Cross-Database Search
    • Single Search
    • One-Search
    • Integrated Search
    • Universal Search

    For the most part, all of these mean about the same thing: An application or a service that allows users to submit their query to search multiple, distributed information sources, and retrieve aggregated, ranked and deduplicated results.

    But the question remains: Is “federated search” a master index, or a real-time (on-the-fly) search? And this is a very good question, given our familiarity with Google and their enormous public index.  Sol Lederman raised this question back in 2007 on the Federated Search Blog, What’s in a Name?

    “The distinction, of course, is crucial. If you’re meeting with a potential customer who believes you’re discussing an approach where all content is harvested, indexed and accessed from one source but you think you’re discussing live search of heterogeneous sources then you’re talking apples and oranges.”

    Deep Web Technologies’ real-time approach gives us an advantage over building a master index which we’ll discuss in our next blog post.   In the meantime, can you think of any other names for what we do?  We’d love to hear from you!

  • Google Part 2: Don’t Risk the Google Hazard

  • Google: The World’s Greatest Marketing Company

    Google has completely changed the way most of the civilized world gets its information. Most know that Google wasn’t the first, but thanks toGoogleAnswers their effective branding, few realize that Google isn’t the best. For a long time, I’ve made the claim that Google will not be remembered as the greatest technology company of all time, but the greatest marketing company. The name Google was inspired by the term “googal”, which means the number 1 followed by 100 zeros, and was intended to refer to the number of results that are returned with each search. However, this philosophy is diametrically opposed to first 30 years of online information retrieval, when librarians were trained to create search queries that were very specific, so that only a few search results would be returned. If too many results were delivered, librarians considered it a bad search, because the large set of results were too difficult to manage. Effective research was about accuracy, not quantity.  Like Heinz, which turned a huge problem, i.e. the ketchup wouldn’t come out of the bottle, into a marketing success, Google has convinced the world that large numbers of search results are a good thing. But the fact is when it comes to search, more is not better. Google is a victim of its own success. As more and more content pours into the Google index, search results are as diverse as they are voluminous.

    Make no mistake, Google’s search technology is significantly improved, but the problem is that its index is growing at a rate of 100% per year. It’s too broad, covers too many subject areas, and it is too dependent on the most popular links.  This is because Google is intended to be all things to all people.  If I want to see the menu  of a Chinese restaurant , or the show times for a movie, or the hours of operation of my nearest Target, or the phone number of my optometrist, there is nothing better than Google. But if you want to do serious research, whether it’s chemical engineering or art history, Google should never be your first choice. Unfortunately, Google has become the first choice for many professionals. While hospitals spend hundreds of thousands of dollars per year on peer reviewed medical information, 83% of physicians go to Google first to do medical research.  While academic libraries spend tens of thousands on the finest collections of digital content, students choose Google as their first and only source of information for research.

    Deep Web Technologies has spent the past 15 years aggregating content in real time, in specific subject areas, so that users could find the information they need quickly and effectively. We enable users to simultaneously search hundreds of content repositories specifically related to their subject discipline, delivering the most relevant results from the latest publications. So, when veterinarians are researching jaguars, their result sets don’t include articles about automobiles or football teams.  With Deep Web Tech, in addition to getting only relevant information, users get the most current information that has been published. DWT’s search technology does not require indexing, as our technology accesses the original source of the content. As soon as it is published, it is accessible to DWT customers. Just as important is the access to multiple sources from a single interface, which enables articles from other sources of content to be compared, side by side, without jumping from one site to another.

    Google has become the most popular search engine in the world. But popularity doesn’t always translate into quality, just take a look at prime time television.


  • Sixty Seconds to Find the Right Article – Mount Carmel’s Story

    We love it when our customers write about their search solution!  Mount Carmel Health Sciences Library (MCHSL) implemented their new federated search solution for a five-site hospital system and nursing college in October of 2012.  In October 2013, Mount Carmel published their initial review, titled “Implementation of a Federated Search in a Multi-Hospital System”, through the Journal of the Medical Library Association (JMLA).

    The most enthusiastic feedback that we have received was from a Mount Carmel College of Nursing instructor who used eSearcher to find articles for his course. He had allotted an hour in his schedule for the article search, but by using eSearcher his task was completed in sixty seconds! Other customer feedback has commended eSearcher’s authoritative results, fast response time, visual appeal, multiple filters, and overall presentation.

    One very important aspect that Deep Web Technologies offers Mount Carmel (and all of our customers!) is a neutral search engine. Many of our competitors weight their own information sources to rank higher on the results list, skewing result ranking in their favor.  “Medical Searcher offered impartial and seamless access to almost all of our premier medical databases.”  

    We hope that you are as successful as Mount Carmel Health Sciences Library in your searching!  And contact us if an impartial, strong search fits your needs.

  • Do Your Research – Seahawks vs. Broncos

    DWT’s own, Frank Bilotto, comes to you live from New York City where the fans of the Seahawks and Broncos are gathering to watch Superbowl XLVIII at the MetLife Stadium in New Jersey.  See his take on how to educate yourself with Deep Web Technologies before the big game.

  • Happy New Year!

    2014 has turned the corner and Deep Web Technologies is moving quickly toward some exciting goals.  Our developers are draining the java and pounding the keyboard to produce a new version of Explorit that will allow greater flexibility and more robust features.  We’re very excited to see the nimble and feature-rich product of much planning and work right around the corner.Happy New Year!

    Throughout 2014, we’ll be lifting the veil on new features and functionality, some of which will automatically be included in our updated Explorit.  Other features will accompany customers with more specialized solutions. Here’s a sneak peek!

    Multilingual Searching: We’ve had great success with Multilingual Searching in, so we thought that we’d take our Multilingual searching to the next level by opening up the languages we can include.  Languages are complicated, and we’re taking Explorit to the next level to deal with that complexity.  Our multilingual searchers who want to find precise results translated from other languages into their language will find our updates very simple and easy to use.  

    Mini-Explorit: Many of our customers have asked about getting a smaller “widgetized” version of Explorit for a page in their website.  Introducing…Mini-Explorit!  Mini-Explorit can sit on a webpage, search and show results quickly without going to the full application. It’s self-contained. With this new feature, we’ve got you covered for those fast homepage searches.

     My (Personal) Library: With the My Library feature, users can select results and save them to their My Library.  When they log out, their results will be saved, safe and snug, awaiting their return.  Of course, at any time, users can email, print, export or just keep saving their results. We’re not picky, and neither is our My Library.

    Oh, there are plenty of other slick features coming down the pike.  Interested in knowing more about these and other features? Let us know!

    Here’s hoping you find a fulfilling 2014!

  • Discovery Services: The Good, the Bad and the Ugly

    It’s been 3 years now since the epic (I jest) showdown at the Charleston Conference between EDS and Summon, the EBSCO and ProQuest Discovery Services. I wrote extensively about these Discovery Services and the issues that they raise in one of our companies’ most popular blog articles:

    Discovery Services: Over-Hyped and Under-Performed

    Three years later a lot of the concerns that I raised are starting to be seen as real issues with Discovery Services by more and more peoplegood_bad_ugly and a plethora of blog articles have been published recently prompting me to inform our readers on what has been going on.

    OK, so what good things can I say about Discovery Services. Discovery Services have become popular because they address issues with early generations of federated search products from competitors who shall remain nameless. Discovery Services return results quickly, and don’t have issues with connectors breaking, are perceived to be able to better rank results and are better able to filter results. Since I really don’t want to say too many good things about the competition let me point you to the following blog article which presents one of the most balanced reviews of Discovery Services that I have read:

    8 Things we know about Web Scale Discovery systems in 2013

    I would of course be remiss if I didn’t point out that our Explorit product is a next generation federated search product that has solved the issues with these early generation products. That’s the subject of another blog post.

    Now for some of the bad. I’m going to focus on two areas: transparency and the “one shoe fits all”. Perhaps my biggest issue with Discovery Services is their lack of transparency. Discovery Services don’t make it easy (maybe impossible) to determine what content is in their index. Just because content from a particular publisher is included in their index, doesn’t mean that all content from that publisher is included in their index. Just because content from a particular journal is included in their index doesn’t mean that all content from that journal is included in their index. Recently I was speaking with a prospect who was evaluating Explorit vs. a Discovery Service and was surprised to find out that some specific physics content that was very important to them only went back 10 years. What about Discovery Services indexing meta-data only vs. meta-data and full-text. Do you know when full-text is indexed?

    Discovery Services are aimed primarily at academic libraries. Their “one shoe fits all” approach means that it is hard for users to limit their searches to a particular set of content, e.g. engineering content, or music content or history content. Also, because of their “one shoe fits all” approach, Discovery Services are much less appropriate for specialized research areas such as medical research, legal research, energy research or pharmaceutical/life sciences research.

    Now for the ugly. I’ve been writing and speaking at conferences for a while now on the issues that Discovery Services have, especially issues with being reliant on the relationships that they establish with content owners for what content they are able get into their indices and what content they are not able to get into their indices.

    So just a few weeks ago Thomson Reuters, publisher of the popular database Web of Science, announced that they were pulling Web of Science content from all the major Discovery Services. Although, at least for now, Thomson Reuters backed-off this decision, Thomson Reuters highlighted one of the ugly things about Discovery Services, that they lack control of what goes into their services. I wrote about this in the recent blog post:

    There’s Another Chink in the Discovery Services’ Armor

    Continuing with the ugly, it is really problematic that the two major Discovery Services are companies that fiercely compete for libraries’ database subscription budgets and thus have no interest in including their competitor’s content in their index and when they reluctantly do so EBSCO will make sure that ProQuest and Gale content will rank low in EDS and ProQuest will make sure that their competitor’s content ranks low in Summon. Back in 2010 I wrote a blog article:

    If Google might be doing it …

    which addresses the concerns that every librarian should have with the lack of vendor neutrality by the Discovery Services.

    Ed Chamberlain, a librarian in the UK laments on the problem that Discovery Services’ owners don’t play nice with each other in the recent blog article:

    Discovery Vendors and pre-indexed data. What can be done?

    Also you might want to read the following editorial asking Discovery Services to play nice:

    Discovering Reciprocity

    In conclusion, enough concerns have been raised about Discovery Services that NISO, the National Information Standards Organization, has recently released a draft recommended practice as part of their Open Discovery Initiative:

    Promoting Transparency in Discovery

Page 5 of 6« First...«23456»