Archives

Category Archive for: ‘Federated Search’

  • Navigating the World of Explorit Everywhere!

    As much as we like to think that Explorit Everywhere! is simple to use, it still holds the junior heavyweight championship title for feature-rich technologies. Throw on top of that the concepts of “federated search”, “Deep Web”, and “discovery services”, and it’s easy to get lost in a maze of information. Since Deep Web Technologies is all about pulling the needle out of the haystack for you, we thought it was time to create an easy reference post on the world of Explorit Everywhere!. Want to know where you can find out about how we rank results? How about federated search or the Deep Web? Take a gander through some of these posts:

    Explorit Everywhere! Features

    Federated Search

    The Deep Web

    Discovery Services

    Multilingual Searching

    You may also enjoy reading this post from the Federated Search Blog enumerating informational posts about federated search and discovery services. A couple of key posts include:

     

  • Relying on Google for Science Information is Bad for Your Health

    People tend to think of Google as the authority in search. Increasingly, we hear people use “google” as a verb, as in, “I’ll just google that.” General users, students and even professional researchers are using Google more and more for their queries, both mundane and scholarly, perpetuating the Google myth: If you can’t find it on Google, it probably doesn’t exist. Google’s ease of use, fast response time and simple interface gives users exactly what they need…or does it?

    Teachers say that 94% of their students equate “Research” with “Google”. (Search Engine Land)

    “Another concern is the accuracy and trustworthiness of content that ranks well in Google and other search engines. Only 40 percent of teachers say their students are good at assessing the quality and accuracy of information they find via online research. And as for the teachers themselves, only five percent say ‘all/almost all’ of the information they find via search engines is trustworthy — far less than the 28 percent of all adults who say the same.”Student researchers-Pew

    Do teachers have a point here? Is it possible that information found via search engines is less than trustworthy, and if so, where do teachers and other serious researchers need to go to find quality information? Deep Web Technologies did a little research of our own to see just how results on Google vs. popular Explorit Everywhere! search engines differs in quality of science sources.

    How Google Works
    Google, and other popular search engines such as Bing and Yahoo, search the surface web for information. The surface web, as opposed to the Deep Web, consists of public websites that are open to crawlers to read the website’s information and store it in a giant database called an index. When a user searches for information, they are actually searching the index of information, not the website itself. The results that are returned are the ones that people seemed to like in the past, or most popular results for the query.  That’s right…the most popular…not necessarily the most relevant information or quality resources.

    We should probably also mention those sneaky ads at the top of the page that look informative, but can be quite deceptive. A JAMA article states this about medical search ads:

    “Many of the ads, the researchers noted, are very informational — with ‘graphs, diagrams, statistics and physician testimonials’ — and therefore not identifiable to patients as promotional material.

    This kind of ‘incomplete and imbalanced information’ is particularly dangerous, they note, because of its deceptively professional appearance: ‘Although consumers who are bombarded by television commercials may be aware that they are viewing an advertisement, hospital websites often have the appearance of an education portal.'”

    Researchers thinking that Google reads their mind and magically returns the right information on the first page of results should think again. The #1 position on a Google results page gets 33% of the traffic, so is a highly sought-after spot on a Google page. Unfortunately, with SEO tricks inflating page-rank on Google and  ads vying for top spot, that number one result, or even the top page of results, may not be entirely germane or even contain much scholarly content. But those results rank high because they’ve worked the Google system. 
    Science.gov, developed and maintained by the DOE Office of Scientific and Technical Information, uses Explorit Everywhere! to search over 60 databases and over 2200 selected websites from 15 federal agencies. The results are from authoritative, government sources, and extraordinarily relevant. When you perform a search on Science.gov, there is no question about the sources you are searching. Explore the difference!

    Whether you are a student or scientist, knowing where to start your science search is very important. In most cases, serious research doesn’t start with Google. A 2014 IDC study shows that only 56% of the time do knowledge workers find the information required to do their jobs. Having the right sources available through an efficient Deep Web search like Explorit Everywhere! is critical to finding significant scientific information and staying ahead of the game.

  • Getting the Best Results vs. Getting All of the Results

    A question we hear regularly is, “Why doesn’t Explorit Everywhere! return all of the results from every source that is searched?” For example, if a user goes directly to a source to search, they may find thousands of results for their query.  But, performing the same search on an Explorit Everywhere! application may only return 100 results from that source.  Why aren’t we returning the thousands of results like the source does?

    DWT specifically returns up to 100 of the top results (unless our customer specifies that we return more) from each source to ultimately avoid overloading the user with information that may not be relevant to their search. Because the majority of Explorit Everywhere! applications have at least 10 sources of information, if each of those 10 sources returns 100 results, then the user will see 1000 results for their query, divided into a default of 20 results per page, or 50 pages of results total. Each of those results has been ranked as relevant to the user’s query with the most relevant results across all of the sources on page 1. Of course, we hope that the gold nugget is right at the top, front and center.  (You can read more about how DWT ranks results on this post –  Ranking: The Secret Sauce for Searching the Deep Web.) But, if we returned all of the results from all of the sources, then the total number of results and total number of pages increases to a dizzying number.

    We know from the countless SEO studies on Google’s results placement that the majority of consumers using Google rarely click to the second page of results. In fact, page 2 and 3 may get only around 6% of the clicks on any given search. (Marketing Land).  While the likelihood of next page clicks does go up with age and education, we’ve found that it is unlikely even for our erudite Explorit Everywhere! users to click through 50 pages of results.  Rather, most researchers will perform a new search or refine their search if they don’t find what they are looking for within the first couple of pages.

    repeat the search with the omitted results included.”  Try clicking that link to see how many results you actually get – probably around 900 results, or 9 pages. So much for the millions that are available! Even Google tries to avoid overloading users by limiting the vast number of results for broad queries.

    For those researchers preferring to narrow their research to just one or two relevant sources on Explorit Everywhere!, it’s possible that 100 results from each source may not be enough. If a source has particularly relevant results, then we suggest capitalizing on that information by going directly to that source to continue searching. Use Explorit Everywhere! as a tool to not only find relevant results, but to find relevant sources of information to further your information discovery.

  • Federated Search – The Killer iWatch App?

    The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert iwatchmight also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

    Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

    In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

    Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite easily...it has connections.

    There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

  • Is Vendor-Neutral Searching Important?

    As a matter of fact, YES!  In doing research, being vendor-neutral implies, in part, that the order of the results set is impartial, and not biased toward any one information provider. A results set partial to an information vendor, source or database could cripple a serious research endeavour, particularly if a vital component of research is missed, scalehidden, or dropped due to a biased display of results, with a particular vendor’s results bubbling to the top.  In addition, being vendor-neutral implies a design approach compatible with different technologies, and a company philosophy that is willing to integrate with a broad spectrum of sources, technologies and products.

    Back in 2010, our CEO and CTO Abe Lederman questioned the bias of information vendors in light of Google’s possible “preferential placement” in search results — “If Google Might Be Doing It…”  He asks, “If Google is being accused of such bias, might not EBSCO or ProQuest also have a bias?”

    Deep Web Technologies recently received an email inquiry from someone looking, very specifically, for a vendor-neutral search service.

    “We’re looking for a no vendor biased, easily applied and customised, reliable federated search. Can you help?”

    The short answer is ABSOLUTELY!  Vendor-neutrality is a Deep Web Technologies specialty and a core value.  In addition to the Deep Web Technologies simple setup and deployment process and our next-generation single-search technology, we believe strongly that no vendor, information provider, database or source should be weighted differently, return results more frequently, or appear highlighted in any way UNLESS requested by you, our customer.  The choice to assign different databases a higher or lower ranking should be dictated by you.  It is your search, after all.

  • 3 Reasons an Indexed Discovery Service Doesn’t Work For Serious Researchers

    Last week we alluded to three obstacles serious researchers face when using a Discovery Service index to retrieve information from their premium sources. Deep Web Technologies uses a real-time approach, choosing to search information sources on-the-fly rather than compiling information into an index as Discovery Services do, for some reasons we’ll discuss in a moment.  But, we should get a few definitions out of the way before we jump  in to why an index doesn’t work for all researchers.other blog posts, but this is a good opportunity to reiterate that not all researchers benefit from a single master index such as those used by Discovery Services.  Let’s look at some of the obstacles that serious researchers face when searching an index through a Discovery Service.

    1. Limited access to content
      Serious researchers like to search for critical information on sources they trust.  In fact, there are some researchers who choose to search only and ever their chosen few information sources, excluding all other databases to which their organization may subscribe.  So, a Discovery Service had better, by gum, include all of the “trusted sources” that their serious researchers use or the Discovery Service may not get much use by that researcher.   And herein lies the problem: There are information vendors, trusted sources with critical information, that simply don’t want to share their information with a Discovery Service.  That information is theirs. This is particularly true of sources within industries such as legal and medical. When an information vendor doesn’t permit their information to be indexed, the Discovery Service index won’t contain information from that source.  Bottom line: Serious researchers may not use a Discovery Service because it’s incomplete.  It just doesn’t contain what they need, or want to search.  
    2. Frequency of content changes and updates
      Information vendors, or content sources, vary in how frequently they update their own database of information.  Some sources update daily or even hourly while other sources may update their database with new information every few weeks.  Depending on the need, a serious researcher may require up-to-the-minute, just-published information for a critical, time-sensitive topic. The drive for data may not allow for old information, no, not even one day old.  Could you imagine Kayak, a travel metasearch site, showing day-old data?  It just wouldn’t work.   Current information can make or break a researcher, and a Discovery Service index that is updated weekly from some, or even half, of the trusted data sources it indexes may not give serious researchers what they need to stay current.  Bottom line: Serious researchers require current information, not stale data.
    3. Muddiness about source searching and clickthrough
      In some Discovery Services, searching one specific source (or even just a handful of sources) is difficult, even impossible. An index contains vast amounts of information, but doesn’t always allow researchers to limit or search a specific source easily, pinpoint the source they need, or click through to the metadata or full text of a document directly at the source.  Discovery Services often create a uniform-looking record for results to create a consistent look and feel, but contain no way to click directly to the record available at the source.  Knowing where information is coming from and clicking through to the result directly at the source can be an important step in the research process.  Bottom line:  A serious researcher can’t afford to spend extra time narrowing down their information sources, or performing extra clicks to go directly to the source itself.  They need to be able to search one or more of their trusted sources rather than the entire index, and click directly to the source for further review and additional information.

    At Deep Web Technologies we perform a real-time search, offering an alternative to the Discovery Service indexed approach. Our federated search retrieves information from sources that can’t be accessed through an index, sources that perform frequent updates to metadata, and sources that users want to search individually due to their more extensive, trusted database of information.  If your organization already uses a Discovery Service but you need more precise information for your serious researches, federated search can complement your Discovery Service by adding real-time results from content sources you don’t currently include in your Discovery Service.

    We’d love to hear from researchers about this – What has your experience been with Discovery Services?

  • Do You Mean Federated Search?

    Occasionally, DWT employees will test the water when describing what we do by dropping the term “federated search” and trying a more generic description.

    “We search all of the databases from a single search box in real-time.”

    “We perform a single search of subscribed, public and deep web databases…”

    “We capture results from your premium sources, the stuff Google can’t reach, and return them to you ranked and deduplicated…”

    If a person is plugged in to the search world, we usually get this response: “Oh!  Do you mean you do Federated Search?”  Bingo.

    Over the years, the concept of Federated Search has gone by many different names.  We’ve heard our technola-rose-by-any-other-name-would-smell-as-sweetogy called:

    • Distributed Search
    • Broadcast Search
    • Unified Search
    • Data Fusion
    • Meta-Search
    • Parallel Search
    • Cross-Database Search
    • Single Search
    • One-Search
    • Integrated Search
    • Universal Search

    For the most part, all of these mean about the same thing: An application or a service that allows users to submit their query to search multiple, distributed information sources, and retrieve aggregated, ranked and deduplicated results.

    But the question remains: Is “federated search” a master index, or a real-time (on-the-fly) search? And this is a very good question, given our familiarity with Google and their enormous public index.  Sol Lederman raised this question back in 2007 on the Federated Search Blog, What’s in a Name?

    “The distinction, of course, is crucial. If you’re meeting with a potential customer who believes you’re discussing an approach where all content is harvested, indexed and accessed from one source but you think you’re discussing live search of heterogeneous sources then you’re talking apples and oranges.”

    Deep Web Technologies’ real-time approach gives us an advantage over building a master index which we’ll discuss in our next blog post.   In the meantime, can you think of any other names for what we do?  We’d love to hear from you!

  • Google Part 2: Don’t Risk the Google Hazard

  • Google: The World’s Greatest Marketing Company

    Google has completely changed the way most of the civilized world gets its information. Most know that Google wasn’t the first, but thanks toGoogleAnswers their effective branding, few realize that Google isn’t the best. For a long time, I’ve made the claim that Google will not be remembered as the greatest technology company of all time, but the greatest marketing company. The name Google was inspired by the term “googal”, which means the number 1 followed by 100 zeros, and was intended to refer to the number of results that are returned with each search. However, this philosophy is diametrically opposed to first 30 years of online information retrieval, when librarians were trained to create search queries that were very specific, so that only a few search results would be returned. If too many results were delivered, librarians considered it a bad search, because the large set of results were too difficult to manage. Effective research was about accuracy, not quantity.  Like Heinz, which turned a huge problem, i.e. the ketchup wouldn’t come out of the bottle, into a marketing success, Google has convinced the world that large numbers of search results are a good thing. But the fact is when it comes to search, more is not better. Google is a victim of its own success. As more and more content pours into the Google index, search results are as diverse as they are voluminous.

    Make no mistake, Google’s search technology is significantly improved, but the problem is that its index is growing at a rate of 100% per year. It’s too broad, covers too many subject areas, and it is too dependent on the most popular links.  This is because Google is intended to be all things to all people.  If I want to see the menu  of a Chinese restaurant , or the show times for a movie, or the hours of operation of my nearest Target, or the phone number of my optometrist, there is nothing better than Google. But if you want to do serious research, whether it’s chemical engineering or art history, Google should never be your first choice. Unfortunately, Google has become the first choice for many professionals. While hospitals spend hundreds of thousands of dollars per year on peer reviewed medical information, 83% of physicians go to Google first to do medical research.  While academic libraries spend tens of thousands on the finest collections of digital content, students choose Google as their first and only source of information for research.

    Deep Web Technologies has spent the past 15 years aggregating content in real time, in specific subject areas, so that users could find the information they need quickly and effectively. We enable users to simultaneously search hundreds of content repositories specifically related to their subject discipline, delivering the most relevant results from the latest publications. So, when veterinarians are researching jaguars, their result sets don’t include articles about automobiles or football teams.  With Deep Web Tech, in addition to getting only relevant information, users get the most current information that has been published. DWT’s search technology does not require indexing, as our technology accesses the original source of the content. As soon as it is published, it is accessible to DWT customers. Just as important is the access to multiple sources from a single interface, which enables articles from other sources of content to be compared, side by side, without jumping from one site to another.

    Google has become the most popular search engine in the world. But popularity doesn’t always translate into quality, just take a look at prime time television.

     

  • Sixty Seconds to Find the Right Article – Mount Carmel’s Story

    We love it when our customers write about their search solution!  Mount Carmel Health Sciences Library (MCHSL) implemented their new federated search solution for a five-site hospital system and nursing college in October of 2012.  In October 2013, Mount Carmel published their initial review, titled “Implementation of a Federated Search in a Multi-Hospital System”, through the Journal of the Medical Library Association (JMLA).

    The most enthusiastic feedback that we have received was from a Mount Carmel College of Nursing instructor who used eSearcher to find articles for his course. He had allotted an hour in his schedule for the article search, but by using eSearcher his task was completed in sixty seconds! Other customer feedback has commended eSearcher’s authoritative results, fast response time, visual appeal, multiple filters, and overall presentation.

    One very important aspect that Deep Web Technologies offers Mount Carmel (and all of our customers!) is a neutral search engine. Many of our competitors weight their own information sources to rank higher on the results list, skewing result ranking in their favor.  “Medical Searcher offered impartial and seamless access to almost all of our premier medical databases.”  

    We hope that you are as successful as Mount Carmel Health Sciences Library in your searching!  And contact us if an impartial, strong search fits your needs.

Page 4 of 5«12345»