- Product Trial
On October 6, ProQuest announced its intent to acquire Ex-lIbris, the Israeli company that provides a number of library automation products and services, and also provides the Primo Discovery Service.
So, soon all three of the major Discovery Services – EDS, Summon and Primo will be owned by EBSCO and ProQuest, two multi-billion dollar companies, who are fierce competitors and whose primary business is selling content and not Discovery Services.
In the past I have written a number of blog articles including Discovery Services: The Good, the Bad and the Ugly in which I raised concerns that EBSCO and ProQuest don’t play nice together, limiting how much of their content they allow their competitor to provide in their Discovery Service. When they do include content from their competitors (not just the other Discovery Service vendor) they rank that content poorly.
Primo, owned by Ex-Libris, was different. Ex-Libris did not sell content and thus did not need to bias the order in which it presented results to the user.
In an October 6, 2014 article on its blog, Ex-Libris shared its frustrations on trying to reach a fair agreement with EBSCO that would allow Ex-Libris to index all of the EBSCO content in their Primo Discovery Service. I’m sure that these discussions will not become more fruitful now that Primo will soon be owned by ProQuest.
Carl Grant, in his October 8, 2015 blog post, comments on the ProQuest acquisition of Ex-Libris and in a paragraph on Content Neutrality underscores what I have been saying for years now:
Let’s not lose sight of the fact that we’ve lost another “content-neutral” discovery vendor as a result of this acquisition. That’s not a good thing for libraries, although most librarians ignore this reality. In the end, I believe they’ll regret doing so. We’ve had yet another check-and-balance removed from our supply chain. This post explains why content neutrality is so important and why that loss carries a potentially high price for libraries. So, in this regard, this is not good news. OCLC with their WorldCat offering remain our only content-neutral discovery solution at this point outside of open source solutions (which don’t’ have an aggregated metadata database like Primo Central, which provides important functionality for libraries).
From a Deep Web Technologies perspective I believe that this announcement is positive. Over time one of our competitors, Primo, will be absorbed into Summon, and libraries will increasingly see the advantages (transparency, neutrality and comprehensiveness of the content searched) that our next-generation Explorit Everywhere! federated search service has over Discovery Services.
Lots of libraries are doing it, sneaking off one day and subscribing to a discovery service. And after all, they’re catering to a new generation of researchers. The Google generation obsessively searches for everything – from restaurants to their next date. Many libraries are adopting discovery services as their new search of choice because it works fast and it’s good enough to give their undergrads something to work with that feels familiar. But, obsessive searching doesn’t necessarily mean good research.
Romanticized as the tool that will change researcher’s lives, discovery services are certainly intoxicating to think about. However, as some libraries are learning (and many of them are now our customers), subscribing to a discovery service doesn’t always shine the light deep enough into all of a library’s subscription information.
Fact: Discovery services arose, in part, from the resulting dissatisfaction of using those clunky old federated search tools back in the early part of this century. Federated search garnered a tough rap back in those days. However, just like the music industry, evolving from cassette tapes to CDs to MP3s to iPods, Federated Search has evolved as well. We consider Explorit Everywhere! the “next-generation” federated search. Explorit Everywhere! gives you a solid research foundation with plenty of bling to make your heart go pitter-patter. But, first and foremost, it’s about serious research.
Marshall Breeding, in his NISO whitepaper “The Future of Library Resource Discovery”, states “While there has been a major shift toward reliance on central indexes in support of discovery and away from technologies such as federated search, the change is not universal. Some institutions and projects have made deliberate choices to not adopt the index-based discovery model.
Stanford University, for example, has opted not to implement one of the commercial index-based discovery services. Even as one of the top research libraries in the world, it has not seen a great deal of interest from its patrons in having them acquire one of the commercial products”
Stanford happens to be an Explorit Everywhere! customer. It’s no coincidence that Stanford, as well as other top-notch research organizations, such as University College of Cork, abstain from discovery services. We’ve spoken of many reasons to steer clear from discovery services in previous blog posts, but for those still undecided, here is a good question to ask:
Who is your audience?
If your audience expects a vanilla solution that performs like Google and searches “enough” of your resources to get some decent looking results, then by all means, take the blue pill. But if your audience is comprised of serious researchers who need a comprehensive search of all of your sources, then we suggest abstaining from discovery services and find out how deep your information sources go.
Oh, and vendor neutrality? Try swallowing this pill: now that ProQuest is purchasing ExLibris, all three major discovery service vendors are owned by content publishers. Imagine where that puts results from your third-party sources or even information from competing discovery service vendors (the EBSCO-ProQuest conundrum).
One of our customers recently made this comparison: Discovery services are like a key to a storeroom full of goodies while Federated Search is like an archaeological dig. The more you dig, the more you find.
So here’s your tough choice: Obsessive searching, or serious finding?
One of Deep Web Technologies’ favorite customers is heading to the Enterprise Search conference in Europe, October 20 and 21. Anita Wilcox, E-resources librarian at the University College Cork Boole Library, will be speaking on their experience identifying requirements for a federated search or discovery services, and then choosing Explorit Everywhere! as their best solution.
CASE STUDY: Meeting User Requirements – Does One Size Fit All?Anita Wilcox, E-resources Librarian, University College Cork Boole Library
Academic libraries manage very diverse collections of documents and information. Four years ago UCC had to make a decision between implementing an e-discovery application or adopting a federated search approach. This presentation tells the story of that development from business case to the way it now meets user requirements.
The full Enterprise Search Europe program may be viewed here. We’ll be posting a followup after the conference! Stay tuned!
Deep Web Technologies proudly powers the technology behind many public search portals, some built for clients to surface their information, others built by DWT to showcase our technology. While most public portals don’t have the subscription or premium resources available to private clients, many of them have deep web resources otherwise unavailable through popular search engines.
Try one of these public portals:
Science.gov – Science.gov searches over 60 databases and over 2200 selected websites from 15 federal agencies, offering 200 million pages of authoritative U.S. government science information including research and development results. Science.gov is governed by the interagency Science.gov Alliance.
WorldWideScience.org – WorldWideScience.org is a global science gateway comprised of national and international scientific databases and portals.WorldWideScience.org accelerates scientific discovery and progress by providing one-stop searching of databases from around the world (Architecture: What is under the Hood). Multilingual WorldWideScience.org provides real-time searching and translation of globally-dispersed multilingual scientific literature.
National Library of Energy – The Department of Energy (DOE) National Library of EnergyBeta (NLEBeta) is a national resource to advance energy literacy, innovation and security. The NLEBeta is a new search tool designed to make it easier for American citizens to find and access information from across the DOE complex nationwide, without knowing DOE’s organizational structure. Such information is offered by DOE to advance its four broad mission areas:
- Science and R&D
- Energy and Technology for Industry and Homeowners
- Energy Market Information and Analysis
- Nuclear Security and Environmental Management
Missouri Digital Heritage – Through the Missouri Digital Heritage Initiative, the Missouri State Archives and the Missouri State Library, in partnership with the State Historical Society of Missouri, are assisting institutions across the state in digitizing their records and placing them online for easy access.
UNECA ASKIA – The Access to Scientific and Socio-economic Information in Africa (ASKIA) Initiative is under the Public Information & Knowledge Management Division of the United Nations Economic Commission for Africa. It defines a framework for bringing together scientific and socio-economic information for the African community, including scientists, researchers, academics, students, economists and, policy-makers, over an interactive online portal acting as a one-stop shop to such knowledge and associated information from/on Africa. The overall goal of the initiative is to strengthen knowledge discovery and access by tapping into global scientific and socio-economic knowledge on and from Africa.
Biznar – 107 business sources available for market and industry research. Includes search of patents and social media.
Mednar – A deep web medical search of over 60 public medical resources.
Environar – Environar searches scientific and government databases in support of environmental, chemical and biological research.
Forecasting and scouting of new technologies is integral to innovative companies. The need to expertly and systematically identify emerging tech is increasing as more and more digital records are added to the ever-growing pool of knowledge assets.
With our newly integrated service, DWT and Wellspring offer a simple approach to find emerging technologies. Rather than the hunter-gatherer method of looking at each source serially, we’ve created a solution to simultaneously search patents, publications, scientific journals and invention databases for fast tech discovery by innovation-driven companies. Within seconds of a query, organizations will have an aggregated list of relevant results gleaned from over 100 million records, which can then be routed for evaluation and action.
Dr. Robert Lowe, Wellspring’s CEO, said, “DWT’s tools and content integration are logical strategic steps of Wellspring’s product evolution. DWT represents another industry-leading data service we have integrated into our products this year, along with CPA Global and DocuSign. The seamless integration enables our clients to find and import critical information without any data entry. Ultimately, the DWT partnership results in better informed decisions for our clients.”
Abe Lederman, Deep Web Technologies’ CEO and CTO, added, “We are very excited to work with Wellspring in the technology scouting and corporate venturing market. We worked closely with Wellspring to provide their customers with access to a curated set of high-quality science and technology information sources, including Wellspring’s Flintbox, a leading technology marketplace for technology transfer and technology scout professionals.”
Spend Matters July 29, 2015 post, Accenture and the Future of Procurement Technology: The Virtual Company Mall, discusses one of five focus areas for advancement in e-procurement technologies mentioned in the Accenture Report: Procurement’s Next Frontier – The Future Will Give Rise to an Organization of One. According to Accenture, the five focus areas are the Virtual category room, the Virtual supplier room, the Supplier network, the Virtual company mall, and Supply analytics, and they are required for future procurement success. These solutions will unite: “cross-functional teams, internal stakeholders, business users, other buying companies, logistics providers, unchartered suppliers, transactional suppliers, strategic suppliers and innovators.”
Accenture describes the Virtual Company Mall like this:
Owned and managed by procurement, the Company Mall will feature a cloud-based set of pre-approved private and public ‘shops’ (i.e., including content from outside the company) ?from which internal customers can select goods and services, supported with business logic that guides their purchasing based on policies, preferred suppliers and contracts. The mall will include a robust service desk that directs customers to the right shops and provides spot-buy services as required, as well as virtual agents delivering consistent and automated buying support. These services will be enabled by a mix of digital disruptors, including cognitive systems through intelligence augmentation and, where possible, intelligent automation.
What will it take to make the Virtual Company Mall a reality? You guessed it: Federated Search. Spend Matters states: “It will require a strong e-procurement capability inclusive of powerful federated search, including internal catalog, supplier hosted catalogs, traditional web storefronts and dynamic storefronts.”
The existing Explorit Everywhere! federated search capabilities could easily be applied to future procurement technologies and the Virtual Company Mall vision. Integrating federated search technologies like Explorit Everywhere! will smooth the transition from current procurement practices to the single search, Virtual Mall. Consider:
- Searching disparate sources, such as internal catalogs, supplier catalogs, and multiple storefronts in real-time for up-to-date prices, descriptions and codes.
- Aggregating results for top relevant products and services to maximize efficiency.
- Flagging duplicate products and services for easy vendor/product comparison.
- Narrowing results to small selections to save to the supply chain or to purchase products or services outright.
- Retrieving information about products or services from obscure companies or little known vendors ensuring comprehensive source research, comparison and vendor discovery.
- Integrating best practices for search and retrieval of products, as well as e-commerce technologies, thereby creating a seamless experience for the procurement team.
Federated search is Deep Web Technologies’ specialty. We are the experts in single search technology. Aggressive adopters of new procurement technologies should contact Deep Web Technologies to discuss partnership opportunities.
Explorit Everywhere! supports all kinds of fancy search logic, such as Boolean operators, nested parenthesis and advanced search fields. This is a big selling point for organizations with researchers who want to search for a precise bit of information, retrieve a small result set, and then narrow that set of results even further with filters and sorts.
Serious researchers know where their information might be hiding, how to search for their information, and what search string may bump their information out of the source and into their lap. Using Explorit Everywhere! can save these researchers time and effort; one search across all of the resources they need to examine takes just a quick click of the search button.
Researchers familiar with a wide range of resources also know that some database search engines are stuck in the stone age. These engines simply do not process advanced logic such as parenthesis, wildcards, many advanced search fields or even Boolean operators. When researchers search these directly, they must use basic search strings to even retrieve results at all. For these engines, searching with broad queries, and iteratively searching and reviewing results is just part of the package. In contrast, a few modern search engines can handle extremely complex queries, replete with parenthesis, quotations, and Booleans and wildcards, handing the user a golden platter of relevant results.
Explorit Everywhere! usually includes both types of sources, with search engines supporting the very simple to the most complex queries. When a user submits a search string, Explorit Everywhere! must first evaluate the query. The resource connector, a bit of code that submits the query string from Explorit Everywhere! to the source, is programmed to “know” the source parameters and limitations. The connector acts as a proxy for the researcher by submitting the query to the resource, then retrieving the results for Explorit Everywhere! to rank against results from other sources. In many cases, the query string is submitted to the source exactly as it was entered by the user. In other cases, however, the query must be reshaped to make the string more acceptable to source idiosyncrasies. And, although Explorit Everywhere! connectors are very good at understanding how a source work and optimizing the query to submit to the source, there is only so much they can do when faced with a complex query and a Neanderthal source. It’s like trying to fit a square peg into a round hole.
To ensure consistent results retrieval, complex queries may not be the best place to start a search. If a complex search through Explorit Everywhere! yields results from only a few sources, researchers should consider rephrasing a query into simpler terms. To ferret information out of selected sources, including the more primitive sources, start with a simpler search to retrieve a broad results set from most or all of the selected resources.
Once the results are retrieved, researchers can view the whole playing field – all the results, across all of their resources. Starting from this vantage point, the informed researcher can narrow that field with laser precision, using filters, sorts, tabs and clustering, or iterate their search based on their initial findings.
Of course, users can still search with complex Boolean strings, taping together a montage of brackets and parenthesis with a patchwork of wildcards and quotations. Explorit Everywhere! will dutifully perform a search and no doubt return results, often with great success. For those elusive results, consider broadening the search query, and narrowing the results set after all sources have returned their results successfully. While this process may seem backward and less efficient initially, it ultimately delves deeper into those entrenched databases containing pertinent information.
As much as we like to think that Explorit Everywhere! is simple to use, it still holds the junior heavyweight championship title for feature-rich technologies. Throw on top of that the concepts of “federated search”, “Deep Web”, and “discovery services”, and it’s easy to get lost in a maze of information. Since Deep Web Technologies is all about pulling the needle out of the haystack for you, we thought it was time to create an easy reference post on the world of Explorit Everywhere!. Want to know where you can find out about how we rank results? How about federated search or the Deep Web? Take a gander through some of these posts:
Explorit Everywhere! Features
- Clusters that Think
- Explorit Everywhere! Goes Mobile
- The Art and Science of Deduping Results
- Connectors: Federated Search’s Strength or Achilles Heel?
- Getting the Best Results vs. Getting All of the Results
- Alerts: Automatic, Efficient and Connected
- Analyze This!
- A Little Thing Called “Source Order”
- Explorit now features visual clustering
- Ranking: The Secret Sauce for Searching the Deep Web
- Federated Search Finds Content that Google Can’t Reach – Part I of III
- A Federated Search Primer – Part II of III
- A Federated Search Primer – Part III of III
- Do You Mean Federated Search?
- No Alternative to Federated Search
- Next-generation federated search
- The Origins of Federated Search
The Deep Web
- The “Deep Web” is not all Dark
- Relying on Google for Science Information is Bad for Your Health
- Deep Web: Legal Due Diligence
- Google: The World’s Greatest Marketing Company
- The Deep Web isn’t all drugs, porn, and murder
- Discovering Discovery Services
- Discovery Services: The Good, the Bad and the Ugly
- Discovery Services: Overhyped and Underperformed
- Is Vendor-Neutral Searching Important?
- 3 Reasons an Indexed Discovery Service Doesn’t Work For Serious Researchers
- Is Speed Worth It?
People tend to think of Google as the authority in search. Increasingly, we hear people use “google” as a verb, as in, “I’ll just google that.” General users, students and even professional researchers are using Google more and more for their queries, both mundane and scholarly, perpetuating the Google myth: If you can’t find it on Google, it probably doesn’t exist. Google’s ease of use, fast response time and simple interface gives users exactly what they need…or does it?
Teachers say that 94% of their students equate “Research” with “Google”. (Search Engine Land)
“Another concern is the accuracy and trustworthiness of content that ranks well in Google and other search engines. Only 40 percent of teachers say their students are good at assessing the quality and accuracy of information they find via online research. And as for the teachers themselves, only five percent say ‘all/almost all’ of the information they find via search engines is trustworthy — far less than the 28 percent of all adults who say the same.”
Do teachers have a point here? Is it possible that information found via search engines is less than trustworthy, and if so, where do teachers and other serious researchers need to go to find quality information? Deep Web Technologies did a little research of our own to see just how results on Google vs. popular Explorit Everywhere! search engines differs in quality of science sources.
How Google Works
Google, and other popular search engines such as Bing and Yahoo, search the surface web for information. The surface web, as opposed to the Deep Web, consists of public websites that are open to crawlers to read the website’s information and store it in a giant database called an index. When a user searches for information, they are actually searching the index of information, not the website itself. The results that are returned are the ones that people seemed to like in the past, or most popular results for the query. That’s right…the most popular…not necessarily the most relevant information or quality resources.
We should probably also mention those sneaky ads at the top of the page that look informative, but can be quite deceptive. A JAMA article states this about medical search ads:
“Many of the ads, the researchers noted, are very informational — with ‘graphs, diagrams, statistics and physician testimonials’ — and therefore not identifiable to patients as promotional material.
This kind of ‘incomplete and imbalanced information’ is particularly dangerous, they note, because of its deceptively professional appearance: ‘Although consumers who are bombarded by television commercials may be aware that they are viewing an advertisement, hospital websites often have the appearance of an education portal.'”
Researchers thinking that Google reads their mind and magically returns the right information on the first page of results should think again. The #1 position on a Google results page gets 33% of the traffic, so is a highly sought-after spot on a Google page. Unfortunately, with SEO tricks inflating page-rank on Google and ads vying for top spot, that number one result, or even the top page of results, may not be entirely germane or even contain much scholarly content. But those results rank high because they’ve worked the Google system.
So, a search performed on Google may return educational results, but the source itself may be unreliable, pure opinion or even company marketing as in the example above. For those needing credible information from recognized, authoritative sources, Google results just don’t cut it. For example, searching for the term “Climate Change” and organizing the top 25 results into categories – Opinions, News, Government, Ads, Wiki Sources, Peer Reviewed and Education – we find that the two biggest categories are News and Opinions. This doesn’t support Google as an authoritative source of information for scientific research.
Where are Quality Science Sources?
Scholarly researchers may need some publicly available information, but more often than not they need information that is not publicly available, i.e. from Google. Much of what they look for is in password protected repositories, subscription databases, or part of an organization’s internal collection of information. These sources of information are not available to Google’s crawlers, so they are not available through Google. Databases and sources of information like these are part of what is known as the Deep Web. The Deep Web contains 95% of the information on the Internet, such as scientific reports, medical records, academic information, subscription information and multilingual databases. You can read more about the Deep Web here.
How is a Deep Web Search Better than Google for Scholars?
For scholars needing to go deeper into their research, Deep Web databases often contain key information and current data unavailable through Google.
Deep Web sources must be searched through specialized search engines, like Explorit Everywhere! by Deep Web Technologies. Explorit Everywhere! combines all of the Deep Web resources, making them available to search from a single search box, kind of like Google. But, there are no gimmicks, SEO tactic to get the results higher up on the page or sly ranking systems that websites can use to maneuver themselves into the number one position. It’s a simple matter of good sources and good results, aggregated and ranked so the best results are at the top. Don’t worry about wading through ads or junky opinions; if you’re searching through Explorit Everywhere!, you are searching high quality, relevant sources.
Explorit Everywhere! outperforms Google by eliminating the clutter and providing dependable, scholarly sources of current information to the user. Time and again, Explorit Everywhere! has proven itself to find the needle in the haystack for serious researchers.
Do Your Own Comparison – Google vs. Science.gov
Most Deep Web search engines are, well, Deep. They aren’t freely available because the sources themselves are private or only available to registered users. Most academic libraries subscribe to premium sources of information, for example, and those databases are considered part of the Deep Web since they aren’t available to search through Google. And, while some reputable sources of information that once existed only on the Deep Web, such as PubMed and NASA, are now publicly available through Google, these sources tend to get buried amidst other results so they aren’t always easy to find. Many libraries feature these authoritative databases in guides, links or in search portals like Explorit Everywhere! simply to highlight the source rather than forcing users to wade through un-relevant results.
There are a few publicly available search engines where you can test drive a Deep Web search and see the difference for yourself. Science.gov, developed and maintained by the DOE Office of Scientific and Technical Information, uses Explorit Everywhere! to search over 60 databases and over 2200 selected websites from 15 federal agencies. The results are from authoritative, government sources, and extraordinarily relevant. When you perform a search on Science.gov, there is no question about the sources you are searching. Explore the difference!
Whether you are a student or scientist, knowing where to start your science search is very important. In most cases, serious research doesn’t start with Google. A 2014 IDC study shows that only 56% of the time do knowledge workers find the information required to do their jobs. Having the right sources available through an efficient Deep Web search like Explorit Everywhere! is critical to finding significant scientific information and staying ahead of the game.
A question we hear regularly is, “Why doesn’t Explorit Everywhere! return all of the results from every source that is searched?” For example, if a user goes directly to a source to search, they may find thousands of results for their query. But, performing the same search on an Explorit Everywhere! application may only return 100 results from that source. Why aren’t we returning the thousands of results like the source does?
DWT specifically returns up to 100 of the top results (unless our customer specifies that we return more) from each source to ultimately avoid overloading the user with information that may not be relevant to their search. Because the majority of Explorit Everywhere! applications have at least 10 sources of information, if each of those 10 sources returns 100 results, then the user will see 1000 results for their query, divided into a default of 20 results per page, or 50 pages of results total. Each of those results has been ranked as relevant to the user’s query with the most relevant results across all of the sources on page 1. Of course, we hope that the gold nugget is right at the top, front and center. (You can read more about how DWT ranks results on this post – Ranking: The Secret Sauce for Searching the Deep Web.) But, if we returned all of the results from all of the sources, then the total number of results and total number of pages increases to a dizzying number.
We know from the countless SEO studies on Google’s results placement that the majority of consumers using Google rarely click to the second page of results. In fact, page 2 and 3 may get only around 6% of the clicks on any given search. (Marketing Land). While the likelihood of next page clicks does go up with age and education, we’ve found that it is unlikely even for our erudite Explorit Everywhere! users to click through 50 pages of results. Rather, most researchers will perform a new search or refine their search if they don’t find what they are looking for within the first couple of pages.
And while we’re being honest, although Google may say that they have found millions of results for your search, they actually don’t return all of those results for you to view. If you set up Google to display 100 results per page, and then perform a search for “climate change” you may see that Google found about 142,000,000 results, and that there are 7 pages available to scroll through. However, once you get to page 4, you are unable to scroll further, and will see this message: “In order to show you the most relevant results, we have omitted some entries very similar to the 354 already displayed. If you like, you can repeat the search with the omitted results included.” Try clicking that link to see how many results you actually get – probably around 900 results, or 9 pages. So much for the millions that are available! Even Google tries to avoid overloading users by limiting the vast number of results for broad queries.
For those researchers preferring to narrow their research to just one or two relevant sources on Explorit Everywhere!, it’s possible that 100 results from each source may not be enough. If a source has particularly relevant results, then we suggest capitalizing on that information by going directly to that source to continue searching. Use Explorit Everywhere! as a tool to not only find relevant results, but to find relevant sources of information to further your information discovery.