- Product Trial
DWT’s own, Frank Bilotto, comes to you live from New York City where the fans of the Seahawks and Broncos are gathering to watch Superbowl XLVIII at the MetLife Stadium in New Jersey. See his take on how to educate yourself with Deep Web Technologies before the big game.
2014 has turned the corner and Deep Web Technologies is moving quickly toward some exciting goals. Our developers are draining the java and pounding the keyboard to produce a new version of Explorit that will allow greater flexibility and more robust features. We’re very excited to see the nimble and feature-rich product of much planning and work right around the corner.
Throughout 2014, we’ll be lifting the veil on new features and functionality, some of which will automatically be included in our updated Explorit. Other features will accompany customers with more specialized solutions. Here’s a sneak peek!
Multilingual Searching: We’ve had great success with Multilingual Searching in WorldWideScience.org, so we thought that we’d take our Multilingual searching to the next level by opening up the languages we can include. Languages are complicated, and we’re taking Explorit to the next level to deal with that complexity. Our multilingual searchers who want to find precise results translated from other languages into their language will find our updates very simple and easy to use.
Mini-Explorit: Many of our customers have asked about getting a smaller “widgetized” version of Explorit for a page in their website. Introducing…Mini-Explorit! Mini-Explorit can sit on a webpage, search and show results quickly without going to the full application. It’s self-contained. With this new feature, we’ve got you covered for those fast homepage searches.
My (Personal) Library: With the My Library feature, users can select results and save them to their My Library. When they log out, their results will be saved, safe and snug, awaiting their return. Of course, at any time, users can email, print, export or just keep saving their results. We’re not picky, and neither is our My Library.
Oh, there are plenty of other slick features coming down the pike. Interested in knowing more about these and other features? Let us know!
Here’s hoping you find a fulfilling 2014!
It’s been 3 years now since the epic (I jest) showdown at the Charleston Conference between EDS and Summon, the EBSCO and ProQuest Discovery Services. I wrote extensively about these Discovery Services and the issues that they raise in one of our companies’ most popular blog articles:
Three years later a lot of the concerns that I raised are starting to be seen as real issues with Discovery Services by more and more people8 Things we know about Web Scale Discovery systems in 2013
I would of course be remiss if I didn’t point out that our Explorit product is a next generation federated search product that has solved the issues with these early generation products. That’s the subject of another blog post.
Now for some of the bad. I’m going to focus on two areas: transparency and the “one shoe fits all”. Perhaps my biggest issue with Discovery Services is their lack of transparency. Discovery Services don’t make it easy (maybe impossible) to determine what content is in their index. Just because content from a particular publisher is included in their index, doesn’t mean that all content from that publisher is included in their index. Just because content from a particular journal is included in their index doesn’t mean that all content from that journal is included in their index. Recently I was speaking with a prospect who was evaluating Explorit vs. a Discovery Service and was surprised to find out that some specific physics content that was very important to them only went back 10 years. What about Discovery Services indexing meta-data only vs. meta-data and full-text. Do you know when full-text is indexed?
Discovery Services are aimed primarily at academic libraries. Their “one shoe fits all” approach means that it is hard for users to limit their searches to a particular set of content, e.g. engineering content, or music content or history content. Also, because of their “one shoe fits all” approach, Discovery Services are much less appropriate for specialized research areas such as medical research, legal research, energy research or pharmaceutical/life sciences research.
Now for the ugly. I’ve been writing and speaking at conferences for a while now on the issues that Discovery Services have, especially issues with being reliant on the relationships that they establish with content owners for what content they are able get into their indices and what content they are not able to get into their indices.
So just a few weeks ago Thomson Reuters, publisher of the popular database Web of Science, announced that they were pulling Web of Science content from all the major Discovery Services. Although, at least for now, Thomson Reuters backed-off this decision, Thomson Reuters highlighted one of the ugly things about Discovery Services, that they lack control of what goes into their services. I wrote about this in the recent blog post:
Continuing with the ugly, it is really problematic that the two major Discovery Services are companies that fiercely compete for libraries’ database subscription budgets and thus have no interest in including their competitor’s content in their index and when they reluctantly do so EBSCO will make sure that ProQuest and Gale content will rank low in EDS and ProQuest will make sure that their competitor’s content ranks low in Summon. Back in 2010 I wrote a blog article:
which addresses the concerns that every librarian should have with the lack of vendor neutrality by the Discovery Services.
Ed Chamberlain, a librarian in the UK laments on the problem that Discovery Services’ owners don’t play nice with each other in the recent blog article:
Also you might want to read the following editorial asking Discovery Services to play nice:
In conclusion, enough concerns have been raised about Discovery Services that NISO, the National Information Standards Organization, has recently released a draft recommended practice as part of their Open Discovery Initiative:
Dr. Walter Warnick, Director of the U.S. Department of Energy Office of Scientific and Technical Information (OSTI), is retiring in early January 2014 after 17 years as Director. Dr. Warnick embraced the idea of knowledge dissemination, and championed aggressive efforts to capitalize on technological advances such as federated search. Abe Lederman, DWT’s President and CTO is attending a retirement celebration today at OSTI in Oak Ridge, Tennessee. Abe’s remarks on his years working with Dr. Warnick are below.
I recall vividly my first meeting with Dr. Warnick, Walt to me, in his office in Germantown in 1999. A few months back we had launched our first Explorit application at OSTI, the Environmental Science Network and were working on our second OSTI Science.gov and other OSTI portals, Walt was always focused on making more and more content available to as he called them, the “science attentive citizen” and helping to accelerate scientific progress. Walt has been a visionary and a very uncommon Federal Government executive. He always challenged his staff, was always willing to take risks and was never patient with things that took too long to accomplish.
The science portal that I am most proud of having developed is WorldWideScience.org, a portal that now searches over 100 of the best science portals from all over the world, including databases in multiple languages. WorldWideScience got its start as a collaboration between the DOE Office of Science under Ray Orbach and the British Library now under Madame Lynn Brindley. I recall fondly attending a number of major events for WorldWideScience.org in London, Seoul and Helsinki. Walt managed to establish a close relationship with Tony Hey, a senior Vice President at Microsoft who has become a strong supporter of OSTI.
Walt has promoted WorldWideScience globally. At the highest levels of the State Department WorldWideScience is used as an example of scientific collaboration between the U.S. and the Chinese and Russian governments. In a presentation to the U.N. in Geneva in 2011 Walt spoke about how WorldWideScience had become an important resource to help bridge the digital divide between countries that have access to the latest scientific information and those that don’t.
Three years ago I wrote a post for my company’s blog, Reminiscing on a 12 year partnership with OSTI, inspired by a wonderful article that Walt had written – Federated Search as a Transformational Technology Enabling Knowledge Discovery: The Role of WorldWideScience.org. In that blog post I had promised to highlight a few of the accomplishments of our partnership with OSTI in my next blog post. Although a bit late I now have the content for that follow-on blog article and promise to post this talk to our blog tomorrow.
Walt, best of luck with whatever you do next. I’m sure that you will continue to promote the causes that are dear to you exemplified in the OSTI Corollary – Accelerating the sharing of scientific knowledge accelerates scientific discovery.
GCN gave the Science.gov Mobile app (which runs on the Android and on the Mobile Web) scores of 7 for usefulness, 8 for ease of use, and 8 for coolness factor.
The Science.gov website has this to say about the accolade:
Coolness? Check. Usefulness? Check. Ease of Use? Check. The Science.gov Mobile application has been named among the Top Ten in Best Federal Apps by Government Computer News (GCN). The recognition is timely, too. The Administration recently issued Digital Government: Building a 21st Century Platform to Better Serve the American People, the strategy which calls on all federal agencies to begin making mobile applications to better serve the American public. GCN called its Top Ten “ahead of the curve” with apps already in place.
I downloaded the application to my Droid X. The install was effortless and the app has a very intuitive user interface, which allows for emailing of search results for later review.
While we didn’t have any involvement in creating the mobile app we did develop the search technology that powers Science.gov as well as the web services API that enables searches by Science.gov Mobile.
We’re quite delighted to see Science.gov serve the mobile web.
The other day Abe received in the mail the document from the US Patent and Trademark Office (USPTO) granting us a trademark and service mark on the multilingual version of Explorit®, our federated search system. As any of you who have filed patents, trademarks or service marks surely know, the process is arduous and time consuming. So, needless to say, we’re delighted to have received the USPTO document.
Curious to see when we first began promoting Explorit®, I took a journey back in time, courtesy of Archive.org, aka the Wayback Machine. Deepwebtech.com was first crawled by Archive.org on August 2, 2002.
This was our original logo:
And, here’s a piece of the original description of Explorit®:
Explorit provides the capability to deploy small to large-scale collections of information on the web – fully searchable and easily navigable – to a wide range of user communities. Large organizations or information purveyors with many collections of heterogeneous information benefit from the consistency and usability of the Explorit user interface: whether they deploy one collection or one hundred, users quickly learn that all Explorit applications operate essentially the same way, and variances are determined by content rather than inconsistent design.
While Explorit® has greatly evolved over the past ten years some things never change. Yes, the architecture, the user interface, and the back end software were completely rewritten years ago to exploit modern programming technologies and web services standards. And, yes, the features have evolved to keep up with market demand. But, the values which drive the development of our software hasn’t changed. Explorit® has been and always will be about helping libraries and research organizations to mine the deep web for the most useful information from dozens or even hundreds of high quality sources.
We’re proud to have that piece of paper; we’ve framed it. But, more important than the document is what it represents – a commitment to serving research by being on the leading edge of information retrieval.
The Universidad Católica de Temuco (UC Temuco) in southern Chile is proud of their new search portal that we built for them. They recently shared some of their excitement in their official announcement. (While the announcement is in Spanish, Google Translate does a decent job of producing readable English.)
My version of the translated text, in part, reads:
The UC Temuco Library System implements powerful metasearch engine for academic information. 2012-05-16 16:08:55.
With the aim of modernizing and implementing new technologies that support education, the new metasearch engine, “Metabuscador DeepWeb,” will benefit students, teachers, and investigators in the search of bibliographic material.
A joint effort of the General Directorate of Research and Graduate Studies and the UC Temuco Library System produced the implementation of Metabuscador DeepWeb, a tool that will provide access from a single search interface to all academic resources to which the University subscribes, repositories, the library catalog, and databases with free access such as Scielo.
UC Temuco cites a number of reasons for valuing the contribution of Metabuscador DeepWeb:
- Improving the time to find documents
- Improving the search experience through customization
- Meeting the demand for increased technology
- Increasing the utilization of lesser known resources
Metabuscador will be available to teachers and students beginning the week of May 22nd (2012), during which time training will be provided.
The appeal of a unified index to search all of an organization’s collections is undeniable. Lightning fast results make you feel as though you are searching Google, and this is particularly appealing for those trained to research on some of the more popular, less credible search engines. There are, however, some problems with the unified index approach which mirrors the approach taken in traditional web indexing.
The Need for Speed
Most researchers understand a unified index as an aggregate of results from multiple databases which quickly summons a result from the database on a user query. Unlike popular search engines such as Google, a unified index does not crawl publisher’s content, but is updated according to a schedule from meta data repositories. This equates to an information lag, depending on how often the publisher content is updated and how long it takes to get the information into the repository. For example, if content is updated weekly, researchers needing up-to-the-minute information will miss relevant results from the publisher’s repository of results because the unified index is not current. Federated Search, on the other hand, searches repositories in real-time with a time lapse of usually less than thirty seconds, returning articles just published to the database as well as other relevant results. While this relies on each publisher’s index to be healthy and ready for search, the majority of publishers monitor their applications to minimize down-time. For many researchers, real-time results is the critical factor to success.
Every index decays over time; sources change their data structure and the meta data must be loaded and changed in the index. The less frequently an index is updated, the greater the lag between the index results and the changes made by a publisher. This may not be a problem if you do not need real-time information, but can present problems for researchers needing time-sensitive data. Anyone who has done a search on the web and has encountered a 404 or 403 error has encountered an example of index decay.
Part of the beauty of federated search is the opportunity to discover new sources of information. If a particular source of information is returning 2000 results for the term “blue light semiconductor” directly on the source page, and 100 relevant results to the federated search engine, it may be worthwhile to investigate that source further. In a unified index, the water is muddied because the index serves as the source and it isn’t clear which source returned which results or how many. This severely cripples the source discovery process of so called “unified discovery services”. Not knowing what you are searching can be very problematic for researchers or business people.
Most unified index discovery engines encourage publisher relationships at the expense of broadening their source lists. For organizations using a small set of publishers, this may not present a problem. For those depending on other publishers for important data, this is no molehill, but a mountain. By eliminating the need for “connectors” and the maintenance associated with them, a unified index has created just a single “connector” to information: the index itself. Inflexibility of adding new collections indicates that what you are getting is another boxed solution. The unified index presents a viable option for those desiring search speed and a cost efficient approach to distributed search. The lack of customization for these services may present significant problems for those organizations needing up-to-date information from publisher’s they specify. Is speed worth it to your organization?
We’ve started using the phrase “next-generation” in reference to our federated search product, because it seems to us that the phrase “federated search” has become somewhat maligned and doesn’t quite capture the new capabilities and features in our product. What are those new capabilities and features, you ask, that raise our product to the level of what we think is “next-generation?” Here is a partial list, and they have two general themes. First, they directly confront some of the shortcomings many people think about, when they discuss federated search as an abstract concept. Second, they take advantage of the features and benefits commonly associated with Web 2.0 apps.
- Simultaneously search more than 10 collections at a time. Incredibly, there are quite a few solutions out there that maintain a limitation of 10 or so collections at any one time. They will try to bury this fact in details, such as enabling many more collections, but grouping them in such a way, that no more than 10 collections are actually searched simultaneously.
- Instantaneous results. Impossible you say, because a federated search must wait for all collections to return their results? Our technology provides incremental results, so that when a search is performed, it will gather (plus de-dup and rank) and display results from those collections returning their results quickly. It will continue to gather results, until all collections have either returned their results (or have timed out). This approach greatly speeds results provided to a researcher, as well as decreases frustration levels from having to wait for slow collections to return results.
- Greatest common denominator. One of the earlier deficiencies of federated search, was their complete reliance on the search engines (i.e. collections) they search. Because some search engines have poor support for Boolean logic or don’t support particular operators, or don’t provide clean, clear or consistent metadata, older federated search platforms would build connectors to serve “the least common denominator.” This basically means, if a particular collection being federated didn’t support the “not” operator, the federated search engine wouldn’t support it either. Our technology is different, in that we do not dumb-down our connectors, but make them intelligent enough to feed the correct parameters into each collection, regardless of the limitations of the other collections. This means that if one collection doesn’t support the “not” operator, that’s okay, because we will still feed the “not” operator into those collections that do. This provides for better results.
- As good as an index. Impossible you say, because an index is always faster and has more than metadata to search? Not impossible I say. First off, many solutions are merely an index of metadata, nothing more. So, they are searching the metadata only (and missing articles that contain your search query, but not found in the metadata). Compare that with federated search: It queries collections that do have full articles, and therefore obtains a list of all articles relevant to your search. This provides for a more complete search. Second, federated search — through incremental results — can provide results very quickly.
- Deeper is better. Older federated search technologies have very low limits on the number of results obtained from federated collections (some as low as 10). This limits the utility of federated search, because many relevant articles could be missed. Our results limits can be managed on a collection-by-collection basis, ensuring you don’t miss important, relevant material in your search, from your favored collections.
- Social network. Social networking is here to stay, and for some research needs, an important source of information. Regardless of the social network, collection or source of information, our connectors can connect. This increases the value of your federated search.
- Web 2.0. Our technology follows Web 2.0 standards and concepts, enabling federated search to become a component of just about any web-based application (through web services), and in the near future, we intend to provide additional Web 2.0 functionality, including but not limited to themes, collection and article ranking and comments, collaboration and many other features.
For these reasons and more, this is why we feel we can no longer talk about “federated search,” without saying “next-generation federated search.”