Finding the web content not indexed by search engines

Why your online research is falling short

Access to the most current and reliable information is a prerequisite to sound decision-making and innovative thinking.

Yet too many knowledge workers are tasked with finding information on their own, on the open internet, and increasingly that includes workers who often are not experienced or trained in doing information research and content discovery.

We all know that content uncovered by popular search engines like Google or Bing is often neither the most current, nor reliable. The “surface web” that search engines have indexed is quite literally the tip of the information iceberg. And internet searches have other downsides:

  • Sifting through pages of search results that are not quite relevant is inefficient, time-consuming and disheartening.
  • It’s the same information available to everyone, so there’s no real advantage.

Researching the deep web versus the surface web

Perhaps you have seen the terms “deep web” and “surface web” and even “dark web” to categorize the seemingly infinite ocean of online information. Following are commonly shared definitions of these three categories:

  • Surface web

As already mentioned, this is the content that has been discovered and indexed by popular search engines. Estimates are that no single search engine indexes more than 15-20% of the content available on the Web. And even when content is indexed, some applicable results may not appear in the top one or two pages, and few people scroll deeper than a few pages.

  • Deep web

This is the information lying below the surface, estimated to be hundreds of times larger, much better in quality, and growing faster than the surface web. Some deep web content lies behind paywalls or membership barriers, but essentially it refers to content that is not indexed and therefore invisible in most internet searches.

  • Dark web

The dark web is a subset of the deep web, requiring special software and user processes to access. Because it allows people and website operators to remain anonymous and untraceable, the dark web is where most illegal online activity takes place.

It’s the deep web (excluding the dark web) where LibSource researchers and other trained researchers do their work. This is where the best information can be found – information that leads to true insights, sound decisions and more creative thinking.

Webinar: Deep Web, Deep Insights

I am going to be presenting my thoughts on mining the deep web for deep insights on February 28 at 10:00 am PT / 1:00 pm ET. If you cannot make that date and time, I encourage you to register anyway, as we will make the recording available to all registrants. For more information and to sign-up, click here.

I will discuss why it’s so important in law and business to focus your research efforts there and share a couple of examples from our own virtual research team. And while lawyers and law librarians are familiar with it, the deep web remains mostly hidden from other user groups – like marketing, sales and business development, product research and development, finance and other critical areas.

Skilled researchers and news aggregation/current awareness platforms – the one-two punch

The solution to the deep web research dilemma is twofold, encompassing both skilled researchers and technology. This ideal combination increases productivity, alleviates stress and enhances the value of information to the entire organization. In my years of experience and working with LibSource clients, I haven’t found anything more effective.

Experienced researchers can pay for themselves in the value they offer as a shared resource, bringing guidance and uncovering the most reliable, meaningful, hard-to-find information. As for technology, the options for platforms that aggregate news and other content into one portal are growing as organizations realize their worth.

Together, they offer three essential value propositions:

  1. Streamlining and consolidating content from a variety of curated sources.
  2. Improved search capabilities and visualization of results.
  3. Reporting and distribution of select content to various end users.

 

On the technology front – content/news aggregation and current awareness portals

We were recently asked to do a high-level comparison of several features across five aggregator options and decided to delve a bit deeper and create a more extensive report. We did not attempt to make it an exhaustive list of all options, nor should this be construed as a “top 5” list. It’s simply five of the largest platforms currently on the market:

  1. InfoNgen

  2. LexisNexis Newsdesk (formerly Moreover Newsdesk)

  3. Manzama

  4. Meltwater

  5. Vable (formerly Linex Systems)

 

While not a detailed feature comparison across all five platforms, we touch on:

  • Needs and utilization and why to implement one.
  • A high-level overview of content sources and portal capabilities.
  • Caveats and considerations for the evaluation and selection process.
  • Overview of the five vendors behind the platforms.

The report is titled, “Nothing but the relevant content” and yes, for those of you who noticed, we were inspired by the oath-taking, “nothing but the truth” affirmation. It was done with law firms in mind, yet the information is useful to corporations and other large organizations.

All of the five platforms offer the same basic functionality of delivering relevant content and only the information that users need. They all have agreements with publishers in law, business and other industry sectors, as well as trusted general media outlets and even include some social media networks and blogs. Amazingly, although they cull the universe of online information, all of them are pulling from thousands of resources. The volume of content is so rich that it still requires effort to sort the irrelevant from the meaningful. One of our researchers summed up the dilemma by saying that these platforms “…cover a staggering amount of content, and in my experience, there still can be a lot of junk to wade through.”

If you are interested in viewing or downloading the report, click here.

How law firms and corporations gain the knowledge advantage

Today’s markets are competitive across every sector, in business and law, and information drives every decision, plan and strategy. And need I say more than “information overload” to represent the challenges of managing the intense pace and growth of online content?

That’s why law firms and corporations realize that it’s no longer enough to rely so heavily on the surface web, or to have all users, especially those not proficient in content discovery and source reliability, do all their own research.

Too much information or lacking access to the most reliable, timely information are both obstacles facing every law firm and corporation. Yet finding ways to break through these obstacles is critical, because information drives every decision, plan, strategy and activity.

Think of the many needs for reliable, steady information:

  • Tracking and understanding competitors and clients, established and new.
  • Finding new business opportunities.
  • Finding ideas for blog posts and thought leadership articles.
  • Fostering knowledge across the organization.

I could go on, but I think you get the point.

The day may come when Artificial Intelligence does everything and all of us will be moving on to bigger, better things – I’ll save that discussion for another day. Meanwhile, I believe large law firms and corporations need the assistance of people with research training and experience, along with technology like news/content aggregators and current awareness platforms. It’s the one-two punch for efficient operations and competing successfully in today’s markets.

Once again, here are those links:

Deep Web, Deep Insights webinar

Nothing but the relevant content

John DiGilio

John DiGilio

John DiGilio is the Senior Director of Research & Intelligence at LibSource. He has written for numerous regional and national publications as well as taught college and graduate courses in such topics as business ethics, e-commerce, fair employment practices, research methodology and business law.
John DiGilio
John DiGilio

Latest posts by John DiGilio (see all)