bg-image

Hammer-and-ScrewPart 2 of 3: Why Semantic Searching Fails for FTO

This three-part series explains why conventional techniques, particularly “semantics-based” searching, fall short for freedom-to-operate (FTO) searching and analysis.  It then puts forth a solution for avoiding these problems. Part I was an introduction to the differences between the searches. Part II identifies the deficiencies of semantic searching in relation to FTO analysis. Part III explains how these deficiencies can be overcome. Click here to download a PDF of the entire series.

 

I.  How Semantic Search Platforms Work

There are countless patent searching software platforms available. Each has unique features, but broad commonalities exist. Available platforms tend to offer some combination of natural language, Boolean, classification and semantic searching. Semantic searching is the primary focus of this discussion, as it is the most evolved.

Semantic patent searching generally refers to automatically enhancing a text-based query to better represent its underlying meaning, thereby better identifying conceptually related references. This process generally includes: (1) supplementing terms of a text-based query with their synonyms; and (2) assessing the proximity of resulting patents to the determined underlying meaning of the text-based query. Semantic platforms are often touted as critical add-ons to natural language searching. They are said to account for discrepancies in word form and lexicography between the text of queries and patent disclosures.

Based on this, it would seem that semantic searching is powerful and effective. Well, it is…  for some types of searches (e.g., patentability or invalidity searches). However, it is surprisingly ineffective for FTO. And this has everything to do with the distinctiveness of FTO as discussed in Part 1.

II.  The Effect of Semantic Platforms on FTO

Semantic platforms, by their nature, assume a certain paradigm. They purport to interpolate the underlying meaning of a text-based query. This is great in cases where an analyst knows which technical concepts are relevant. For example, in a patentability or invalidity search, the analyst has a specific claim under review with specifically-recited elements. FTO searches do not fit this paradigm.

Consider the distinctions discussed in Part 1 of this series:

(1) In FTO, relevance of patent results is determined by claim scope, not description. The technical aspects described by a patent’s disclosure are distinct from its claims.

In a patentability search, the semantic platform will return precisely what the searcher desires – patents describing the subject concept of the query.

For FTO, the platform will not. Some patents describing a product feature under review may contain claims covering such feature. However, the vast majority will not. The claims will instead be drawn narrower by requiring additional aspects and specificity. Accordingly, semantic engines necessarily output a high proportion of non-relevant patents (i.e., they are “noisy”).

The reverse scenario is also problematic. Many patents will exist that do not describe a specific product feature, yet will have claims sufficiently broad to cover the feature. Semantic engines will rarely identify these types of patents. Even if identified, they are likely to be assigned a low relevancy rank given their much broader scope. This makes sense in a patentability search, but not in an FTO context.

For this reason, semantic platforms suffer two deficiencies at opposite ends of the spectrum: (1) they are under-inclusive as they are prone to missing relevant broad patents; and (2) they are over-inclusive due to their noisiness with respect to patents with narrow or otherwise non-relevant claims.

(2) Products tell a thousand stories. Products, due to their physical existence, can be described in thousands of ways. Each way could be a basis for infringement. Patentability searching, instead, is more discrete.

Semantic search tools force analysts to play an arbitrary game of “guess the element.” They require that analysts examine features of a product and pick out just the right ones worthy of review. Even for experienced analysts, this exercise is more conjury than skill. It is simply impossible to accurately predict which aspects of a product are likely to be the basis of infringement in an FTO analysis.

In practical terms, semantic platforms unduly force analysts to pit accuracy against timeliness. If an analyst is selective, many relevant references will inevitably be missed. If, on the other hand, the analyst is cautious and queries many product features, the results will be unworkably noisy.

 (3) Missing patents in an FTO search could be dire. Finding relevant patents in an FTO search is no indication whether additional relevant patents exist. An entire technology space must be cleared. In patentability searching, producing a few close results is more acceptable.

Because of points (1) and (2) above, semantic-based results are likely to contain a large number of patents, perhaps ranked by purported relevance. In a patentability search, an analyst may be comfortable reviewing only the first tier of patent references (e.g., the top one-hundred or so). However, the purpose of FTO is to assess and minimize liability risk. Reviewing only the first arbitrary tier of references would undermine this mission. FTO is not concerned with which patents most predictably cover a product; FTO means ensuring that no patents cover the product.

III.  Summing Up Semantics

Conducting FTO searches using semantic platforms produces noisy results that are also prone to significant omission of relevant patents. This presents the analyst with a dilemma. The analyst must choose between: (1) reviewing a compact set of references that is likely incomplete; or (2) reviewing a comprehensive set of references that likely contains a significant amount of noise.

If interested in whether these findings relate to you, perform a simple test. Dig up your last comprehensive FTO search. Review the patent references that you ultimately deemed relevant. Do they generally fall within the same patent classes (as opposed to being scattered over the classification map)? Do they all pertain to a predictable technical feature (as opposed to relating to the product in unexpected ways)? Do you believe they could have all been retrieved using just a few keywords? If your responses are generally “no,” then your experience is quite typical. If your responses are generally “yes,” you’ve experienced a surprising amount of luck. I suggest buying a lottery ticket.

The illustration below shows how semantic search platforms handle patentability and FTO searches differently in terms of accuracy and cost (“cost” essentially being a proxy measure for work time). A high proportion of missed references results in an inaccurate search. A high proportion of noise results in a costly search. The darker shaded regions represent where industry cases typically fall.

semantic graphics

The point here is that semantic platforms can deliver effective results for patentability searches at a reasonable cost but, when it comes to FTO searching, the effectiveness of the platforms is limited even at great cost.

This all leads to the question of whether FTO searches are innately high-cost/low-accuracy processes or if we are just not handling them correctly. Many in the patent industry seem resigned to the belief that improving FTO is a futile endeavor. This point-of-view is understandable but incorrect. FTO can be made accurate and low-cost. It just takes a fresh approach.

More on streamlining FTO in Part 3: What You Should be Doing Instead.

bg-image