How It Works

Overview

All the Patents uses a four-step process to generate exhaustive prior art from existing patent data, leveraging LLM-backed vector embeddings for semantic analysis.

Step 1: Data Collection

The project begins by gathering all patents that have ever been filed. This creates a comprehensive dataset of existing intellectual property – the raw material from which all combinations will be generated.

Step 2: Claims Extraction

Individual claims are parsed from each patent, isolating the specific inventive elements that define each patent’s scope. Patent claims are the legal boundaries of what the patent protects, making them the fundamental unit for combination analysis.

Step 3: Vector Space Analysis

Using LLM-backed vector embeddings, all extracted claims are placed into a high-dimensional vector space. Claims are then clustered by semantic similarity, revealing relationships between patents across different fields, time periods, and jurisdictions.

This approach allows the system to understand not just the text of each claim, but its meaning and relationship to every other claim in the dataset.

Step 4: Combinatorial Generation

The final step generates every possible combination of existing patent claims. These combinations are established as prior art, creating a comprehensive defensive dataset.

“If anyone in the future tries to recombine an existing claim into a new thing, they can point to our thing as prior art, to be able to say ‘No, no, no… they already did that. You can’t do that again, because they did that as prior art.’”
– Damien Riehl

Technical Infrastructure

Vector Embeddings: LLM-backed embeddings for semantic patent claim analysis
Platform: Originally hosted on Kelvin Legal Data OS (by 273 Ventures)
Frontend: Streamlit (Python web framework for data applications)