How to compute forward citations in PATSTAT

patstat
data-science
Published

January 31, 2024

There are two types of patent forward citations in patstat: patent applications and patent publications (that’s why there are 2 steps below).

I didn’t put any conditions in both queries because according to PATSTAT data catalog (page 57), when a publication is not cited CITED_PAT_PUBLN_ID = 0, and when an application is not cited CITED_APPLN_ID = 0.

I start from the premise that I want to compute citations just for a set of patent applications, stored in a table called applnids_w_uk_inv with the following columns: appln_id, docdb_family_id, inpadoc_family_id, appln_filing_year, earliest_filing_year.

Here, I use SQL queries to compute citations to show the logic, but it is easily adaptable to any other language: R, Python, etc…

List of forward citations

As patent applications

We first create a table with the citations.

CREATE TABLE citations AS
SELECT
  a.appln_id,
  a.docdb_family_id,
  a.inpadoc_family_id,
  a.appln_filing_year,
  a.earliest_filing_year,
  t212.pat_publn_id as `citing_pat_publn_id`,
  t211.appln_id as `citing_appln_id`,
  t201.docdb_family_id as `citing_docdb_family_id`,
  t201.inpadoc_family_id as `citing_inpadoc_family_id`,
  t201.appln_filing_year as `citing_appln_filing_year`,
  t201.earliest_filing_year as `citing_earliest_filing_year`
FROM applnids_w_uk_inv a
JOIN tls212_citation t212 ON a.appln_id = t212.cited_appln_id
JOIN tls211_pat_publn t211 ON t212.pat_publn_id = t211.pat_publn_id
JOIN tls201_appln t201 ON t211.appln_id = t201.appln_id;

As patent publications

Our list of patent are cited as patent publications, we add them to the first results.

INSERT INTO TABLE citations
SELECT
  a.appln_id,
  a.docdb_family_id,
  a.inpadoc_family_id,
  a.appln_filing_year,
  a.earliest_filing_year,
  t212.pat_publn_id as `citing_pat_publn_id`,
  t211b.appln_id as `citing_appln_id`,
  t201.docdb_family_id as `citing_docdb_family_id`,
  t201.inpadoc_family_id as `citing_inpadoc_family_id`,
  t201.appln_filing_year as `citing_appln_filing_year`,
  t201.earliest_filing_year as `citing_earliest_filing_year`
FROM applnids_w_uk_inv a
JOIN tls211_pat_publn t211 ON t211.appln_id = a.appln_id
JOIN tls212_citation t212 ON t211.pat_publn_id = t212.cited_pat_publn_id
JOIN tls211_pat_publn t211b ON t212.pat_publn_id = t211b.pat_publn_id
JOIN tls201_appln t201 ON t211b.appln_id = t201.appln_id;

We remove duplicates if any.

CREATE TABLE citations2
SELECT DISTINCT * FROM citations;

Compute the number of forward citations

There are a lot of different forward citations: patent applications citing patent applications, patent families, in a window of 3 years, 5 years, etc…

Here is a generic query to illustrate how to compute citations. we do it in two steps: first we identify the citations that match the criterias (1 if it matches, 0 otherwise), then we sum them.

CREATE TABLE citations3
SELECT
  *,
  IF(citing_appln_filing_year - appln_filing_year <= 3, 1, 0) AS cit.appln.3yrs,
  IF(citing_inpadoc_filing_year - inpadoc_filing_year <= 3, 1, 0) AS cit.indapoc.3yrs,
  IF(citing_appln_filing_year - appln_filing_year <= 5, 1, 0) AS cit.appln.5yrs,
  ....
FROM citations2;

-- Compute the Ncitations (by appln_id, inpadoc, etc)
-- Example for the appln ids
SELECT
  appln_id,
  SUM(cit.appln.3yrs) AS Ncit.appln.3yrs,
  SUM(cit.appln.5yrs) AS Ncit.appln.5yrs,
  ...
FROM citations3
GROUP BY appln_id;