How to compute forward citations in PATSTAT
There are two types of patent forward citations in patstat: patent applications and patent publications (that’s why there are 2 steps below).
I didn’t put any conditions in both queries because according to PATSTAT data catalog (page 57), when a publication is not cited CITED_PAT_PUBLN_ID = 0, and when an application is not cited CITED_APPLN_ID = 0.
I start from the premise that I want to compute citations just for a set of patent applications, stored in a table called applnids_w_uk_inv
with the following columns: appln_id, docdb_family_id, inpadoc_family_id, appln_filing_year, earliest_filing_year.
Here, I use SQL queries to compute citations to show the logic, but it is easily adaptable to any other language: R, Python, etc…
List of forward citations
As patent applications
We first create a table with the citations.
CREATE TABLE citations AS
SELECT
a.appln_id,
a.docdb_family_id,
a.inpadoc_family_id,
a.appln_filing_year,
a.earliest_filing_year,as `citing_pat_publn_id`,
t212.pat_publn_id as `citing_appln_id`,
t211.appln_id as `citing_docdb_family_id`,
t201.docdb_family_id as `citing_inpadoc_family_id`,
t201.inpadoc_family_id as `citing_appln_filing_year`,
t201.appln_filing_year as `citing_earliest_filing_year`
t201.earliest_filing_year FROM applnids_w_uk_inv a
JOIN tls212_citation t212 ON a.appln_id = t212.cited_appln_id
JOIN tls211_pat_publn t211 ON t212.pat_publn_id = t211.pat_publn_id
JOIN tls201_appln t201 ON t211.appln_id = t201.appln_id;
As patent publications
Our list of patent are cited as patent publications, we add them to the first results.
INSERT INTO TABLE citations
SELECT
a.appln_id,
a.docdb_family_id,
a.inpadoc_family_id,
a.appln_filing_year,
a.earliest_filing_year,as `citing_pat_publn_id`,
t212.pat_publn_id as `citing_appln_id`,
t211b.appln_id as `citing_docdb_family_id`,
t201.docdb_family_id as `citing_inpadoc_family_id`,
t201.inpadoc_family_id as `citing_appln_filing_year`,
t201.appln_filing_year as `citing_earliest_filing_year`
t201.earliest_filing_year FROM applnids_w_uk_inv a
JOIN tls211_pat_publn t211 ON t211.appln_id = a.appln_id
JOIN tls212_citation t212 ON t211.pat_publn_id = t212.cited_pat_publn_id
JOIN tls211_pat_publn t211b ON t212.pat_publn_id = t211b.pat_publn_id
JOIN tls201_appln t201 ON t211b.appln_id = t201.appln_id;
We remove duplicates if any.
CREATE TABLE citations2
SELECT DISTINCT * FROM citations;
Compute the number of forward citations
There are a lot of different forward citations: patent applications citing patent applications, patent families, in a window of 3 years, 5 years, etc…
Here is a generic query to illustrate how to compute citations. we do it in two steps: first we identify the citations that match the criterias (1 if it matches, 0 otherwise), then we sum them.
CREATE TABLE citations3
SELECT
*,
IF(citing_appln_filing_year - appln_filing_year <= 3, 1, 0) AS cit.appln.3yrs,
IF(citing_inpadoc_filing_year - inpadoc_filing_year <= 3, 1, 0) AS cit.indapoc.3yrs,
IF(citing_appln_filing_year - appln_filing_year <= 5, 1, 0) AS cit.appln.5yrs,
....
FROM citations2;
-- Compute the Ncitations (by appln_id, inpadoc, etc)
-- Example for the appln ids
SELECT
appln_id,SUM(cit.appln.3yrs) AS Ncit.appln.3yrs,
SUM(cit.appln.5yrs) AS Ncit.appln.5yrs,
...
FROM citations3
GROUP BY appln_id;