Skip to content

DynamoDB tables

paulalbert1 edited this page Nov 16, 2018 · 2 revisions

Introduction

DynamoDB is Amazon Web Services' NoSQL solution for storing data and providing data. DynamoDB has the virtue of being very fast. One of its limitations though is that you can only store objects that are no larger than 400 KB.

This article walks through the different DynamoDB tables that ReCiter uses and how. We can group the tables into three categories.

Tables a ReCiter administrator needs to maintain

Identity

The Identity table contains everything we know and can use about our target person including name(s), email, departmental affiliations, organizational affiliations, known relationships, grant identifiers, years of bachelor and doctoral degree, and person type designations.

These data are used to:

  • Construct the searches necessary for retrieval
  • Designate one author per article as a targetAuthor
  • Score the evidence

Having as much information as possible is key to returning highly accurate scores. Note that in the interests of concision, the below sample data omits ~50 known relationships including co-investigators, managers, mentees, and HR relationships; for longstanding Weill Cornell faculty, it is common that we are able to have that many potential co-authors.

Sample data:

{
  "identity": {
    "alternateNames": [
      {
        "firstInitial": "J",
        "firstName": "Jochen",
        "lastName": "Buck"
      }
    ],
    "degreeYear": {
      "bachelorYear": 0,
      "doctoralYear": 1983
    },
    "emails": [
      "jobuck@med.cornell.edu"
    ],
    "grants": [
      "TR000457",
      "HD040560",
      "HD059913",
      "AI057158",
      "NS055255",
      "GM073546",
      "GM107442",
      "Y025810",
      "CA062948",
      "HD038722",
      "CA29502",
      "CA029502",
      "AI064842",
      "GM062328"
    ],
    "institutions": [
      "Eberhard-Karls University Faculty of Medicine (Germany)",
      "Eberhard-Karls University (Germany)",
      "Weill Cornell Medical College, Cornell University"
    ],
    "knownRelationships": [
      {
        "name": {
          "firstInitial": "K",
          "firstName": "Kerry",
          "lastName": "Purtell"
        },
        "type": "CO_INVESTIGATOR",
        "uid": "kep2002"
      },
      {
        "name": {
          "firstInitial": "N",
          "firstName": "Neil",
          "lastName": "Harrison"
        },
        "type": "CO_INVESTIGATOR",
        "uid": "neh2001"
      },
      {
        "name": {
          "firstInitial": "L",
          "firstName": "Lavoisier",
          "lastName": "Ramos"
        },
        "type": "CO_INVESTIGATOR",
        "uid": "lar2018"
      },
    ],
    "organizationalUnits": [
      {
        "organizationalUnitLabel": "Jochen Buck and Lonny R Levin Lab",
        "organizationalUnitType": "DEPARTMENT"
      },
      {
        "organizationalUnitLabel": "Pharmacology",
        "organizationalUnitType": "DEPARTMENT"
      }
    ],
    "personTypes": [
      "academic",
      "academic-faculty",
      "academic-faculty-weillfulltime",
      "academic-faculty-fullprofessor",
      "employee-academic",
      "employee-exempt",
      "employee"
    ],
    "primaryName": {
      "firstInitial": "J",
      "firstName": "Jochen",
      "lastName": "Buck"
    },
    "pubMedAlias": [],
    "title": "Professor of Pharmacology",
    "uid": "jobuck"
  },
  "uid": "jobuck"
}

GoldStandard

The GoldStandard table is used to track:

  • PMIDs that have been accepted on behalf of the target individual
  • PMIDs that have been rejected on behalf of the target individual
  • An audit log tracking who accepted which articles, who rejected which articles, who undid an accept or reject action, and when those actions occurred. Note that these actions can be taken by a person or a system.

The GoldStandardRetrievalStrategy will return all the PMIDs for a user from this table. This ensures that PMIDs are retrieved and scored in cases where the lastName firstInitial falls short. For example, a user's surname is not properly indexed in PubMed or the author is an author but only as a part of a collaboration group.

Sample data for user who has a uid=paa2013:

{
  "auditLog": [
    {
      "action": "ACCEPTED",
      "dateTime": "2018-11-09T19:34:30.476Z",
      "pmids": [
        19393200
      ],
      "uid": "mcc2001",
      "userVerbose": "Michael Carter"
    },
    {
      "action": "REJECTED",
      "dateTime": "2018-11-09T21:34:30.476Z",
      "pmids": [
        1231231
      ],
      "uid": "paa2013",
      "userVerbose": "Paul J Albert"
    },

    {
      "action": "ACCEPTED",
      "dateTime": "2018-11-06T22:27:58.008Z",
      "pmids": [
        23370376,
        19758226,
        19758159,
        21278764,
        23576059,
        19393200,
        19758177,
        19393196,
        24882717,
        24694772
      ],
      "uid": "reciter-inst-client",
      "userVerbose": "Institutional Client"
    }
  ],
  "knownpmids": [
    23370376,
    19758226,
    19758159,
    21278764,
    23576059,
    19393200,
    19758177,
    19393196,
    24882717,
    24694772
  ],
  "rejectedpmids": [
    1231231
  ],  
  "uid": "paa2013"
}

Tables a ReCiter administrator can update if need be

The following tables come pre-populated with data that are seeded by Weill Cornell but are relevant to all institutions. As we'll discuss, there may be occasions where you will want to modify them.

InstitutionAfid

For sites that have Scopus set up, the ReCiter application uses the InstitutionAfid table. These data are used for clustering and for scoring individual articles, both for targetAuthor and non-targetAuthors.

This table maps Scopus Institution Identifiers to strings where the string is a primary key, and they're can be multiple IDs associated with a single string. These strings correspond to names used in various Weill Cornell identity systems. Weill Cornell has 276,666 institutions in our Identity table. These represent 3,861 unique institutions. We've looked up the Scopus Institution IDs for the 1,786 institutions that are most often cited as being a faculty's current or historical affiliation. This collectively represents 273,006 affiliations. In other words, ~99% of the time we can predict what a Scopus Institution ID could be. Note that Scopus makes it fair share of splitting errors: a given institution such as Weill Cornell might have multiple institution IDs.

If you notice that a target person has an affiliation in the Identity table and in the article affiliation table, but they aren't matching because they are different strings, you can manually create a new record in DynamoDB tying the ID to the string.

Sample data:

{
  "afids": [
    "60006404",
    "60000305"
  ],
  "institution": "Case Western Reserve University School of Law"
}

MeshTerm

These data are relatively static. It contains a list of all MeSH terms, and the number of results returned when you do a search with each term in PubMed as a MeSH major search (for example). These data are used at two different points during clustering.

Sample:

{
  "count": 24413,
  "mesh": "Attention Deficit Disorder with Hyperactivity"
}

ScienceMetrix

Science Metrix is a research outfit that offers a Creative Commons-licensed journal classification system, last updated in 2016, that maps some 15,000 journals to one of 176 topical "subfields." This is useful for our purposes for several reasons:

  1. Broad coverage: 15,000 exceeds the number of journals that, say, Journal Citation Reports offers by a factor of 3.
  2. The defined subfields correspond to well-known medical specialties, which make it a good fit for PubMed. For example: Orthopedics, Otorhinolaryngology, Pathology, Pediatrics, Pharmacology & Pharmacy, and Psychiatry.
  3. Also, the journals are mapped to one and only one field, so we don't have to worry about splitting fields.

We have found that all but the most obscure journals in PubMed have a record in ScienceMetrix. Unless you have people publishing a great deal in venues not included by ScienceMetrix analysis, there is little reason to manually update these data.

Sample record:

{
  "issn": "0163-8343",
  "publicationName": "General Hospital Psychiatry",
  "scienceMatrixSubfieldId": "123",
  "scienceMetrixDomain": "Health Sciences",
  "scienceMetrixField": "Clinical Medicine",
  "scienceMetrixSubfield": "Psychiatry",
  "smsid": 106634
}

ScienceMetrixDepartmentCategory

As described in the relevant section in How ReCiter Works, this table contains a mapping between a ScienceMetrix journal subfield identifier and department string.

This table matches departments (which are stored as strings) to ScienceMetrix subfields (which are stored as strings but the application matches on SubfieldID). The number logOddsRatio is a numerical indication speaking to the strength of relationship between a department or other organization unit to a journal subfield.

Odds ratio is a statistical term where a number of 1 means that an event occurs randomly. Higher numbers indicate increasingly significant and positively correlated relationships. This number is a log. The strongest relationship (the one between authors in the Library department and journals in Information & Library Sciences) had an odds ratio of > 1,100. As a first pass, we only included department-journal category relationships if such a relationship had an odds ratio (as opposed to a log of the odds ratio) of > 6.

If your institution has an org unit that is very similar to the one listed here (e.g., suppose you have a department of Oral Surgery and Weill Cornell has Oral and Maxillofacial Surgery), we recommend duplicating that row and substituting the name of the existing department for your department. Try to make sure that all your institutions' org units that correspond to major specialties are represented in this table.

You can also compute these values yourself using these variables:

  • primaryDepartmentCount - number of articles written by members of a department
  • subfieldCount - number of articles written in a particular subfield as defined by ScienceMetrix
  • countFieldDepartment - intersection of primaryDepartmentCount and subfieldCount where this number is 9 or greater
  • totalArticles - total count of articles in the corpus

Here is the formula:

  • logOddsRatio = log((countFieldDepartment * totalArticles) / (primaryDepartmentCount * subfieldCount))

Going forward, we hope to use additional instutions' data to increase the number of organizational units and corpus of publicaitons used for this approach.

For more on how these data are used, see here

Sample record:

{
  "logOddsRatio": 3.52,
  "pk": 154,
  "primaryDepartment": "Sports Medicine",
  "scienceMetrixJournalSubfield": "Sport Sciences",
  "scienceMetrixJournalSubfieldId": 125
}

Tables the application independently updates and maintains

ESearchResult

The ESearchResult table contains a list of PMIDs that come as a result of different searches.

Sample data:

{
  "esearchpmids": [
    {
      "pmids": [
        24786648,
        22874398,
        23578816,
        15360861,
        20478738,
        17238381,
        18999229,
        22465355,
        30009991,
        26911828
      ],
      "retrievalDate": "2018-09-17T13:12:01.962Z",
      "retrievalStrategyName": "GoldStandardRetrievalStrategy"
    },
    {
      "pmids": [
        12769474,
        22456661,
        10266777,
        24373624,
        11605838
      ],
      "retrievalDate": "2018-09-17T13:12:28.447Z",
      "retrievalStrategyName": "FirstNameInitialRetrievalStrategy"
    }
  ],
  "retrievalDate": "2018-09-17T13:12:28.469Z",
  "uid": "ccole"
}

PubMedArticle

This table stores data from PubMed-based XML as JSON. The only logic here is the design decision which fields to include and which not to include. Note that these data are stored without reference to the targetAuthor, so a given record can be re-used by multiple targetAuthors.

Sample record:

{
  "pmid": 449710,
  "pubmedarticle": {
    "medlinecitation": {
      "article": {
        "articletitle": "The response of bone apposition rate to some nonphysiologic conditions.",
        "authorlist": [
          {
            "forename": "C S",
            "initials": "CS",
            "lastname": "Tam"
          },
          {
            "forename": "B",
            "initials": "B",
            "lastname": "Cruickshank"
          },
          {
            "forename": "D R",
            "initials": "DR",
            "lastname": "Swinson"
          },
          {
            "forename": "W",
            "initials": "W",
            "lastname": "Anderson"
          },
          {
            "forename": "H A",
            "initials": "HA",
            "lastname": "Little"
          }
        ],
        "journal": {
          "isoAbbreviation": "Metab. Clin. Exp.",
          "issn": [
            {
              "issn": "0026-0495",
              "issntype": "Print"
            },
            {
              "issn": "0026-0495",
              "issntype": "Linking"
            }
          ],
          "journalissue": {
            "issue": "7",
            "pubdate": {
              "month": "07",
              "year": "1979"
            },
            "volume": "28"
          },
          "title": "Metabolism: clinical and experimental"
        },
        "pagination": {
          "medlinepgns": [
            "751-5"
          ]
        }
      },
      "medlinecitationpmid": {
        "pmid": 449710
      },
      "meshheadinglist": [
        {
          "descriptorname": {
            "descriptorname": "Animals",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": []
        },
        {
          "descriptorname": {
            "descriptorname": "Bone Development",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": [
            {
              "majortopicyn": {
                "val": "Y"
              },
              "qualifiername": "drug effects"
            }
          ]
        },
        {
          "descriptorname": {
            "descriptorname": "Bone and Bones",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": [
            {
              "majortopicyn": {
                "val": "N"
              },
              "qualifiername": "drug effects"
            },
            {
              "majortopicyn": {
                "val": "Y"
              },
              "qualifiername": "physiology"
            }
          ]
        },
        {
          "descriptorname": {
            "descriptorname": "Hydrocortisone",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": [
            {
              "majortopicyn": {
                "val": "Y"
              },
              "qualifiername": "pharmacology"
            }
          ]
        },
        {
          "descriptorname": {
            "descriptorname": "Male",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": []
        },
        {
          "descriptorname": {
            "descriptorname": "Organ Specificity",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": []
        },
        {
          "descriptorname": {
            "descriptorname": "Rabbits",
            "majortopicyn": {
              "val": "N"
            }
          },
          "qualifiernamelist": []
        }
      ]
    },
    "pubmeddata": {
      "history": {
        "pubmedPubDate": [
          {
            "pubMedPubDate": {
              "day": "1",
              "month": "7",
              "year": "1979"
            },
            "pubStatus": "pubmed"
          },
          {
            "pubMedPubDate": {
              "day": "1",
              "month": "7",
              "year": "1979"
            },
            "pubStatus": "medline"
          },
          {
            "pubMedPubDate": {
              "day": "1",
              "month": "7",
              "year": "1979"
            },
            "pubStatus": "entrez"
          }
        ]
      }
    }
  }
}

ScopusArticle

This table stores records from Scopus in which ReCiter was able to match a PubMed record to a Scopus record. Matching occurs by looking for PMID and if that's not available, then DOI. The primary key for this table is PMID.

These data are used for affiliation statement and retrieving a more complete name.

The Scopus data has several bugs which ReCiter attempts to manage:

  • Some Scopus Institution IDs have an identifier of 0.
  • Sometimes the same author with the same sequence number is listed more than once.
  • Scopus also has duplicate records where one publication from PubMed has two records in Scopus, one with the matching PMID and a second with a matching DOI. But this should not come in to play. We've noticed this is true for around 2% of records in Scopus.

Sample record:

{
  "id": "10393308",
  "scopusarticle": {
    "affiliations": [
      {
        "affiliationCity": "Cambridge",
        "affiliationCountry": "United States",
        "affilname": "Massachusetts Institute of Technology",
        "afid": 60022195
      }
    ],
    "authors": [
      {
        "afids": [
          60022195
        ],
        "authid": 7203077959,
        "authname": "Schwartz T.",
        "givenName": "Thomas",
        "initials": "T.",
        "seq": 1,
        "surname": "Schwartz"
      },
      {
        "afids": [
          60022195
        ],
        "authid": 8769503600,
        "authname": "Shafer K.",
        "givenName": "Karen",
        "initials": "K.",
        "seq": 2,
        "surname": "Shafer"
      },
      {
        "afids": [
          60022195
        ],
        "authid": 7003903924,
        "authname": "Lowenhaupt K.",
        "givenName": "Ky",
        "initials": "K.",
        "seq": 3,
        "surname": "Lowenhaupt"
      },
      {
        "afids": [
          60022195
        ],
        "authid": 7004521236,
        "authname": "Hanlon E.",
        "givenName": "Eugene",
        "initials": "E.",
        "seq": 4,
        "surname": "Hanlon"
      },
      {
        "afids": [
          60022195
        ],
        "authid": 7103273576,
        "authname": "Herbert A.",
        "givenName": "Alan",
        "initials": "A.",
        "seq": 5,
        "surname": "Herbert"
      },
      {
        "afids": [
          60022195
        ],
        "authid": 57203048538,
        "authname": "Rich A.",
        "givenName": "Alexander",
        "initials": "A.",
        "seq": 6,
        "surname": "Rich"
      }
    ],
    "doi": "10.1107/S090744499900582X",
    "pubmedId": 10393308
  }
}

Analysis

The Analysis table has person-specific records containing:

  • Suggested articles including article metadata, the evidence, the scors for that evidence, and the overall score per article
  • Precision, recall, and overall accuracy
  • A flag - usingS3 - for whether the data for a given individual are stored in the Analysis table or S3. S3 is used in cases where the record in the Analysis table would exceed 400kb.

Sample record:

{
  "personIdentifier": "paa2013",
  "dateAdded": "2018-11-16T21:47:29.483+0000",
  "dateUpdated": "2018-11-16T21:47:29.483+0000",
  "mode": "FOR_TESTING_ONLY",
  "overallAccuracy": 0.9811612903225806,
  "precision": 0.9603225806451613,
  "recall": 1,
  "countSuggestedArticles": 31,
  "reCiterArticleFeatures": [
    {
      "pmid": 19393200,
      "totalArticleScoreStandardized": 10,
      "totalArticleScoreNonStandardized": 28.65,
      "userAssertion": "ACCEPTED",
      "publicationDateDisplay": "2009 Feb 12",
      "publicationDateStandardized": "2009-02-12",
      "datePublicationAddedToEntrez": "2009-04-28",
      "journalTitleVerbose": "Autoimmunity reviews",
      "issn": [
        {
          "issntype": "Electronic",
          "issn": "1873-0183"
        },
        {
          "issntype": "Linking",
          "issn": "1568-9972"
        }
      ],
      "journalTitleISOabbreviation": "Autoimmun Rev",
      "articleTitle": "Vitamin D: the alternative hypothesis.",
      "reCiterArticleAuthorFeatures": [
        {
          "rank": 1,
          "lastName": "Albert",
          "firstName": "Paul J",
          "initials": "P",
          "targetAuthor": true
        },
        {
          "rank": 2,
          "lastName": "Proal",
          "firstName": "Amy D",
          "initials": "A",
          "targetAuthor": false
        },
        {
          "rank": 3,
          "lastName": "Marshall",
          "firstName": "Trevor G",
          "initials": "T",
          "targetAuthor": false
        }
      ],
      "volume": "8",
      "issue": "8",
      "pages": "639-44",
      "doi": "10.1016/j.autrev.2009.02.011",
      "evidence": {
        "authorNameEvidence": {
          "institutionalAuthorName": {
            "firstName": "Paul",
            "firstInitial": "P",
            "middleName": "J",
            "middleInitial": "J",
            "lastName": "Albert"
          },
          "articleAuthorName": {
            "firstName": "PaulJ",
            "firstInitial": "P",
            "lastName": "Albert"
          },
          "nameScoreTotal": 7.7,
          "nameMatchFirstType": "full-exact",
          "nameMatchFirstScore": 4.2,
          "nameMatchMiddleType": "exact-singleInitial",
          "nameMatchMiddleScore": 1.5,
          "nameMatchLastType": "full-exact",
          "nameMatchLastScore": 2,
          "nameMatchModifierScore": 0
        },
        "emailEvidence": {
          "emailMatch": "palbert1@gmail.com",
          "emailMatchScore": 40
        },
        "journalCategoryEvidence": {
          "journalSubfieldScienceMetrixLabel": "Immunology",
          "journalSubfieldScienceMetrixID": 111,
          "journalSubfieldDepartment": "NO_MATCH",
          "journalSubfieldScore": -1
        },
        "affiliationEvidence": {
          "scopusTargetAuthorAffiliation": [
            {
              "targetAuthorInstitutionalAffiliationSource": "SCOPUS",
              "targetAuthorInstitutionalAffiliationIdentity": "Weill Cornell Medical College",
              "targetAuthorInstitutionalAffiliationArticleScopusLabel": "Weill Cornell Medical College",
              "targetAuthorInstitutionalAffiliationArticleScopusAffiliationId": 60007997,
              "targetAuthorInstitutionalAffiliationMatchType": "POSITIVE_MATCH_INDIVIDUAL",
              "targetAuthorInstitutionalAffiliationMatchTypeScore": 3
            }
          ],
          "pubmedTargetAuthorAffiliation": {
            "targetAuthorInstitutionalAffiliationArticlePubmedLabel": "Weill Cornell Medical College, New York, NY 10065, USA. paa2013@med.cornell.edu",
            "targetAuthorInstitutionalAffiliationMatchTypeScore": 0
          }
        },
        "educationYearEvidence": {
          "identityBachelorYear": 1997,
          "articleYear": 2009,
          "discrepancyDegreeYearBachelor": 12,
          "discrepancyDegreeYearBachelorScore": 0,
          "discrepancyDegreeYearDoctoralScore": 0
        },
        "personTypeEvidence": {
          "personType": "academic-faculty-weillfulltime",
          "personTypeScore": 2
        },
        "articleCountEvidence": {
          "countArticlesRetrieved": 663,
          "articleCountScore": 0.3425
        },
        "averageClusteringEvidence": {
          "totalArticleScoreWithoutClustering": 52.04,
          "clusterScoreAverage": 16.6,
          "clusterReliabilityScore": 1,
          "clusterScoreModificationOfTotalScore": -23.39
        }
      }
    },
    {
      "pmid": 24694772,
      "totalArticleScoreStandardized": 10,
      "totalArticleScoreNonStandardized": 23.4,
      "userAssertion": "ACCEPTED",
      "publicationDateDisplay": "2014 Mar 30",
      "publicationDateStandardized": "2014-03-30",
      "datePublicationAddedToEntrez": "2014-04-04",
      "journalTitleVerbose": "Journal of biomedical informatics",
      "issn": [
        {
          "issntype": "Electronic",
          "issn": "1532-0480"
        },
        {
          "issntype": "Linking",
          "issn": "1532-0464"
        }
      ],
      "journalTitleISOabbreviation": "J Biomed Inform",
      "articleTitle": "Automatic generation of investigator bibliographies for institutional research networking systems.",
      "reCiterArticleAuthorFeatures": [
        {
          "rank": 1,
          "lastName": "Johnson",
          "firstName": "Stephen B",
          "initials": "S",
          "targetAuthor": false
        },
        {
          "rank": 2,
          "lastName": "Bales",
          "firstName": "Michael E",
          "initials": "M",
          "targetAuthor": false
        },
        {
          "rank": 3,
          "lastName": "Dine",
          "firstName": "Daniel",
          "initials": "D",
          "targetAuthor": false
        },
        {
          "rank": 4,
          "lastName": "Bakken",
          "firstName": "Suzanne",
          "initials": "S",
          "targetAuthor": false
        },
        {
          "rank": 5,
          "lastName": "Albert",
          "firstName": "Paul J",
          "initials": "P",
          "targetAuthor": true
        },
        {
          "rank": 6,
          "lastName": "Weng",
          "firstName": "Chunhua",
          "initials": "C",
          "targetAuthor": false
        }
      ],
      "volume": "51",
      "pages": "8-14",
      "doi": "10.1016/j.jbi.2014.03.013",
      "evidence": {
        "authorNameEvidence": {
          "institutionalAuthorName": {
            "firstName": "Paul",
            "firstInitial": "P",
            "middleName": "J",
            "middleInitial": "J",
            "lastName": "Albert"
          },
          "articleAuthorName": {
            "firstName": "PaulJ",
            "firstInitial": "P",
            "lastName": "Albert"
          },
          "nameScoreTotal": 7.7,
          "nameMatchFirstType": "full-exact",
          "nameMatchFirstScore": 4.2,
          "nameMatchMiddleType": "exact-singleInitial",
          "nameMatchMiddleScore": 1.5,
          "nameMatchLastType": "full-exact",
          "nameMatchLastScore": 2,
          "nameMatchModifierScore": 0
        },
        "journalCategoryEvidence": {
          "journalSubfieldScienceMetrixLabel": "Medical Informatics",
          "journalSubfieldScienceMetrixID": 36,
          "journalSubfieldDepartment": "Library",
          "journalSubfieldScore": 4.56
        },
        "affiliationEvidence": {
          "scopusTargetAuthorAffiliation": [
            {
              "targetAuthorInstitutionalAffiliationSource": "SCOPUS",
              "targetAuthorInstitutionalAffiliationIdentity": "Weill Cornell Medical College",
              "targetAuthorInstitutionalAffiliationArticleScopusLabel": "Weill Cornell Medical College",
              "targetAuthorInstitutionalAffiliationArticleScopusAffiliationId": 60007997,
              "targetAuthorInstitutionalAffiliationMatchType": "POSITIVE_MATCH_INDIVIDUAL",
              "targetAuthorInstitutionalAffiliationMatchTypeScore": 3
            }
          ],
          "pubmedTargetAuthorAffiliation": {
            "targetAuthorInstitutionalAffiliationArticlePubmedLabel": "Samuel J. Wood Library, Weill Cornell Medical College, New York, United States.",
            "targetAuthorInstitutionalAffiliationMatchTypeScore": 0
          },
          "scopusNonTargetAuthorAffiliation": {
            "nonTargetAuthorInstitutionalAffiliationSource": "SCOPUS",
            "nonTargetAuthorInstitutionalAffiliationMatchKnownInstitution": [
              "Weill Cornell Medical College, 60007997, 5",
              "Columbia University in the City of New York, 60030162, 5"
            ],
            "nonTargetAuthorInstitutionalAffiliationScore": 3
          }
        },
        "relationshipEvidence": [
          {
            "relationshipName": {
              "firstName": "Michael",
              "firstInitial": "M",
              "middleName": "Eliot",
              "middleInitial": "E",
              "lastName": "Bales"
            },
            "relationshipType": [
              "Organizational unit"
            ],
            "relationshipMatchType": "verbose",
            "relationshipMatchingScore": 2.2,
            "relationshipVerboseMatchModifierScore": 0.6,
            "relationshipMatchModifierMentor": 0,
            "relationshipMatchModifierMentorSeniorAuthor": 0,
            "relationshipMatchModifierManager": 0,
            "relationshipMatchModifierManagerSeniorAuthor": 0
          }
        ],
        "educationYearEvidence": {
          "identityBachelorYear": 1997,
          "articleYear": 2014,
          "discrepancyDegreeYearBachelor": 17,
          "discrepancyDegreeYearBachelorScore": 0,
          "discrepancyDegreeYearDoctoralScore": 0
        },
        "personTypeEvidence": {
          "personType": "academic-faculty-weillfulltime",
          "personTypeScore": 2
        },
        "articleCountEvidence": {
          "countArticlesRetrieved": 663,
          "articleCountScore": 0.3425
        },
        "averageClusteringEvidence": {
          "totalArticleScoreWithoutClustering": 23.4,
          "clusterScoreAverage": 23.4,
          "clusterReliabilityScore": 0,
          "clusterScoreModificationOfTotalScore": 0
        }
      }
    },
...
  },
  "uid": {
    "S": "paa2013"
  },
  "usingS3": {
    "N": "0"
  }

}