Nation, Genre & Gender: Data

As part of the Nation, Genre & Gender project, literary experts at the UCD School of English Drama Film have manually annotated a corpus of 19th–20th century Irish and British novels. Sample data for our three core case study novels is provided below.

Data Format

Each annotated novel contains the following data files and directories:

  • fulltext.txt: a single file containing a version of the novel text with manual annotations to aid character identification.
  • dictionary.txt: file containing list of all unique characters in the novel, along with their aliases.
  • stopwords.txt: file containing list of words/phrases which should not be identified as characters.
  • attributes.txt: file containing list of attributes for the characters in the novel.
  • notes.txt: notes regarding the edition and annotation process for the specific novel.
  • networks: directory containing the individual chapter and overall character networks for the novel, in GEXF format.

Downloads

Data for each of the three case study novels is provided in a separate ZIP archive:

This data is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available here.