Computer Science Department

Lehrstuhl für Dienstleistungsinformatik / e-Services Research Group

Enron Spreadsheet Error Finder

A tool for locating errors in the spreadsheets of the Enron corpus

Screenshot of the Enron Spreadsheet Error Finder

The Enron Spreadsheet Error Finder is a tool to search for faulty spreadsheets in the public collection of emails of the Enron company. The tool constructs a conversation graph from the emails and allows the user to quickly search for keywords in these communications. Users can then navigate through the related emails and see the differences between the attached spreadsheets of the same conversation or that have similar filenames.

The software was developed by the e-Services Research group in the context of the joint DEOS project, which is funded by the Austrian Science Fund and the German Research Foundation. The Project aims at developing new techniques for finding and fixing errors in spreadsheets.

The tool can be freely used and shared by everyone.

Software prerequisites

  1. The 64-bit variant of the Java 8 runtime environment
  2. A running PostgreSQL 9.4 database instance and a user account as the owner of this database
  3. The extracted Enron email files by Hermans and Murphy-Hill

Installation

  1. Download and unzip the Enron Spreadsheet Error Finder tool.
  2. Edit the configuration file conf/database_config.xml to point to your database and type in the user and password information.
  3. Edit the configuration file conf/directory_config.xml giving the following information:
    • maildir: The path to the directory that contains the extracted .eml mail files. All subdirectories of the given path will be considered automatically.
    • tmpdir: A path to a temporary directory which is needed to extract all spreadsheets. This directory will be deleted before the database is built!
    • corpusdir: A path to the directory in which the final corpus of spreadsheets is stored. This directory will be deleted before the database is built!
  4. Open a console and start the tool with java -Xmx6g -jar EnronSpreadsheetErrorFinder.jar. The process of building the database will take several hours and requires at least 6 GB of available system memory.
  5. After the database is built the tool will start automatically. To launch the tool again, the same command can be used and the database will not be built again.

Downloads

Enron Spreadsheet Error Finder v0.174 (.zip)
A collection of errors identified with the tool
Source code (in preparation)

Contributors

Thomas Schmitz
Dietmar Jannach
Tom-Philipp Seifert
Stephan Abel

Feedback

If you have any questions or comments, please feel free to contact Thomas Schmitz.