CIS 5930 (Fall 2008):
Advanced Topics in Data Management

Course Information
Announcement
Lecture Notes
Assignments
Project Notes
Related Papers

Project description is available here!

Matlab script and tex samples are available here!

TPIE Installation

TPIE is a software environment that facilitates the implementation of I/O-efficient algorithms and data structures. It was initiated at Duke University, and is still under development today trying to incorporate many additional features.

TPIE consists of a small core library (libtpie.a) and a large collection of function templates. Please follow the instructions below to install TPIE on your Linux box:

  1. Download the latest snapshot from here.
  2. Extract the files: tar zxf tpie.tgz
    This will create a directory "tpie" with all TPIE files in it.
  3. Set the environment variable AMI_SINGLE_DEVICEto a directory on a local disk. This directory will be used by TPIE as scratch space to write temporary files. You should create such a directory on a local disk that has been allocated to you, for example/da/chinglau/tmp Then you set the environment variable using (suppose you are using c-shell):
    setenv AMI_SINGLE_DEVICE /da/chinglau/tmp
    You may want to set this variable in your login scripts.
  4. cd tpie
  5. ./configure
  6. Build the core library: make lib
    Now the TPIE core library libtpie.a is placed under lib/
  7. Build the sample program:
    cd test
    make sample_pgm
    The executable of the sample program will be placed under ../bin/
  8. Test the sample program:
    cd ../bin
    ./sample_pgm -l 50M -m 32M

Use TPIE to Write Your Own Programs

The easiest way to write a TPIE program is to "steal" the skeleton of the sample program and its Makefile. You should also read the relevant sections from the TPIE manual. For the MapReduce assignment you only have to read the following sections in the manual: Ch 1-3, Ch 4.1, 4.2, 4.3, Ch 5.1, 5.2, 5.10

Bear in mind that TPIE is constantly under changes and the manual is a bit outdated. For example, the manual still talks about .C and .H files, which are now replaced by .cpp and .h files. The sorting function now has an optional "indicator" parameter, which allows you to view the progress of the sorting. These small changes actually improves TPIE and makes programming with TPIE easier. Whenever in doubt, please read the comments in the .h files under include/

Do not create TPIE temporary files in an NFS mounted directory. The files could be created in the directoryon the local hard disk to avoid the files to be sent over the network.

Facilities

When you program and debug, you can use any Linux machine and install TPIE on it.  When you want to test your code on large data sets, you can either use your own machine (if you have Linux installed), or Linprog. Please coordinate with other teams so that no more than two experiments shall be run at the same time on the same machine.  The experiments are going to take some time, so don’t wait until the last minute!

Important: You should back up important files and code to a safer place (e.g., your CS home directory) on a regular basis.

Student Date Topic Reference
Last updated 10/06/08