Tools, Datasets and Resources
EE 380L - A Practicum in Data Mining
Spring 2002

Resources

SVD Code and More (NEW!)
  • Software to compute the SVD of a large, sparse matrix. This code is essentially the same as Michael Berry's las2 subroutine from SVDPACK.
  • C++ classes implementing various data structures, such as, hash tables may be found at SGI's STL site.
  • A standard list of English stopwords is available from the SMART ftp site. Other stopword lists may vary slightly.
  • Sparse Matrices (NEW!)
  • Here is a simple of representing sparse matrices very similar to the Harwell-Boeing format. Please use this format for this course.
  • More general sparse matrix formats are given here.
  • Even more sparse matrix resources.
  • Information Retrieval (NEW!)
  • Useful URLs:
  • Reference Books
    • R. Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, Essex, England, 1999. (Book website can be found here. It does not have the book online, but it contains many useful resources and an errata. Includes a list of general resources.)
    • W.B. Frakes and R. Baeza-Yates (Eds.), Information Retrieval: Data Structures and Algorithms, Prentice- Hall, Englewood Cliffs, NJ, 1992. This web site contains the source codes used in the book.
    • INFORMATION RETRIEVAL, An online book by C. J. van RIJSBERGEN (Preface)
    Visualization
  • Introduction to Visualization: Vis '96 Tutorial
  • GTM (Generative Topographical Mapping)
  • Hierachical Probabilistic PCA
  • CVIZ Manual - IBM SurfAid
  • Data Reduction
  • Principal Component Analysis (PCA)
  • Synopsis 1
  • Synopsis 2 
  • Notes on Spectral Decomposition
  • MATLAB procedure comparing SVD and PCA
  • Principal Curves (nonlinear)
  • Web-based statistical analyzer
  • Classification
  • ROC curve notes (with figures)
    roc.ps roc.ps.gz roc.zip
  • Support Vector Machines
  • Bayesian Networks
  • Tutorial on Learning Bayesian Networks
  • Bayesian Belief Networks
  • Clustering
  • METIS
  • Miscellaneous
  • EM Algorithm
  • White paper on OLAP
  • Data Warehouse basics and links
  • KDNuggets
  • Tools

    1. SAS and Enterprise Miner
    2. UT WNT page w/screenshots
    3. Notes on connecting to published Applications.
    4. Aladdin Expander (.gz tool)
    5. Ghostview (.ps tool)
    6. Acrobat Reader (.pdf tool)

    Datasets

    Links (original) Local copy
    1. UCI KDD Archive
    2. UCI Machine Learning databse
    3. Delve
    4. ELENA
    5. PRNN
    6. PROBEN1
    7. StatLib
    8. Statlog
    LANS benchmarks Datasets
    KDD Sisyphus I Yes
    KDNuggets Datasets
    Financial Time Series Data
    The Data Mine
     


    Last updated 11/2001