1 Introduction
2 High-Dimensional Space
2.1 Properties of High-Dimensional Space
2.2 The High-Dimensional Sphere
2.2.1 The Sphere and the Cube in Higher Dimensions
2.2.2 Volume and Surface Area of the Unit Sphere
2.2.3 The Volume is Near the Equator
2.2.4 The Volume is in a Narrow Annulus
2.2.5 The Surface Area is Near the Equator
2.3 Volumes of Other Solids
2.4 Generating Points Uniformly at Random on the Surface of a Sphere
2.5 Gaussians in High Dimension
2.6 Bounds on Tail Probability
2.7 Random Projection and the Johnson-Lindenstrauss Theorem
2.8 Bibliographic Notes
2.9 Exercises
3 Random Graphs
3.1 TheG(n, p) Model
3.1.1 Degree Distribution
3.1.2 Existence of Triangles in G ( n, d/n )
3.2 Phase Transitions
3.3 The Giant Component
3.4 Branching Processes
3.5 Cycles and Full Connectivity
3.5.1 Emergence of Cycles
3.5.2 Full Connectivity
3.5.3 Threshold for O (Inn) Diameter
3.6 Phase Transitions for Monotone Properties
3.7 Phase Transitions for CNF-sat
3.8 Nonuniform and Growth Models of Random Graphs
3.8.1 Nonuniform Models
3.8.2 Giant Component in Random Graphs with Given Degree Distribution
3.9 Growth Models
3.9.1 Growth Model Without Preferential Attachment
3.9.2 A Growth Model with Preferential Attachment
3.10 Small World Graphs
3.11 Bibliographic Notes
3.12 Exercises
4 Singular Value Decompositi0n (SVD)
4.1 Singular Vectors
4.2 Singular Value Decompositi0n (SVD)
4.3 Best Rank k Approximations
4.4 Power Method for Computing the Singular Value Decompositi0n
4.5 Applications of Singular Value Decompositi0n
4.5.1 Principal Component Analysis
4.5.2 Clustering a Mixture of Spherical Gaussians
4.5.3 An Application of SVD to a Discrete Optimization Problem
4.5.4 Spectral Decompositi0n
4.5.5 Singular Vectors and Ranking Documents
4.6 Bibliographic Notes
4.7 Exercises
5 Random Walks and Markov Chains
5.1 Stationary Distribution
5.2 Electrical Networks and Random Walks
5.3 Random Walks on Undirected Graphs with Unit Edge Weights
5.4 Random Walks in Euclidean Space
5.5 The Web as a Markov Chain
5.6 Markov Chain Monte Carlo
5.6.1 Metropolis-Hasting Algorithm
5.6.2 Gibbs Sampling
5.7 Convergence of Random Walks on Undirected Graphs
5.7.1 Using Normalized Conductance to Prove Convergence
5.8 Bibliographic Notes
5.9 Exercises
6 Learning and VC-Dimension
6.1 Learning
6.2 Linear Separators, the Perceptron Algorithm, and Margins
6.3 Nonlinear Separators, Support Vector Machines, and Kernels
6.4 Strong and Weak Learning-Boosting
6.5 Number of Examples Needed for Prediction: VC-Dimension
6.6 Vapnik-Chervonenkis or VC-Dimension
6.6.1 Examples of Set Systems and Their VC-Dimension
6.6.2 The Shatter Function
6.6.3 Shatter Function for Set Systems of Bounded VC-Dimension
6.6.4 Intersection Systems
6.7 The VC Theorem
6.8 Bibliographic Notes
6.9 Exercises
7 Algorithms for Massive Data Problems
7.1 Frequency Moments of Data Streams
7.1.1 Number of Distinct Elements in a Data Stream
7.1.2 Counting the Number of Occurrences of a Given Element
7.1.3 Counting Frequent Elements
7.1.4 The Second Moment
7.2 Sketch of a Large Matrix
7.2.1 Matrix Multiplication Using Sampling
7.2.2 Approximating a Matrix with a Sample of Rows and Columns
7.3 Sketches of Documents
7.4 Exercises
8 Clustering
8.1 Some Clustering Examples
8.2 A Simple Greedy Algorithm for k-clustering
8.3 Lloyd's Algorithm for k-means Clustering
8.4 Meaningful Clustering via Singular Value Decompositi0n
8.5 Recursive Clustering Based on Sparse Cuts
8.6 Kernel Methods
8.7 Agglomerative Clustering
8.8 Communities, Dense Submatrices
8.9 Flow Methods
8.10 Linear Programming Formulation
8.11 Finding a Local Cluster Without Examining the Whole Graph
8.12 Axioms for Clustering
8.12.1 An Impossibility Result
8.12.2 A Satisfiable Set of Axioms
8.13 Exercises
9 Graphical Models and Belief Propagation
9.1 Bayesian or Belief Networks
9.2 Markov Random Fields
9.3 Factor Graphs
9.4 Tree Algorithms
9.5 Message Passing Algorithm
9.6 Graphs with a Single Cycle
9.7 Belief Update in Networks with a Single Loop
9.8 Maximum Weight Matching
9.9 Warning Propagation
9.10 Correlation Between Variables
9.11 Exercises
10 Other Topics
10.1 Rankings
10.2 Hare System for Voting
10.3 Compressed Sensing and Sparse Vectors
10.3.1 Unique Reconstruction of a Sparse Vector
10.3.2 The Exact Reconstruction Property
10.3.3 Restricted Isometry Property
10.4 Applications
10.4.1 Sparse Vector in Some Coordinate Basis
10.4.2 A Representation Cannot be Sparse in Both Time and Frequency Domains
10.4.3 Biological
10.4.4 Finding Overlapping Cliques or Communities
10.4.5 Low Rank Matrices
10.5 Exercises
11 Appendix
11.1 Asymptotic Notation
11.2 Useful Inequalities
11.3 Sums of Series
11.4 Probability
11.4.1 Sample Space, Events, Independence
11.4.2 Variance
11.4.3 Variance of Sum of Independent Random Variables
11.4.4 Covariance
11.4.5 The Central Limit Theorem
11.4.6 Median
11.4.7 Unbiased Estimators
11.4.8 Probability Distributions
11.4.9 Maximum Likelihood Estimation MLE
11.4.10 Tail Bounds
11.4.11 Chernoff Bounds: Bounding of Large Deviations
11.4.12 Hoeffding's Inequality
11.5 Generating Functions
11.5.1 Generating Functions for Sequences Defined by Recurrence Relationships
11.5.2 Exponential Generating Function
11.6 Eigenvalues and Eigenvectors
11.6.1 Eigenvalues and Eigenvectors
11.6.2 Symmetric Matrices
11.6.3 Extremal Properties of Eigenvalues
11.6.4 Eigenvalues of the Sum of Two Symmetric Matrices
11.6.5 Norms
11.6.6 Important Norms and Their Properties
11.6.7 Linear Algebra
11.6.8 Distance Between Subspaces
11.7 Miscellaneous
11.7.1 Variational Methods
11.7.2 Hash Functions
11.7.3 Catalan Numbers
11.7.4 Sperner's Lemma
11.8 Exercises
Index
References
精彩片段:
In the vector space representation of data, properties of vectors such as dot products,distance between vectors, and orthogonality often have natural interpretations and this iswhat makes the vector representation more important than just a book keeping device. Forexample, the squared distance between two 0-1 vectors representing links on web pages isthe number of web pages to which only one of the pages is linked. In Figure 2.3, pages 4and 5 both have links to pages 1, 3, and 6 but only page 5 has a link to page 2. Thus, thesquared distance between the two vectors is one. We have seen that dot products measuresimilarity. Orthogonality of two nonnegative vectors says that they are disjoint. Thus if adocument collection, eg., all news articles of a particular year, contained documents on twoor more disparate topics, politics, cars, entertainment, etc., vectors corresponding to twodocuments from different topics will be nearly orthogonal.
While distance, dot product, cosine of the angle etc. are all measures of similarity ordissimilarity, there are important mathematical and algorithmic differences. The randomprojection theorem presented in this chapter states that a collection of vectors can beprojected to a lower dimensional space approximately preserving all pairwise distances.Thus, the nearest neighbors of each vector in the collection can be computed in theprojected lower dimensional space saving-time. However, such a saving in time is notpossible for computing pairwise dot products using a simple projection.
Our aim in this book is to present the reader with the mathematical foundations to dealwith high-dimensional data. There are two important parts of this foundation. The first ishigh-dimensional geometry along with vectors, matrices, and linear algebra. The secondmore modern aspect is the combination of the above with probability,
High dimensionality is a common characteristic in many models and for this reason wedevote much of this chapter to the geometry of high dimensiona! space which is quitedifferent from our intuitive understanding of two and three dimensions. We. focus first onvolumes and surface areas of high-dimensional objects like hyperspheres. We will not presentin this book details Of any one application, but rather present the fundamental theory usefulto many applications.