Seminars & Colloquia Calendar
Data analysis in high-dimensional spaces
Adi Ben Israel, Rutgers University (RUTCOR)
Date & time: Thursday, 06 May 2021 at 5:00PM - 6:00PM
1. The unreliability of the Euclidean distance in high-dimension, making a proximity query meaningless and unstable because there is poor discrimination between the nearest and furthest neighbor , see also .
2. The uniform probability distribution on the n-dimensional unit sphere S_n, and some non-intuitive results for large \(n\). For example, if x is any point in S_n, taken as the "north pole", then most of the area of S_n is concentrated in the "equator".
3. The advantage of the \(ell_1\)-distance, which is less sensitive to high dimensionality, and has been shown to "provide the best discrimination in high-dimensional data spaces," [1, p. 427].
4. Clustering high-dimensional data using the \(ell_1\) distance, .
 C.C. Aggarwal et al, On the surprising behavior of distance metrics in high dimensional space, Lecture Notes in Computer Science, vol 1973(2001), Springer, https://doi.org/10.1007/3-540-44503-X_27
 T. Asamov and A. Ben-Israel, A probabilistic \(ell_1\) method for clustering high-dimensional data, Probability in the Engineering and Informational Sciences, 2021, 1-16
 K. Beyer et al, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, vol 1540(1999), Springer, https://doi.org/10.1007/3-540-49257-7_15
 J.M. Hammersley, The distribution of distance in a hypersphere, The Annals of Mathematical Statistics 21(1950), 447452.