Calendar

Download as iCal file

Experimental Mathematics Seminar

Data analysis in high-dimensional spaces

Adi Ben Israel, Rutgers University (RUTCOR)

Location:  Zoom
Date & time: Thursday, 06 May 2021 at 5:00PM - 6:00PM

 

1. The unreliability of the Euclidean distance in high-dimension, making a proximity query meaningless and unstable because there is poor discrimination between the nearest and furthest neighbor [3], see also [4].

2. The uniform probability distribution on the n-dimensional unit sphere S_n, and some non-intuitive results for large \(n\). For example, if x is any point in S_n, taken as the "north pole", then most of the area of S_n is concentrated in the "equator".

3. The advantage of the \(ell_1\)-distance, which is less sensitive to high dimensionality, and has been shown to "provide the best discrimination in high-dimensional data spaces," [1, p. 427].

4. Clustering high-dimensional data using the \(ell_1\) distance, [2].

References

[1] C.C. Aggarwal et al, On the surprising behavior of distance metrics in high dimensional space, Lecture Notes in Computer Science, vol 1973(2001), Springer, https://doi.org/10.1007/3-540-44503-X_27

[2] T. Asamov and A. Ben-Israel, A probabilistic \(ell_1\) method for clustering high-dimensional data, Probability in the Engineering and Informational Sciences, 2021, 1-16

[3] K. Beyer et al, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, vol 1540(1999), Springer, https://doi.org/10.1007/3-540-49257-7_15

[4] J.M. Hammersley, The distribution of distance in a hypersphere, The Annals of Mathematical Statistics 21(1950), 447452.