-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical meaning of eps parameter in DBScan #467
Comments
It's the minimal misorientation angle in radians between two points in a cluster. I.e. if two points are further away than |
So, I just had a look at the usage in the paper I just had reviewer comments on. eps was set to 15 degrees (converted to radians). But the 3 sigma value for the deviation of orientations within each cluster varied from 1.4 degrees to 4.2 degrees (i.e. I calculated standard deviation of the angle in the axis angle pair between each quaternion in the cluster and the average orientation of the cluster, which means that 99.7% of orientations in any cluster lie within something between 1.4 and 4.2 degrees from the cluster average). In other words, the actual results are that the clusters are far tighter than the eps threshold. Similar analysis could (and probably should) be done in other cases. I think eps has to be treated as a useful adjustable parameter, but actually it's better to then analyse final results to see the real scatter in the data. And looking again, the definition suggests that it is the maximal misorientation angle, not minimal. Best wishes Ian |
You are right, it's the maximum distance (sklearn docs). Sorry for being too quick there. |
I've found this wikipedia section to be quite useful: https://en.wikipedia.org/wiki/DBSCAN#Abstract_algorithm Just a note that there are 2 parameters which are important for DBScan. The For eps there are a couple of different ways described here which might be of interest. https://en.wikipedia.org/wiki/DBSCAN#Parameter_estimation. Additionally, methods like OPTICS might be of use which tend to be better if you have clusters which are of varying densities. |
Hi Ian, sorry I'm a bit late to this one. When we wrote Density Based Clustering ... Johnstone et al. we thought about this a bit and in the end decided to start with a number that accurately answers the question: "Given our (perceived) measurement errors, what's the furthest away two data points (i.e the smallest rotation that maps which for our dataset was 0.05 (i.e. ~ 3 degrees) given the high-quality EBSD map we had at hand. Then run the algorithms, inspect the real/orientation space maps, and see where that leaves you. 15 degrees does seem a bit too generous though unless you're working with really really noisy data. You may well end up merging two physically distinct clusters if you're unlucky. |
@pc494 took a break for the holidays there. I initially took a naive view that a small misorientation should be sensible, but on some datasets merely ended up with lots of clusters that were all versions of the same thing, with just minor misorientations between. Even with the 15 degree criterion, this dataset splits some laths of same orientation into two clusters, probably due to some sample bending. It is trivial to see they belong to the same cluster in a pole figure, however. A minor issue which may influence this, but unproven at this point, is that any orientation has two possible habit planes and the reality of samples could mean that there is a slight misorientation between two laths with the two different habit planes. On this specific dataset I played with eps and found that reducing to 3 degrees (in radians) made no difference to results and exactly the same clusters were found. I played with min_samples and found that increasing this from 40 to 100 just deleted the smallest cluster and that lath, with no other change, so that was unhelpful. Decreasing from 40 to 10 found one extra cluster that was a variant indexing of an existing cluster with a slight tilt and just filled in a few points around the edges of one of the laths, so no significant benefit to overall interpretation. With the benefit of this, I will revise paper text slightly and update the eps parameter seeing as the smaller one works here. Ian |
Does anyone have any insight on the physical meaning of the eps parameter in the DBSCAN algorithm and how this relates to angular spread in a cluster?
The text was updated successfully, but these errors were encountered: