Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Address boundary error" with bwtool matrix -cluster #51

Open
ghuls opened this issue Apr 19, 2016 · 2 comments
Open

"Address boundary error" with bwtool matrix -cluster #51

ghuls opened this issue Apr 19, 2016 · 2 comments

Comments

@ghuls
Copy link
Contributor

ghuls commented Apr 19, 2016

On some input files I have the following crash with bwtool matrix -cluster=10

./bwtool matrix -keep-be…” terminated by signal SIGSEGV (Address boundary error)

I traced the error to this code in beato/cluster.c

static int *k_means(struct cluster_bed_matrix *cbm, double t)
{
    /* output cluster label for each data point */
    int *labels; /* Labels for each cluster (size n) */
    int h, i, j; /* loop counters, of course :) */
    double old_error;
    double error = DBL_MAX; /* sum of squared euclidean distance */
    double **tmp_centroids; /* centroids and temp centroids (size k x m) */   
    int n = cbm->n;
    int m = cbm->m;
    int k = cbm->k;
    AllocArray(labels, n);
    AllocArray(tmp_centroids, k);
printf("k_means: 0\n");
    for (i = 0; i < k; i++)
        AllocArray(tmp_centroids[i], m);
    /* assert(data && k > 0 && k <= n && m > 0 && t >= 0); /\* for debugging *\/ */
    /* init ialization */
printf("k_means: 1\n");
    for (i = 0, h = cbm->num_na; i < k; h += (cbm->n-cbm->num_na) / k, i++)
    {
printf("k_means: 1:%d\n", i);
        /* pick k points as initial centroids */
        for (j = 0; j < m; j++) {
printf("k_means: 1:%d %d %d %d\n", i, j, m, h);
            cbm->centroids[i][j] = cbm->pbm->matrix[h][j];
        }
    }
...

For a working file:

do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 19982
k_means: 1:0 1 10000 19982
k_means: 1:0 2 10000 19982
k_means: 1:0 3 10000 19982
...
k_means: 1:9 9993 10000 19991
k_means: 1:9 9994 10000 19991
k_means: 1:9 9995 10000 19991
k_means: 1:9 9996 10000 19991
k_means: 1:9 9997 10000 19991
k_means: 1:9 9998 10000 19991
k_means: 1:9 9999 10000 19991
k_means: 2
k_means: 3
do_kmeans_sort: 1
do_kmeans_sort: 2
do_kmeans_sort: 3
do_kmeans_sort: 4
do_kmeans_sort: 5
output_matrix

For a non-working file:

do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 20000
Segmentation fault (core dumped)

I think h (=20000) is calculated wrongly:
I have a window size of 10000 and 20000 regions in my BED file.

@andypohl
Copy link

Thanks for finding this. Man, I used the cluster feature a lot in my research and never came across this. I'll investigate. Thanks for the debugging information. Is it possible to put data on the web that can reproduce the error? I understand if that's not possible with real data. Maybe you have toy/fake data. I just haven't been in this code in a while and without data it'll be hard to dissect the problem. I imagine it's something where if all the numbers involved are perfect multiples of the size of an array, something gets calculated off-by-one. Well... that's what I imagine. Of course I won't rule out something more sinister. I've made a lot of mistakes here and there. But I thought I would've noticed if the algorithm was producing incorrect results.

@ghuls
Copy link
Contributor Author

ghuls commented Apr 19, 2016

bwtool matrix -keep-bed -cluster=10 5000:5000 bwtool_matrix_cluster.bed bwtool_matrix_cluster.bw bwtool_matrix_cluster.txt

http://aertslab.org/temp/bwtool/bwtool_matrix_cluster.bed
http://aertslab.org/temp/bwtool/bwtool_matrix_cluster.bw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants