-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How I use your pretrained model for unlabeled face images? #9
Comments
@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features. |
@yl-1993 Thanks for your reply! Whetheri use generate_proposal.py extract features and use dsgcn/main.py to cluster? Can you supported your pretrained model in Baidu Yun? How many images supported in code, maybe i have million images. |
@SharharZ Yes, you can follow the pipeline in |
@SharharZ The pretrained model has already been shared through Baidu Yun. Checkout Setup and get data for more details. |
@yl-1993 Thank you! I'm sorry that maybe I didn't describe it clearly. I mean the pretrained model of hfsoftmax.I analysised the code and download your data. I am no sure how generate the .bin file and npz file for my face image data. In other words, i extract face features in 512 dimension, how to covert into your format file. |
@SharharZ I think you can store your features with |
@yl-1993 thank you very much! |
@yl-1993 There are some different pre-trained models for extracting face features in the link you provided, which feature extracting pre-trained model matches for the clustering's pre-trained model? |
@SharharZ Pretrained models for feature extraction has been uploaded to BaiduYun. You can find the link in the hfsoftmax wiki. |
@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.
|
@yl-1993 感谢回复!不过我理解如果用的提face feature的模型不一致,聚类的模型应该需要重训吧?所以想确认下哪一个提feature模型是和放出来的聚类预训练模型是匹配的。 |
@jxyecn 是的,所以上述回复中说,如果你想抽取自己的特征并训练你的聚类模型,可以选择任意的特征提取模型。另外,这个ResNet-50的模型参数和聚类预训练模型用到的略有不同,如果发现有较大影响,可以继续在这个issue下留言。 |
The question is how to use your main.py file. I wanted to provide extracted face features (face embeddings), but your config file seems to be taking the training related files. I suppose I should put the directory of embedding in the test path location (of this file "cfg_test_0.7_0.75.yaml"). but can't figure out how this it is gonna work since it is also taking training file path. can you explain this part a bit? |
@engmubarak48 Thanks for pointing out. For testing part, it will read training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path. |
@yl-1993 Thanks for your quick reply. I would like to ask, which part of your code extracts/generates the features of images. I have read your generate_proposals.py file, and it seems to be taking .bin files. do we have to extract the features on our own, or there is a file that extracts the features and saves as a bin file. thanks. |
@engmubarak48 Since this repo focuses on the clustering framework, the face recognition training and feature extraction are not included. You can checkout hfsoftmax for pretrained model and feature extraction. Similar discussion can be found in #4. |
@yl-1993 Since the data is unlabeled, I can have only one file that consists of extracted features (assuming that I extracted my features and saved as a bin file). but in your test config file, there is a path pointing to a .meta file (which indicates the labels according to my understanding). what type of labels are they, and why do we need, since we are clustering unlabeled images. or meta.file is used only for evaluation. and can be removed if the evaluation is not needed? Dear @yl-1993 what I intend to do is the following.
And, also I realized your extract_feat.py in hfsoftmax reads images from bin.file. So, I think I should save my numpy array image data into a bin file too. Could you please, in steps, clarify for me "the format my data should be" and also "what needs to be filled in the config file?"--- both in the extract.py and main.py I would really appreciate. |
|
Dear @yl-1993 The main question I asked is what should I fill to the .meta file if I don't have the labels of the data. In your "cfg_test_0.7_0.75.yaml" config file. there is a path pointing to this file "part1_test.meta" In general, I only want to cluster the images. and add each cluster to a folder. then check the clusters manually. Thanks |
@engmubarak48 Sorry for not fully understanding your question. For a quick fix, you can simply use a dummy meta for testing, which will not influence the clustering result. The meta file is currently used for measuring the difference between predicted score and ground-truth score. It is a reference value in test phase. This is a good point. We will support empty meta during inference soon. |
@engmubarak48 #17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network. |
Can I use which model you provide to extract face features and then use the clustering model(pretrained_gcn_d.pth.tar) you provide to process my own images? |
@felixfuu You can use |
Thanks, @yl-1993, I have already made it work back then when I was checking the performance. Do you have any further plans to improve the performance? I am working on this area (face clustering), let me know if you planning further research on this area. we might exchange some ideas. |
How to make an annotation file(.meta) for new data? |
@felixfuu For clustering, you only need to feed features and proposals into the trained network. |
The result of my experiment is not very good. i used 940 faces (many of the same ids) to cluster out 900 labels. Almost every picture has a label. @yl-1993 |
@yl-1993 I use resnet50-softmax as the feature extractor, and follow the pipeline in sctipts/pipeline.sh. Is there an error in this process? |
@felixfuu The overall procedure is correct. I think there are two ways to check your results. (1) Check the extracted features. You can use the |
@yl-1993 According to your suggestion, I visualize the cluster proposals and the result of clustering is not good, so it should be the reason of the feature. In my experiment, the k = 20, max=100. |
@felixfuu |
@yl-1993 Feature extraction will not be a problem, I also checked it with a pair (the cosine similarity is over 0.7 when the pair with same identity and below 0.5 with different identity). |
By the way, i used knn_hnsw. @yl-1993 |
I use |
@felixfuu @MrHwc It seems both of you encounter problems with respect to proposal generation. The basic rule is to reduce |
@yl-1993 I checked the proposal, it should be that the feature is not robust enough, there is no obvious gap between the same identity and different identity. |
My training set is about 100,000, each id has at least 3 feature vectors, up to 381. K={30, 60, 80}, th={0.5, 0.55, 0.6, 0.65 }, I use |
@MrHwc There are several ways may help. (1) Have you checked the distribution of the generated clusters? Empirically, a large proportion of clusters may only have 2 images. (2) What's the results of single proposals? For example, the result of K=80, th=0.6. If the clustering model is well trained, it will surpass the result of single proposals. (3) Proposals with low threshold is helpful to recall and proposals with high threshold may improve precision. From the results, you can try to involve proposals with higher threshold. e.g., th=0.7. |
@yl-1993 您好,我在使用您的代码时遇到了一些问题,烦请指教。我用了您提供的特征提取代码提取了55张图片特征后,再使用该聚类代码后最后出来的pred_labels.txt包含了584013行的数据,我的理解是每一行对应一张图片的label,但这远远大于我的图片数了,若使用自己的feature后是否需要修改程序,这个数据似乎是与您提供的feature对应的。 |
@luhengjie 您好,可否列出具体的调用方式?我猜测应该是有些地方用到了默认的part1_test的数据。另外,为了便于有相同问题的人也能理解,我用英文也回复一下。When Hengjie uses the repo for his own feature, the number of predicted results does not match the number of his features. I guess the problem may lie in using the part1_test in somewhere. We can identify the problem when more details are posted. |
@yl-1993 Thank you for your reply. The way I use your code is to replace the part1_test in the features with my own features, and delete all files in the label folder to avoid influence.The last step is sh scripts/pipeline. |
@luhengjie Thanks. If you name it as |
Hi all, PR #28 simplifies the pipeline of training and testing. To apply the pretrained model to your own unlabeled features, you only need to:
|
Hi |
|
按照你的extract.py里面的代码写的,你是将你提取到的特征保存成.npy文件,而不是二进制文件.bin,请问怎么才可以保存出.bin文件呢 |
Hi, i want to use this method preprocess many unlabed face images, how i use your pretrained model to classify and labeled. Thank you very much!
The text was updated successfully, but these errors were encountered: