How to work with result (.pk) file #4

asptutorial · 2016-05-28T15:30:40Z

Hi,

first of all thank you for the great work and nice implementation!

The tool works fine for me and I will use it for document comparison in the socal media context. Can you please give me some advise how to work with the resulting "...wmd_d.pk" file? First I thought the result would be a textfile with a readable matrix in it but now I think I need any additional software?

Thank you very much!

budhiraja · 2016-09-15T11:06:34Z

I am facing the same issue. How do we use the "...wmd_d.pk" file?
Thanks for the great work though! (Y)

asptutorial · 2016-09-27T15:33:41Z

Hi,

I already have found a solution for this topic. The python code below can be used to transform the wmd resultfile into an csv structured file. Just use the name of the resultfile for the first variable and the new filename for your csv file asthe secound variable. Otherwise you can edit the code. In the for-loops you iterate trough the resultmatrix.

Hope this helps

import pdb, sys, numpy as np, pickle, os

load_file = sys.argv[1]
fileName = sys.argv[2]

def main():
with open(load_file) as f:
WMD_D = pickle.load(f)
with open(fileName,"a") as myfile:
for valx in WMD_D:
for valy in valx:
myfile.write('%f;' % (valy))
myfile.write('\n')

if name == "main":
main()

renaud · 2016-09-27T16:31:13Z

hi @asptutorial2016 could you create a pull request please?
thanks, Renaud

budhiraja · 2016-10-11T15:38:29Z

Hi @asptutorial2016 , Can you please explain the code?
Thanks,
-A

loretoparisi · 2017-07-07T15:20:37Z

@budhiraja it just loads the pickle file WMD_D = pickle.load(f) and then make two iterations on the matrix, one for the rows for valx in WMD_D: and one for the columns: for valy in valx: and then it writes down a col myfile.write('%f;' % (valy)) and a new row myfile.write('\n').

loretoparisi · 2017-07-07T16:13:46Z

BTW I just fixed the indentation here:

import pdb, sys, numpy as np, pickle, os

load_file = sys.argv[1]
fileName = sys.argv[2]

def main():
	with open(load_file) as f:
		WMD_D = pickle.load(f)
		with open(fileName,"a") as myfile:
			for valx in WMD_D:
				for valy in valx:
					myfile.write('%f;' % (valy))
				myfile.write('\n')

if __name__ == '__main__':
	main()

and you run like

$ python export.py twitter_wmd_d.pk matrix.csv

That said, I'm still not sure of this output since I would expect one WMD value for each text in the dataset. The output is like

# head -n1 matrix.csv 
0.000000;2.662555;3.469390;2.638741;3.003526;3.436190;3.354545;3.355945;2.980919;3.000695;3.253323;3.182629;3.106654;3.078974;3.315462;2.870222;3.350217;2.973588;3.092783;3.114992;2.838299;2.645446;3.046859;2.940506;nan;3.286307;2.886182;3.659952;3.191685;3.501301;1.776840;2.541942;3.142671;2.721650;3.302513;2.941633;2.707120;2.351888;3.005273;3.305617;3.528918;3.125243;2.898311;3.343573;2.176667;2.928992;2.981162;2.931301;3.664851;3.310865;3.015806;2.000753;2.553579;2.239903;3.353682;2.766366;3.157034;3.025401;3.315156;3.101092;2.484381;3.020051;2.835375;3.714346;3.125150;3.092783;3.034500;3.379948;3.844170;2.669132;3.101865;2.950110;3.410166;3.399071;3.399071;3.636564;2.683887;3.484905;nan;2.738641;3.041693;3.171469;3.351616;3.001618;3.066837;2.979639;2.256675;3.407525;2.838299;2.700281;3.558868;nan;3.092783;3.040104;3.095895;4.065932;3.421955;3.160233;2.346177;3.564308;2.729474;3.488503;3.450735;3.145845;3.028299;3.588741;2.985653;2.870175;3.045858;3.576804;3.193545;3.111908;3.067753;2.689224;3.114125;2.963947;2.950037;2.644434;2.369771;2.820261;3.253356;3.777590;4.032024;2.934441;3.088626;3.564733;3.137017;3.091637;3.235543;3.129742;3.237264;3.063123;2.983043;3.411296;2.980262;3.315646;3.138268;2.803821;2.895465;3.557537;2.828279;2.708696;2.949570;1.752965;2.964379;2.700281;3.075949;3.429963;3.241878;2.895290;2.794623;3.223408;3.585769;3.187804;3.548501;nan;2.967911;3.185351;3.945765;3.785060;2.728175;3.213646;3.448902;2.976206;2.938305;3.129154;2.714573;3.544060;2.764415;3.008341;2.620735;3.364275;3.252821;3.129372;3.031745;3.259631;2.889915;3.220967;3.496291;3.758972;3.542420;3.126512;3.398939;2.869977;3.472169;
...

loretoparisi mentioned this issue Jul 10, 2017

Current wmd implementation does not match GenSim #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to work with result (.pk) file #4

How to work with result (.pk) file #4

asptutorial commented May 28, 2016

budhiraja commented Sep 15, 2016

asptutorial commented Sep 27, 2016 •

edited

Loading

renaud commented Sep 27, 2016

budhiraja commented Oct 11, 2016

loretoparisi commented Jul 7, 2017

loretoparisi commented Jul 7, 2017 •

edited

Loading

How to work with result (.pk) file #4

How to work with result (.pk) file #4

Comments

asptutorial commented May 28, 2016

budhiraja commented Sep 15, 2016

asptutorial commented Sep 27, 2016 • edited Loading

renaud commented Sep 27, 2016

budhiraja commented Oct 11, 2016

loretoparisi commented Jul 7, 2017

loretoparisi commented Jul 7, 2017 • edited Loading

asptutorial commented Sep 27, 2016 •

edited

Loading

loretoparisi commented Jul 7, 2017 •

edited

Loading