Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing new graph file parsers for graphchi #1

Closed
clstaudt opened this issue Jul 25, 2013 · 11 comments
Closed

Implementing new graph file parsers for graphchi #1

clstaudt opened this issue Jul 25, 2013 · 11 comments

Comments

@clstaudt
Copy link
Contributor

I would like to try graphchi with a collection of graphs in the so-called METIS format, a simple adjacency list format, which is however not the same as the adjacency lists already supported.

http://www.cc.gatech.edu/dimacs10/downloads.shtml

The Introduction to Example Applications states that "it is fairly easy to write your own parsers. ", but it is not apparent how this works. Looking at the source code did not get me far. There should be some hints in the documentation on how to create a new parser.

@akyrola
Copy link
Member

akyrola commented Jul 25, 2013

The documentation is quite sparse, admittedly.

Here are some tips how to get started:

The parsers are implemented in src/preprocessing/conversions.hpp
https://github.com/GraphChi/graphchi-cpp/blob/master/src/preprocessing/conversions.hpp

Look first around line 541 how the parsers for different tiletypes are called.

Then you need to write your own parser method similar to convert_adjlist(basefilename, sharderobj), which starts on line 285.

After that, you should be done. You just need to specify "--filetype=myfiletype" on the command line, where myfiletype is the identifier for the format your want to implement.

@clstaudt
Copy link
Contributor Author

Thanks, this was very helpful. I think I might be able to implement a parser for the METIS format.

Is there a specific reason why the C++ standard library is so sparsely used for the parsers? Is it okay to work with std::ifstream, std::stringstream, std::getline etc?

Kind regards
Christian Staudt

@akyrola
Copy link
Member

akyrola commented Jul 25, 2013

It is ok to use C++ standard library. I just found the C-methods were a bit faster, and with billions of edges that can make a difference.

@clstaudt
Copy link
Contributor Author

Implemented the METIS format parser, see:

https://algohub.iti.kit.edu/parco/Prototypes/PLPgraphchi/changeset/1aa8e6ef1373f1eceef31694b050bff4e91be3aa

However, when trying to test the new format with the community detection example, I get a crash:

cls ~/workspace/Prototypes/graphchi-cpp $ ./bin/example_apps/communitydetection --filetype=metis file /Users/cls/workspace/Data/DIMACS/Clustering/pgp.graph
[filetype] => [metis]
INFO: conversions.hpp(convert_if_notexists:742): Did not find preprocessed shards for /Users/cls/workspace/Data/DIMACS/Clustering/pgp.graph
INFO: conversions.hpp(convert_if_notexists:744): (Edge-value size: 8)
INFO: conversions.hpp(convert_if_notexists:745): Will try create them now...
INFO: sharder.hpp(determine_number_of_shards:393): Determining number of shards automatically.
INFO: sharder.hpp(determine_number_of_shards:396): Assuming available memory is 800 megabytes.
INFO: sharder.hpp(determine_number_of_shards:397): (This can be defined with configuration parameter 'membudget_mb')
INFO: sharder.hpp(determine_number_of_shards:403): Determining maximum shard size: 100 MB.
INFO: sharder.hpp(determine_number_of_shards:416): Number of shards to be created: 2
INFO: sharder.hpp(execute_sharding:358): Max vertex id: 0
INFO: sharder.hpp(start_phase:488): Starting phase: 1
DEBUG: binary_adjacency_list.hpp(read_edges:133): 100%
Assertion failed: (a>0), function preada, file ./src/util/ioutil.hpp, line 50.
Abort trap: 6

This is hard for me to diagnose. Am I doing anything obviously wrong?

Kind regards
Chris

Am 25.07.2013 um 18:20 schrieb Aapo Kyrola [email protected]:

It is ok to use C++ standard library. I just found the C-methods were a bit faster, and with billions of edges that can make a difference.


Reply to this email directly or view it on GitHub.

@akyrola
Copy link
Member

akyrola commented Jul 28, 2013

Sorry I had not noticed your message.

It seems your interim file is empty: see the message "Max vertex id: 0". You can send me the code and I am happy to have a look.

@clstaudt
Copy link
Contributor Author

You should be able to view and pull the code from here:
https://algohub.iti.kit.edu/parco/Prototypes/PLPgraphchi

Alternatively, I append the source file. Thank you for having a look at this.

Chris

Am 28.07.2013 um 04:02 schrieb Aapo Kyrola [email protected]:

Sorry I had not noticed your message.

It seems your interim file is empty: see the message "Max vertex id: 0". You can send me the code and I am happy to have a look.


Reply to this email directly or view it on GitHub.

@akyrola
Copy link
Member

akyrola commented Jul 29, 2013

Hmm, i notice that none of your output to logstream of convert_metis is shown.

I don't see anything obviously wrong in your code. I suggest you add std::cout << "debug ... " << std::endl; to many places and hunt down why no edges are read from the file.

@clstaudt
Copy link
Contributor Author

Am 29.07.2013 um 19:06 schrieb Aapo Kyrola [email protected]:

I don't see anything obviously wrong in your code. I suggest you add std::cout << "debug ... " << std::endl; to many places and hunt down why no edges are read from the file.

No edges are read from the file because the control flow does not reach my convert_metis function, starting from the community detection example app. In the main function of the example, graphchi_init(argc, argv) is called, which is supposed to read the --filetype=metis option I suppose. Then it calls set_argc and puts the key-value-pair into the configuration, and prints it, right? get_option_string_interactive is supposed to get the value, I guess. I cannot figure out where convert is actually called, the example only calls convert_if_notexists explicitly. Any idea on how to fix this?

@akyrola
Copy link
Member

akyrola commented Aug 3, 2013

convert_if_notexists calls convert.... what's happening there?

@clstaudt
Copy link
Contributor Author

clstaudt commented Aug 4, 2013

For some reason, it did not enter the if (!sharderobj.preprocessed_file_exists()) block. Tried it with a new file and now reading a graph in METIS format seems to work. Community detection on a large web graph 1 runs ins 269 seconds.

Are you interested in adding the parser code to graphchi?

Kind regards
Chris

Am 03.08.2013 um 19:49 schrieb Aapo Kyrola [email protected]:

convert_if_notexists calls convert.... what's happening there?


Reply to this email directly or view it on GitHub.

@akyrola
Copy link
Member

akyrola commented Aug 4, 2013

Great! Just make a pull request and I will add it. Thanks!

Sent from my iPhone

On Aug 4, 2013, at 14:31, clstaudt [email protected] wrote:

For some reason, it did not enter the if (!sharderobj.preprocessed_file_exists()) block. Tried it with a new file and now reading a graph in METIS format seems to work. Community detection on a large web graph 1 runs ins 269 seconds.

Are you interested in adding the parser code to graphchi?

Kind regards
Chris

Am 03.08.2013 um 19:49 schrieb Aapo Kyrola [email protected]:

convert_if_notexists calls convert.... what's happening there?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

@akyrola akyrola closed this as completed Oct 10, 2013
antoine-de pushed a commit to antoine-de/graphchi-cpp that referenced this issue Feb 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants