Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup required before starting indexing. #1107

Closed
rossbrown9879 opened this issue Apr 20, 2020 · 10 comments
Closed

Setup required before starting indexing. #1107

rossbrown9879 opened this issue Apr 20, 2020 · 10 comments

Comments

@rossbrown9879
Copy link

I'm new to this project and I want to build index on MS-MARCO documents dataset, for document ranking task.

The readme mentions using following command to start indexing.

nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection \
 -generator LuceneDocumentGenerator -threads 1 -input msmarco-doc/collection \
 -index lucene-index.msmarco-doc.pos+docvectors+rawdocs -storePositions -storeDocvectors -storeRawDocs \
 >& log.msmarco-doc.pos+docvectors+rawdocs &

I don't know much about this script. I ran the same command and got the following in one newly created file named log.msmarco-doc.pos+docvectors+rawdocs :

nohup: ignoring input
sh: 0: Can't open target/appassembler/bin/IndexCollection

What I understood from reading this is, there was something missing like appassembler. I used following command

$ mvn clean package appassembler:assemble

This got me following :

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.078 s
[INFO] Finished at: 2020-04-20T15:33:58+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/ruchit/Desktop). Please verify you invoked Maven from the correct directory. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException

I don't know what that is, and now I do not know any solution to this. Can anyone provide steps that need to follow for building an inverted index?

Also I'd like to know if there are any prebuilt indexes on MS-MARCO document ranking dataset. I saw one index for passage ranking dataset. If there are any index already built on MS-MARCO document ranking dataset, please mention its source. Thanks in advance.

@lintool
Copy link
Member

lintool commented Apr 20, 2020

What java version are you using?

@rossbrown9879
Copy link
Author

@lintool thanks for quick reply.
I'm using following java version.

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu119.10.1)
OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Ubuntu-1ubuntu119.10.1, mixed mode, sharing)

Looks like I've progressed a bit on this. I'm sorry for my confusion.
What I've done till now is :
(1) Cloned this repo to Desktop
(2) cd Desktop/anerini
(3) Created a target folder using mvn clean appassembler:assemble
(4) Created a directory msmarco-doc/collection, and downloaded zipped corpus in this folder.
(5) Ran the script for indexing.

But still I'm getting following in the output file.

nohup: ignoring input
Error: Could not find or load main class io.anserini.index.IndexCollection
Caused by: java.lang.ClassNotFoundException: io.anserini.index.IndexCollection

Earlier when I reported this issue, I did not clone the repository, I was just running mvn clean appassembler:assemble. Due to this, I could not create that target folder. Now I did that part. But still that ClassNotFoundException is arising. Thank you again.

@rossbrown9879
Copy link
Author

rossbrown9879 commented Apr 20, 2020

When I deleted previously created target folder and ran `mvn clean package appassembler:assemble

` My build failed and I got following :


Results :

Tests run: 251, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.2:report (report) @ anserini ---
[INFO] Loading execution data file /home/ruchit/Desktop/anserini/target/jacoco.exec
[INFO] Analyzed bundle 'Anserini' with 265 classes
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ anserini ---
[INFO] Building jar: /home/ruchit/Desktop/anserini/target/anserini-0.9.1-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-javadoc-plugin:3.1.0:jar (attach-javadocs) @ anserini ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:54 min
[INFO] Finished at: 2020-04-20T17:58:28+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.1.0:jar (attach-javadocs) on project anserini: MavenReportException: Error while generating Javadoc: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[1]+  Exit 1                  nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection -generator LuceneDocumentGenerator -threads 1 -input msmarco-doc/collection -index lucene-index.msmarco-doc.pos+docvectors+rawdocs -storePositions -storeDocvectors -storeRawDocs &> log.msmarco-doc.pos+docvectors+rawdocs

@lintool
Copy link
Member

lintool commented Apr 20, 2020

Your build is failing:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:54 min
[INFO] Finished at: 2020-04-20T17:58:28+05:30
[INFO] ------------------------------------------------------------------------

Scroll up further up in your Maven output to see why?

@rossbrown9879
Copy link
Author

@lintool The part above Results : , in the log I provided in comment above contains list of tests like this.

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.181 sec
Running io.anserini.analysis.EnglishStemmingAnalyzerTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running io.anserini.analysis.TweetTokenizationTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
Running io.anserini.doc.JDIQ2018EffectivenessDocsTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 sec
Running io.anserini.doc.GenerateRegressionDocsTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.542 sec
Running io.anserini.kg.FreebaseTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running io.anserini.kg.FreebaseNodeTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
Running io.anserini.util.ExtractAverageDocumentLengthTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.254 sec
Running io.anserini.util.ExtractDocumentLengthsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.255 sec
Running io.anserini.util.FeatureVectorTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.259 sec
Running io.anserini.util.ExtractTopDfTermsTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.261 sec
Running io.anserini.util.ExtractNormsTest

I see that none of the test have any failures. The part before those tests contain following warnings :

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Scanning for projects...
[INFO] 
[INFO] ------------------------< io.anserini:anserini >------------------------
[INFO] Building Anserini 0.9.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ anserini ---
[INFO] Deleting /home/ruchit/Desktop/anserini/target
[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.2:prepare-agent (default) @ anserini ---
[INFO] argLine set to -javaagent:/home/ruchit/.m2/repository/org/jacoco/org.jacoco.agent/0.8.2/org.jacoco.agent-0.8.2-runtime.jar=destfile=/home/ruchit/Desktop/anserini/target/jacoco.exec
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ anserini ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 200 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ anserini ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 147 source files to /home/ruchit/Desktop/anserini/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ anserini ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 40 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ anserini ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 91 source files to /home/ruchit/Desktop/anserini/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ anserini ---
[INFO] Surefire report directory: /home/ruchit/Desktop/anserini/target/surefire-reports

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

I do not see any log mentioning cause of failure of the build. I can not understand the following mentioned when BUILD FAILURE is logged at the end.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:57 min
[INFO] Finished at: 2020-04-20T18:12:06+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.1.0:jar (attach-javadocs) on project anserini: MavenReportException: Error while generating Javadoc: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[1]+  Exit 1                  nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection -generator LuceneDocumentGenerator -threads 1 -input msmarco-doc/collection -index lucene-index.msmarco-doc.pos+docvectors+rawdocs -storePositions -storeDocvectors -storeRawDocs &> log.msmarco-doc.pos+docvectors+rawdocs

@rossbrown9879
Copy link
Author

rossbrown9879 commented Apr 20, 2020

I think the build is failed because of Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.1.0:ja But I'm not sure about this. Is it anything related to my java version?

I think that this is because of environment variable JAVA_HOME is not set. Can you tell me the value to which I should set this environment variable?

@lintool
Copy link
Member

lintool commented Apr 20, 2020

What's the error associated with org.apache.maven.plugins:maven-javadoc-plugin:3.1.0?

@rossbrown9879
Copy link
Author

rossbrown9879 commented Apr 20, 2020

@lintool following is the error
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.1.0:jar (attach-javadocs) on project anserini: MavenReportException: Error while generating Javadoc: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set. -> [Help 1]
[ERROR]

@lintool
Copy link
Member

lintool commented Apr 20, 2020

Yes, you should set your JAVA_HOME appropriately then. Not knowing the specifics of your setup, it would be easier to search online to find out how to do so...

@lintool
Copy link
Member

lintool commented Apr 23, 2020

Having heard no follow up, closing issue. Reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants