Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

Closed
mawiesne opened this issue Jun 22, 2018 · 26 comments

Comments

@mawiesne
Copy link
Contributor

mawiesne commented Jun 22, 2018

With the introduction of Swebble 3.1.7 to the JWPL 1.2.0-SNAPSHOT line, I can no longer fetch plain text data from Wikipedia backends via Page.getPlainText. The stacktrace is documented here:

ERROR - de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at ...
	at java.lang.Thread.run(Thread.java:748)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 79 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 90 more

It seems there is a mismatch of method signatures and/or incompatible libraries being used at runtime. I consider this a major bug, as parts of the main functionality are affected. Therefore, this bug should be fixed before releasing JWPL 1.2.0 (Final).

Dependencies involved:

  • swebble 3.1.7
  • de.fau.cs.osr.ptk.ptk-common 3.0.8
  • de.fau.cs.osr.utils 3.0.8

System environment:

  • Affected OS: MacOS 10.13, Linux / Ubuntu 16.04 LTS+, most likely Windows 7/8/10, cc: @rzo1
  • Java version: Oracle JDK 1.8 (build 1.8.0_161-b12), most likely every 1.8.0-x release

Any ideas @ferschke / @reckart ? Can somebody contact the colleagues at FAU Erlangen to investigate this issue?

@mawiesne
Copy link
Contributor Author

mawiesne commented Jun 22, 2018

This seems to be a regression introduced with the changes of #152 and #155 .

@mawiesne
Copy link
Contributor Author

@tgalery as you contributed the changes of #155, can you also have a look into this issue?

@rzo1
Copy link
Contributor

rzo1 commented Jun 22, 2018

I can confirm, that this also affects Windows 10 - Stacktrace is similar to the one posted by @mawiesne in a Java 8 environment (Oracle / OpenJDK does not matter)

@reckart
Copy link
Member

reckart commented Jun 22, 2018

@mawiesne No idea. I hope @tgalery maybe has some insight.

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

can someone post a bit of code that generates the stacktrace above ?

@rzo1
Copy link
Contributor

rzo1 commented Jun 22, 2018

import de.tudarmstadt.ukp.wikipedia.api.Page;
import de.tudarmstadt.ukp.wikipedia.api.Wikipedia;
import de.tudarmstadt.ukp.wikipedia.api.exception.WikiApiException;

public class Main {

    public static void main(String[] args) throws  WikiApiException {

        Wikipedia wikipedia = new Wikipedia(new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false));

        //German Wikipedia for example, page with title "Gesundheit"
        Page page = wikipedia.getPage("Gesundheit");

        //Exception will be thrown...
        page.getPlainText();

    }
}

with this implementation as CustomDataSource:

import de.tudarmstadt.ukp.wikipedia.api.DatabaseConfiguration;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants.Language;
import org.slf4j.Logger;

import java.sql.*;

public class CustomDataSource extends DatabaseConfiguration {
    private static final Logger logger = org.slf4j.LoggerFactory.getLogger(CustomDataSource.class);

    private String jdbcURL;
    private String databaseDriver;

    /*
     * needed to please frameworks like Spring... parameter injection is done
     * via setters there
     */
    public CustomDataSource() {
        super();
    }

    public CustomDataSource(String hostName, String dbName, String user, String password, String driverClassName, boolean useSSL)  {
        this();
        setDbName(dbName);
        setHostName(hostName);
        setPassword(password);
        setUserName(user);
        // check if the DB driver is available in the classpath
        try {
            Class.forName(driverClassName);
        } catch (ClassNotFoundException e) {
            logger.error(e.getLocalizedMessage(), e);
            throw new RuntimeException(e.getLocalizedMessage(), e);
        }
        String baseJdbcURL = "jdbc:mysql://" + getHostName() + "/" + getDbName();
        if(!hasExternalSSLParams(baseJdbcURL)) {
            if (useSSL) {
                setJdbcURL(baseJdbcURL + "?verifyServerCertificate=false&useSSL=true");
            } else {
                setJdbcURL(baseJdbcURL + "?useSSL=false");
            }
        } else {
            setJdbcURL(baseJdbcURL);
        }

        Language lang = requestWikiLangFromDB(hostName, dbName, user, password);
        setLanguage(lang);
    }

    private boolean hasExternalSSLParams(String baseJdbcURL) {
        return baseJdbcURL.contains("useSSL=");
    }

    /*
     * Although the JWPL-DataBase knows it's Wikipedia language (described as
     * <code>language</code> in the table <code>MetaData</code>), the
     * {@link DatabaseConfiguration} needs to know the specified
     * {@link Language}. Hence, it will be requested by this method so the user
     * does not have to configure the {@link Language} manually.
     *
     * @param hostName
     * @param dbName
     * @param user
     * @param password
     * @return the language found in the <code>MetaData</code>-table, as
     * enumeration instance of {@link Language}
     * @throws WikiServiceException
     */
    private Language requestWikiLangFromDB(String hostName, String dbName, String user, String password)  {

        try (Connection connection = DriverManager.getConnection(getJdbcURL(), user, password)){

            Statement stmnt = connection.createStatement();
            ResultSet result = stmnt.executeQuery("Select language from MetaData");
            if (result.next()) {
                String languageString = result.getString(1);

                logger.info("The language found at {}:{} is '{}' and will be set to this Wiki-DB connection", hostName, dbName, languageString);
                if (languageString.equals("türkçe")) {
                    languageString = "turkish";
                }
                return WikiConstants.Language.valueOf(languageString);
            } else {
                throw new RuntimeException("No language could be found for this Wikipedia DB. This is very strange, check your DB setup!");
            }

        } catch (SQLException e) {
            logger.error(e.getLocalizedMessage());
            throw new RuntimeException(e);
        }
    }

    public void setDbName(String dbName) {
        assert dbName!=null;
        assert dbName.trim().length() > 0;

        super.setDatabase(dbName);
    }

    public String getDbName() {
        return super.getDatabase();
    }

    public void setHostName(String hostName) {
        assert hostName!=null;
        assert hostName.trim().length() > 0;

        super.setHost(hostName);
    }

    public String getHostName() {
        return super.getHost();
    }

    public String getUserName() {
        return super.getUser();
    }

    public void setUserName(String user) {
        assert user!=null;
        assert user.trim().length() > 0;
        super.setUser(user);
    }

    /**
     * @param databaseDriver the databaseDriver to set
     */
    public void setDatabaseDriver(String databaseDriver) {
        assert databaseDriver!=null;
        assert databaseDriver.trim().length() > 0;
        this.databaseDriver = databaseDriver;
    }

    public String getDatabaseDriver() {
        return databaseDriver;
    }

    /**
     * @param jdbcURL the jdbcURL to set
     */
    public void setJdbcURL(String jdbcURL) {
        assert jdbcURL!=null;
        assert jdbcURL.trim().length() > 0;
        this.jdbcURL = jdbcURL;
    }

    public String getJdbcURL() {
        return jdbcURL;
    }

    @Override
    public String getPassword() {
        return super.getPassword();
    }

    @Override
    public void setPassword(String password) {
        super.setPassword(password);
    }

    @Override
    public WikiConstants.Language getLanguage() {
        return super.getLanguage();
    }

    @Override
    public void setLanguage(WikiConstants.Language language) {
        assert language != null;

        super.setLanguage(language);
    }

}

Will output:

Exception in thread "main" de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at de.hshn.mi.shc.etl.wiki.Main.main(Main.java:19)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 8 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 19 more

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

Can you help me understand something in your code ? Looking at the JWPLDataSource would that connect to a database which contains the relevant wikipages ? The creds look funny to me.

@rzo1
Copy link
Contributor

rzo1 commented Jun 22, 2018

Basically:

1.) Create a connection to a database. In our case: a MySQL DB containing the Wikipedia Dumps and therefore the wikipedia pages.

2.) I left out the real credentials ;)

3.) Retrieve a page of interest (it does not matter which one).

4.) Try to retrieve the full text via getPlainText()

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

gotcha, sorry for being a pain, cause i use this in the context of json wikipedia. Is the Mysql database populated by downloading and importing sql files from here https://dumps.wikimedia.org/enwiki/20180320/ (if so could you let me know which) or is there a transformation from the full xml dump into sql that is done by some cli tool in advance ?

@mawiesne
Copy link
Contributor Author

mawiesne commented Jun 22, 2018

We make use of the DataMachine tool, provided by JWPL project, see here: https://dkpro.github.io/dkpro-jwpl/DataMachine/

The resulting files are then imported into a MySQL 5.7 installation.

For a German version of Wikipedia dumps, we basically use:

java -Xmx2g -jar JWPLDataMachine.jar german !Hauptkategorie Begriffsklärung ~/dewiki/$date-of-snapshot$/

as given in the examples section of the how-to.

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

Cool, could I get the exact command you guys used to produce the german (or any other language) dump (I will try to replicate the bug and see if there's an easy fix).

@rzo1
Copy link
Contributor

rzo1 commented Jun 22, 2018

I updated the code-snippet above to not use internal classes / provided related code to execute it.

@mawiesne
Copy link
Contributor Author

mawiesne commented Jun 22, 2018

@tgalery Thanks a ton for looking into this! I will upload a dump of a transformed version of the German wikipedia DB dating Jan 2018. Stay tuned, next comment with instructions will follow shorty.

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

@mawiesne that would be extremely helpful

@mawiesne
Copy link
Contributor Author

@tgalery Download one or both of the two mysql dumps from here:

  1. German version (4.5G):
    https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz
  2. Spanish version (2.7G):
    https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz

Re-Import them on your local dev-machine via:

  1. In a MySQL shell/tool:
    CREATE DATABASE wikipedia_de_jwpl_Jan2018 CHARACTER SET UTF8;
  2. In a MySQL shell/tool:
    GRANT ALL ON wikipedia_de_jwpl_Jan2018.* TO username@'%' IDENTIFIED BY "password";
  3. From a command line/shell:
    gunzip < wikipedia_de_jwpl_Jan2018.sql.gz | mysql --quick --user=root -p

Same procedure with smaller Spanish (es) version, just exchange 'de' with 'es'. When you decide to use es, you could, for instance, fetch a page such as "Salud".

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

Cheers, will give you guys an update as soon as I can.

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

Some upates. I've been trying to debug this using the Spanish dump as it's slightly smaller.
But it seems I get an exception instantiating the wikipedia class. I'm using scala and I get the following:

scala> import de.tudarmstadt.ukp.wikipedia.api._
import de.tudarmstadt.ukp.wikipedia.api._

scala> val source = new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
source: de.tudarmstadt.ukp.wikipedia.api.CustomDataSource = de.tudarmstadt.ukp.wikipedia.api.CustomDataSource@3ac02398

scala> val wikipedia = new Wikipedia(source)
log4j:WARN No appenders could be found for logger (de.tudarmstadt.ukp.wikipedia.api.Wikipedia).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
org.hibernate.tool.schema.spi.SchemaManagementException: Schema-validation: missing column [version] in table [MetaData]
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.validateTable(AbstractSchemaValidator.java:136)
  at org.hibernate.tool.schema.internal.GroupedSchemaValidatorImpl.validateTables(GroupedSchemaValidatorImpl.java:42)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.performValidation(AbstractSchemaValidator.java:89)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.doValidation(AbstractSchemaValidator.java:68)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.performDatabaseAction(SchemaManagementToolCoordinator.java:191)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.process(SchemaManagementToolCoordinator.java:72)
  at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:312)
  at org.hibernate.boot.internal.SessionFactoryBuilderImpl.build(SessionFactoryBuilderImpl.java:462)
  at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:710)
  at de.tudarmstadt.ukp.wikipedia.api.hibernate.WikiHibernateUtil.getSessionFactory(WikiHibernateUtil.java:51)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.__getHibernateSession(Wikipedia.java:761)
  at de.tudarmstadt.ukp.wikipedia.api.MetaData.<init>(MetaData.java:44)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.<init>(Wikipedia.java:87)
  ... 42 elided

Is there something wrong with the spanish dump I downloaded above ?

@tgalery
Copy link
Contributor

tgalery commented Jun 22, 2018

Commenting out hibernate auto validation gives me this:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'metadata0_.version' in 'field list'
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
  at com.mysql.jdbc.Util.getInstance(Util.java:386)
  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
  at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
  at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2281)
  at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:60)
  ... 60 more

so ... maybe there's something funny with the dump ?

@mawiesne
Copy link
Contributor Author

mawiesne commented Jun 24, 2018

@tgalery I think I know what went wrong, and I'll provide two modified/fresh dumps on next Monday.

UPDATE:
Re-Download one of the two files and check sha1sum afterwards:

  1. German version (4.5G):
    https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz
    _sha1sum_should match f837788b0fe5c5b564fd22f11213be9d718190f4

  2. Spanish version (2.7G):
    https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz
    sha1sum should match dc33b2975e4243217e13658685de2bcf3677975a

Remove all previous files / imported DBs and conduct a re-import. It should work now as I've dumped it from one of our production systems in which no DB schema errors are present.

Again, sry for any inconveniences.

@rzo1
Copy link
Contributor

rzo1 commented Jun 27, 2018

It seems to be a problem with the reflection code in de.fau.cs.osr.utils.visitor.VisitorLogic, which cannot differentiate between the correct visit methods at runtime.

Line 361ff

	public Object invoke(VisitorInterface<?> visitor, Object node)
			throws IllegalArgumentException,
				IllegalAccessException,
				InvocationTargetException
		{
			touch();
			return method.invoke(visitor, node);
		}

Both classes

Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

extend the same interface classes, which leads to this issue.

@rzo1
Copy link
Contributor

rzo1 commented Jun 27, 2018

At Heilbronn University Group we managed to reproduce this bug with the existing test-cases PageTest#testPlainText() and the test-DB provided in #2, see

org.junit.internal.AssumptionViolatedException: got: <de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)
>, expected: null


	at org.junit.Assume.assumeThat(Assume.java:95)
	at org.junit.Assume.assumeNoException(Assume.java:142)
	at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:98)
	... 23 more
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:210)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 53 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 64 more

CI did not complain because of #161

@mawiesne mawiesne mentioned this issue Jun 27, 2018
4 tasks
@tgalery
Copy link
Contributor

tgalery commented Jul 2, 2018

Cool, I'm assuming this will be reproducible once #162 gets merged ?

@rzo1
Copy link
Contributor

rzo1 commented Jul 2, 2018

Yes

@mawiesne mawiesne added the severe label Jul 3, 2018
@rzo1
Copy link
Contributor

rzo1 commented Jul 16, 2018

@tgalery Any updates here? :)

@mawiesne
Copy link
Contributor Author

@rzo1 @tgalery Seems, I found a fix for this issue locally. I will push a branch and open a PR, once the related test case works as expected.

mawiesne added a commit that referenced this issue Jul 16, 2018
…iscriminate candidate methods and ends in 'VisitorException'

- Fixes this issue by commenting out unused/empty candidate method `public void visit(WtNode n)`.
- Un-ignores and adapts test cases in `PageTest`. This way, `testGetPlainText` can now work correctly. No more ignored tests \o/
- Adds minor fix in `PlainTextConverter` to parse/handle standalone line breaks strings correctly.
- Adapts demo data to the version from 2010 (initial DB import had "*" and multi-line breaks). I screwed them up slightly in June 2018 when bringing #2 to master.
- Simplifies 2/3 `PlainTextConverter` constructors to reduce duplicate code.

Moreover, this kind of fixes #161, as no other problems remain once this commit is merged.
mawiesne added a commit that referenced this issue Jul 17, 2018
…iscriminate candidate methods and ends in 'VisitorException'

- Addresses LF problems in `PlainTextConverter` on Windows platform, now related test passes. thx for helping: @rzo1
@mawiesne mawiesne self-assigned this Jul 17, 2018
mawiesne added a commit that referenced this issue Jul 17, 2018
…iscriminate candidate methods and ends in 'VisitorException'

- Addresses comments by @reckart in PR #185
mawiesne added a commit that referenced this issue Jul 17, 2018
…-VisitorException

#160 - Page.getPlainText broken - `PlainTextConverter` struggles to discriminate candidate methods and ends in 'VisitorException'
@mawiesne
Copy link
Contributor Author

Finally fixed via PR #185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants