Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

mawiesne · 2018-06-22T06:56:08Z

With the introduction of Swebble 3.1.7 to the JWPL 1.2.0-SNAPSHOT line, I can no longer fetch plain text data from Wikipedia backends via Page.getPlainText. The stacktrace is documented here:

ERROR - de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at ...
	at java.lang.Thread.run(Thread.java:748)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 79 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 90 more

It seems there is a mismatch of method signatures and/or incompatible libraries being used at runtime. I consider this a major bug, as parts of the main functionality are affected. Therefore, this bug should be fixed before releasing JWPL 1.2.0 (Final).

Dependencies involved:

swebble 3.1.7
de.fau.cs.osr.ptk.ptk-common 3.0.8
de.fau.cs.osr.utils 3.0.8

System environment:

Affected OS: MacOS 10.13, Linux / Ubuntu 16.04 LTS+, most likely Windows 7/8/10, cc: @rzo1
Java version: Oracle JDK 1.8 (build 1.8.0_161-b12), most likely every 1.8.0-x release

Any ideas @ferschke / @reckart ? Can somebody contact the colleagues at FAU Erlangen to investigate this issue?

The text was updated successfully, but these errors were encountered:

mawiesne · 2018-06-22T07:03:11Z

This seems to be a regression introduced with the changes of #152 and #155 .

mawiesne · 2018-06-22T07:19:51Z

@tgalery as you contributed the changes of #155, can you also have a look into this issue?

rzo1 · 2018-06-22T07:22:39Z

I can confirm, that this also affects Windows 10 - Stacktrace is similar to the one posted by @mawiesne in a Java 8 environment (Oracle / OpenJDK does not matter)

reckart · 2018-06-22T08:23:04Z

@mawiesne No idea. I hope @tgalery maybe has some insight.

tgalery · 2018-06-22T08:26:48Z

can someone post a bit of code that generates the stacktrace above ?

rzo1 · 2018-06-22T08:49:19Z

import de.tudarmstadt.ukp.wikipedia.api.Page;
import de.tudarmstadt.ukp.wikipedia.api.Wikipedia;
import de.tudarmstadt.ukp.wikipedia.api.exception.WikiApiException;

public class Main {

    public static void main(String[] args) throws  WikiApiException {

        Wikipedia wikipedia = new Wikipedia(new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false));

        //German Wikipedia for example, page with title "Gesundheit"
        Page page = wikipedia.getPage("Gesundheit");

        //Exception will be thrown...
        page.getPlainText();

    }
}

with this implementation as CustomDataSource:

import de.tudarmstadt.ukp.wikipedia.api.DatabaseConfiguration;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants.Language;
import org.slf4j.Logger;

import java.sql.*;

public class CustomDataSource extends DatabaseConfiguration {
    private static final Logger logger = org.slf4j.LoggerFactory.getLogger(CustomDataSource.class);

    private String jdbcURL;
    private String databaseDriver;

    /*
     * needed to please frameworks like Spring... parameter injection is done
     * via setters there
     */
    public CustomDataSource() {
        super();
    }

    public CustomDataSource(String hostName, String dbName, String user, String password, String driverClassName, boolean useSSL)  {
        this();
        setDbName(dbName);
        setHostName(hostName);
        setPassword(password);
        setUserName(user);
        // check if the DB driver is available in the classpath
        try {
            Class.forName(driverClassName);
        } catch (ClassNotFoundException e) {
            logger.error(e.getLocalizedMessage(), e);
            throw new RuntimeException(e.getLocalizedMessage(), e);
        }
        String baseJdbcURL = "jdbc:mysql://" + getHostName() + "/" + getDbName();
        if(!hasExternalSSLParams(baseJdbcURL)) {
            if (useSSL) {
                setJdbcURL(baseJdbcURL + "?verifyServerCertificate=false&amp;useSSL=true");
            } else {
                setJdbcURL(baseJdbcURL + "?useSSL=false");
            }
        } else {
            setJdbcURL(baseJdbcURL);
        }

        Language lang = requestWikiLangFromDB(hostName, dbName, user, password);
        setLanguage(lang);
    }

    private boolean hasExternalSSLParams(String baseJdbcURL) {
        return baseJdbcURL.contains("useSSL=");
    }

    /*
     * Although the JWPL-DataBase knows it's Wikipedia language (described as
     * <code>language</code> in the table <code>MetaData</code>), the
     * {@link DatabaseConfiguration} needs to know the specified
     * {@link Language}. Hence, it will be requested by this method so the user
     * does not have to configure the {@link Language} manually.
     *
     * @param hostName
     * @param dbName
     * @param user
     * @param password
     * @return the language found in the <code>MetaData</code>-table, as
     * enumeration instance of {@link Language}
     * @throws WikiServiceException
     */
    private Language requestWikiLangFromDB(String hostName, String dbName, String user, String password)  {

        try (Connection connection = DriverManager.getConnection(getJdbcURL(), user, password)){

            Statement stmnt = connection.createStatement();
            ResultSet result = stmnt.executeQuery("Select language from MetaData");
            if (result.next()) {
                String languageString = result.getString(1);

                logger.info("The language found at {}:{} is '{}' and will be set to this Wiki-DB connection", hostName, dbName, languageString);
                if (languageString.equals("türkçe")) {
                    languageString = "turkish";
                }
                return WikiConstants.Language.valueOf(languageString);
            } else {
                throw new RuntimeException("No language could be found for this Wikipedia DB. This is very strange, check your DB setup!");
            }

        } catch (SQLException e) {
            logger.error(e.getLocalizedMessage());
            throw new RuntimeException(e);
        }
    }

    public void setDbName(String dbName) {
        assert dbName!=null;
        assert dbName.trim().length() > 0;

        super.setDatabase(dbName);
    }

    public String getDbName() {
        return super.getDatabase();
    }

    public void setHostName(String hostName) {
        assert hostName!=null;
        assert hostName.trim().length() > 0;

        super.setHost(hostName);
    }

    public String getHostName() {
        return super.getHost();
    }

    public String getUserName() {
        return super.getUser();
    }

    public void setUserName(String user) {
        assert user!=null;
        assert user.trim().length() > 0;
        super.setUser(user);
    }

    /**
     * @param databaseDriver the databaseDriver to set
     */
    public void setDatabaseDriver(String databaseDriver) {
        assert databaseDriver!=null;
        assert databaseDriver.trim().length() > 0;
        this.databaseDriver = databaseDriver;
    }

    public String getDatabaseDriver() {
        return databaseDriver;
    }

    /**
     * @param jdbcURL the jdbcURL to set
     */
    public void setJdbcURL(String jdbcURL) {
        assert jdbcURL!=null;
        assert jdbcURL.trim().length() > 0;
        this.jdbcURL = jdbcURL;
    }

    public String getJdbcURL() {
        return jdbcURL;
    }

    @Override
    public String getPassword() {
        return super.getPassword();
    }

    @Override
    public void setPassword(String password) {
        super.setPassword(password);
    }

    @Override
    public WikiConstants.Language getLanguage() {
        return super.getLanguage();
    }

    @Override
    public void setLanguage(WikiConstants.Language language) {
        assert language != null;

        super.setLanguage(language);
    }

}

Will output:

Exception in thread "main" de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at de.hshn.mi.shc.etl.wiki.Main.main(Main.java:19)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 8 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 19 more

tgalery · 2018-06-22T08:58:48Z

Can you help me understand something in your code ? Looking at the JWPLDataSource would that connect to a database which contains the relevant wikipages ? The creds look funny to me.

rzo1 · 2018-06-22T09:01:04Z

Basically:

1.) Create a connection to a database. In our case: a MySQL DB containing the Wikipedia Dumps and therefore the wikipedia pages.

2.) I left out the real credentials ;)

3.) Retrieve a page of interest (it does not matter which one).

4.) Try to retrieve the full text via getPlainText()

tgalery · 2018-06-22T09:13:10Z

gotcha, sorry for being a pain, cause i use this in the context of json wikipedia. Is the Mysql database populated by downloading and importing sql files from here https://dumps.wikimedia.org/enwiki/20180320/ (if so could you let me know which) or is there a transformation from the full xml dump into sql that is done by some cli tool in advance ?

mawiesne · 2018-06-22T09:20:38Z

We make use of the DataMachine tool, provided by JWPL project, see here: https://dkpro.github.io/dkpro-jwpl/DataMachine/

The resulting files are then imported into a MySQL 5.7 installation.

For a German version of Wikipedia dumps, we basically use:

java -Xmx2g -jar JWPLDataMachine.jar german !Hauptkategorie Begriffsklärung ~/dewiki/$date-of-snapshot$/

as given in the examples section of the how-to.

tgalery · 2018-06-22T09:32:39Z

Cool, could I get the exact command you guys used to produce the german (or any other language) dump (I will try to replicate the bug and see if there's an easy fix).

rzo1 · 2018-06-22T09:34:47Z

I updated the code-snippet above to not use internal classes / provided related code to execute it.

mawiesne · 2018-06-22T09:36:48Z

@tgalery Thanks a ton for looking into this! I will upload a dump of a transformed version of the German wikipedia DB dating Jan 2018. Stay tuned, next comment with instructions will follow shorty.

tgalery · 2018-06-22T09:50:42Z

@mawiesne that would be extremely helpful

mawiesne · 2018-06-22T09:52:50Z

@tgalery Download one or both of the two mysql dumps from here:

German version (4.5G):
https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz
Spanish version (2.7G):
https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz

Re-Import them on your local dev-machine via:

In a MySQL shell/tool:
CREATE DATABASE wikipedia_de_jwpl_Jan2018 CHARACTER SET UTF8;
In a MySQL shell/tool:
GRANT ALL ON wikipedia_de_jwpl_Jan2018.* TO username@'%' IDENTIFIED BY "password";
From a command line/shell:
gunzip < wikipedia_de_jwpl_Jan2018.sql.gz | mysql --quick --user=root -p

Same procedure with smaller Spanish (es) version, just exchange 'de' with 'es'. When you decide to use es, you could, for instance, fetch a page such as "Salud".

tgalery · 2018-06-22T10:04:23Z

Cheers, will give you guys an update as soon as I can.

tgalery · 2018-06-22T16:23:48Z

Some upates. I've been trying to debug this using the Spanish dump as it's slightly smaller.
But it seems I get an exception instantiating the wikipedia class. I'm using scala and I get the following:

scala> import de.tudarmstadt.ukp.wikipedia.api._
import de.tudarmstadt.ukp.wikipedia.api._

scala> val source = new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
source: de.tudarmstadt.ukp.wikipedia.api.CustomDataSource = de.tudarmstadt.ukp.wikipedia.api.CustomDataSource@3ac02398

scala> val wikipedia = new Wikipedia(source)
log4j:WARN No appenders could be found for logger (de.tudarmstadt.ukp.wikipedia.api.Wikipedia).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
org.hibernate.tool.schema.spi.SchemaManagementException: Schema-validation: missing column [version] in table [MetaData]
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.validateTable(AbstractSchemaValidator.java:136)
  at org.hibernate.tool.schema.internal.GroupedSchemaValidatorImpl.validateTables(GroupedSchemaValidatorImpl.java:42)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.performValidation(AbstractSchemaValidator.java:89)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.doValidation(AbstractSchemaValidator.java:68)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.performDatabaseAction(SchemaManagementToolCoordinator.java:191)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.process(SchemaManagementToolCoordinator.java:72)
  at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:312)
  at org.hibernate.boot.internal.SessionFactoryBuilderImpl.build(SessionFactoryBuilderImpl.java:462)
  at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:710)
  at de.tudarmstadt.ukp.wikipedia.api.hibernate.WikiHibernateUtil.getSessionFactory(WikiHibernateUtil.java:51)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.__getHibernateSession(Wikipedia.java:761)
  at de.tudarmstadt.ukp.wikipedia.api.MetaData.<init>(MetaData.java:44)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.<init>(Wikipedia.java:87)
  ... 42 elided

Is there something wrong with the spanish dump I downloaded above ?

tgalery · 2018-06-22T17:15:26Z

Commenting out hibernate auto validation gives me this:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'metadata0_.version' in 'field list'
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
  at com.mysql.jdbc.Util.getInstance(Util.java:386)
  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
  at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
  at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2281)
  at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:60)
  ... 60 more

so ... maybe there's something funny with the dump ?

mawiesne · 2018-06-24T21:47:27Z

@tgalery I think I know what went wrong, and I'll provide two modified/fresh dumps on next Monday.

UPDATE:
Re-Download one of the two files and check sha1sum afterwards:

German version (4.5G):
https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz
_sha1sum_should match f837788b0fe5c5b564fd22f11213be9d718190f4
Spanish version (2.7G):
https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz
sha1sum should match dc33b2975e4243217e13658685de2bcf3677975a

Remove all previous files / imported DBs and conduct a re-import. It should work now as I've dumped it from one of our production systems in which no DB schema errors are present.

Again, sry for any inconveniences.

rzo1 · 2018-06-27T07:45:57Z

It seems to be a problem with the reflection code in de.fau.cs.osr.utils.visitor.VisitorLogic, which cannot differentiate between the correct visit methods at runtime.

Line 361ff

	public Object invoke(VisitorInterface<?> visitor, Object node)
			throws IllegalArgumentException,
				IllegalAccessException,
				InvocationTargetException
		{
			touch();
			return method.invoke(visitor, node);
		}

Both classes

Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

extend the same interface classes, which leads to this issue.

rzo1 · 2018-06-27T08:23:29Z

At Heilbronn University Group we managed to reproduce this bug with the existing test-cases PageTest#testPlainText() and the test-DB provided in #2, see

org.junit.internal.AssumptionViolatedException: got: <de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)
>, expected: null


	at org.junit.Assume.assumeThat(Assume.java:95)
	at org.junit.Assume.assumeNoException(Assume.java:142)
	at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
	at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
	at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
	at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
	at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:98)
	... 23 more
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
	at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
	at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
	at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
	at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:210)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
	... 53 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
	at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1462)
	at java.util.Collections.sort(Collections.java:175)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
	at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
	... 64 more

CI did not complain because of #161

tgalery · 2018-07-02T14:43:17Z

Cool, I'm assuming this will be reproducible once #162 gets merged ?

rzo1 · 2018-07-02T14:57:20Z

Yes

rzo1 · 2018-07-16T14:17:35Z

@tgalery Any updates here? :)

mawiesne · 2018-07-16T20:31:23Z

@rzo1 @tgalery Seems, I found a fix for this issue locally. I will push a branch and open a PR, once the related test case works as expected.

…iscriminate candidate methods and ends in 'VisitorException' - Fixes this issue by commenting out unused/empty candidate method `public void visit(WtNode n)`. - Un-ignores and adapts test cases in `PageTest`. This way, `testGetPlainText` can now work correctly. No more ignored tests \o/ - Adds minor fix in `PlainTextConverter` to parse/handle standalone line breaks strings correctly. - Adapts demo data to the version from 2010 (initial DB import had "*" and multi-line breaks). I screwed them up slightly in June 2018 when bringing #2 to master. - Simplifies 2/3 `PlainTextConverter` constructors to reduce duplicate code. Moreover, this kind of fixes #161, as no other problems remain once this commit is merged.

@rzo1

…iscriminate candidate methods and ends in 'VisitorException' - Addresses LF problems in `PlainTextConverter` on Windows platform, now related test passes. thx for helping: @rzo1

@reckart

…iscriminate candidate methods and ends in 'VisitorException' - Addresses comments by @reckart in PR #185

…-VisitorException #160 - Page.getPlainText broken - `PlainTextConverter` struggles to discriminate candidate methods and ends in 'VisitorException'

mawiesne · 2018-07-17T12:38:46Z

Finally fixed via PR #185

mawiesne added 🐛Bug 🆘 help wanted labels Jun 22, 2018

mawiesne added this to the 1.2.0 milestone Jun 22, 2018

mawiesne mentioned this issue Jun 22, 2018

generateWikiConfig won't work for some languages #159

Closed

rzo1 mentioned this issue Jun 27, 2018

PageTest - setupWikipedia() fails but Maven Build still succeeds #161

Closed

mawiesne mentioned this issue Jun 27, 2018

Release JWPL 1.2.0 #158

Closed

4 tasks

mawiesne added the severe label Jul 3, 2018

mawiesne mentioned this issue Jul 7, 2018

Hibernate hbm2ddl policy should be "validate" instead of "update" #34

Closed

rzo1 mentioned this issue Jul 17, 2018

#160 - Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #185

Merged

mawiesne self-assigned this Jul 17, 2018

mawiesne added a commit that referenced this issue Jul 17, 2018

#160 - Page.getPlainText broken - PlainTextConverter struggles to d…

d51f3a7

…iscriminate candidate methods and ends in 'VisitorException' - Addresses comments by @reckart in PR #185

mawiesne closed this as completed Jul 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

mawiesne commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018

rzo1 commented Jun 22, 2018 •

edited

Loading

reckart commented Jun 22, 2018

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018 •

edited

Loading

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018

tgalery commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018 •

edited

Loading

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018

mawiesne commented Jun 22, 2018 •

edited

Loading

tgalery commented Jun 22, 2018

mawiesne commented Jun 22, 2018

tgalery commented Jun 22, 2018

tgalery commented Jun 22, 2018

tgalery commented Jun 22, 2018

mawiesne commented Jun 24, 2018 •

edited

Loading

rzo1 commented Jun 27, 2018 •

edited

Loading

rzo1 commented Jun 27, 2018 •

edited

Loading

tgalery commented Jul 2, 2018

rzo1 commented Jul 2, 2018

rzo1 commented Jul 16, 2018

mawiesne commented Jul 16, 2018

mawiesne commented Jul 17, 2018

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

Comments

mawiesne commented Jun 22, 2018 • edited Loading

Dependencies involved:

System environment:

mawiesne commented Jun 22, 2018 • edited Loading

mawiesne commented Jun 22, 2018

rzo1 commented Jun 22, 2018 • edited Loading

reckart commented Jun 22, 2018

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018 • edited Loading

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018

tgalery commented Jun 22, 2018 • edited Loading

mawiesne commented Jun 22, 2018 • edited Loading

tgalery commented Jun 22, 2018

rzo1 commented Jun 22, 2018

mawiesne commented Jun 22, 2018 • edited Loading

tgalery commented Jun 22, 2018

mawiesne commented Jun 22, 2018

tgalery commented Jun 22, 2018

tgalery commented Jun 22, 2018

tgalery commented Jun 22, 2018

mawiesne commented Jun 24, 2018 • edited Loading

rzo1 commented Jun 27, 2018 • edited Loading

rzo1 commented Jun 27, 2018 • edited Loading

tgalery commented Jul 2, 2018

rzo1 commented Jul 2, 2018

rzo1 commented Jul 16, 2018

mawiesne commented Jul 16, 2018

mawiesne commented Jul 17, 2018

mawiesne commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018 •

edited

Loading

rzo1 commented Jun 22, 2018 •

edited

Loading

rzo1 commented Jun 22, 2018 •

edited

Loading

tgalery commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 22, 2018 •

edited

Loading

mawiesne commented Jun 24, 2018 •

edited

Loading

rzo1 commented Jun 27, 2018 •

edited

Loading

rzo1 commented Jun 27, 2018 •

edited

Loading