Skip to content

LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…

License

Notifications You must be signed in to change notification settings

TanujRohatgi/lapdftext

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

About

LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 82.4%
  • XSLT 12.3%
  • HTML 4.6%
  • Other 0.7%