Proselytism

Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm

Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf

Note

This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.

It is framework agnostic and has been tested on Ubuntu and MacOSX.

Installation

Install the required external librairies :

# aptitude install netpbm
# aptitude install xpdf
# aptitude install libreoffice

Add this line to your application's Gemfile:

gem 'proselytism'

Note : for ruby 1.9 use the branch 1.9

gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"

And then execute:

$ bundle

##Configuration

With a YAML config file:

rails g proselytism:config

As a Rails engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.

With an initializer (optional for Rails App) :

You can override the configuration file params by adding a custom initializer to /config/initializers . By default Proselytism will log in a separate log file, if you want to use the rails logger

#/config/initializers/proselytism.rb
Proselytism.config do |config|
  config.logger = Rails.logger
end

To generate a full config initializer:

rails g proselytism:initializer

Usage

Proselytism.convert source_file_path, :to => :pdf do |converted_file_path|

end
Proselytism.extract_text source_file_path do |extracted_text|

end
Proselytism.extract_images source_file_path do |image_files_paths|

end

Proselytism creates its converted files in temporary folders.

If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself

pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
#my code
FileUtils.remove_entry_secure File.dirname(pdf_file_path)

Add your own converters

Add your own converter by extending Proselytism::Converters::Base

Your converter will be automatically selected and used related to the params given to the :from and :to methods
Add a perform method which
- calls the execute method with your custom command
- returns the converted file(s) path(s)

Proselytism::Converters::Base takes care of

raising error (if the command execution fail)
logging the command output

class MyConverter < Proselytism::Converters::Base
  class Error < parent::Base::Error; end
  
  form :ext1, :ext2
  to :ext3, :ext4

  def perform(origin, options={})
    destination = destination_file_path(origin, options)
    command = "mycommand #{origin} #{destination} 2>&1"
    execute command
    destination
  end
end

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
lib		lib
spec		spec
.gitignore		.gitignore
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
proselytism.gemspec		proselytism.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proselytism

Note

Installation

Usage

Add your own converters

Contributing

About

Releases

Packages

Languages

License

itkin/proselytism

Folders and files

Latest commit

History

Repository files navigation

Proselytism

Note

Installation

Usage

Add your own converters

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages