Skip to content

The ETL Connector extension for Alfresco allows mass import of documents in an Alfresco repository by using compatible ETL Tools (for now Talend). It also provides an ETL client library that makes it easy to integrate in any ETL tool.

Notifications You must be signed in to change notification settings

Open-Wide/alfresco-etl-connector

Repository files navigation

====================================================== ETL Connector for Alfresco - Alfresco Server Extension http://knowledge.openwide.fr/bin/view/Main/AlfrescoETLConnector Copyright (C) 2008-2012 Open Wide SA

About ETL Connector

The ETL Connector extension for Alfresco allows to import documents in an Alfresco repository by using compatible ETL Tools (for now Talend). It also provides an ETL client library that makes it easy to integrate in any ETL tool.

Features

  • works by simple REST HTTP interactions with Alfresco, content provided as fully compliant ACP (Alfresco Content Package) XML
  • imports to any kind of Alfresco content (not only file and folder but also custom types or aspects, any properties and associations, any document tree)
  • configure permissions on imported content
  • create vs update modes on document and containers
  • provides import result logs

Team

License

  • Alfresco Server Extension : GPL
  • ETL client library : LGPL

Getting Started

Client side (ETL tool) : get a compatible ETL tool release

Server-side (Alfresco repository) : get the release compatible with your alfresco server at https://github.com/OpenWide-SI/alfresco-etl-connector/downloads .

  • etlconnector-alfresco2.1 : validated with 2.1 Entreprise for Tomcat, should work with all 2.x Alfresco releases, reported to work on Labs 2.9b
  • etlconnector-alfresco3.1 : tested with 3.1 Entreprise for Tomcat
  • etlconnector-alfresco3.2 : tested with 3.2 Community for Tomcat
  • etlconnector-alfresco4.1 : tested with 4.1.1 Enterprise for Tomcat
  • Alternatively, it may be provided in compatible ETL release bundles.

Server-side installation

  • extract the WEB-INF subdirectory from it ant put it in your alfresco webapp, ex. $ALF_HOME/tomcat/webapps/alfresco/WEB-INF/lib
  • restart alfresco. If it's been correctly installed, there should be in the startup logs (alfresco.log) a line like this one :

19:20:49,635 INFO [org.alfresco.config.source.UrlConfigSource] Found META-INF/web-client-config-custom.xml in file:/C:/dev/workspace/etlconnector-alfresco-deploy/tomcat/webapps/alfresco/WEB-INF/lib/etlconnector-alfresco_1.0.jar

Test

  • You can test it by using the samples provided in the companion project etlconnector-samples , and a compatible ETL like Talend 3.1 or greater on the client side.

For the Quitus sample, using Talend :

  • put the etlconnector-samples*jar in WEB-INF/lib in your alfresco web application
  • start the Alfresco server (after having installed the ETL Connector extension)
  • import the etlconnector-samples/quitus/GED_TECHNIQUE.acp document package in a new "GED TECHNIQUE" folder within the company home folder, using the custom action wizard in the Alfresco web interface
  • start Talend
  • import the etlconnector-samples/quitus/talend/ALFRESCO_ETLCONNECTOR_QUITUS as a Talend workspace project
  • open the single Talend document import job ("ALFRESCO IMPORT_QUITUS 0.1")
  • click in the left panel on Context > PATHS 0.1 to open the configuration dialog and there set the PATH_SOURCE variable of the job to the location of the etlconnector-samples/quitus folder
  • run it in Talend : the complex document tree has been imported in Alfresco, including custom metadata and associations

Documentation

ETL Connector

Using ETL Connector with Talend

FAQ

What is ETL Connector interesting for ?

  • ETL Connector's main benefits stem from the productivity gains inherent to ETL tools : allowing to design graphically how existing information maps to Alfresco metadata, in an easy manner and using an ETL's raw power when it comes to accessing data sources in the Information System. Moreover, an ETL provides all kind of tools to first partition data in smaller batches, and afterwards handle errors.

Performances

  • successfully tested on an import job creating 4000 nodes in 30 minutes, using the Talend ETL to target an Alfresco 2.1 Entreprise server sitting on Oracle.
  • in some deployment environments, a sustained speed of 12 nodes per second has even been experienced.

For developers

Alfresco Server Extension architecture

  • builds on the existing Alfresco Content Package (ACP) import code
  • enriches it with : import of each node in its own transaction,better name path addressing, full error logs, custom import strategies allowing creation vs update import modes
  • XML REST / HTTP server implemented as Alfresco web Commands (though a Java webscript would be a viable alternative today)

Building the Alfresco Server Extension

  • provide the etlconnector-alfresco Eclipse project with the Alfresco SDK (see which one in .classpath) and java 1.5 dependencies
  • run Ant on the given build.xml
  • the ETL Connector Server release is in build/export/ , ready to be added to an Alfresco installation
  • to support a newer version of Alfresco : update source overrides to the latest alfresco source code (ImporterComponent -> ContentImporterComponentBase, ViewParser -> ViewParserBase, CommandServlet > ContentImporterCommandServlet) and reapply changes on them (see javadoc) ; also update their Spring bean definitions (contentImporterComponent & contentViewParser) in talendalfresco-services-context.xml according to their newer versions ; update other alfresco configuration (web.xml, web-client-config-custom.xml and in samples).

Building the ETL client library

Alfresco ETL connector plugin for Talend

Release Notes - 1.3

server

  • migrated to and tested with Alfresco 3.2 Community
  • disabled command servlet request-wide transactions for ImportCommand, allowing to use propagating transactions and fully transactionalized repository services. Done by also overriding CommandServlet code and replacing it in web.xml (though it could exist along the original one but would require changing client ETL code)
  • reapplying rules & behaviours is now done in its own transaction. However a separate result line is returned only in error case.

client

  • now custom CommandServlet URL path can be specified (typically when deploying etlconnector-alfresco3.1.3+'s overriden ContentImporterCommandServlet along the original one).
  • patched cm:folder / view:associations / cm:container hierarchy which was missing view:associations (but to no harm in most cases)
  • improved test framework, added test of duplicate child name error case

Release Notes - 1.2

server

  • migrated to and tested with Alfresco 3.1 Entreprise

Release Notes - 1.1

server

  • namePath now resolved using db rather than lucene (since custom lucene analyzers may make it fail, e.g. like those for French locale in 3.1 which remove ending "s")
  • tested with Alfresco 2.1.1 Entreprise
  • build script now outputs alfresco server version, the right build time & user

client

  • fully compatible with 1.0
  • ACP XML : more robust writing of properties and associations. Now null is allowed as "no value" and String is allowed for single-valued associations, instead of outputting wrongly formed XML because of exception.
  • removed warning from org.apache.commons.httpclient.HttpMethodBase getResponseBody

overall

  • improved README (Talend plugin developer doc), build, copyright, moved links from forge to github & addons

Release Notes - 1.0

First release. Tested with Alfresco 2.1 Entreprise and Talend 3.1 .

About

The ETL Connector extension for Alfresco allows mass import of documents in an Alfresco repository by using compatible ETL Tools (for now Talend). It also provides an ETL client library that makes it easy to integrate in any ETL tool.

Resources

Stars

Watchers

Forks

Packages

No packages published