| folge2 | TMNav | TMHarvest | Panckoucke | Hamburg 2004 |
DataProviders provide their data as name-value-pairs over several iterations. This contract is general enough to allow the integration of nearly every kind of datasource in TMHarvest.
Currently TMHarvest comes with support for the following datasources:
Performs a sql-query. For each row of the resultset, the template is processed once. The names of the placeholders are derive from the resultsets columnnames.
The body of the element contains the select statement. The JDBC-Source is defined in the <context> as a <datasource> and is referenced via the datasource-attribute of the <sqlProvider>.
Reads data from a csv-file.
It is assumed that the first line of the file contains the column-labels. The labels will be used to identify the corresponding placeholders in the template.
The <csvProvider>-element defines the following attributes:
The file that contains the data. This attribute is required. The filename is resolved relative to the location of the modelfile.
A String that is used to separate consecutive fields. This attribute is optional. The default is the tabulator.
How much rows of data are read in one turn from the csv-file. Higher values need more memory, but yields to better performance. There is little need to modify this attribute. This attribute is optional. The default is 800.
Allows you to transform the columnlabels to upper or to lower case. This attribute is optional. The default is that no transformation is done.
The XPathProvider selects a subset of nodes (the nodeset) from the xml-source. For each of theese nodes the template is processed once.
In order to identify placeholders and substitute them with concrete value, the xpathProvider can contain an arbitrary number of <property>-elements. Each of theese properties define a name and a query. The query is executed on the current node. The result is transformed to a string and is used as the value in subsequent placeholder processing.
The <xpathProvider>-element defines the following attributes:
The xmlfile to be processed. This attribute is required.
The xPath-query, that returns the initial nodeset. This attribute is required.
The <xpathProvider>-element contains the following children:
The <property>-element allows you to define a name and a query. It defines the attribute's name and expression
The <metaDataProvider> iterates over a set of files and extracts and returns meta data.
Currently the following filetypes are supported:
The <metaDataProvider>-element defines the following attributes:
The basedirectory, from where the files are read. The directory is parsed recursivly. What kind of files are included, depends on the implementation of the finderclass.
Full qualified name of a class that implements the org.tm4j.tmharvest.md.MetaDataFinder-Interface.
Implementations of the org.tm4j.tmharvest.md.MetaDataFinder-Interface differs in what kind of files they are able to process and what meta data the return. In addition, every implementation defines a set of file extensions that will be applied as a filter to all files in the given directory.
TMHarvest comes with the following implementations:
classname:org.tm4j.tmharvest.md.MP3MDFinder
extensions:.mp3
meta data:location, bitrate, album, artist, title, year, genre, comment
classname:org.tm4j.tmharvest.md.OpenOfficeOMDFinder
extensions:.sxi, .sxd, .sxc, .sxw, .sxm
meta data:location, application, language, title, subject, description, creationDate, keywords (currently only comma separated)
classname:org.tm4j.tmharvest.md.MSOfficeMDFinder
extensions:.mpp, .vsd, .mdb, .ppt, .doc, .xls
meta data:location, title, author, lastAuthor, lastEditDate, keywords, creationDate, application, comments, template, pageCount, charCount, version
This Metadata-Provider checks for <meta>-tags whose name begin with DC.. If successful it uses the name without the prefix DC. as the key.
classname:org.tm4j.tmharvest.md.HtmlMDFinder
extensions:.htm, .html
meta data:the name of every <meta>-tag, whose name starts with DC.
indicates a java-class, that implements the org.tm4j.harvest.data.DataProvider-Interface.
The <customProvider>-element defines the following attributes:
Contains the full qualified name of a class that implements the org.tm4j.harvest.data.DataProvider-Interface
The <customProvider>-element contains the following children:
The <param>-element is used to set properties of the custom class. The class must define beanish setters for all the properties that will be set this way. The <param>-element defines two attributes: name and value.