folge2Search Services TopicmapHamburg 2005-2006

Using a local configuration with Nutch

When using Nutch in your own projects you face the problem, that nutch-site.xml as well as nutch-default.xml are stored in the nutch.jar. Therefore you can’t edit them without breaking up the jar.

The solution to this problem is to add a new configuration resource to NutchConf. This new resource overides the values from nutch-default.xml and gets overriden by nutch-site.xml.

Since nutch-site.xml in nutch.jar is empty, our local configuration is the last configuration effectively used.

In order to pass your local configuration settings to Nutch you need to call one of the two addConfResource() -methods on an instance of NutchConf:


   public synchronized void addConfResource(String);
   public synchronized void addConfResource(File);

The first one tries to locate a resource with the given name on the classpath while the second reads the resource from the given file.

Just as nutch-default.xml and nutch-site.xml the local configuration is a xml-file with the root-element

  
<nutch-conf>
</nutch-conf>