About

citeproc-hs is a Haskell implementation of the Citation Style Language (CSL).

citeproc-hs adds to pandoc, the famous Haskell text processing tool, a Bibtex like citation and bibliographic formatting and generation facility.

CSL is an XML language for specifying citation and bibliographic formatting, similar in principle to BibTeX .bst files or the binary style files in commercial products like Endnote or Reference Manager.

CSL is used by Zotero for bibliographic style formatting, and the huge number of CSL styles developed by the Zotero community can can be downloaded from here:

http://www.zotero.org/styles

There are plans to use CSL for adding bibliographic support to future releases of OpenOffice.

citeproc-hs can process and format citations according to a CSL style, given a collection of references.

Natively citeproc-hs can read JSON1 and MODS2 XML formatted bibliographic databases.

bibutils can be used to convert Bibtex and other bibliographic databases to MODS collections, which can be thus read by citeproc-hs.

bibutils also exports a library and this library can be used by citeproc-hs for reading the most widely used bibliographic databases. This requires the installation of hs-bibutils, the Haskell bindings to bibutils.

citeproc-hs is a library that exports functions to parse CSL styles and MODS collections, to process lists of citation groups and to format the processed output. The output is a Haskell data type that can be further processed for conversion to any kind of formats (at the present time plain ASCII and the pandoc native format)

Download

citeproc-hs can be downloaded from Hackage:

http://hackage.haskell.org/package/citeproc-hs

To get the darcs source run:

    darcs get http://gorgias.mine.nu/repos/citeproc-hs/

Installation

citeproc-hs depends on a few Haskell packages. Most of them come with the Haskell Platform and are usually included in every Haskell tool-chain.

In order to install citeproc-hs you need to install its dependencies. You can choose to manually download and install everything from Hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hxt

Every package downloaded form Hackage can be installed with these simple commands:

    runhaskell Setup.lhs configure
    runhaskell Setup.lhs build
    runhaskell Setup.lhs install

This last step requires root privileges.

If you don’t have root privileges you can install citeproc-hs and all its dependencies locally with these commands:

    runhaskell Setup.lhs configure --user --prefix=$HOME
    runhaskell Setup.lhs build
    runhaskell Setup.lhs install --user

Alternatively you can use cabal-install to install citeproc-hs and all the needed dependencies:

    cabal update
    cabal install citeproc-hs

Installing without bibutils or network support

bibutils and network support may be suppressed with cabal flags:

    runhaskell Setup.lhs configure -f'-bibutils'

or

    runhaskell Setup.lhs configure -f'-bibutils -network'

and then build and install with:

    runhaskell Setup.lhs build
    runhaskell Setup.lhs install

It is possible to pass the flags also too cabal-install.

Installing with ICU support

Sorting Unicode strings containing non ASCII characters is not supported by the standard Haskell libraries and requires the installation of the ICU libraries, available here:

    http://site.icu-project.org/

It is then necessary to install the Haskell bindings to these libraries. These bindings are available here:

    http://hackage.haskell.org/package/text-icu

You then need to configure the citeproc-hs package with the appropriate cabal flag:

    runhaskell Setup.lhs configure -funicode_collation

and then build as usual.

It is possible to pass the flags also too cabal-install. In this case the installation of the ICU libraries is the only prerequisite.

Using citeproc-hs with Pandoc

To use citeproc-hs with pandoc you need to install citeproc-hs-pandoc-filter, a drop-in replacement for pandoc-citeproc.

The you can run the filter with something like:

pandoc --filter citeproc-hs -s -S -f markdown -t html your_markdown_file.txt

Please refer to pandoc’s documentation for more information on inserting citations in pandoc’s documents and processing them with a filter.

Documentation

Haddock documentation for the exported API is available on Hackage:

http://hackage.haskell.org/package/citeproc-hs

Name parsing

The MODS parser has been optimized for bibtex input, especially for parsing names with affixes , dropping and non-dropping particles.

Suffixes should come after the family name:

    Brown, Jr., John W.

If a comma is needed before the suffix, an exclamation mark may be used:

    Brown,! Jr., John W.

Non-dropping particles are placed before the family name:

    von Hicks,! Jr., Michael

Dropping particles are placed after the given name:

    la Martine,! III, Martin B. de

See also the CSL specification:

http://citationstyles.org/downloads/specification.html#name-particles

Date parsing

The MODS parser, which is used to read all bibliograhic databases supported by bibutils, tries to parse dates, including seasons (expressed in English). An example of supported formats:

2010-01-31   (January 31, 2010)

2004-05      (May, 2004)

2001         (the year only)

Summer, 2001 (the season)

The DOI variable

If the DOI variable is prefixed by a doi: like:

    doi = {doi:10.1038/171737a0}

the processor will generate a link and produce this pandoc native representation:

   Link [Str "10.1038/171737a0"] ("http://dx.doi.org/10.1038/171737a0", "10.1038/171737a0")

that produces a link like:

   <a href="http://dx.doi.org/10.1038/171737a0">10.1038/171737a0</a>

Running the test-suite

To run the test suite, you first need to grab it with mercurial by running, from the root directory of the citeproc-hs source tree:

    hg clone https://bitbucket.org/bdarcus/citeproc-test

You then need to grind human-readable test code into machine-readable form by running, in the citeproc-test directory, the following commands:

    cd citeproc-test
    ./processor.py -g
    cd ..

Then, from the root directory of citeproc-hs source tree, run:

    runhaskell test/test.hs

You may also specify a test group:

    runhaskell test/test.hs date

or a single test in a group:

    runhaskell test/test.hs date IgnoreNonexistentSort

To increase the debug messages edit test/test.hs and increase the Int parameter of runTS:

    runTS args 1 testDir

Known Issues

The CSL implementation is mostly but not entirely complete. Some of the missing features are meaningless in pandoc, the main target of citeproc-hs at the present time. Specifically the display attribute has not been implemented yet.

The citeproc-hs-0.3.9 release passes 586 out of 757 tests of the citeproc-test suite. The test-suite has been developed along with citeproc-js, and the failure of some of those tests is not meaningful for citeproc-hs.

The MODS parser may need some refinement.

Bug Reports

To submit bug reports you can use the Google code bug tracking system available at the following address:

http://code.google.com/p/citeproc-hs/issues

Credits

Bruce D’Arcus, the man behind CSL, Rintze Zelle, one of the main CSL developer, and Frank Bennett, the citeproc-js author, have been very kind and provided ideas, comments and suggestions that made it easier coding citeproc-hs.

John MacFarlane, the author of pandoc, has been very supportive of the project and provided a lot of useful feed back, comments and suggestions.

Author

Andrea Rossato

andrea.rossato at unitn.it

CSL
http://xbiblio.sourceforge.net/csl/
citeproc-js
https://bitbucket.org/fbennett/citeproc-js/wiki/Home
citeproc-test
http://bitbucket.org/bdarcus/citeproc-test
Zotero
http://www.zotero.org
Pandoc
http://johnmacfarlane.net/pandoc/
Bibutils
http://www.scripps.edu/~cdputnam/software/bibutils/
MODS
http://www.loc.gov/mods/

This software is released under a BSD-style license. See LICENSE for more details.

This is an early, “alpha” release. It carries no warranties of any kind.

Copyright © 2008–2012 Andrea Rossato


  1. The JSON format is basically documented by citeproc implementations and is derived by the CSL scheme. More information can be read in the citeproc-js documentation:

    http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#data-input

  2. The Metadata Object Description Schema (MODS) is an XML format which is used by bibutils to interconvert many different bibliographic database formats, like Bibtex, Endnote, and others.