citeproc-hs is a Haskell implementation of the Citation Style Language (CSL).
citeproc-hs adds to pandoc, the famous Haskell text processing tool, a Bibtex like citation and bibliographic formatting and generation facility.
CSL is an XML language for specifying citation and bibliographic formatting, similar in principle to BibTeX .bst
files or the binary style files in commercial products like Endnote or Reference Manager.
CSL is used by Zotero for bibliographic style formatting, and the huge number of CSL styles developed by the Zotero community can can be downloaded from here:
There are plans to use CSL for adding bibliographic support to future releases of OpenOffice.
citeproc-hs can process and format citations according to a CSL style, given a collection of references.
Natively citeproc-hs can read JSON1 and MODS2 XML formatted bibliographic databases.
bibutils can be used to convert Bibtex and other bibliographic databases to MODS collections, which can be thus read by citeproc-hs.
bibutils also exports a library and this library can be used by citeproc-hs for reading the most widely used bibliographic databases. This requires the installation of hs-bibutils, the Haskell bindings to bibutils.
citeproc-hs is a library that exports functions to parse CSL styles and MODS collections, to process lists of citation groups and to format the processed output. The output is a Haskell data type that can be further processed for conversion to any kind of formats (at the present time plain ASCII and the pandoc native format)
citeproc-hs can be downloaded from Hackage:
http://hackage.haskell.org/package/citeproc-hs
To get the darcs source run:
darcs get http://gorgias.mine.nu/repos/citeproc-hs/
citeproc-hs depends on a few Haskell packages. Most of them come with the Haskell Platform and are usually included in every Haskell tool-chain.
In order to install citeproc-hs you need to install its dependencies. You can choose to manually download and install everything from Hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hxt
Every package downloaded form Hackage can be installed with these simple commands:
runhaskell Setup.lhs configure
runhaskell Setup.lhs build
runhaskell Setup.lhs install
This last step requires root
privileges.
If you don’t have root
privileges you can install citeproc-hs and all its dependencies locally with these commands:
runhaskell Setup.lhs configure --user --prefix=$HOME
runhaskell Setup.lhs build
runhaskell Setup.lhs install --user
Alternatively you can use cabal-install to install citeproc-hs and all the needed dependencies:
cabal update
cabal install citeproc-hs
bibutils and network support may be suppressed with cabal flags:
runhaskell Setup.lhs configure -f'-bibutils'
or
runhaskell Setup.lhs configure -f'-bibutils -network'
and then build and install with:
runhaskell Setup.lhs build
runhaskell Setup.lhs install
It is possible to pass the flags also too cabal-install.
Sorting Unicode strings containing non ASCII characters is not supported by the standard Haskell libraries and requires the installation of the ICU libraries, available here:
http://site.icu-project.org/
It is then necessary to install the Haskell bindings to these libraries. These bindings are available here:
http://hackage.haskell.org/package/text-icu
You then need to configure the citeproc-hs package with the appropriate cabal flag:
runhaskell Setup.lhs configure -funicode_collation
and then build as usual.
It is possible to pass the flags also too cabal-install. In this case the installation of the ICU libraries is the only prerequisite.
To use citeproc-hs with pandoc you need to install citeproc-hs-pandoc-filter, a drop-in replacement for pandoc-citeproc.
The you can run the filter with something like:
pandoc --filter citeproc-hs -s -S -f markdown -t html your_markdown_file.txt
Please refer to pandoc’s documentation for more information on inserting citations in pandoc’s documents and processing them with a filter.
Haddock documentation for the exported API is available on Hackage:
http://hackage.haskell.org/package/citeproc-hs
The MODS parser has been optimized for bibtex input, especially for parsing names with affixes , dropping and non-dropping particles.
Suffixes should come after the family name:
Brown, Jr., John W.
If a comma is needed before the suffix, an exclamation mark may be used:
Brown,! Jr., John W.
Non-dropping particles are placed before the family name:
von Hicks,! Jr., Michael
Dropping particles are placed after the given name:
la Martine,! III, Martin B. de
See also the CSL specification:
http://citationstyles.org/downloads/specification.html#name-particles
The MODS parser, which is used to read all bibliograhic databases supported by bibutils, tries to parse dates, including seasons (expressed in English). An example of supported formats:
2010-01-31 (January 31, 2010)
2004-05 (May, 2004)
2001 (the year only)
Summer, 2001 (the season)
If the DOI variable is prefixed by a doi:
like:
doi = {doi:10.1038/171737a0}
the processor will generate a link and produce this pandoc native representation:
Link [Str "10.1038/171737a0"] ("http://dx.doi.org/10.1038/171737a0", "10.1038/171737a0")
that produces a link like:
<a href="http://dx.doi.org/10.1038/171737a0">10.1038/171737a0</a>
To run the test suite, you first need to grab it with mercurial by running, from the root directory of the citeproc-hs source tree:
hg clone https://bitbucket.org/bdarcus/citeproc-test
You then need to grind human-readable test code into machine-readable form by running, in the citeproc-test directory, the following commands:
cd citeproc-test
./processor.py -g
cd ..
Then, from the root directory of citeproc-hs source tree, run:
runhaskell test/test.hs
You may also specify a test group:
runhaskell test/test.hs date
or a single test in a group:
runhaskell test/test.hs date IgnoreNonexistentSort
To increase the debug messages edit test/test.hs and increase the Int parameter of runTS:
runTS args 1 testDir
The CSL implementation is mostly but not entirely complete. Some of the missing features are meaningless in pandoc, the main target of citeproc-hs at the present time. Specifically the display attribute has not been implemented yet.
The citeproc-hs-0.3.9 release passes 586 out of 757 tests of the citeproc-test suite. The test-suite has been developed along with citeproc-js, and the failure of some of those tests is not meaningful for citeproc-hs.
The MODS parser may need some refinement.
To submit bug reports you can use the Google code bug tracking system available at the following address:
http://code.google.com/p/citeproc-hs/issues
Bruce D’Arcus, the man behind CSL, Rintze Zelle, one of the main CSL developer, and Frank Bennett, the citeproc-js author, have been very kind and provided ideas, comments and suggestions that made it easier coding citeproc-hs.
John MacFarlane, the author of pandoc, has been very supportive of the project and provided a lot of useful feed back, comments and suggestions.
Andrea Rossato
andrea.rossato at unitn.it
This software is released under a BSD-style license. See LICENSE for more details.
This is an early, “alpha” release. It carries no warranties of any kind.
Copyright © 2008–2012 Andrea Rossato
The JSON format is basically documented by citeproc implementations and is derived by the CSL scheme. More information can be read in the citeproc-js documentation:
http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#data-input↩
The Metadata Object Description Schema (MODS) is an XML format which is used by bibutils to interconvert many different bibliographic database formats, like Bibtex, Endnote, and others.↩