#!/usr/local/bin/perl -w # last modified 25 October 2000 =pod =head1 NAME isi2bibtex - convert ISI database files to BibTeX format =cut # COPYRIGHT INFORMATION # isi2bibtex version 0.40 # ISI SCI to BibTeX database format converter # Copyright (C) 2000 Jonathan Swinton, Ben Bolker, Anthony Stone, John J. Lee # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 2 of the License, or (at your option) # any later version. # This program is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for # more details. # You should have received a copy of the GNU General Public License along with # this program; if not, write to the Free Software Foundation, Inc., 59 Temple # Place, Suite 330, Boston, MA 02111-1307 US # The maintainer of this program, John J. Lee, may be reached by email at # phrxy@csv.warwick.ac.uk # A copy of the GNU General Public License is available from, for example, # http://www.eff.org/ =head1 SYNOPSIS B [B] B [B] If no output file is specified, inputfile.bib is used. Records are appended if the output file exists. =head1 DESCRIPTION Isi2bibtex converts an ISI (Institute for Scientific Information) bibliographic database file to a BibTeX file for use in TeX and LaTeX documents. Both formats hold bibliographic data on scientific and other academic documents. (In the UK, the ISI databases are commonly known as 'BIDS' or 'MIMAS WoS') Another way to do the same job isi2bibtex does is with bp, which has the advantage of converting between many different bibliographic formats and character sets. If you don't want that, isi2bibtex understands BIDS standard format in addition to the others, and is stand-alone and so presumably easier to get working. =head2 Options B<-h>, B<--help> display help and exit B<-v>, B<--version> display version information and exit B<-q>, B<--quiet> no informational output B<-a>, B<--abstract> include abstract in output file B<-c>, B<--check> make some checks on field contents (default) B<-n>, B<--nocheck> don't make checks on field contents =head2 Input databases Although isi2bibtex was written for SCI (Science Citation Index), all the ISI databases should work (SCI, SSCI, A&HCI, ISTP). Isi2bibtex will probably make a bad job of editing the content of these other databases, and would have to be changed a bit (not difficult), but you may be lucky. In the UK, probably most of the other databases on BIDS should work either straight away or with a small amount of modification of the script. BIDS Pascal for instance works with downloading format, and would work with a small amount of modification with standard format. =head2 Input formats If you use a web interface to ISI, isi2bibtex will only convert text output (whether emailed or saved directly), not saved web pages or other bibliography formats such as Procite or Reference Manager. Specifically, the formats that are understood are: ISI generic output format version 1.0: I presume this is the format used in most of the world. In the UK, this is output by MIMAS WoS 'save records' or 'email records'. BIDS standard format: (any of: Title only; Title, authors & journal; Full record excluding citations; Full record) BIDS downloading format: (any of: Title, authors & journal in downloading format; Full record in downloading format) You can mix and match record formats in one file. Deleting fields and / or record numbers from records should be okay. Any fields may be present in any order. Don't change the indentation of records: isi2bibtex will ignore records if they're too different from the usual layout due to ambiguity of field labels and field contents. =head2 Output fields For ISI generic format (eg MIMAS WoS output) and BIDS downloading format, fields other than title (TI), author (AU), source ie journal (SO), page range (BP, EP), year (PY) and abstract (AB) are ignored. For standard BIDS format, fields other than Title (TI:), author (AU:), journal (JN:) and abstract (AB:) are ignored (the JN: field contains the page numbers, volume, date, etc as well as the journal name). At the moment isi2bibtex only outputs the more useful fields, but this may change in the future (when someone gets round to it). Which fields are output can be controlled with a configuration file (see below). Those that don't correspond to the standard BibTeX fields (such as abstract) won't be recognised by BibTeX by default, but they'll be there if you need to use them. =head2 Output formatting Output is tidied up as much as possible, but some editing is still required. Output formatting defaults can be modified with a configuration file. /etc/isi2bibtexrc and ~/.isi2bibtexrc (or whatever you set in the @CONFIG variable in the script) are looked at in that order for configuration settings, with the latter overriding the former. See the example configuration file provided. If they are switched on, things like journal title abbreviations and acronym capitalisations can be added and removed at the end of the script (very easy to do by looking at what's there already). Newer ISI entries have lower case as well as upper, and isi2bibtex always leaves the capitalization as-is for those records. =cut # TODO # add field include configs (hash) (check which are standard BibTeX fields) # check record is an article, (DT: field in BIDS), and allow records for # books, proceedings, etc. # add a properly unique but not very human readable key option, for relatively # big databases # warn of duplicate keys # Rewrite to detect format of record first, then convert it, rather than # guessing each field. Would make it easy to add BIDS Pascal format etc. =head1 DOESN'T WORK? Remember to: =head2 Unix set the first line of this script to point to your copy of perl (eg. /usr/bin/perl or /usr/local/bin/perl) make it executable (eg. chmod +x ./isi2bibtex) put it somewhere your OS can find it (eg ~/bin or /usr/local/bin): you may need to change your PATH environment variable =head2 Microsoft Windows change the name of this script to isi2bibtex.pl (you'll have to use it as isi2bibtex.pl [OPTIONS] inputfile.txt outputfile.bib rather than isi2bibtex [OPTIONS] inputfile.txt outputfile.bib because the windows DOS shell doesn't know about the #! line) put it in your Perl bin directory, eg C:\Perl\bin (obviously if you don't have perl installed, you need to do that first: see below) if that doesn't work check the PATH environment variable contains your perl bin directory, and as a last resort try perl C:\Perl\bin\isi2bibtex.pl inputfile.txt =head2 Other platforms I have no idea, but it should work on any platform that runs Perl 5, perhaps with a few small modifications. Isi2bibtex has been tested on Windows (95 and NT) and Unix. Please send me any portability changes. Email if it still doesn't work. =head1 KNOWN PROBLEMS It only does articles, not books, proceedings etc, and won't notice if a record isn't an article. It ignores some fields (mostly those that don't correspond to the standard BibTeX 'article' fields). ISI access providers' output other than BIDS and MIMAS WOS haven't been tested. Send me an output file and I'll make it work with that format (for SCI). Please tell me about any bugs you find, at . =head1 SEE ALSO If you don't have Perl installed, it can be got (free) from http://www.perl.com/ . LaTeX, BibTeX and everything else TeX can be downloaded from http://www.CTAN.org/ . bp converts between many bibliography database formats (including conversion from BIDS downloading and ISI generic formats to BibTeX), and many character sets. Ben Bolker (author of the BIDS / MIMAS WOS specification for bp) has a page describing his modifications to bp: http://www.zoo.ufl.edu/bolker/bp.html Dana Jacobsen (author of bp) has a web page with lots of bibliography software information and details of his bp package: http://www.ecst.csuchico.edu/~jacobsd/bib/index.html Some other BibTeX utilities that may or may not be useful (these are just the ones I've got round to looking at): bibclean checks and formats BibTeX databases bibsort sorts BibTeX databases bib2dvi converts BibTeX databases to DVI files (DVI files are output by LaTeX and are DeVice Independent typeset documents, a bit like postscript or pdf -- you can read them on most computer systems) bibextract, citefind and citetags respectively extract BibTeX records from a BibTeX database, extract LaTeX references from a LaTeX document, and look up LaTeX references in a BibTeX database. bibindex makes an index for fast lookup by biblook, if you have a huge database that needs it I suppose BibTool is an all-singing all-dancing general purpose BibTeX management utility bibview is a simple interactive searching utility for BibTeX files findbib gets BibTeX records corresponding to references in a LaTeX file from a preprint server (don't know if it still works) both refer2bibtex and r2bib convert refer files (whatever they are) to BibTeX files tkbibtex is a graphical tool for editing and searching BibTeX databases Text::BibTeX is a Perl module for doing things to BibTeX databases. =head1 COPYRIGHT Copyright (C) 2000 Jonathan Swinton, Ben Bolker, Anthony Stone, John J. Lee Isi2bibtex replaces and is derived from bids.to.bibtex (as of 29 Jan 1998) and isi2bib 0.1. bids.to.bibtex was based on a perl script written by Jonathan Swinton, and subsequently modified by Ben Bolker and Anthony Stone. isi2bib 0.1 was written by John J. Lee This script is covered by the GPL. See the script for copyright information. =head1 FILES /etc/isi2bibtexrc, ~/.isi2bibtexrc (or whatever you set in the @CONFIG variable) are looked at in that order for default configuration settings, with the latter overriding the former. See the example configuration file provided. =head1 VERSION 0.40 =cut # HISTORY # # 0.3 first released version # 0.31 oops forgot to change version number, was labelled as 0.3 # mostly bug fixes # 0.32 mostly bug fixes, slightly better formatting # name changed to isi2bibtex from isi2bib because the bp converter # for ISI is also called isi2bib # 0.33 bugfix: now insists on indentation of input file being standard due # to ambiguity otherwise (misinterpretation of field contents as # field label) # 0.40 added configuration file and some command line switches; worked around # ISI SCI database sometimes having missing JI field; some error checking # for missing fields; bug fixes # 0.41 change of email address; tiny bug fixes use Getopt::Long; use Config; use FileHandle; use Text::Wrap; $VERSION = 0.40; $SCRIPT = 'isi2bibtex'; if ($Config{'osname'} !~ /win|mac/i) { @CONFIG = (glob("~/.isi2bibtexrc"),); } else { # put your configuration file name in quotes inside the brackets below: @CONFIG = (); } # Default options to alter output formatting # ------------------------------------------- #************************************ # NOTE: USE THE CONFIG FILE INSTEAD! #************************************ # can't 'use constant' as it isn't installed everywhere $AUTHORKEY = 1; # use key for record generated from author's # names and publication date rather than key # from input file and line number $HEADER = 1; # attach header information to output file # If you leave several email header sections # (or other non-record text) in the file, # header() will run several times. # case is only ever guessed for fields that are all in upper case: $TITLECASE = 1; # guess case of title $AUTHORCASE = 1; # " authors $FORMULACASE = 0; # " chemical formulas # and crystal planes (badly) $JOURNALCASE = 1; # " journal $SPECIAL = 1; # do some special case cases $JABBREV = 1; # do some journal abbreviations # MIMAS WOS only option $ISOTITLE = 1; # Use the ISO abbreviated title field (JI) # instead of SO, the full title. # If JI is missing, this will use J9 instead. # If you set this to 0 and use SO, the script # will abbreviate according to journalabbrev() # if $JABBREV is set. # fields to include $AUTHOR = 1; $TITLE = 1; $JOURNAL = 1; $YEAR = 1; $VOLUME = 1; $PAGES = 1; $ABSTRACT = 0; # NOT DONE YET: #$ISSUE = 0; #$MONTH = 0; #$ADDRESS = 0; #$REFERENCES = 0; #$DOCTYPE = 0; $LINELENGTH = 78; $INDENT = " "x8; $INDENTX = " "x2; # field indenting strings, like so: #@ARTICLE{bidstest64, # author = {Braun, J. and Bishop, G. G. and Ermakov, A. V. and # Goncharova, L. V. and Hinch, B. J.}, # title = {Adsorption of pf3 on cu(001): ordered overlayer # ... # note the spaces before the author field ($INDENT) plus the extra # spaces before the next line ($INDENTX) $INDENT2 = $INDENT.$INDENTX; # alternative to setting $INDENTX # following are for lining up your equals signs and / or '{'s. $INDENTB = " "x4; # indent before padding of field name $INDENTA = "= "; # " after " $ADASHES = 2; # join up long words that have been split at the end # of the line in abstract # 0: leave as-is # 1: cut space # 2: remove the dash as well # note this won't have an effect unless $ABSFORMAT # is set to 1 $TDASHES = 2; # same as $ADASHES for title $ABSFORMAT = 1; # if unset, leave the abstract exactly as-is, and # don't try to reformat it to fit your line length # - this is useful because there are no blank # lines to mark paragraph breaks in ISI format so # isi2bibtex can only guess where they are $ABSPARAS = 1; # guess paragraphs in abstract when reformatting $PARAGAP = 10; # how many spaces at end of line after end of # sentence before guessing this is a para end. # abstract indentation (only when $ABSFORMAT = 0) $ABSLENGTH = 63; # abstract field line length in the ISI database $FORLUCK = 5; $ABSINDENT = ' 'x($LINELENGTH - $ABSLENGTH - $FORLUCK); $CHECK = 1; $QUIET = 0; $USAGE = 'Usage: '.$SCRIPT.($Config{'osname'} =~ /win/i ? '.pl' : '' ). " [OPTIONS] inputfile [outputfile]\n". "isi2bibtex - convert ISI database files to BibTeX format\n\n". "OPTIONS:\n". " -h, --help\t\tdisplay this help and exit\n". " -v, --version\t\tdisplay version number and exit\n". " -q, --quiet\t\tno informational output\n". " -a, --abstract\tinclude abstract in output file\n". " -c, --check\t\tmake some checks on field contents (default)\n". " -n, --nocheck\t\tdon't make any checks on field contents\n\n". # " -, --\n". "Try `perldoc isi2bibtex' or 'man isi2bibtex' for more information.\n\n". 'Report bugs to .'."\n"; #************************************ # NOTE: USE THE CONFIG FILE INSTEAD! #************************************ # End of options to alter output formatting # ----------------------------------------- # You probably don't need to worry about what's below this point, other # than the lists of acronyms, abbreviations, etc. near the end of the # script. sub qwarn ($) { $warning = shift; warn $warning if not $QUIET; } for $config_file (@CONFIG) { read_config($config_file); } Getopt::Long::config('bundling', 'auto_abbrev'); @option_spec = ( "version|v", "help|h", "quiet|q", "abstract|a", "check|c!", ); GetOptions(\%options, @option_spec) or exit; #for (@option_spec) { # s/\|.*//; # if (not defined($options{$_})) { # $options{$_} = ''; # } #} if ($options{'version'}) { print STDERR "$SCRIPT version $VERSION\n\n". "Copyright (C) 2000 Jonathan Swinton, Ben Bolker, Anthony Stone, ". "John J. Lee\n". "This is free software; see the source for copying conditions. There ". "is NO\nwarranty; not even for MERCHANTABILITY or FITNESS FOR A ". "PARTICULAR PURPOSE.\n"; exit; } if ($options{'help'}) { print STDERR $USAGE; exit; } if (defined $options{'abstract'}) { $ABSTRACT = ($options{'abstract'} ? 1 : 0); } if (defined $options{'quiet'}) { $QUIET = ($options{'quiet'} ? 1 : 0); } if (defined $options{'check'}) { $CHECK = ($options{'check'} ? 1 : 0); } $date = gen_date('long', 0); $Text::Wrap::columns = $LINELENGTH; $line = ""; # current line of field data # actually, bad name: this ends up with a whole field in it # $_ is current line as usual # BIDS format field identifiers %name = ( 'TI'=>'title', 'AU'=>'author', 'NA'=>'address', 'JN'=>'journal', 'PY'=>'year', 'VO'=>'volume', 'NO'=>'issue', 'PG'=>'pages', 'AB'=>'abstract', 'KP'=>'keywords' #'CR'=>'', 'RF'=>'', 'PA'=>'' # don't know, don't care ); # The journal field includes volume, issue, pages and year in BIDS # standard format. # Following are the MIMAS WOS format field identifiers, which claims to be # ISI Generic Export Format version 1.0 at the moment. # Most are ignored by this script. # Can't believe anybody would wan't to use all of these, but here they are # anyway %mname = ( # 'FN'=>'filetype', # File type # 'VR'=>'version', # File format version number # 'PT'=>'pubtype', # Publication type # (eg. book, journal, book in series) 'AU'=>'author', 'TI'=>'title', # Article title 'SO'=>'sourcetitle', # eg. journal title, in full # 'LA'=>'language', # 'DT'=>'doctype', # eg. review, book, article # 'NR'=>'refcount', # 'SN'=>'ISSN', # 'PU'=>'publisher', # 'C1'=>'addresses', # Research addresses (of all authors) # 'DE'=>'authkeywords', # Author keywords # 'ID'=>'keywordsplus', # KeyWords Plus 'AB'=>'abstract', # 'CR'=>'citedrefs', # 'TC'=>'citetimes', # Times cited 'BP'=>'1stpage', 'EP'=>'lastpage', # 'PG'=>'pagecount', 'JI'=>'abbrtitle', # ISO source title abbr. # 'SE'=>'seriestitle', # Book series title # 'BS'=>'seriessub', # Book series subtitle 'PY'=>'year', # Publication year # 'PD'=>'pubdate', # Publication date eg. JUN 8 'VL'=>'volume', # 'IS'=>'issue', # 'PN'=>'partno', # Part number # 'SU'=>'supplement', # 'SI'=>'special', # Special issue # 'GA'=>'ISIno', # ISI document delivery number # 'PI'=>'pubcity', # Publisher city # 'WP'=>'pubURL', # Publisher web address # 'RP'=>'reprintaddr', # Reprint address # 'CP'=>'patent', # Cited patent 'J9'=>'titleabbr', # 29-character source title abbr. # 'PA'=>'pubaddr', # Publisher address # 'ER'=>'endrecord', ); # hash to store fields and also to remember where we are @record{'header', 'separator', 'key', 'title', 'author', 'journal', 'isojournal', 'j9journal', 'volume', 'year', 'issue', 'pages', 'keywords', 'abstract', 'other'} = ('')x14; for (keys %record) { $record{$_} = '' } # count of records in total and under the last header @recordcount{'total', 'header'} = (0, 0); $field = 'header'; $std = 0; # BIDS standard format flag $mimas = 0; # MIMAS WOS flag $temp = ''; # general purpose temp string $fileout = ''; # output filename $filein = ''; # input filename # If there's only one argument, "file", read that in and output # "file.bib". If there's two arguments, read in first as input, and # output to the second: if ($#ARGV == 1) { $fileout = $ARGV[1] } elsif ($#ARGV == 0) { $fileout = join ("",$ARGV[0],'.bib') } else { die $USAGE } $filein = $ARGV[0]; $temp = '>'; if (-e $fileout) { qwarn "isi2bibtex: file $fileout exists: appending records\n" if not $QUIET; $temp = '>>'; } open(IPTBIDS, $filein) or die "isi2bibtex: couldn't open $filein for input: $!\n"; open(OPTBIB, $temp.$fileout) or die "isi2bibtex: couldn't open $fileout for output: $!\n"; print STDERR "isi2bibtex: converting BIDS file $filein to BibTeX output ". "$fileout...\n" if not $QUIET; # read BIDS file and convert it convert(); # output BibTeX file print OPTBIB $output; close OPTBIB; # end of main program sub convert { # input loop for ISI/BIDS file my $temp = ''; while () { chomp; if (/^\s*Record - / or /^\s*$/) { # match ISI format record if (/^\s*$/ and $field eq 'header') { # we're still in the header, # and we want to keep this blank line # so don't match yet } else { end_field() if $field eq 'header'; if ($field ne 'separator') { end_record() if $field ne 'header'; $field = 'separator'; } else { # twiddle thumbs - in record separator } next; } } elsif (/(?: {6}|\(\d\) |\(\d\d\) |\(\d\d\d\) )([A-Z]{2}): (.*)/) { # match BIDS standard format field if (!$std and $field ne 'separator' and $field ne 'header') { # not a standard format record, don't match here } else { end_field(); $std = 1; $line = $2; $temp = $1; for ($line) { s/^\s+//; s/\s+$//; } if (defined($name{$temp})) { $field = $name{$temp}; } else { $field = 'other'; } next; } } elsif (/^([A-Z]{2})- /) { # match BIDS downloading format field end_field(); $temp = $1; if (defined($name{$temp})) { $field = $name{$temp}; $line = strip($_); # extract data } else { $field = 'other'; } next; } elsif (/^((?:[A-Z]{2})|J9) /) { # match MIMAS WOS format field end_field(); $mimas = 1; $temp = $1; if (defined($mname{$temp})) { $field = $mname{$temp}; $line = strip($_); # extract data mimas(); # map to output fields } else { $field = 'other'; } next; } else { # match mid-field line # for multiline headers eg Subject: line split over two lines /^\s/ ? ($singleline = 0) : ($singleline = 1); # cut whitespace s/^\s+//; s/\s+$//; # if we're not in a record and we didn't recognise it, it's a header if ($field eq 'separator' and $_ ne '') { $field = 'header'; next; } # keep header exactly as it is (other than leading and trailing space) # - and the abstract as well if required elsif ($field eq 'header') { if ($singleline) { $line = join("\n", $line, $_); } else { $line = join(" ", $line, $_); } } # make MIMAS author format look like BIDS downloading format elsif ($field eq 'author' and $mimas) { $line = join(';', $line, $_); } elsif ($field eq 'title') { $line = join(' ', $line, $_) unless title_dashes(); } elsif ($field eq 'abstract') { $line = join("\n".$ABSINDENT, $line, $_); } # join everything else with a space else { $line = join(' ', $line, $_) } } } } sub title_dashes { # cut dashes from title if required # abstract dashes are dealt with in abstract() - kludge if ( ($line =~ /\b-$/) and ($TDASHES and $field eq 'title') ) { if ($TDASHES == 2) { $line =~ s/\b-$// } $line = join('', $line, $_); } } sub mimas { # map MIMAS fields onto output fields # first page should be put in page field if ($field eq '1stpage') { $field = 'pages' } # last page no. should be appended to first if ($field eq 'lastpage') { $field = 'pages'; $line = $record{'pages'}.'-'.$line; }; # remember all forms of journal title, decide in end_record() which to use if ($field eq 'sourcetitle') { # full journal title $field = 'journal'; } if ($field eq 'abbrtitle') { # ISO abbr $field = 'isojournal'; } if ($field eq 'titleabbr') { # other abbr $field = 'j9journal'; } } sub strip { # get field data out of first line of BIDS downloading / ISI format field my ($string) = $_[0]; for ($string) { # chop off initial field identifier and whitespace s/^\s*(?:(?:[A-Z]{2})|(?:J9))-?\s*//; # chop off trailing whitespace s/\s+$//; } $string; } sub end_field { # stuff to do at end of each field # put the field we have built up in the appropriate part of %record for ($line) { s/^\s+//; s/\s+$//; } $record{$field} = $line; $line = ''; if ($field eq 'header') { header() if $HEADER; $recordcount{'header'} = 0; } } sub end_record { # stuff to do at end of each record # reached end of record, so must have just reached end of the last field end_field(); $recordcount{'total'}++; $recordcount{'header'}++; # do some editing # substitute one of the other journal title fields if ISO abbr missing if ($mimas and $ISOTITLE) { if ($record{'isojournal'} ne '') { $record{'journal'} = $record{'isojournal'}; } elsif ($record{'j9journal'} ne '') { $record{'journal'} = $record{'j9journal'}; } } makejournal(); makeauthor(); # more editing and output record to file key(); author() if $AUTHOR; title() if $TITLE; journal() if $JOURNAL; year() if $YEAR; volume() if $VOLUME; pages() if $PAGES; abstract() if $ABSTRACT; terminator(); # reinitialize variables $field = 'separator'; for (keys %record) { $record{$_} = '' } $std = 0; $mimas = 0; } sub makekey { # generate unique-ish key for BibTeX to refer to record by if ($AUTHORKEY) { # make unique-ish key out of first surname + first letter of subsequent # authors names, followed by last two digits of year # names already in key field from makeauthor() $record{'key'} =~ s/\s*//g; # append last two digits of year to flag $record{'key'} .= substr($record{'year'}, -2); $record{'key'} =~ s/ //g; } else { # make unique-ish key out of input filename (minus extension) with the # current source file line number appended # this is probably more likely to be unique than author key, but less # intelligible $record{'key'} = $fileout; $record{'key'} =~ s/\.bib//; $record{'key'} .= $.; } } sub check_field($) { # check field for sense my $field = shift; my $warn = ''; return 1 if not $CHECK; if ($field eq 'pages') { if ($record{$field} !~ /^[A-Z]?[ \d]{0,5}-?[A-Z]?[ \\&\d]{1,5}(?:\s*\(\d{1,3}\s+pages?\))?$/) { $warn = "$SCRIPT: warning: missing or unusual page number at "; $warn .= "line ${.}, record $recordcount{'total'}\n"; qwarn $warn; return 0; } else { return 1 } } @check_ignore{'abstract', 'year', 'volume'} = (1)x3; return 1 if $check_ignore{$field}; if ($record{$field} =~ /(?:^\s*$)|(?:^.{0,2}$)/) { $warn = "$SCRIPT: warning: missing or very short $field field at "; $warn .= "line ${.}, record $recordcount{'total'}\n"; qwarn $warn; return 0; } return 1; } sub makeauthor { # convert author field to BibTeX format my ($author,$surname,$firstnames,$authsep,$namesep); my @authors; # set author and name separators for appropriate format if (! $std) { $authsep = q/;/; $namesep = q/, /; } elsif ($std) { $authsep = q/, /; $namesep = q/_/; } @authors = split(/$authsep/, $record{'author'}); $record{'key'} = ''; foreach $author (@authors) { ($surname, $firstnames) = split(/$namesep/, $author); if ( not (defined($surname) and defined($firstnames)) ) { qwarn "$SCRIPT: badly formed author field at line $.\n"; $surname = $author; $firstnames = 'unknown'; } for ($firstnames) { s/(\w)/$1. /g; # add full stops to initials s/^\s+//; # cut whitespace s/\s+$//; # " } if ($AUTHORCASE and ($record{'author'} !~ /[a-z]/)) { $surname = initupper($surname); # capitalise $surname =~ s/^(Mac|Mc|O')([a-z])/$1\u$2/; # special cases } $author = "$surname, $firstnames"; # get surname of first author, and first letter of other authors' surnames if ($record{'key'} eq '') { $record{'key'} = $surname } else { $record{'key' } .= substr($surname,0,1); } } $record{'author'} = join (' and ', @authors); makekey(); # make BibTeX unique record key } sub makejournal { # convert journal field to BibTeX format if ($std) { # separate out BibTeX fields from the BIDS journal field $record{'journal'} =~ /(.*),\s*(\d{4}),\s*Vol\.\s*(.+?),.*p{1,2}\.\s*(.*)/; $record{'journal'} = $1; $record{'year'} = $2; $record{'volume'} = $3; $record{'pages'} = $4; for (keys %record) { $record{$_} = '' unless defined($record{$_}); } } # set the case for the journal if it's all in upper caps if ($JOURNALCASE and ($record{'journal'} !~ /[a-z]/)) { $record{'journal'} = initupper($record{'journal'}); } $record{'journal'} = journalabbrev($record{'journal'}) if ($JABBREV and ! ($ISOTITLE and $mimas)); } sub key { # output record key my $fld = "\@ARTICLE"; check_field('key'); $output .= pastewrap("{$record{'key'}", "", $fld, $INDENT2); } sub author { # output author field my $fld = $INDENT.'author'.$INDENTB.' '.$INDENTA; $record{'author'} = texsafety($record{'author'}); check_field('author'); $output .= ",\n".pastewrap("{$record{'author'}}", "", $fld, $INDENT2); } sub title { my $fld = $INDENT.'title'.$INDENTB.' '.$INDENTA; # set the case for the title if it's all in upper caps if ($TITLECASE and ($record{'title'} !~ /[a-z]/)) { $record{'title'} = firstupper($record{'title'}); $record{'title'} = formulas($record{'title'}) if $FORMULACASE; $record{'title'} = special($record{'title'}) if $SPECIAL; } $record{'title'} = texsafety($record{'title'}); check_field('title'); $output .= ",\n".pastewrap("{$record{'title'}}", "", $fld, $INDENT2); } sub journal { # output journal field to file my $fld = $INDENT.'journal'.$INDENTB.' '.$INDENTA; $record{'journal'} = texsafety($record{'journal'}); check_field('journal'); $output .= ",\n".pastewrap("{$record{'journal'}}", "", $fld, $INDENT2); } sub year { # output year field to file my $fld = $INDENT.'year'.$INDENTB.' '.$INDENTA; $record{'year'} = texsafety($record{'year'}); check_field('year'); $output .= ",\n".pastewrap("{$record{'year'}}", "", $fld, $INDENT2); } sub volume { # output volume field to file my $fld = $INDENT.'volume'.$INDENTB.' '.$INDENTA; $record{'volume'} = texsafety($record{'volume'}); check_field('volume'); $output .= ",\n".pastewrap("{$record{'volume'}}", "", $fld, $INDENT2); } sub pages { # output pages field to file my $fld = $INDENT.'pages'.$INDENTB.' '.$INDENTA; $record{'pages'} = texsafety($record{'pages'}); check_field('pages'); $output .= ",\n".pastewrap("{$record{'pages'}}", "", $fld, $INDENT2); } sub abstract { # output abstract field to file my $fld = $INDENT.'abstract'.$INDENTB.''.$INDENTA; my $abs = ''; $record{'abstract'} = texsafety($record{'abstract'}); if ($ABSFORMAT) { $record{'abstract'} =~ s/^\s+//mg; # unindent # stick abstract together, guessing paragraph positions if required for (split(/\n/, $record{'abstract'})) { s/^\s+//; s/\s+$//; if ( /[.?!]$/ and length() < ($ABSLENGTH - $PARAGAP) and $ABSPARAS ) { $abs .= $_."\n" } else { $abs .= $_.' ' } } $abs =~ s/ $//; # cut off final space $abs = ",\n".wrap($fld, $INDENT2, '{'.$abs.'}'); for ($abs) { s/\b- \b/-/ if ($ADASHES == 1); s/\b- \b// if ($ADASHES == 2); } } else { if ($record{'abstract'} eq '') { $abs = ",\n".$fld."{}"; } else { $abs = ",\n".$fld."{\n"; $abs .= $ABSINDENT.$record{'abstract'}."}"; } } check_field('abstract'); $output .= $abs; } sub terminator { # add record terminator to file $output .= "\n}\n\n"; } sub header { # add informational header to BibTeX output file # modify this to insert whatever comments are helpful to you # see flags at top of script to turn header off my $temp; $temp = "This file was automatically generated from entries from the ISI\n" ."(Institute for Scientific Information) databases of scientific and\n" ."other academic documents, using isi2bibtex version $VERSION, a perl\n" ."script which converts ISI or BIDS format files to BibTeX format files\n" ."for inclusion in documents typeset using the LaTeX document processor."; $output .= pastewrap($temp)."\n\n"; $temp = "Try perldoc isi2bibtex for instructions, or read the script."; $output .= pastewrap($temp)."\n\n"; # output the whole header #$output .= $record{'header'}."\n\n"; # get subject line from header $temp = "This file generated on $date, from file '$filein', which has ".( ($record{'header'} =~ /Subject:\s*(.*)/) ? "the subject line '$1'." : "no subject line. " ); $output .= pastewrap($temp)."\n\n"; } sub initupper { my $string = $_[0]; # capitalise initial letter of every word, lower-case the rest $string =~ s/(\w+)/\u\L$1/g; # if you have words like "don't" in your references, try this # (pinched from the perlfaq): # $string =~ s/ ( # (^\w+) # at the beginning of the line # | # or # (\s\w+) # preceded by whitespace # ) # /\u\L$1/xg; # $string =~ /([\w']+)/\u\L$1/g; $string; } sub firstupper { my $string = $_[0]; # capitalise initial letter of every sentence, lower-case the rest $string =~ s/\b(\w)(.*)/\U$1\E\L$2/; $string =~ s/([.?!]\s+)(\l\w)/$1\u$2/g; $string; } sub lowercase { my $string = $_[0]; # decapitalise everything $string =~ s/(.*)/\L$1/; $string; } sub lowertrivial { my $string = $_[0]; # decapitalise short words for ($string) { # s/\bA\b/a/g; # causes trouble eg. Phys. Rev. A s/\bAn\b/an/g; s/\bAnd\b/and/g; s/\bThe\b/the/g; s/\bOf\b/of/g; s/\bTo\b/to/g; s/\bFrom\b/from/g; s/\bIn\b/in/g; s/\bWith\b/with/g; } $string; } sub journalabbrev { my $journal = $_[0]; # substitute journal abbreviations # for MIMAS you can just use the pre-abbreviated title field (see options # at top of script) # obviously, put stuff in here that makes sense for you for ($journal) { s/\bjournal\b/J./gi; s/\b(chemical|chemistry)\b/Chem./gi; s/\b(physics|physical)\b/Phys./gi; s/\bsociety\b/Soc./gi; s/\bcommunications\b/Comm./gi; s/\btransactions\b/Trans./gi; s/\breviews\b/Rev./gi; s/-chemical\b/ Chem./gi; s/-faraday\b/ Faraday/gi; s/\bdiscussions\b/Disc./gi; s/\bamerican\b/Amer./gi; s/\bapplied\b/Appl./gi; s/\bresearch\b/Res./gi; s/\bcrystallograph[a-z]+\b/Crystallog./gi; s/\bletters\b/Lett./gi; s/\b(surface|surfaces)\b/Surf./gi; s/\b(science|sciences)\b/Sci./gi; s/^Sci\.$/Science/; s/\bphilosoph\l\w+\b/Philos./gi; s/\bengineer\l\w+\b/Engin./gi; s/\bphenomena\b/Phenom./gi; s/\bspectroscop\l\w+\b/Spectrosc./gi; s/\bproceedings\b/Proc./gi; s/\bnational\b/Nat./gi; s/\bacademy\b/Acad./gi; s/\broyal\b/Roy./gi; s/\bopinion\b/Opin./gi; s/\b(material|materials)\b/Mater./gi; s/\bcondensed\b/Cond./gi; s/\bmolec\l\w+\b/Mol./gi; s/\bstructur\l\w+\b/Struct./gi; s/\bmatter\b/Matt./gi; s/\binternational\b/Int./gi; s/\bbulletin\b/Bull./gi; s/\bannual\b/Ann./gi; s/\bcatalysis\b/Catal./gi; s/\breview\b/Rev./gi; s/\btechnolog\l\w+\b/Technol./gi; s/\bprogress\b/Prog./gi; s/\bscientific\b/Sci./gi; s/\binstrument\l\w+\b/Instrum./gi; s/\bvacuum\b/Vac./gi; s/^Vac\.$/Vacuum/; # you may or may not want to remove these words from journal titles # s/\ba\b//g; # s/\ban\b//gi; # s/\band\b//gi; s/\bthe\b//gi; s/\bof\b//gi; # s/\bto\b//gi; # s/\bfrom\b//gi; # s/\bin\b//gi; # s/\bwith\b//gi; $journal = lowertrivial($journal); s/\s+/ /g; } $journal } sub special { my $string = $_[0]; # set case of some special words # put your own words here # note that the upper-case replacement is enclosed in braces so that BibTeX # doesn't put it back into lower-case for ($string) { s/\bscf\b/{SCF}/g; s/\bmc(-*)scf\b/{MC$1SCF}/g; s/\bci\b/{CI}/g; s/\b([0-9]-[0-9]+)g\b/{$1G}/g; s/\bdna\b/{DNA}/g; s/\bvanderwaals\b/{Van} der {Waals}/g; s/\bmoller-plesset\b/{M}{\\o}ller--{Plesset}/g; # s/\b\b/{}/g; } $string; } sub formulas { my $string = $_[0]; # attempt to set case of chemical formulas $string =~ s/(\b\l[a-z]+[1-9]+[a-z0-9]+\b)/{\U$1}/gi; # attempt to set crystal planes eg Al(111) $string =~ s/\b(\l[a-z]{1,2}\(\d\d\d\))/{\u$1}/gi; $string; } sub texsafety { my $string = $_[0]; # escape TeX special characters # ie. replace % and & with \% and \&. $string =~ s:\&:\\&:g; $string =~ s:\%:\\%:g; $string; } sub pastewrap { my @strings = @_; # paste together lines separated by newlines then wrap text # parameters: string, final string, initial tab, tab for ($strings[0]) { s/^\s+//; s/\s+$//; s/\n/ /mg; } foreach $i (1,2,3) { if (! defined $strings[$i]) { $strings[$i] = '' } }; $strings[0] .= $strings[1]; # add final newlines or whatever wrap($strings[2], $strings[3], $strings[0]); } sub gen_date ($$) { # generate current date string # 1st parameter is format: # 'long': Monday 23rd December 2000 # 'short': 2000-12-23 # 2nd parameter is time flag my ($format, $time) = @_; my ($day_name, $month_name); my @td = localtime(time); my ($min, $hour, $day, $month, $year, $weekday) = @td[1..6]; $year += 1900; if ($time) { $time = "${hour}:${min}" } if ($day == 1) { $ord = 'st' } elsif ($day == 2) { $ord = 'nd' } elsif ($day == 3) { $ord = 'rd' } else { $ord = 'th' } $day_name = ( Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday )[$weekday]; $month_name = ( January, February, March, April, May, June, July, August, September, October, November, December )[$month]; $year = $td[5] + 1900; if ($format eq 'long') { $date = "$day_name $day$ord $month_name $year"; } elsif ($format eq 'short') { $date = "${year}-${month}-${day}"; } if ($time) { $date .= $time }; $date; } sub read_config ($) { my $file = shift; # read configuration file return if not ($fh = new FileHandle($file)); while(<$fh>) { next if /^#/; next if /^\s*$/; SWITCH: { /^\s*authorkey\s*=\s*(.*)\s*$/ and $AUTHORKEY = $1, last; /^\s*header\s*=\s*(.*)\s*$/ and $HEADER = $1, last; /^\s*titlecase\s*=\s*(.*)\s*$/ and $TITLECASE = $1, last; /^\s*authorcase\s*=\s*(.*)\s*$/ and $AUTHORCASE = $1, last; /^\s*journalcase\s*=\s*(.*)\s*$/ and $JOURNALCASE = $1, last; /^\s*specialcase\s*=\s*(.*)\s*$/ and $SPECIAL = $1, last; /^\s*formulacase\s*=\s*(.*)\s*$/ and $FORMULACASE = $1, last; /^\s*jabbrev\s*=\s*(.*)\s*$/ and $JABBREV = $1, last; /^\s*isotitle\s*=\s*(.*)\s*$/ and $ISOTITLE = $1, last; /^\s*author\s*=\s*(.*)\s*$/ and $AUTHOR = $1, last; /^\s*title\s*=\s*(.*)\s*$/ and $TITLE = $1, last; /^\s*journal\s*=\s*(.*)\s*$/ and $JOURNAL = $1, last; /^\s*year\s*=\s*(.*)\s*$/ and $YEAR = $1, last; /^\s*volume\s*=\s*(.*)\s*$/ and $VOLUME = $1, last; /^\s*pages\s*=\s*(.*)\s*$/ and $PAGES = $1, last; /^\s*abstract\s*=\s*(.*)\s*$/ and $ABSTRACT = $1, last; /^\s*linelength\s*=\s*(.*)\s*$/ and $LINELENGTH = $1, last; /^\s*indent\s*=\s*"(.*)"\s*$/ and $INDENT = $1, last; /^\s*indentx\s*=\s*"(.*)"\s*$/ and $INDENTX = $1, last; /^\s*indentb\s*=\s*"(.*)"\s*$/ and $INDENTB = $1, last; /^\s*indenta\s*=\s*"(.*)"\s*$/ and $INDENTA = $1, last; /^\s*adashes\s*=\s*(.*)\s*$/ and $ADASHES = $1, last; /^\s*tdashes\s*=\s*(.*)\s*$/ and $TDASHES = $1, last; /^\s*absformat\s*=\s*(.*)\s*$/ and $ABSFORMAT = $1, last; /^\s*absparas\s*=\s*(.*)\s*$/ and $ABSPARAS = $1, last; /^\s*paragap\s*=\s*(.*)\s*$/ and $PARAGAP = $1, last; /^\s*abslength\s*=\s*(.*)\s*$/ and $ABSLENGTH = $1, last; /^\s*quiet\s*=\s*(.*)\s*$/ and $QUIET = $1, last; /^\s*check\s*=\s*(.*)\s*$/ and $CHECK = $1, last; # /^\s*\s*=\s*(.*)\s*$/ and $ = $1, last; die "$SCRIPT: syntax error in config file at line $.\n"; } } }