:::: MENU ::::
Browsing posts in: Little scripts for formatting

Zotero csl style for Protist

Like most other citation managers Zotero doesn’t have a style for the algae journal. For a while now I’ve been wanting to create some for J.Phyc and Protist and the like. I just never got around to it and ended up manually editing the bibliography created with an already available style. This is of course quite tedious and borring. So today I decided to spend a couple of hours and create a Protist citation style for Zotero. It really wasn’t that hard to use their visual style editor. Its all gui and the automatic update of the modifications I was making was really helpful.

If any Zotero users outhere ever need to format something for Protist I’ll keep the style available here. Just download it, save it as a “protist.csl” and install it as usual. I anticipate the style code will need updates – I already noticed something that doesn’t look quite right. But as it is now its almost identical to the format of some Protist papers from 2013.

Creating a partition file based on secondary structure alignment in WUSS format

Recently I’ve been using Erik Nawrocki’s nice program SSU-ALIGN to line up a bunch of ribosomal sequences to covariance models of secondary structure of ribosomal RNA. The program is really well documented, fast, and creates very informative output. The consensus secondary structure of the aligned rRNA sequences is captured in WUSS format which looks something like this:


The various brackets represent paired sites (in stems), whereas dashes, dots, underscores unpaired sites (in loops) in the molecule. Different types of brackets denote different types of intramolecular pairing between nucleotides (more here). Since I wanted to partition the rRNA data based on secondary structure (stems vs. loops) I needed to parse this output. My really simplistic way of doing this in an interactive terminal session follows.

cat SS_cons | tr “[]{}()<>” “1” | tr “:.,~_” “2” | sed ‘s/-/2/g’ | sed ‘s/\(.\)/\1 \n/g’ | awk ‘{print NR “\t” “c”$0}’ > out

In this rather long oneliner I am replacing all the characters denoting paired sites with ”1” (first tr command) and all characters denoting unpaired sites with ”2” (second tr command). The first sed command is for the dashes (which for some reason are not replaced by tr). The second sed command I ripped off the internet and is to print a newline character after each character in the string. The awk command is to prepend a line number to each character and a ”c” before the number for partition assignment (I needed this for easier grep in the next step). Finally, I am redirecting output to a file. The first six lines of output from this messy oneliner looks like this:

1 c1

2 c1

3 c1

4 c1

5 c1

6 c2


So for each character in the alignment we have the position (first column) and the partition prefaced with a ”c” (second column). Now to find all the character positions that need to go in the paired partition I did:

grep “c1” out | cut -f 1 | tr “\n” “,” > paired_sites

In this command I grep all lines that have c1 (meaning first partition), then cut only the first column (discarding c1) and finally replacing the newline characters with a comma (tr command). The output is redirected to a file and ready paste in a RAxML-type partition file. Change the comma to a space for a Nexus-type partition. Rinse and repeat the last command for the characters in the second partition (unpaired sites).

And that’s it. I am sure there is a much simpler way of doing this. But it worked for me and I easily moved on to the more interesting part of my work. Building the phylogeny.