Software

Here I've begun collecting some software I've created or that I'm collaboratively developing and maintaining. This list rigorously includes only open source software. It is organised into broad categories to ease navigation:

Machine Learning
Genomics/Bioinformatics
Data Structures and Algorithms

Machine Learning

Recursive Neural Networks - Neural networks for general data structures

Recursive neural networks are a form of recurrent networks that is designed to process more generic data types than fixed n-dimensional vectors or sequences, e.g. n-ary trees, directed acyclic graphs, representing causal relationships among elements in a particular domain.

This software provides a common framework for learning with different data structure types, possibly containing cycles, thus extending the classical theory. It is a rewrite of the code I have implemented in support of my PhD and post-doc work while at the Università degli Studi di Firenze, University of California at Irvine and University College Dublin. It has been successfully applied in different problems related to structural bioinformatics, as described in many pulications.

GitHub

Genomics/Bioinformatcs

Bio::DB::HTS - High-throughput sequencing data handling in Perl

A Perl module providing an interface to the HTSlib library for reading/writing high-throughput sequencing data like indexed and unindexed SAM/BAM/CRAM sequence alignment databases, Tabix and variant data VCF/BCF files. It provides support for retrieving information on individual alignments, read pairs, and alignment coverage information across large regions. It also provides callback functionality for calling SNPs and performing other base-by-base functions.

GitHub MetaCPAN

Ensembl Core API

The goal of the Ensembl project is to automatically annotate available genomes, integrate these annotations with other available biological data and make all this publicly available via the web. The range of available data includes comparative genomics, variation and regulatory data.

The Ensembl core API provides a level of abstraction over the Ensembl Core databases and is used by the Ensembl web interface, pipeline, and gene-build systems. To external users, the API may be useful to automate the extraction of particular data, to customize Ensembl to fulfil a particular purpose, or to store additional data in Ensembl.

GitHub API Documentation

Ensembl REST API

A Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. It also provides bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.

GitHub API Documentation

Disulfinder

Disulfinder is the standalone version of DISULFIND, a popular server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone.

The application is available on Ubuntu and Debian.

Download package

Data Structures and Algorithms

libitree - An interval tree library in C

A simple C library for handling interval trees. An interval tree is a tree data structure that holds intervals, allowing the efficient retrieval of all intervals that overlap with any given interval or point.

I wrote this library while working at Ensembl on problems involving mapping between different genome assemblies, which typically involve intervals representing genomic regions.

GitHub

Tree::Interval::Fast - Efficient interval tree algorithms in Perl

This is a Perl XS wrapper around the above libitree library to provide a simple and fast implementation of mutable interval trees in Perl. I wrote the module since the Ensembl core Perl API needs C-like time/memory behaviour of this type of data structures and, to the best of my knowlege, currently there are no such modules available in Perl.

GitHub MetaCPAN

AVLTree - Efficient balanced binary trees in Perl

An AVL tree is self-balancing binary tree where all the basic operations take o(log n) time in the average and worst cases. Similarly to the above interval tree module, I wrote this as I've found there were no C-based module implementing AVL trees in Perl. This one in particular uses the Perl XS extension mechanism by providing a tiny wrapper around an efficient C library which does the core of the work. Preliminary benchmarking shows this module one order of magnitude faster than a pure perl implementation.

GitHub MetaCPAN