Welcome to the Home Page of GeNML!

GeNML is a lossless compression program specifically developed for nucleotide sequences. It targets maximum compression performance with fast sequence processing, especially at the side of the decoder. As of May, 2005, GeNML achieves the best published lossless compression rates for DNA sequences (see the Results page for a comparison with other encoders). GeNML has been designed and developed by Prof. Ioan Tabus and Gergely Korodi, who are currently with the Institute of Signal Processing, Tampere University of Technology, in Tampere, Finland. The Authors would also like to express their gratitude to Prof. Jorma Rissanen, whose inventions were pivotal for their studies and initiated their research of genomic signal compression.

GeNML is derived from an earlier project called NMLComp, developed by Prof. Ioan Tabus, Prof. Jorma Rissanen and Gergely Korodi. NMLComp is a low complexity DNA encoder with a minimal model, and it achieves fair compression results, comparable to encoders of much higher complexity. Its success is primarily attributed to the novel Normalized Maximum Likelihood model for discrete regression, that proves to be a very efficient tool of processing approximate sequence matches, which is vital for DNA compression.

The good results of NMLComp directed the attention of the Authors to investigate the potential behind the NML model by relieving the requirement of low model complexity. The published GeNML algorithm is a complex combination of different encoders with various models for each of them. This composite method has significantly improved the compression performance of NMLComp, while the processing speed has been kept at a comparable level. The introduction and discussion of the GeNML algorithm appeared in the Special Section of the January, 2005 issue of the ACM TOIS journal.

The GeNML Home Page.
Last modified: Sun May 15 05:41:00 EEST 2005