Tab Separated Values (TSV): a format for tabular data exchange

TSV is a very simple textual data format which allows tabular data to be exhanged between applications that use different internal data formats. This document briefly explains this standardized format and gives simple examples of using it.


The TSV format

The Tab Separated Values has been officially registered as an Internet media type (MIME type) under the name text/tab-separated-values. Note that major media type text means that the data is in textual format which can be viewed and edited using a normal text editor; in practice however this is seldom done - instead, the format is programmatically written and read.

The abbreviation TSV (or tsv) is commonly used, and usually the file name suffix for TSV files is ".tsv".

The registry entry is very short, since the format is very simple, but the description is paraphrased here in plain English (as opposite to the partly formalized notation in the registration):

The last rule could be taken just as allowing the labeling of columns that way. Mostly if you wish to include just tabular data as such, it probably works OK. But it's usually recommendable to use some labeling there.

Since TAB is used as a separator between fields, a field cannot contain a TAB. However, TABs usually don't appear in data items that you wish to tabulate, so this is seldom a restriction. There are various other formats which are very similar to TSV but use a different separator, such as Comma Separated Values (CSV). Note that commas, spaces, and other characters often used as separators in such formats appear rather often in data to be tabulated, at least in header fields. Special conventions can be made to allow the inclusion of the separator into data fields, but the extreme simplicity of the data format is then lost.

The example given in the registration is the following, where <TAB> denotes a TAB character:

Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St

Not to be shown as is!

Although TSV is a text format, a TSV file is not expected to appear in a nice tabular format when displayed or printed as such, e.g. by an editor. In special cases it might do that, if tab stops are set suitably in the environment and if the fields in a column have roughly the same width. But even the simple example probably fails when embedded into an HTML document (inside a PRE element, which is the best shot):

Name	Age	Address
Paul	23	1115 W Franklin
Bessy the Cow	5	Big Farm Way
Zeke	45	W Main St

Using TABs in HTML is unreliable, but more fundamentally, tabbing with TAB is in itself not structural. It just takes to some predefined tab stop; typically the tab stops are set at 1st, 9th, 17th, etc. character position. This means that the data would look OK by accident only. And when using TSV, you are not supposed to try to "fix" things by using multiple TABs; that would mean that there are empty fields between the TABs, defeating the very idea of TSV.

What happens if you ask your Web browser to show a TSV file, e.g. by following a link to a file containing our example? It really depends on many things; it's a matter of handling media types, which can be a bit confusing. It most essentially depends on the settings of your browser, but also on how the server sends it. But in any case, you as a user have at least as one option the possibility of saving the file onto local disk, to be processed separately using some suitable program (e.g. MS Excel). So TSV is one way of making tabular data available on a network in a universally accessible way, though it may require some skills on the user side to know what could be done with the data and do something useful with it.

Importing TSV data to MS Excel

We could give a TSV file as input to a spreadsheet application, like MS Excel. Such programs generally accept TSV format data. The data can then be viewed, printed, and manipulated in tabular format. And you can save the data in MS Excel's "native" format (which cannot be processed by other programs unless they have been specifically written to process it) or in TSV format, or in a few other formats. Our simple example data would get displayed roughly as follows:

The data appears in columns labeled A, B, C, and
rows labeled 1, 2, 3, 4 in framed cells. The cells
that contain numbers only are right-aligned, other cells
left-aligned. Cell (A,3) appears as truncated to "Bessy the".

Cells with long content may look truncated. This is Excel's feature and often useful when manipulating spreadsheets. The presentation, both on screen and on paper, can be tuned as needed, using various tools in Excel.

When you have Excel in use, you can open the the File menu, then select Open, and pick up the file to be opened. For finding the file, you may need to set value of the "Files of type:" setting (which is effectively a file type filter) to "All files (*.*)". (The details may vary depending on the version and configuration of the program.)

But you can make things easier by setting things up so that your system (we're assuming some flavor of Windows here) automatically recognizes .tsv files as something to be opened in Excel, when you click on their icons. Assuming there is no such association for .tsv in your system yet, you'll be prompted for one when you first click on the icon of a .tsv file. When prompted for "Description of '.tsv' files:", you could reply e.g. "text/tab-separated-values". Then pick up Excel from the list of programs and click OK.

When TSV data is imported into Excel, some data transformation and interpretation may take place. For example, some strings might be interpreted as dates and converted to a different date format, and Excel generally treats a string of digits as a number and right-justifies it. In such cases, "opening by clicking" could be inadequate, since it might not give you the option of specifying how the data is to be interpreted. Explicitly opening the file in Excel normally lets you use "Text import wizard" where you can specify the interpretation of each column.

Generating TSV data

For example, if you have a program of your own (say, a Fortran or C program, or a Perl script) that writes out some tabular data, and you'd like to be able to open (or let others open) the data in Excel, the TSV approach is rather simple. You just need to know how to write data in TSV format, which is a lot easier than trying to generate MS Excel format. Usually you can just use TAB characters in a string used to format the output, but see next paragraph as regards to editors; in C and Perl for example, you can alternatively use the notation \t inside a character constant or string constant.

And, of course, you could type TSV data "by hand", using a text editor. Note that typing the TAB character may require some extra trick, since many editors process it as command-like, instead of entering it into data as a character. For example, in some modes Emacs turns TABs to sequences of spaces, which won't do in TSV; to prevent this, type control-Q before hitting the TAB key.

As an example, here is Fortran 77 code that writes a matrix is TSV format:

      real a(m,n)
... code that writes data to matrix a ... 
      do 200 i=1,m
 200     print 210, (a(i,j),j=1,n)
 210     format(F10.6,4('	',F10.6))

The code is not particularly elegant, since the number of columns is hard-wired into the format statement, which causes the first number to printed as such and 4 more with a tab character before each of them. Between the apostrophes (single quotation marks), there is exactly one TAB character and nothing more; the appearance on your browser may vary.

Exporting data in TSV format from Excel

When you save a spreadsheet in Excel (using Save in the File menu, or using control-S), it normally uses by default its own internal data format. You can override this by using Save As... (in the File menu) instead and selecting the desired format. Among the available formats, you can pick up "Text (Tab delimited) (*.txt)", which means TSV. (Here, too, you may encounter some variation between versions of Excel.) Note that the default file name suffix will be .txt when; you may wish to override this, reserving .txt for plain text files.

Saving as TSV generally loses formatting information. Quite often this is desirable, since if you e.g. need to E-mail some data or put a table onto the Web, information about the specific font faces, cell widths, and other presentational details are not needed and can cause problems. Note that Excel's own "Save As HTML" includes a horrendous amount of such data. There is software called Office 2000 HTML Filter 2.0 available from Microsoft for free, but it fixes only part of the problem (for Excel files - for Word files, it's pretty good).

You could use this possibility for the job of making data available on the Internet, or in an intranet. If you have the data in Excel format, it will be less universally accessible than e.g. TSV format. Although there are Excel viewers available free of charge for Windows, not everyone wants to install such software just to casually view some tabular data, and not everyone uses Windows at all. And even if people have Excel installed, they might be unwilling to open documents from unknown origin, in fear of macro viruses for example. So TSV might be a good idea at least as an alternative. You could always make the data available both in Excel format and as TSV, especially since saving as TSV may remove some formatting information.

Converting from TSV format to HTML table format

For putting data onto the Web in particular, you could additionally convert the data from TSV format into an HTML table and make it a separate HTML document, or insert it into an HTML document, e.g. as follows:

Paul231115 W Franklin
Bessy the Cow5Big Farm Way
Zeke45W Main St

Since TSV is a very simple format, and since the basic structure of HTML tables is very simple, such a conversion is rather trivial as soon as you have any programming tool. At the simplest, you could simply add the <table> tag before the data and the </table> tag after it, and insert <tr><td> at the start of each data line, and replace each tab with a <td>. That way, each line would be turned to a table row and each field to a table cell. However, this would misrepresent the first line, which is a line of header fields, at least by TSV definition, so it would be better to use th rather than td there. There are some additional considerations which make a somewhat more advanced strategy better. The following Perl code (also available as a separate file) tries to produce nice HTML markup and uses very simple heuristics: it assumes that any field which contains only digits and spaces should be right-aligned in a cell.

print <<END;
<table border="1" cellspacing="0" cellpadding="4">
chomp ($_ = <STDIN>);
print "$_</th></tr>\n";
print "</tr>\n</thead>\n<tbody>\n";
while (<>) {
    print "<tr>\n";
    my @fields = split('\t');
    for $cell(@fields) {
	if($cell =~ /^(\d|\s)+$/) {
	    print "<td align=\"right\">$cell</td>"; }
	else {
	    print "<td>$cell</td>"; } }
    print "</tr>\n"; }
print "</tr>\n</tbody>\n</table>\n";

The code creates just the table markup, so the result can be simply inserted into an existing HTML document. The example table above was created that way.

You don't need to know Perl in order to use that program. (Of course you'd need some understanding of Perl in order to modify the program for your needs. But you might also consider editing the result using an editor, if you just wish to e.g. add an align="left" attribute into the first <tr> tag, to make the header cells left-aligned.) But you need a Perl interpreter, and you know how to run it. (Typically, you would, after saving the program under the name in a file, and having TSV data in a file tsv.tsv, go to command prompt level and just give a command like perl <tsv.tsv >tsvex.html to get the converted data into file tsv.html.) There are free Perl interpreters available for virtually every platform. For Win32, many people recommend ActivePerl.

Date of creation: 2000-09-01. Last update: 2005-02-12.

Jukka Korpela.