File input (or “upload”) in HTML forms

A form in an HTML document (Web page) can contain an input element with type="file". This may let the user include one or more files into the form submission. The form is often processed so that such files are stored onto the disk of the Web server; this is why file input (or file submission) is often called “file upload.” File input opens interesting possibilities, but browser support is still limited and generally of poor quality even in newest versions. Moreover, users are often puzzled with it, since most people use file input rather rarely.

This is a legacy document, with many references to outdated browser versions. It does not cover features such as the File API or the new features in file upload in HTML5.

This document presents

The basics

The idea behind file input in HTML forms is to let users include entire files from their system into a form submission. The files could be text files, image files, or other data. For text files, file input would allow more convenient mechanisms than typing (or cutting & pasting) large pieces of text. For binary data, such as images, file input would be not just more convenient but usually the only practical way. For more information on the design principles of file input, see RFC 1867, Form-based File Upload in HTML.

Writing an HTML form with a file input field is rather simple. The difficult thing is actually to find or write a server-side script which can do something useful when it receives data in such a format. And the really difficult thing is to make such processing robust and controlled so that all data is processed properly and so that someone won’t e.g. fill your server’s disk space with gigabytes of junk, by ignorance or by malevolence.

You need to know the general basics of writing HTML forms; if you need links to tutorials and references on forms, consult How to write HTML forms. Then, what you need to do in HTML is to write a form so that

Minimally, the form needs to contain a a submit element too. It may also contain any other fields you like, and explanatory texts, images, etc.

A common problem with file input in forms is that form data gets sent but only the name of the file is included. The reason is typically that the form element does not contain the attributes mentioned above.

Since browser support to file input is still problematic, consider providing alternative methods of submitting data, too.

It is hopefully evident that what happens in file input is the submission of a copy of the file content. The file on the user’s disk remains intact, and the server-side script cannot change it, only the copy of the data.

Setting up a server-side script

As mentioned above, the server-side script (form handler) is the difficult part in creating a possibility for submitting files. There are useful brief notes on that in the FAQ entry, but it is a difficult programming issue, and outside the scope of this document of mine. I just wish to emphasize—in addition to security issues discussed below - that what happens to the data after submission is at the hands of the server-side script. It could “upload” it, i.e. save onto the server’s disk under some name, but it might just as well process the data only by extracting some information from it, or send the data by E-mail somewhere, or even send it to a printer. For example, the WDG HTML Validator provides, as one alternative, a page containing a form for submitting a file to validation.

There are different server-side techniques for processing forms, so you need to consult documentation applicable to the technique you use, which is usually dictated by the characteristics of the server software. In particular, if you use CGI, it can be useful to check section Programs and Scripts: Perl: File Uploading in CGI Resource Index. (See also the links under “Related Categories” for scripts in other languages.) You might find a script suitable for your purposes, or at least ideas for writing your own script. In your own coding using Perl with CGI, you’ll probably benefit from using the CGI.pm module; see especially section Creating a file upload field in its documentation, and my Fool’s Guide to CGI.pm. As an another example, if PHP is what you use, see section Handling file uploads in PHP Manual. For ASP, see e.g. Pure ASP File Upload by Jacob Gilley.

Example

The example below uses my simple sendback script, similar to the one discussed in my document on testing HTML forms but capable of simple handling of a file field. It simply echoes back the data it gets, but presented so that your browser will display it nicely; for a file field, only 40 first octets (byes) are shown.

The HTML markup is:

<form action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/echo.cgi"
enctype="multipart/form-data" method="post">
<p>
Type some text (if you like):<br>
<input type="text" name="textline" size="30">
</p>
<p>
Please specify a file, or a set of files:<br>
<input type="file" name="datafile" size="40">
</p>
<div>
<input type="submit" value="Send">
</div>
</form>

And on your browser, with its current settings, and as possibly affected by my stylesheet, this is what the form looks like

Type some text (optionally):

Please specify a file, or a set of files:

How it was intended to work

RFC 1867 describes, in section 3 Suggested implementation, how file input was intended to take place in a typical situation:

3.1 Display of FILE widget

When a[n] INPUT tag of type FILE is encountered, the browser might show a display of (previously selected) file names, and a “Browse” button or selection method. Selecting the “Browse” button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited.

If an ACCEPT attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform.

Upon form submit, the contents of the files would then be included into the data set sent, as defined by the specification of the multipart/form-data data type (data format, data encoding).

Browser support to file input

Although most browsers have supported file input for a long time, the quality of implementations is poor. Therefore users easily get confused with file input.

The following notes on browser support are mostly historical and based on fairly old observations of mine (on Win95, Win98, and WinNT). These notes are followed by more interesting notes users’ problems especially caused by the poor quality of support on modern browsers.

Internet Explorer

IE 3.0 displays an input box and lets the user type a filename there—and it sends the name as part of the form data! Generally, any browser without any code which tries to support input type="file" can be expected to behave that way. (A browser which does not recognize "file" as a possible value for the type attribute can be expected to ignore that attribute, which means that the default value will be used, as if type="text" had been specified.)

IE 4 has an input box and a “Browse” capability, and it actually sends the file content, but it still allows one file only to be selected. The “Browse” function display is unfiltered, i.e. all files which are normally visible are selectable. There does not seem to be any improvement in this respect in IE 5, or IE 6, or IE 7.

Netscape

According to Netscape’s documentation on file input, support to it exists already in Netscape 2.

Netscape 4 support to file input has a “Browse” capability, too, but the browsing has by default a filter which limits selectability to “HTML files”. The user can manually change this, though it is questionable how familiar users are with such things. Only one file can be specified. There does not seem to be any improvement in this respect in Netscape 4.5. Here is an example of the user interface:

(A popup window, titled "File Upload", with a typical
Windows-style set of directory and file icons, and basic
functionality for navigation in the directory hierarchy.
There is a field (initially empty) for File name, and a pulldown
menu named Files of type, initially set to HTML Files.)

Mozilla

The above-mentioned strange feature of Netscape has been fixed in Mozilla, which uses no filter (i.e. displays all files); on the other hand it (at least in several versions) gives no user option to switch to a filtered view!

Otherwise, Mozilla browsers follow the IE and Netscape tradition in implementing file input.

Opera

Opera supports file input rather well. It provides a “Browse” menu, though the button for activating it carries the label “...”, which might be somewhat confusing. It lets the user specify several files from the menu:

It isn’t perfect though. The Browse window is rather small, and it is impossible to pick up several ranges, i.e. you must click on the files individually unless you want to select just one contiguous range. And the box for file names is quite small too, and its size is not affected by the size attribute. See also notes on setting the default filename.

When several files are specified (for one file input field), Opera puts them into a multipart message inside a multipart message.

Safari

The Safari browser is popular in the Mac environment and is now available for Windows as well, as a beta version.

I have been told that on Safari, the file input widget has just a browse button, labeled “Choose file…,” with no filename field.

Users’ problems with file input

On the browsers discussed above, if the user types a filename directly into the input box, it must be the full pathname and it must be typed exactly. If the input is not a name of an existing file (e.g. due to a typo), then the form will be sent as if an empty file had been specified (though with the name given by the user), and no warning is given. People who encounter file input for the first time might be expected to get very confused, since the filename box appears first and looks like an area where the user should type something.

The user probably often wishes to view the contents of files in the dialog, since it is difficult to select the file on the basis of its name only. On Windows systems, the browsers discussed here seem to use widgets where normal clicking on a file icon selects it, and to open it (in some program) one needs to use right click and select a suitable action. I guess most users won’t find that out without being helped. The following screen capture presents the dialogue on IE 4 (on WinNT) in a situation where the user has right clicked on an icon and an action menu has popped up and the user is about to select the Open action (which would, in this case, probably open the .jpg file in a graphics program or in a new browser window.

(A Choose file dialog, with a popup menu on top of it,
with Select as the first and hightlighted alternative.
The Open alternative has been focused on. There are
other alternatives below it, e.g. Add to Zip, Send To,
and Properties.

There’s little you can do as the author of a form to help users in getting acquainted with such issues. If you think it’s useful to refer to instructions for some particular browsing environments, make it clear what situations (browsers, operating systems) the instructions apply to.

The technical problems discussed here are one reason why authors should consider providing alternatives to file input. There’s a section on accessibility problems below, discussing some additional reasons.

The appearance of the Browse button and the filename box

All the browsers mentioned above use essentially similar appearance for the widget used to implement a file input element: a text input box for the filename looks similar to normal text input elements (<input type="text">), and the Browse button resembles submit buttons (thus, is often grey), and it has the text “Browse” or its equivalent in another language.

That text is under the control of the browser, not the author. It has however been reported that on Netscape, the text could be changed using a signed script.

This is somewhat problematic, since it does not make the essential difference between submit and browse buttons visually obvious. Cf. to similar problems with reset buttons.

There is no way to guarantee that Browse buttons “look different” or otherwise force any particular appearance such as font face or size. See the document Affecting the presentation of form fields on Web pages for an overview and examples. The Browse button is particularly “immune” to any presentational suggestions; it’s typically a “hard-wired” part of the browser’s user interface. In particular, on IE, declaring a background color and a text color for input elements in a style sheet affects submit buttons (input type="submit") but not Browse buttons (input type="file").

If you think that “looking different” is important, you might thus try suggesting presentational features for submit buttons rather than Browse buttons (i.e., for input type="file" elements). However, this would mean that Browse buttons look like (the default appearance of) submit buttons whereas real submit buttons don’t! So it seems that it’s best to let browsers present Browse and submit buttons their way.

The input box for the filename, on the other hand, seems to be affected by similar factors as normal text input boxes. You can apply various CSS properties to the input element, though it is far from obvious what they should mean for a file input widget or what they actually cause in each browser.

Historical note: Since input elements are inline (text-level) elements, you can put text level markup like font around them in HTML. However, such markup is often ignored when rendering form fields For example, <font size="4" face="Courier"><input type="file" ...></font> might increase the font size and set the font to Courier. Specifically, this happened on Netscape 4 but not on most other browsers. (As a side effect, on Netscape 4, such a font size change affected the dimensions of the Browse button but not the font size of the the text “Browse”. Note that if you included a color attribute there, Netscape  ignored it.)

You could suggest presentational properties in a style sheet too, e.g.
<input type="file" ... style="color:#f00; background:#ccc">
and these in turn would be ignored e.g. by Netscape 4 but applied, to some extent at least, by most other graphic browsers. It is difficult to say how CSS rules should affect the widget, since it is an open question whether e.g. the text of the Browse button (which is not part of the textual content of the HTML document) should be formatted according to the font properties of the input element. (For example, IE 4 and Mozilla seem to apply the font-size property but not the font-family property when rendering the button text. IE 6 applies font-family too.

The following example demonstrates how your browser treats a file input element where we suggest presentational properties both in HTML and in CSS:

The example has the HTML markup
<b><tt><big><input type="file" ...></big></tt></b>
and the following CSS declarations applied to that input element:
color:#630; background:#ffc none; font-size:160%; font-family:Courier,monospace; font-weight:bold
Such suggestions might help in making it clearer to users that there is a special input box. But try to avoid making it look too special, since there is then the risk of not getting intuitively recognized as an input box at all.

At Quirksmode.org, there is a longish article that discusses fairly complex CSS techniques for changing the appearance of file input elements, in a sense: Styling an input type="file". I would however advice against any substantial changes in the appearance. Any esthetic improvement over browser defaults (in addition to being a matter of taste) has a price: it makes even the experienced user uncertain of what the widget is.

File input is a challenge to many users

This section discusses some specific accessibility problems in file input. For an overview of what accessibility is and why it is important, please refer to the Guide to Web Accessibility and Design for All.

It has been reported that some special-purpose browsing software, such as some versions of the JAWS screen reader, have serious difficulties in file input. This is understandable, since the common implementation in browsers is oriented towards visual interaction.

Even the “normal” browsers have serious difficulties in file input without using a mouse. (There are different reasons, including physiological and neurological problems, why the user may need to work without a mouse or other pointing devide.) In Internet Explorer 6, you can select the Browse button by tabbing, but if you try to use the keyboard to activate it, hitting the Enter key, the browser submits the form instead! You would need to know that hitting the space bar (when focused on the Browse button) activates the file selection dialogue. Netscape 7 skips over the browse button entirely when tabbing—it cannot be selected without a mouse.

Not surprisingly, on Opera things work reasonably. The user can select the Browse button using the tab key and activate it by pressing the enter key, then select a file for upload from the file system; you would use the arrow keys move around in the file selection.

On the Lynx text browser, at least on Lynx 2.8.4 on Unix, there is no Browse button, and there is no dialogue for accessing the computer’s file system. Thus, the user needs to know the exact path name and syntax to type in the file name for upload, as is apparently also the case for IE and Netscape.

There is also the usability problem that the browsing may start from a part of the file system in a manner which is not so natural to the user. The initial selection might be e.g. that of the directory where the Web browser itself resides! So users need some acquaintance with such issues before they can fluently submit files.

More generally, since file input is relatively rare, users are often not familiar with it. They might not recognize the Browse button, and might have difficulties in understanding what’s going on when they click on it (or fail to click on it).

Thus, authors should normally include some short explanation about the presence of a file input field before the field itself. This can usually me done in a natural way, explaining simultaneously what kind and type of file should be submitted.

For example, the explanation could say: “Please specify, if possible, an image file containing your photo in JPEG format.” Such a note may not help much when a user encounters such a field for the first time in his life, but it helps him to associate the eventual problems with a concept of file input and to explain his problems when seeking for help. And if he has tried to use file input before, it tells him to stay tuned to something special, and perhaps at this point, before entering the file input field, to access the file system outside the browser and find the exact path name of the file he wants to submit.

Character encoding problems

The file is submitted as such, without code conversions. A plain text file is submitted without information about character encoding, so the recipient needs to guess the encoding or infer it somehow.

For example, suppose that you have a UTF-8 encoded form and that it is used to submit a plain text file. If the user wrote the file using Notepad, it will (by default) be in windows-1252 encoding, and its content is sent as such, declared just as text/plain (no charset attribute), even though contents of normal fields are UTF-8 encoded. The server-side form handler has no direct way of knowing what the encoding is, so how can it meaningfully process the data?

In general, the browser cannot tell the encoding, so it can neither declare it nor code-convert the data. The reason is that commonly used file systems lack indication of the character encoding of a plain text file; it just needs to be known.

Thus, if your form is meant for submitting plain text files, your best option is probably to ask users to save their text files in UTF-8 encoding with BOM (Byte Order Mark). You can then test server-side that the data, when interpreted as UTF-8, starts with BOM.

How to provide alternatives

There are several possible ways to let people submit their files even when their browsers do not support file fields in forms (or the support is of so poor quality that they don’t want to use it).

You could include a TEXTAREA element into the form. This would work especially for text files in the sense that a user could open his file in an editor and cut & paste the data into the textarea. Naturally, this becomes awkward for large files, but it might still be a good idea to have a textarea along with a file input field. Your server side script would need some more code to handle both.

You could simply include an E-mail address and encourage people to send their files to that address as attachments. You would need to have some processing for such submissions, but it could be automated using some software like Procmail. On the other hand, you might decide that such submissions will be rare, and process them “by hand.” Make sure the address is visible on the page itself. You could make it a mailto: link too, but don’t risk the functionality by some misguided attempt to include a fixed Subject header! Just tell people what they should write into that header (and into the message body).

Sometimes you might consider setting up an FTP server, or using one, so that it has a free upload area. You would then just specify the server and the area, and people could use their favorite FTP clients. Note that for the submission of a large number of files, FTP would be more comfortable than using a form with a file input field.

Especially for local users, you could just give a physical address to which people can bring or send their files e.g. on diskettes or CD roms. Make it clear to them in advance which media and formats you can handle that way.

References

See also notes on RFC 1867.

Notes on client-side scripting issues

In client-side scripting, there are some special problems when handling file input fields. The JavaScript Form FAQ contains answers to such questions:

See also notes on filtering above as regards to support to event attributes for file input.

Technical notes

The HTML 4.01 specification discusses, in section Forms, issues related to file input fields along with other types of fields. The notes below hopefully help in locating and interpreting the relevant portions.

The enctype attribute

The HTML 4.01 specification defines an enctype attribute for the form element. Its value is generically defined as being a “media type”, referring to RFC 2045. (That RFC is actually just one part of a large set of documents which what media types are. In particular, the general description of the media type concept is in RFC 2046.)

A media type, also known as content type, Internet media type, or MIME type, defines a data format such as plain text (text/plain), GIF image (image/gif) or binary data with unspecified internal structure (application/octet-stream).

But in the context of form submission, the use of a media type as the value of the enctype attribute is meaningful only if there is a definition of the conversion to be done. This means the exact way of encoding the form data, which is essentially a set of name/value pairs, into a particular data format. The definition must be rigorous, since otherwise it is impossible to process the data in a useful, robust way by computer programs.

The HTML specification defines two possible values for enctype:

enctype="application/x-www-form-urlencoded" (the default)
This implies a simple encoding which presents the fields as name=value strings separated by ampersands (&) and uses some special “escape” mechanisms for characters, such as %28 for the “(” character. It’s confusing if people try to read it—it was meant to be processed by programs, not directly read by humans!
enctype="multipart/form-data"
This implies that the form data set is encoded so that each form field (more exactly, each “control”) is presented in a format suitable for that field, and the data set as a whole is a multipart message containing those presentations as its components. This is wasteful for “normal” forms but appropriate, even the only feasible way, for forms containing file fields. The multipart structure means that each file comes in a nice “package” inside a larger package, with a suitable “label” (content type information) on the inner “package.” This type was originally defined in RFC 1867 but it is also discussed in RFC 2388 (see notes on the RFCs later).

Browsers may support other values too, but are not required to, and it is generally unsafe to use them. Sometimes people use enctype="text/plain", and text/plain is per se a well-defined media type; but there is no specification of the exact method of encoding a form data set into such a format, and browsers are not required to support such an attribute—so anything may happen if you use it.

Normally you should not try to re-invent the wheel by writing code which interprets (decodes) the encoded form data. Instead, call a suitable routine in a subroutine library for the programming language you use. It typically decodes the data into a convenient format for you to process in your own code.

It seems that the HTML 4.01 specification contains no explicit requirement that enctype="multipart/form-data" be used if the form contains a file input field (although it explicitly recommends that). But e.g. IE 4 and Netscape 4 handle form submissions incorrectly if the enctype is defaulted in such a case: they send the name of the file instead of its content!

Submitting several files?

The HTML 4.01 specification uses the term file select for the “control” (i.e. form field) created by an input type="file" element. It specifies file select so that this control type allows the user to select files so that their contents may be submitted with a form. Note the plural “files”—the idea is clearly that one such field should allow the inclusion of several files.

Note that there is nothing an author needs to do, and nothing he can do, to make a browser allow the selection of several files per input field. It depends on the browser whether that is possible.

However, as described above, the current browser support is poor: only some versions of Opera support multi-selection, and these do not include the newest versions. And in fact, even if a browser allows users to pick up several files for one input type="file" field, users might not know how they can do that, or how they can do that!

Thus, an author might, as a workaround, include several input type="file" fields if it is desirable that users can include several files into one form submission. Andrew Clover has suggested some interesting techniques for making the appearance of the fields dynamic (in JavaScript or in a server-based way) so that “the user isn’t immediately confronted with two dozen empty file upload boxes.”

Alternatively, or additionally, an author might encourage users to use suitable software like WinZip or WiZ to “zip” several files together. Naturally the server-side script must then be somehow prepared to handle zipped files.

Setting the default filename

The HTML 4.01 specification describes the value attribute for a file input field by saying that browsers (user agents) “may use the value of the value attribute as the initial file name.” This however is usually not supported by browsers. The usual explanation is “security reasons.” And indeed it would be a security risk if files from the user’s disk were submitted without the user’s content. It might be all too easy to lure some users into submitting some password files! But in fact RFC 1867 duly notifies this problem; in section 8 Security Considerations it says:

It is important that a user agent not send any file that the user has not explicitly asked to be sent. Thus, HTML interpreting agents are expected to confirm any default file names that might be suggested with <INPUT TYPE=file VALUE="yyyy">.

It also mentions (in section 3.4) that the use of value “is probably platform dependent” but then goes on: “It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again.” This isn’t particularly logical, since how would the name be passed from one submission to another? (The mechanism for getting the original file name would be quite unreliable for such purposes.) A more useful application could be this: Assume that your form is for reporting a problem with a particular program, say Emacs, and that program uses a configuration file with some specific name, say .emacs, so that you would very much like to get the user’s config file for problem analysis. Setting the default name, if supported by the browser, might be an extra convenience to the user.

Thus, they just failed to implement it, for no good reason. This isn’t a very important flaw, however. The situations where it would make sense to suggest a default file name are rare.

Netscape’s old HTML Tag Reference says, in the description of input type="file", that “VALUE=filename specifies the initial value of the input element,” but no actual support to this in Netscape browsers has been reported. Similar considerations apply to the corresponding item in Microsoft’s HTML Elements reference. It additionally messes things up by describing the intended meaning wrong: “Sets or retrieves the value of the <INPUT type=file>.” The description links to a description of the value attribute which says: “The value, a file name, typed by the user into the control. Unlike other controls, this value is read-only.” This probably relates to using the value property in client-side scripting. And in fact, one can read the value in JavaScript (and get the filename entered by the user) but setting it is unsuccessful (without an error message); the same applies to Netscape (but on Opera, even an attempt to read the value seems to confuse the browser). Note that the examples in the above-mentioned documentation do not contain an input type="file" element with a value attribute.

However, support to file input in several versions of Opera handles the value attribute in the following way:

Such support, however, is absent in Opera 7.54, for some reason.

The following form contains a file input field with value="C:\.emacs". Your browser probably just ignores that attribute, but some browsers may use it to set the initial file name:

An example of Opera’s security alert in the situation discussed above:
! The files listed below have been selected, without your
intervention, to be sent to another computer. Do you want to
send these files?
Destination  http://yucca.hut.fi/cgi-bin/sendback.pl
Form URL     http://www.hut.fi/u/jkorpela/forms/filedemo.html
C:\emacs
             OK    Cancel    Help

There was a short-time bug in Opera 6 that created a security hole, which would have let authors grab users’ files without their knowing, i.e. bypassing the dialogue described above.

Getting the original name

RFC 1867 says:

The original local file name may be supplied as well, either as a ‘filename’ parameter either of the ‘content-disposition: form-data’ header or in the case of multiple files in a ‘content-disposition: file’ header of the subpart. The client application should make best effort to supply the file name; if the file name of the client’s operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description.

But note that this appears in subsection 3.3 of section 3. Suggested Implementatation. Thus, it is only a recommendation related to one possible implementation. You shouldn’t count on having a filename included.

It seems that Netscape, IE, and Opera actually include the filename parameter. However, only Opera uses the format which seems to be the intended one, as deduced from the examples in RFC 1867 (section 6), namely a relative name like foo.txt, not a full pathname like C:\mydocs\foo.txt. Internet Explorer 7 beta preview behaves similarly, and this has been explained as a security improvement.

Is the Netscape and IE behavior really incorrect? Well, since most computers have some sort of path name system for file names, one would expect to see path names in examples if the intent had been that path names are sent. This is consistent with the fact that in order to actually use the file names for some meaningful purpose (like the one mentioned in RFC 1867: “the uploaded file might contain references to each other, e.g., a TeX file and its .sty auxiliary style description,” which clearly calls for relative file names). When path names are sent, things get much more complicated, since their specific syntax (and interpretation) is strongly system-specific, and there is even no provision for telling the server what the browser’s file system is. Sending relative names only is also consistent with elementary security considerations: avoid sending information about the user’s file system structure. Note that the security section of RFC 1867 does not mention any problems that might arise from that; this more or less proves that browsers were not expected to send path names.

The idea of including a filename attribute makes sense of course, and would apply e.g. to a file submission containing a set of HTML documents referring to each other with relative URLs. However, it’s clear that the processing script would need to strip off the path part of the names (which is in principle risky since C:\mydocs\foo.txt could be a relative filename on many systems!). Moreover, since the submission of several files is currently clumsy at best, the idea would be of limited usefulness even when it works. (Collections of files that refer to each other by names would be best handled as packaged into formats such as application/zip, leaving the file name issue to be handled by zipping and unzipping programs, which can preserve relative names as well as relative directory structures.)

The size attribute

Although the user is not expected to type the filename(s) into a filename box but use the Browse function, the size of the box matters. When the user selects a file by clicking on it, the browser puts the filename into the filename box, and the name is a full pathname which can be quite long. It may confuse users if they see the name badly truncated.

Definition of input type="file" in the HTML 3.2 specification said:

Just like [for] type=text you can use the size attribute to set the visible width of this field in average character widths.

And most browsers seem to treat the size attribute that way.

But the HTML 4.01 specification defines the size attribute for an input element as follows:

This attribute tells the user agent the initial width of the control. The width is given in pixels except when type attribute has the value "text" or "password". In that case, its value refers to the (integer) number of characters.

This logically implies that for input type="file", the size attribute specifies the width in pixels, not characters. This is probably an oversight, and the risk of a browser acting literally according it is ignorable.

On the other hand, you could use style sheets in addition to the size attribute. Using e.g. the attribute style="width:25em" could override the size attribute; this currently seems to happen on IE 4 and newer only, but it should do no harm on browsers which don’t support it. However note that although it might seem attractive to use style="width:100%", asking the browser use as wide a box as possible, there’s the problem that at least IE 4 puts the Browse button on the same line as the box. Thus you would in effect force horizontal scrolling! Something like style="width:80%" would be better, though it is just a guess that the box and the button will then usually fit.

Setting restrictions on the file size

Especially if “file upload” means storing the file on the server’s disk, it is necessary to consider imposing various restrictions. It would be nasty if some user filled the disk with gigabytes of junk, by ignorance, or by misclicking, or by malevolence. See section Avoiding Denial of Service Attacks in the documentation of CGI.pm; even if it isn’t directly applicable to you since you use other techniques than CGI and Perl, it gives some food for thought in general.

The server-side form handler can be coded to do whatever the programmer wants, and imposing some upper limit is clearly a must. (That is, the code should check for the input size, and discard, or otherwise process in a special way, submissions exceeding a reasonable limit.)

Any client-side restrictions, i.e. checks done by a browser prior to form submission, are unreliable and should be considered as extra comfort to users only—so that they get a rejection message earlier.

RFC 1867 says:

If the INPUT tag includes the attribute MAXLENGTH, the user agent should consider its value to represent the maximum Content-Length (in bytes) which the server will accept for transferred files.

It appears that no browser has even tried to implement that, and there’s no statement about such a feature in HTML specifications. On the contrary, the HTML 3.2 specification says something quite different:

You can set an upper limit to the length of file names using the maxlength attribute.

Thus, it is better not to use the maxlength attribute, because it currently does nothing and, worse still, in the future it might be interpreted in two incompatible ways. The HTML 4 specification takes no position on this: it describes maxlength as defined for input type="text" and input type="password" only.

Filtering (through a file type filter)

The HTML 4.01 specification defines an accept attribute for use with input type="file" as follows:

This attribute specifies a comma-separated list of content types that a server processing this form will handle correctly. User agents may use this information to filter out non-conforming files when prompting a user to select files to be sent to the server.

Thus you could specify, for example, accept="image/gif,image/jpeg", if you are willing to get image files in GIF or JPEG format only. Browsers might use this information to set up the Browse menu so that only such files are selectable, at least initially. And the HTML 3.2 specification even claims: “Some user agents support the ability to restrict the kinds of files to those matching a comma separated list of MIME content types given with the ACCEPT attribute[;] e.g. accept="image/*" restricts files to images.” (Note that "image/*" is not a MIME content type. Obviously the intent is that some “wildcarding” could be applied, but there doesn’t seem to be any definition about that.)

But it seems that browser support is currently nonexistent. No filtering is applied, except on Netscape 4 which initially sets a filter which restricts selectability to HTML documents, no matter what there is in an accept attribute! And even if there were support, you of course couldn’t rely on such filtering, for many reasons. If it worked, it would be basically for user comfort, not for setting effective restrictions (which must be imposed by the form handler).

Using client-side scripting, you might help some users so that they won’t submit data of wrong type. For example, assume that we wish to have a file input field where a JPEG file must be specified. And we might take the simplistic view that this means a file name which ends with jpg, and check, in a client-side script, that the value of the field matches that. Note that the value is the filename, not the file content. However one must be extra careful here. Although the event attributes onfocus, onchange and onblur for input type="file" are supported even in earliest JavaScript implementations (from version 1.0), there are limitations and problems. In particular, onblur seems to be treated strangely, and the obvious idea—associate checking code with onblur—seems to make Netscape run in an eternal loop. Thus, it is probably best to associate the checks with file submission only. This means using the onsubmit attribute in the form tag. Example:

<script type="text/javascript" language="JavaScript">
function check() {
  var ext = document.f.pic.value;
  ext = ext.substring(ext.length-3,ext.length);
  ext = ext.toLowerCase();
  if(ext != 'jpg') {
    alert('You selected a .'+ext+
          ' file; please select a .jpg file instead!');
    return false; }
  else
    return true; }
</script>

<form method="post" name=f
enctype="multipart/form-data"
onsubmit="return check();"
action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/echo.cgi">
<p>
Please select a JPEG (.jpg) file to be sent:
<br>
<input type="file" name="pic" size="40"
accept="image/jpeg">
<p>
Please include a short explanation:<br>
<textarea name="expl" rows="3" cols="40"
onfocus="check();">
</textarea>
<p>
<input type="submit" value="Send">
</form>

The status of RFC 1867

The status of the original description of input type="file", namely RFC 1867, Form-based File Upload in HTML, is vague. The HTML 4.01 specification makes only an informative reference to it, and mentions a “work in progress” in this area:
ftp://ftp.ietf.org/internet-drafts/draft-masinter-form-data-01.txt
This is however outdated information; the URL does not work, and the draft has expired. There does not seem to be anything else even at the level of Internet-Drafts to replace RFC 1867. There is however RFC 2388, Returning Values from Forms: multipart/form-data which might be related to the process. However it is not specified to obsolete RFC 1867.

In the HTML 4.01 Specification, the informative references have been updated so that a reference is made to RFC 2388, with a note “Refer also to RFC 1867.”

In June 2000, RFC 2854, The 'text/html' Media Type, was issued. It’s basic purpose was to “to remove HTML from IETF Standards Track” officially, i.e. to make it explicit that work on HTML specifications has been moved from IETF to W3C. It explicitly obsoletes RFC 1867, together with some other HTML related RFCs. But note that there is very little in HTML specifications by the W3C that defines what file input really is; they refer to RFC 1867 instead.

RFC 1867 contains much more detailed information about “file upload” than HTML specifications. It explains the original idea and how it might be implemented. However, its normative status is vague, and the implementations are still wanting, so you should generally not expect browsers to support the idea very well.