Jukka Koskinen
Tampere University of Technology
P.O.Box 553, FIN-33101 Tampere, Finland
E x t e n d e d   abstract

http://www.cs.tut.fi/~jak/ec00.html
Eurocrypt 2000
Poster session
16 May 2000

Security for an auxiliary human memory

Introduction

It is obvious that in the future people will carry with them and perhaps also wear more and more electronic devices. As computers are so readily available they will start assisting humans also in the area of personal memory, that is, they will not only serve as a source of external information, but help in remembering, far beyond the current use of calendars and notebooks. At the same time communications technology is starting to make it possible to be "in contact" almost all the time. Putting these two trends together implies that there will most likely be applications that distribute the personal memory over some network.

It is also important that already the fact of being networked in a mobile way will create a lot of new information that can, and hence in many cases will, be stored - like location and the parties or even contents of communications. This kind of situation will raise many security issues, that combine the normal confidentiality and authentication issues of the communications to anonymity, and extend to availability over long periods of time.

There are many questions concerning e.g. the interface, data representation and retrieval techniques that need to be solved before any useful system can be built. These questions may be so challenging that security will be overlooked in the first place. It seems plausible to investigate what happens when the target system is specified somewhat beyond what is currently practical and address its development from the security point of view. Is it possible that the security approach could direct the development? Especially could it keep the systems more coherent - if it is done before any practical systems get common?

The starting point for this poster is a simple model where a user communicates with a memory system by storing, recalling and possibly also forgetting. We assume that the requirements for long term availability and reliability dictate that a major part of the memory resides on some service outside the security perimeter of the user, that is, the user has only minimal trust to this service. Especially we assume that confidentiality of the user's memories must be provided by encrypting everything he sends to this service. This contradicts badly with the desired ability to do content based search, which is quite central to remembering. We are led to a compromise where a certain part of every memory item of a user is encrypted with the same fixed key. Besides the user's ubiquitous interface there are likely to be other parts of the system that are within the perimeter, but we concentrate on the outside service and present protocols for secure communication with it.
 

General aspects of the security mechanisms

From now on we do not deal with memories but just "data". A data item is a unit of data that is addressed as a whole. We call the user by U and the server by S, and start to design protocols for storage, retrieval (by address) and search (by content). We deal with erasure later. In this section we do not yet specify the protocols, but investigate general properties required of them. All protocols obviously involve a request from U to S and a reply from S to U.

Anonymity and payment

We set two requirements for anonymity: The second requirement may seem rather stringent, but its motivation comes from our postulate that each user will have a fixed encryption key for his data (or at least a part of it). The requirement forces us to mix U's data with that of the others. Since S may provide its service for a large number of users it may have to divide them into groups. It seems best to allow the users to choose their group themselves. This may allow some useful categorization for them as well. Care must be taken that S cannot advertise different lists of available groups to different users. Making such lists disjoint would give S a complete linkage of each user's data and consequently a lot of cryptotext with the same key.

The first requirement implies that the request must be so routed or otherwise managed that it does not identify U. Similarly S cannot actually send any reply to U, but the message must be routed to him. A recent survey of such methods is [CPV99]. In a wireless environment a non-addressed reply can be achieved by including in each request a random number N and broadcasting the reply with the same N, cf. [JLC99]. In near future General Packet Radio Services (GPRS) can possibly provide a continuously connected platform, where the origin of the messages may not be revealed to the service. In this system a denial of service attack against the server is not so likely, because the sender must pay for each packet. However the contacts may also come through some other medium, where this does not hold.

We do not go into any details of anonymizing the underlying communications, but we have to address the problem of S getting paid for its services and here we get back to the anonymity. Payment for the service may include some initial and a fixed annual fee, but obviously the payment must depend on usage. Because the service should not know which data items belong to a user, it seems that the payment for a store operation should cover storage service for the entire intended time. Each item then should have an expiry time attached to it. During earlier erasure by the user some refund is possible.

One approach for paying anonymously for service would be to use one blindly signed ticket to access a pseudoaccount maintained by the server, like in [JLC99] where it is used for phone service. The account can be replenished by anonymous payments. This method would provide good granularity for payments, but it would link several items to a same user. It seems that we must use tickets that carry the payment.

Before entering the protocols U can purchase suitable tickets directly from S. We envision that the service could be so common that it does not matter for U to be identified by S at this moment. When administering tickets of its own S must keep directories that prevent double-spending, but it will still save a lot of effort in comparison to general purpose electronic cash systems, managed by a third party. However, using such a general system for payment of the tickets makes it possible that S never identifies U. A ticket system administered by S puts more trust on S, but a fraudulent S may only gain some money - and soon lose its reputation and future income. Such frauds cannot compromise the more important security objectives.

Authenticity and integrity

The service does not need to authenticate the user. It is enough that the user provides a payment. When fetching data, succesful decryption assures the user that the data is his, but it could still be wrong data. To achieve full integrity one needs to compare to a saved hash value. When storing data the user needs to authenticate the service. This can be done by encrypting something in the request by the server's public key and verifying that the responder knew the private key of the server.

We may allow any data to be retrieved by anyone knowing its address, because only the owner can decrypt it. The others do not even know whose encrypted data they have got. In the same way we may allow them to get any data their search criteria happen to match.

Availability and accountability

There will obviously be need for different levels of service, especially concerning for how long and how well S should take care of the data (e.g. frequency of back-up and the degree of redundancy). Besides some general security audits there can be no direct assurance on these issues. But as there will be different payment tariffs based on the service level, there will be a different penalty for the server when it cannot reproduce a storage item that has not expired. For this purpose S must sign a receipt of data, and the receipt must indicate the expiry time and the service level for that data. If U does not get correct data he could present the receipt to a third party, who could charge S for the error without disclosing U's identity, and then pay U.

Such a receipt will work conveniently only for address based retrieval.  When looking for data with content based search the user cannot make the service accountable for finding anything. By getting a payment also for the searches the server is of course encouraged to work properly. Otherwise users will develop their own search methods and use only direct addressing. They could do this also by using the service to store the index structure.

Some form of direct addressing seems to be important for all users, because it allows controlling the entirety of the data that a user has stored. That is, a user should maintain a directory structure, that also includes expiry dates to allow reviving of near-to-expire data if desired. The server will of course have this information in connection to each stored data item, but it is not reasonable for the user to access his data this way because there are also lots of other users' data in the same area, having similar expiry dates.

Confidentiality and search structures for the data

We have already postulated that we want to do some kind of content based search even if our data is encrypted. This requires that the user has one eternal master key that he uses to encrypt those parts of the data he wants to use in searches. To protect these cryptotexts from analysis, at least as long as they are not used in a search, they should be spread among the rest of the data item. This requires that the server must search the queried bit strings over the whole of the data item. Different users may apply different strategies for their data, but to ensure availability it seems necessary to embed in the data item the variable encryption key, which is used to encrypt the non-searchable part of the item. We next outline one way of doing this.

Assume U has a string of data to be stored and he has chosen a set of keywords, as well as time and place and other attributes, that he may want to use for content based search. These search items are encrypted separately with the master key. Then U chooses a random key k and encrypts the data as a string t. The encrypted search items are inserted at randomly chosen locations in the string t, giving string v. The pointers to these locations and their lengths as well as the key k are coded as a string w, which is encrypted with the master key. The resulting bits are inserted in the string v in a fixed way, which covers some sufficiently large range in comparison to the length of w (which is also fixed of course). The final result will be the X that we use in the protocols. When U receives such an X from S, he first collects and decrypts the string w and then extracts the search items according to the pointers and lengths given in w. What remains of X can now be decrypted using the key found from w.

Basic protocols

In the protocols following notation is used:
X is the data item to be stored or having been stored.
G is  one of the groups announced by S.
L is the desired service level. This includes also the expiry time.
B is a ticket that has a signature by S (valuable enough for the requested service). The signature or verification keys for the tickets are not shown.
ES is the public encryption key of S.
K-1S is the private signature key of S used to sign the storage receipt.
k is a fresh symmetric encryption key chosen by U.
A is the address of X and it will contain sufficient information for S to find X later (especially the group G is not needed).
Q is a query.

To send any request U generates a fresh symmetric encryption key k and uses the normal digital envelope technique with the server's public key ES . It is assumed that S need not complete the decryption with k before seeing the first parts of the plaintext. It sends all replies encrypted with k.

Store. User U wants to store data X in group G with service level L and pays B for it.

U --> S: { k } ES , { G, B, L, X } k
S --> U: { A , { L, h(X), h(AX ) } K-1S } k
Server S checks whether B has enough value for the level L and size of X. If so, S stores X at an address AX in group G and signs a receipt containing L and the hashes of X and AX. The reply by S contains the receipt and the address AX . After this U should verify the signature and store both the address and the receipt.

Retrieve. User wants to fetch the data that was stored at AX .

U --> S: { k } ES , { AX } k
S --> U: { X' } k
If X' is missing or h(X') is not equal to the h(X), which U received in the storage receipt, then U will resort to some rectification procedure by presenting the receipt { L, h(X), h(AX ) } K-1S.

Search. User wants to find data that matches some search criterion Q, which contains bit strings that should match. The data is expected to be in group G and B is a payment for the operation.

U --> S: { k } ES , { G, B, Q }k
S --> U: { X } k
Instead of one item X, there may be none or many that match the search criterion. If this is expected to be common, the reply could be changed to give a list of addresses together with such predetermined parts of the data items where U can find the decryption key and some content that gives a hint which item is the searched one.

Extensions

Erasure and update

Besides expiry there is no way to get rid of data. Erasure could be a desirable feature - if not for forgetting things but to update them. Update or replacement is conceptually just an erasure which is followed by a store. If the suggested once-for-all type of payment is used, then a combined transaction may be provided to make it less expensive to update content.

If erasure is allowed the accountability situation changes. Since the user must remain anonymous, a signature by him is not possible, and hence non-repudiation of an erasure order cannot be achieved without compromising anonymity in some way. We envision two ways to do this: First, a third party could sign the erasure order and thus tie it to an identified and responsible person U. We do not pursue this method further here. The second method is to include in the storage request a hash value H of a random number r,which is kept secret by U. The value H is included also in the storage receipt signed by S. This ties it to the hashes of the data and its address. When erasing the item, U gives to S the preimage r. If U later tries to accuse S for not providing an item that matches the receipt { h(X), h(AX ), H }K-1S, then S can free itself from responsibility by presenting an r, which hashes to H.

Erasure. To erase an item, which was stored at A using H = h(r) in the storage request, user sends the pre-image r and includes in the request a blinded string b.

U --> S: { k } ES , { AX , r, b }k
S --> U: { b' }k
If r hashes to the value H that S has stored with the other data at AX , then S can erase that data, but it must save r, and H (as a quick index to r). It replies to U by b' which it has produced by blindly signing the string b. After unblinding U gets a valid ticket ("B") out of it, obviously with less value than in the ticket that U used when storing the erased item. If no refund is used then the content of the reply from S is largely irrelevant to U, unless of course some reasons are invented why S should be responsible for proper erasure.

Update. User wants to replace data at AX with new content X', using the same service level as earlier and not having to pay more. The request is a combination of those for storage and erase.  Of course a new erasure-hash H' is used and no payment or refund information is exhanged.

U --> S: { k }ES , { AX , r,  H', X' }k
S --> U: { AX'  , { L, h(X'), h(AX' ), H' } K-1S } k
The server replies exactly as it does for a storage request. It is possible that AX' = AX.

More steps and pieces for store

The user may want to get an earlier assurance that the other party is the legitimate server, i.e. before sending possibly lots of information to a wrong place. This can be done by splitting the digital envelope in two parts. To comply with the granularity of the ticket values or to provide either for large quantities of related data or data that is continuously being created, the protocol can be continued with simplified storage requests that use only symmetric encryption and which are linked by an identifier I, invented by S. Here we show also the different nonces N, N' and N'' that enable the reception by U.  We put the erasure-hash H in brackets to show that it is optional.

When splitting the original digital envelope something is needed to accompany the symmetric key in the publicly encrypted message 1. We chose to put there the service level L. Now, if U can decrypt L from message 2 with key k, he can believe that the message came from S and S knows k.

1. U --> S: N, { k, L } ES
2. S --> U: N, { L, I }k
3. U --> S: N', I, { G, B, [H], X } k
4. S --> U: N', { A , { L, h(X), h(AX ), [H] } K-1S } k
The above could be all of the data, but if it was just the first sending, then the i'th one (i>1) would be like this:
2i+1. U --> S: N'', I, { AX , Bi, Xi} k
2i+2. S --> U: N'', { A , { L, h(X, X2, ..., Xi), h(AX ), [H] } K-1S } k
At any time the receipt from S accounts for all the data that has been stored until then, using the same address.

In the case of long data also the retrieval can be fragmented, but as the retrieval protocol is so simple this does not appear to require any changes at this level of abstraction.

Discussion

Security. Since all the private data of users is encrypted, it is difficult for the outsiders to interfere with the system except with denial-of-service attacks. These are hard to avoid in any communication system.

Using the service for communication between two or more users is possible and cannot be considered misuse. Instead the system may offer interesting opportunities when applied in this way.

Complexity. There is one public-key encryption required of the user in every protocol, but these can be precomputed. The user can also postpone verification of signatures. It seems that computation will be needed more for other things than the cryptographic algorithms.

Accountability requires several things to be stored on both sides. In addition to good care for the data items the server needs to store used tickets until their expiry. In the presence of erasure the server also needs to store the authenticators of any erased data items, i.e. pre-images of the hashes or signatures by the third party. The user will need the addresses and storage receipts. In the presence of erasure that uses the hash method the user also needs to store the pre-image.

If the user doesn't care very much and trusts the search facility, he can do without storing anything, except of course the tickets and S's public encryption key. If some specific data is not so valuable, the user may well discard the storage receipt, and save only the address. Also if he is sure not to erase some data, he may discard the pre-image of the hash H. Of course, in such a case H could be omitted from the message altogether.

Future work. There are many open issues regarding the organization of the data in an efficient and searchable way and these may also have a bearing to the security issues. A more immediate security question requiring closer study is the anonymity of the underlying communication mechanisms. As for the proposed protocols there is still room for more thorough analysis. A detailed question would be to evaluate the risks of using a fixed key to encrypt the search items and the encryption key of the other data.

References

[CPV99] Joris Claessens, Bart Preneel, Joos Vandewalle. Solutions for Anonymous Communication on the Internet. Proceedings of the IEEE 33rd Annual 1999 International Carnahan Conference on Security Technology (ICCST'99), Madrid, Spain, Oct 5-7, 1999, 98-303. http://www.esat.kuleuven.ac.be/~joclaess/pub/iccst99.ps.gz

[JLC99] W.-S. Juang, C.-L. Lei, C.-Y. Chang: Anonymous channel and authentication in wireless communications. Computer Communications 22 (1999), 1502-1511.