Spring 2001

March 5, 2001

Location: Policy is increasingly used as a means for constructing flexible and secure communication services. However, the application of existing policy frameworks to group communication is difficult. This difficulty is largely due to size, lifetime, and dynamic nature of group sessions. This talk presents Antigone, a communication architecture and policy language in which the diverse and dynamic security requirements of group sessions can be identified and enforced. An Antigone policy defines the requirements of the dependent issues of group access control and security mechanism provisioning. The policy governing each session is the result of the reconciliation policies of all participants. The mechanisms and configuration of the communication service supporting the application is constructed from the reconciled policy. Several non-trivial application policies are identified, and their use and enforcement is discussed.

March 8, 2001

The explosive growth of communications networks has stimulated much interest in algorithms for problems at the interface between networking and combinatorial optimization. In this talk, we present provably-good algorithmic strategies for a family of such problems that involve routing, wavelength assignment, network design and (distributed) resource allocation. Randomization plays a key role in our approaches; we will also spend a little time discussing the value of randomization in the broader computational context. Linear programming and multi-commodity flow are other key ingredients of the approaches presented in the talk.

March 9, 2001

In this talk I will describe new algorithms and tools for generating paintings, illustrations, and animation using a computer. The long-term goal of this work is to combine the beauty and expressiveness of traditional media with the flexibility of computer graphics, with potential applications ranging from animated films to technical illustration.

I will begin with a painterly rendering algorithm for processing images and video sequences. This method produces images with a much greater subjective impression of “looking hand-made” than do earlier methods.

I will then present a new style of line art illustration for smooth 3D surfaces. This style is designed to clearly convey surface shape, even for surfaces without predefined material properties or hatching directions.

Finally, I will describe a new machine learning framework for processing images by example, called “image analogies.” Given an example of a painting or drawing (e.g. scanned from a hand-painted source), we can process new images with some approximation to the style of the painting, without requiring any explicit definition of the style. The image analogies framework supports many other novel image processing operations by example, such as “texture-by-numbers,” where we synthesize new realistic imagery from example photographs.

March 13, 2001

Models are important in science. Appropriate models have allowed scientist to make progress. We illustrate this using two examples:

  1. In 1979 Shamir discussed how to avoid misuse of digital signatures within a company. His scheme was implemented by RSA Inc. We analyze the model and its weaknesses. The proposal of an alternative model to this problem has been the cornerstone for what is now called threshold cryptography, which allows parties to co-sign messages.
  2. Perfect secrecy is likely the oldest model in information security. It was used to prove the security of the one-time pad. A consequence of the model is that modern ciphertexts are (indistinguishable from) uniform. This implies that encrypted text is immediately visible to a network sniffer. Information hiding addresses that problem. We look back at models used in modern information hiding to conclude that these are inadequate. We propose a new model for perfect information hiding which is inspired by Shannon’s 50 year old model. We analyze this model in details and propose information hiding schemes.

Part of this presentation is based on joint work with Blackburn, Burmester, De Santis, Di Crescenzo, Frankel, Jajodia, Le, Yung, Wild.

March 16, 2001

Many important applications must run continuously and without interruption, yet must be changed to fix bugs or upgrade functionality. To date, no existing dynamic updating system has achieved a practical balance between flexibility, correctness, ease-of-use, and low overhead. We present a new approach that provides type-safe dynamic updating of native code in an extremely flexible manner (functions and types may be updated, and at any time) and permits the use of automated tools to aid the programmer in the updating process. Our system is based around {\em dynamic patches} made up of proof-carrying code that both contain the updated code and the code needed to transition from the old version to the new. We discuss how patches are generated using a semiautomatic tool, how they are applied using dynamic-linking technology, and how code is compiled to make it updateable. To concretely illustrate our system, we have implemented a dynamically-updateable web server, FlashEd. We discuss our experience building and maintaining FlashEd. Performance experiments show that the overhead due to updating for FlashEd is between 2 and 6 percent.

March 26, 2001

In the Storage Tank project at IBM research, we have built a distributed file system that leverages high-speed storage networks to achieve performance and scale. Following a brief overview of Storage Tank’s architecture and goals, I will present two technologies for concurrency control that we have implemented and discuss how we employed simulation and analysis for performance evaluation and parameter selection.

First, Storage Tank provides a consistency model, called publish consistency, customized for the needs of scalable Web serving. Publish consistency composes the optimistic concurrency control techniques used in Web caching with the callback-invalidation protocols typically used in file systems. By slightly relaxing consistency constraints, publish consistency greatly improves the performance of a file system for Web-serving workloads.

Also, Storage Tank must be deployed on unreliable networks and commodity hardware and therefore must maintain data consistency and availability during communication errors, systems crashes, and other failures. Storage Tank employs leases for failure detection and state recovery of its clients. To reduce network overheads, we have added two techniques in lease management, called opportunistic renewal and stateless serving. Based on these techniques, we are able to implement a sub-second lease, making our system very responsive in the presence of failures.

April 2, 2001

Can user-perceived latencies in the World Wide Web be improved by pre-loading a cache with resources that are predicted to be accessed in the future by the user? If so, how? And how can we measure the effectiveness of a system that peforms such Web cache pre-loading? This talk will provide some answers to these questions, focusing on the design and development of two different tools for evaluating both pre-loading and non-pre-loading Web-caching systems. For early evaluation in the development process, one tool simulates expected performance. The other tests fully implemented systems in a real-world environment. No previous knowledge of Web caching will be assumed.

April 3, 2001

Suppose that Bob has a database D and that Alice wants to perform a search query q on D (e.g., “is q in D?”). There are elegant cryptographic techniques for solving this problem under various constraints such as “Bob should know neither q nor the answer to q and Alice should learn nothing about D other than the answer to q” while optimizing various performance criteria (e.g., amount of communication). We consider the version of this problem where the query is of the type “is q approximately in D?” for a number of different notions of “approximate”, some of which arise in image processing and template matching, while others are of the string-edit type that arise in biological sequence comparisons. New techniques are needed in this framework of approximate searching, because each notion of “approximate equality” introduces its own set of difficulties; using encryption is more problematic in this framework because two items that are approximately equal cease to be so after after encryption or cryptographic hashing. Practical protocols for solving such problems make possible new forms of e-commerce between proprietary database owners and customers who seek to query the database, with privacy and security.

Joint work with Wenliang (Kevin) Du

April 9, 2001

A surface reconstruction algorithm takes as input a set of sample points from an unknown closed and smooth surface in 3-d space, and produces a piece-wise linear approximation of the surface that contains the sample points. This problem has received considerable attention in computer graphics and more recently in computational geometry. In the latter area, four different algorithms (by Amenta and Bern ’98; Amenta, Choi, Dey and Leekha ’00; Amenta, Choi and Kolluri ’00; Boissonnat and Cazals ’00) have been proposed. These algorithms have a correctness guarantee: if the sample is sufficiently dense then the output is a good approximation to the original surface. They have unfortunately a worst-case running time that is quadratic in the size of the input. This is so because they are based on the construction of 3-d Voronoi diagrams or Delaunay tetrahedrizations, which can have quadratic size. Furthermore, according to recent work (Erickson ’01), this can be the case for some surfaces even when the sample set is “locally uniform”. In my talk, I will describe a new algorithm that also has a correctness guarantee but whose worst-case running time is almost a linear function of the input size. As in some of the previous algorithms, the piece-wise linear approximation produced by the new algorithm is a subset of the 3-d Delaunay tetrahedrization; however, this is obtained by computing only the relevant parts of the 3-d Delaunay structure. The algorithm first estimates for each sample point the surface normal and a parameter that is then used to “decimate” the set of samples. The resulting subset of sample points is locally uniform and so a reconstruction based on it can be computed using a simple and fast algorithm. In a last step, the decimated points are incorporated into the reconstruction.

April 10, 2001

Traditional cryptography relies on unproven complexity assumptions that certain problems, such as integer factorization, are computationally hard. However, advances in algorithms and computing technology may render current cryptosystems insecure, and lead to decryption of past secret messages.

In this talk, I will present cryptographic protocols in a new setting, where provable everlasting security can be achieved without any complexity assumption. My first result, in joint work with Michael Rabin, will describe an efficient encryption and authentication scheme that is provably secure against strong adaptive attacks, by an adversary who is computationally unbounded. Our encryption scheme further enjoys the surprising property that even if the secret key is captured by the adversary after the transmission of the ciphertext, the secrecy of the message is assured.

Oblivious transfer is a cryptographic protocol that can serve as a basic primitive for all of cryptography. I shall present an implementation of oblivious transfer in the same setting, which is again provably secure without complexity assumption.

April 13, 2001

Minimizing power and increasing performance are today’s primary microprocessor design goals, for both the high-end and embedded markets. In this talk, I will discuss two cache design optimization techniques which exploit the data reference characteristics of applications written in high-level programming languages. In the first part of my talk, I will present a new organization of the data cache hierarchy called region-based cachelets, which are capable of improving memory performance of embedded applications while significantly reducing dynamic energy consumption. Following this, I will discuss a new cache-like structure we call the stack value file (or SVF), which boosts performance of applications by routing stack data references to a separate storage structure optimized for the unusual characteristics of this reference substream. By utilizing a custom structure for stack references, we are able to increase memory level parallelism, reduce memory latency, and reduce off-chip memory activity.

April 16, 2001

Leveraging techniques such as boosting and bagging, combine the classification rules produced by many runs of a simpler learning algorithm. The rules produced are often far superior to those produced by the simpler algorithm alone.

These techniques are appealing for several reasons: they are simple to implement, computationally efficient and empirically successful. Perhaps the most successful and best known such algorithm is Freund and Schapire’s AdaBoost.

The incredible practical success of AdaBoost motivated our efforts to understand and generalize it.

In this talk I will introduce the AdaBoost algorithm and discuss how it may be viewed as gradient descent on a potential function. I will discuss how this viewpoint leads to generalizations of the approach, provide guarantees on the performance of such algorithms, and discuss how we adapted the gradient descent framework to the regression setting.

April 19, 2001

The gene finding research community has focused considerable effort on human and bacterial genome analysis. This has left some small eukaryotes without a system to address their needs. We focused our attention on this category of organisms, and designed several algorithms to improve the accuracy of the gene finding detection for them. We considered three alternatives for gene searching. The first one identifies a coding region by searching signals surrounding the coding region. This technique is used by GeneSplicer – a program that predicts putative locations for the splice sites. The system combines decision trees with Markov chain models in order to detect signal patterns. A second alternative is to identify a protein region by analyzing the nucleotide distribution within the coding region. Complex gene finders like GlimmerM combine both the above alternatives to discover genes. The basis of GlimmerM is a dynamic programming algorithm that considers all combinations of possible exons for inclusion in a gene model, and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons produced by an interpolated Markov model (IMM). The third alternative carefully combines the predictions of existing gene finders to produce a significantly improved gene detection system.

April 20, 2001

From simple message exchanges among friends, to purchases made with a credit card, to the transmission of sensitive documents, the Internet is now being used for many different purposes. However, the presence of malicious users on the Internet has created the need for security and, more specifically, the need for privacy and authenticity. While encryption is used to achieve privacy, authenticity can be achieved by means of digital signature schemes.

In this talk, I present DHIES, a public-key encryption scheme based on the Diffie-Hellman problem, which deals with the issue of privacy. DHIES is built in a generic way from lower-level primitives: it uses a symmetric encryption scheme, a message authentication code, group operations in an arbitrary group, and a hash function. The scheme is very efficient and has strong security properties. Furthermore, these security properties are proven to hold under appropriate assumptions on the underlying primitives. DHIES is now embodied in two draft standards, IEEE P1363a and ANSI X9.63EC, and the corporate standard SECG.

I will also briefly mention some of my work in other areas of cryptography, including broadcast encryption and forward-secure digital signatures.

April 23, 2001

One of the most fundamental questions in cryptography is the study of minimal cryptographic primitives needed to implement secure computation among two or more players. The issue of minimal complete primtivies for the case of two players was completely resolved. (A primitive is called complete if any computation can be carried out by the players having access (only) to the primitive and local computation. A primitive is called minimal if any primitive involving less players is not complete.)

However, in the multi-party setting, when there are n > 2 players, and t of them are corrupted, the question of what are the simplest complete primitives remained open for t > n/3. In this talk we consider this question, and introduce complete primitives of minimal cardinality for secure multi-party computation. In particular, we show that “Oblivious Cast,” a natural generalization of Oblivious Transfer to the three-party case, is minimal and complete for t < n/2.

The cardinality issue is essential in settings where the primitives are implemented by some other means, and the simpler the primitive, the easier it is to realize it.

This is joint work with Matthias Fitzi and Ueli Maurer (ETH Zurich), and Rafail Ostrovsky (Telcordia).

April 25, 2001

Graphs and graph transformations are proposed as a uniform and precise framework for the specification of access control policies. In modeling Role Based Access Control, the formalism provides a specification of static and dynamic consistency conditions, a uniform treatment of user roles and administrative roles, and a symmetric treatment of user-role assignments and permission-role assignments. A methodology is described to systematically generate conditions to ensure the consistency of the state graph after applying operations on assignments. The uniform framework allows the comparison of different policy models and the precise description of the evolution of a policy, as well as the analysis of the interaction between two policies and the integration of two policies.

(joint work with L.V.Mancini and M.Koch)

April 27, 2001

Statistical models of natural language are an important component of several applications such as speech recognition, handwriting and optical character recognition, spelling correction and machine translation. A language model, in most cases, assigns probabilities to the “next” word $$w_k$$ in a sentence based upon the preceding words, or “history” $$w_1,w_2,\cdots,w_{k-1}$$. N-gram models, which predict $$w_k$$ based on a few preceding words, have been the main story of language models so far. Due to their local (Markov) dependence assumption, sentences preferred by N-gram models typically lack semantic coherence and syntactic well-formedness.

A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence in natural language: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N-gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N-gram model in complementary ways and the overall improvement from their combination is nearly additive.

May 2, 2001

The introduction of one or more needles into a patient’s anatomy is a common component in a broad class of minimally invasive therapies. Systems which implement this capability vary widely, in terms of imaging, planning, and placement techniques. In the first part of this talk, I will present work on several of these systems we have developed, utilizing CT, fluoroscopic, and ultrasonic imaging in conjunction with robotic and free-hand placement.

These procedures, however, share a common structure and, unlike many surgical procedures, the actual execution of a needle placement is a very constrained task. It is this restricted nature which makes needle placement an inviting target, both for such particular, application-specific systems, and for a more general treatment. The second part of the presentation focuses on the development of a unified representation for this class of systems. Representing the geometric, device, and algorithmic components of these systems, this framework allows for configuration, implementation and analysis of particular systems assembled from modular components. I will present an overview of this representation system, as well as discussing its implementation and progress in its validation.