Industry Insights

Scientists Find Optimal Balance of Data Storage and Time

February 8, 2024

105

About 70 years ago, an engineer at IBM named Hans Peter Luhn quietly changed the course of computer science. Luhn already held several patents, including one for a device that could measure a cloth’s thread count and another for a guide that determined what mixed drinks you could make from the ingredients in your kitchen. But in a 1953 internal IBM paper, he proposed a new technique for storing and retrieving information that is now built into just about all computational systems: the hash table.

Hash tables are a major class of data structures. They offer an especially convenient method for accessing and altering information in massive databases. But this technology comes with an unavoidable trade-off.

In a 1957 paper published in the IBM Journal of Research and Development, W. Wesley Peterson identified the main technical challenge that hash tables pose: They need to be fast, meaning that they can quickly retrieve the necessary information. But they also need to be compact, using as little memory as possible. These twin objectives are fundamentally at odds. Accessing and modifying a database can be done more quickly when the hash table has more memory; and operations become slower in hash tables that use less space. Ever since Peterson laid out this challenge, researchers have tried to find the best balance between time and space.

Computer scientists have now mathematically proved that they have found the optimal trade-off. The solution came from a pair of recent papers that complemented each other. “These papers resolve the long-standing open question about the best possible space-time trade-offs, yielding deeply surprising results that I expect will have a significant impact for many years to come,” said Michael Mitzenmacher, a computer scientist at Harvard University who was not involved in either study.

“I would definitely say it is a big deal,” added Rasmus Pagh, a computer scientist at the University of Copenhagen. “A lot of people have worked on this problem, trying to see how much you can squeeze space, while also having time-efficient operations. This is the one I would have loved to solve.”

Making a Hash of It

Hash tables are among the oldest, simplest, fastest and most widely used data structures today. They’re designed to perform three basic operations: insertions, which add new items to the database; queries, which access an item or check to see whether it exists; and deletions. A hash table can be ephemeral — existing only as long as a particular program runs — or it can be a permanent part of your computer’s operating system. A web browser such as Chrome or Safari may have multiple built-in hash tables intended to keep track of different kinds of data.

Entries in a hash table are stored as pairs, with the item — the information itself — connected to a key that identifies the information. Plug a key into a hash table’s query algorithm, and it takes you directly to the item. This may not sound so extraordinary, but for enormous databases it can be a great time-saver.

Around the TUT GLOBAL

Scientists Find Optimal Balance of Data Storage and Time

Making a Hash of It

EDITOR PICKS

POPULAR CATEGORY

ABOUT US

FOLLOW US

Making a Hash of It

RELATED ARTICLESMORE FROM AUTHOR

Why Computer Scientists Consult Oracles

Quantum Computers Cross Critical Error Threshold

The AI Pioneer With Provocative Plans for Humanity

Around the TUT GLOBAL

EDITOR PICKS

POPULAR CATEGORY

ABOUT US

FOLLOW US

RELATED ARTICLES MORE FROM AUTHOR