Research and Advances
Artificial Intelligence and Machine Learning

Median split trees: a fast lookup technique for frequently occuring keys

Posted

Split trees are a new technique for searching sets of keys with highly skewed frequency distributions. A split tree is a binary search tree each node of which contains two key values—a node value which is a maximally frequent key in that subtree, and a split value which partitions the remaining keys (with respect to their lexical ordering) between the left and right subtrees. A median split tree (MST) uses the lexical median of a node's descendents as its split value to force the search tree to be perfectly balanced, achieving both a space efficient representation of the tree and high search speed. Unlike frequency ordered binary search trees, the cost of a successful search of an MST is log n bounded and very stable around minimal values. Further, an MST can be built for a given key ordering and set of frequencies in time n log n, as opposed to n2 for an optimum binary search tree. A discussion of the application of MST's to dictionary lookup for English is presented, and the performance obtained is contrasted with that of other techniques.

View this article in the ACM Digital Library.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More