Opinion
Data and Information

Upstart Puzzles: Selective Sponges

Wringing out an array of options.

Posted
curved sponges

In human body cells, there are special RNA molecules called microRNAs that do not code for proteins but rather degrade messengerRNAs (mRNAs) mostly by binding to parts of them either causing the mRNAs’ destruction or preventing them from translating to proteins. Sometimes microRNAs can be harmful (for example, by degrading useful mRNAs), so a research direction is to inject therapeutic “sponges” (or colloquially “honeypots”) that attract the harmful microRNAs based on preferential Watson-Crick pairing to neutralize them without attracting helpful microRNAs. The puzzle in this column abstracts the problem to its core algorithmic issues.

Given a set of target strings t1, … tn and off-target strings o1, …, om, we want to find a sponge string s such that every target string ti is a (left-to-right) substring of s and no off-target string oj is a substring of s. Hereafter when we say substring, we mean left-to-right substring, so BCD is a left-to-right substring of ABCD, but CBA is not and therefore not a substring for our purposes.

To start, we want the string s to match the targets without overlaps. For example, if s is ABCD, then s matches targets AB and D without overlap. By contrast, ABCD does not match ABC and CD without overlaps. ABCCD or CDABC do.

Figure.  All the targets should be either vertical or horizontal substrings of the sponge, but none of the non-targets should be.

Warm-Up: Consider a four-letter alphabet A, B, C, I. Here are the target strings: CBI, CCC, and BIAI. Here are the non-target strings: BIC and AIC. Find a string s of length 10 such that the target strings match s in a non-overlapping way but neither BIC nor AIC is a substring of s.

Solution to Warm-Up:

Every length 10 solution will be a permutation of the target strings, because overlaps are not allowed. In this case, the only such permutation that does not contain any of the non-target strings as a substring is: CCCCBIBIAI. The reason is that AIC prevents BIAI from being first or second in the permutation and BIC prevents CBI from being first.

Now that you understand the principle, please try the following:

Question: Here are the targets: CBI CCC BIAI CIII IBCA BCCI

Here are the non-targets: CCB, ICB, IBA, CIBI, BBBCI, CCBBA. Please try to find a string s of length 22 without overlaps that matches the targets but not the non-targets

Solution:

Here is one solution: CBIIBCACIIIBIAIBCCICCC.

There are several ways to generalize this. First, suppose we allow targets to overlap in the solution string. For example, allowing overlaps would imply that the solution string ABCD would provide a match for all of the following target strings: A, B, C, D, AB, BC, CD, ABC, BCD, and ABCD itself. Second, the principle of sponge can extend to more dimensions. Let us say that we want to create sponge squares that match targets, but do not match off-targets. Here, matching means left-to-right matching in the horizontal direction and top-to-bottom matching in the vertical direction.

Given the possibility of overlaps as well as the ability to match up-to-down as well as left-to-right, the sponge square

ABC
DCD
CAD

matches ABC, ADC, DCD, CAD, BCA, CDD, CA, CD, DD, CDD, …

Question: Try to find a four-by-four sponge square for the following targets: BBA BIC, BIBA, CBB, BIB, CBC, and CAC while avoiding these non-targets: AAC, BBI, III, AII, CAB, BBB, ABA, CII, and BBC. Please remember that overlaps are ok.

Solution:

CBIB
BICB
CBBA
CACA

Here is a much more difficult challenge.

Question: Try to find a six-by-six sponge square for the following targets: CAI, AIBA, CCII, BCIAAI, IIBIIA, BCI, AIB, AAI, CCIIB, IIA, CBA, IBABI, CBCBAI, and BIC while avoiding these non-targets: IIBA, ACACI, BCCC, III, CABCC, BABAB, BCICI, BAAIA, AABCC, IAAAI, and CCCI.

Solution:

CBBICB
ICCBBA
BICACA
CAIBBI
AAIIAB
IIBIIA

Call a square sponge target-perfect if each of its side lengths equals the length of the longest target string. The square sponges we have already found are target-perfect.

Question: Can you find a set of five four-letter strings that can make it impossible to find a target-perfect (in this case, 4-by-4) square sponge, even in the absence of non-targets?

Solution: Here is one. There are others. AAAA, BBBB, CCCC, DDDD, AABB

Upstart: What are the conditions that determine whether a target-perfect square sponge exists or not?

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More