Updated December 19, 2023
Introduction to Shortest Common Supersequence
The Shortest Common Supersequence (SCS) is the minor possible sequence that contains both given sequences, X and Y, as subsequences. In other words, one can derive both input sequences by deleting some elements from the shortest common supersequence without altering the order of the remaining elements. Computer science often employs this concept, especially in the context of dynamic programming algorithms, for solving string-related problems. The goal is to minimize the length of the supersequence while ensuring it encompasses all elements of the original sequences.
Table of Content
- Introduction of Shortest Common Supersequence
- Problem Statement
- Examples
- Solving Shortest Common Supersequence
Problem Statement: Shortest Common Supersequence (SCS)
Analyze a given set of sequences S1, S2,…, and Sn; the task is to find the shortest common supersequence SCS that contains all the sequences Si as subsequences.
The supersequence SCS should be of the minimum possible length and include all the elements of each input sequence without repetitions.
Input:
- n: The number of input sequences.
- S1, S2,…, Sn: The individual sequences.
Output:
- The shortest common supersequence SCS that includes all the input sequences.
Example:
Input:
n = 3
S1 = “ABC”
S2 = “ACF”
S3 = “BCD”
Output: Shortest Common Supersequence: “ABCD”
To solve the problem, you must determine the order to combine the input sequences to form the shortest supersequence while ensuring that each sequence appears exactly once in the final result. Commonly used algorithms, like dynamic programming, efficiently solve this optimization problem.
Common Supersequence
A common supersequence is a sequence that contains two or more given sequences as subsequences. In other words, if you have two or more sequences, a common supersequence is a new sequence that includes all the elements of the original sequences in the same order while maintaining their relative order within each sequence.
For example, let’s consider two sequences:
- Sequence 1: “ABC”
- Sequence 2: “ACF”
A common supersequence for these sequences could be “ABCF” because it contains both “ABC” and “ACF” as subsequences.
Examples
Consider the following input strings:
Sequence 1 (S1): “edu”
Sequence 2 (S2): “cba”
We aim to find the shortest common supersequence (SCS) containing both strings.
If you concat both the strings S1 and S2, you will get another string, i.e., S3.
S3: “educba”
In this supersequence, both S1 and S2 strings are added side by side.
You can generate another supersequence by adding characters in between these subsequences while maintaining their order. For example,
S4: “edumicba”
In this supersequence, we first add “edu” from S1, then “mi” extra characters, and then “cba” from S2. Note that this is also a supersequence of strings S1 and S2, but this is not the shortest common supersequence.
You can also generate another supersequence, for example:
S5: “cbaedu”
This supersequence has a shortened length like “educba.” You have only two shorted common super sequences of strings S1 and S2, which have a length of 6.
Consider another example of the shortest common supersequence. Consider these two strings:
Sequence 1: “educba”
Sequence 2: “baeduc”
You can find its supersequence by combining them, but that will not be the shortest one. If you analyze these strings carefully, you will notice that.
To find the shortest common subsequence of these two strings. Your “educ” is common in both strings. The substring “educ” is the last of string S2, and the substring “educ” is from the start of string S1.
So, if you merge string S1 at the end of string S2, you will get “baeduceducba.” Now, you can delete those four repeated characters from this supersequence. Hence, it will become “baeducba,” which is the shortest common supersequence of the string “educba” and “baeduc”.
The final string, “baeducba,” has both strings in it.
Solving Shortest Common Supersequence
You can solve and implement shortest common subsequence problems in various ways. Some of these are discussed as follows.
1. Brute Force Approach
In this approach, you will find all possible super sequences of strings A and B. You will find the shortest among these supersequences. This approach, which sees all possibilities, is recursive. Due to this, it has exponential time complexity.
Steps:
- If one string is empty, the other becomes a supersequence.
- For matched characters, they are included in the supersequence. The process continues with the remaining parts of both strings.
- If there is no match, two options are explored. Use the first character from each string.
- Choose the shorter supersequence between the two options.
- This process continues until it explores all possible supersequences.
Implementation in C++ Code:
#include <iostream>
#include <cstring>
using namespace std;
string shortestCommonSupersequence(const string& str1, const string& str2) {
if (str1.empty()) return str2;
if (str2.empty()) return str1;
if (str1[0] == str2[0]) {
return str1[0] + shortestCommonSupersequence(str1.substr(1), str2.substr(1));
} else {
string scs1 = str1[0] + shortestCommonSupersequence(str1.substr(1), str2);
string scs2 = str2[0] + shortestCommonSupersequence(str1, str2.substr(1));
return scs1.length() < scs2.length() ? scs1 : scs2;
}
}
int main() {
string str1 = "xyz";
string str2 = "zxy";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be,
The Time Complexity of this code will be exponential, i.e., O(2^(m+n)), where m and n are the lengths of the input strings.
Space complexity will be O(m+n) due to the recursive nature of the algorithm. The additional space is used for the function call stack.
2. Memorization (Top-Down DP)
This technique is also known as Top-Down Dynamic Approach. This technique uses a recursive approach with caching to store and retrieve already calculated results. So, we avoid redundant results in calculations.
Step:
- A memoization table stores and retrieves previously calculated results. This avoids redundant calculations.
- In recursive calls, check if the result is already memoized. If not, the result is calculated and then memoized.
- Base cases handle scenarios where one string is empty.
Implementation in C++ Code:
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
// Memoization helper function
string memoizedSCS(const string& str1, const string& str2, int i, int j, unordered_map<string, string>& memo) {
// Base cases
if (i == 0) return str2.substr(0, j);
if (j == 0) return str1.substr(0, i);
// Check if result is already memoized
string key = to_string(i) + "|" + to_string(j);
if (memo.find(key) != memo.end()) {
return memo[key];
}
// Recursive cases
if (str1[i - 1] == str2[j - 1]) {
memo[key] = memoizedSCS(str1, str2, i - 1, j - 1, memo) + str1[i - 1];
} else {
string scs1 = memoizedSCS(str1, str2, i - 1, j, memo) + str1[i - 1];
string scs2 = memoizedSCS(str1, str2, i, j - 1, memo) + str2[j - 1];
memo[key] = (scs1.length() < scs2.length()) ? scs1 : scs2;
}
return memo[key];
}
// Wrapper function for Memoization
string shortestCommonSupersequence(const string& str1, const string& str2) {
unordered_map<string, string> memo;
return memoizedSCS(str1, str2, str1.length(), str2.length(), memo);
}
int main() {
string str1 = "xyz";
string str2 = "zxy";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be,
This memorization technique improves time compared to the brute force technique. However, it also depends on a number of unique subproblems. The time complexity of the dynamic approach will be O(m*n), where m and n are the lengths of input strings. This is because of two nested loops to fill the DP table.
Space complexity will be O(m*n), which is the size of the DP table.
3. Dynamic Approach (Bottom-Up DP)
You can optimize time complexity by using a dynamic approach. You will create a table to store lengths of common supersequences for different prefixes of the input strings. The dynamic technique uses a table to store intermediate results.
This approach initializes a DP table of size (m+1) x (n+1). The table is filled by iterating through the strings and comparing characters. The length of the Shortest Common Supersequence is calculated. The supersequence is reconstructed using this DP table.
Steps:
- Initialization:
- Create dp table (size: (m+1) x (n+1)).
- Set the first-row dp[i][0] and the first column dp[0][j] from 0 to m and 0 to n.
- Fill DP Table:
- Iterate from dp[1][1].
- If str1[i-1] and str2[j-1] are equal, set dp[i][j] = dp[i-1][j-1] + 1.
- Else, set dp[i][j] to max of dp[i-1][j] and dp[i][j-1].
- Calculate SCS Length:
- SCS length: scsLength = m + n – dp[m][n].
- Reconstruct SCS:
- Initialize pointers i and j to str1 and str2 lengths.
- While i and j are greater than 0: If str1[i-1] and str2[j-1] are equal, prepend character to supersequence and decrement both. Otherwise, prepend the character from the string with a larger DP value and decrease the index.
- Return Result:
- The constructed string is the Shortest Common Supersequence.
Implementation in C++ Code:
#include <iostream>
#include <vector>
using namespace std;
string shortestCommonSupersequence(const string& str1, const string& str2) {
int m = str1.length();
int n = str2.length();
// Initialize DP table
vector<vector> dp(m + 1, vector(n + 1, 0));
// Fill DP table
for (int i = 1; i <= m; ++i) {
for (int j = 1; j <= n; ++j) { if (str1[i - 1] == str2[j - 1]) { dp[i][j] = dp[i - 1][j - 1] + 1; } else { dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]); } } } // Length of SCS int scsLength = m + n - dp[m][n]; // Reconstruct SCS string scs; int i = m, j = n; while (i > 0 && j > 0) {
if (str1[i - 1] == str2[j - 1]) {
scs = str1[i - 1] + scs;
--i;
--j;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
scs = str1[i - 1] + scs;
--i;
} else {
scs = str2[j - 1] + scs;
--j;
}
}
while (i > 0) {
scs = str1[i - 1] + scs;
--i;
}
while (j > 0) {
scs = str2[j - 1] + scs;
--j;
}
return scs;
}
int main() {
string str1 = "xyz";
string str2 = "zxy";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be,
The time complexity of the dynamic approach will be O(m*n), where m and n are the lengths of input strings. This is because of two nested loops to fill the DP table.
Space complexity will be O(m*n), which is the size of the DP table.
4. Using the LCS (Longest Common Subsequence) technique
You can use the LCS method to solve SCS problems. This technique first finds the LCS of the given strings and then uses it to construct SCS.
The length of the LCS is found using a DP table. Then, the SCS is reconstructed based on the LCS.
Steps
- lcsLength Function
- Initialize a DP table.
- Fill the DP table to find the length of the LCS.
- Return the length of the LCS.
- shortestCommonSupersequence Function:
- Initialize a DP table.
- Call lcsLength to find the length of the LCS.
- Calculate the length of the SCS.
- Reconstruct the SCS using the LCS and DP table.
- Main Function:
- Create input strings.
- Call shortestCommonSupersequence.
- Print the Shortest Common Supersequence and its length.
Implementation of C++ Code:
#include <iostream>
#include <vector>
using namespace std;
// Function to find the LCS and the DP table
int lcsLength(const string& str1, const string& str2, vector<vector>& dp) {
int m = str1.length();
int n = str2.length();
// Fill DP table
for (int i = 1; i <= m; ++i) {
for (int j = 1; j <= n; ++j) {
if (str1[i - 1] == str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
// Function to find the Shortest Common Supersequence using LCS
string shortestCommonSupersequence(const string& str1, const string& str2) {
int m = str1.length();
int n = str2.length();
// Initialize DP table
vector<vector> dp(m + 1, vector(n + 1, 0));
int lcsLen = lcsLength(str1, str2, dp);
// Length of SCS
int scsLen = m + n - lcsLen;
// Reconstruct SCS
string scs;
int i = m, j = n;
while (i > 0 && j > 0) {
if (str1[i - 1] == str2[j - 1]) {
scs = str1[i - 1] + scs;
--i;
--j;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
scs = str1[i - 1] + scs;
--i;
} else {
scs = str2[j - 1] + scs;
--j;
}
}
while (i > 0) {
scs = str1[i - 1] + scs;
--i;
}
while (j > 0) {
scs = str2[j - 1] + scs;
--j;
}
return scs;
}
int main() {
string str1 = "xyz";
string str2 = "zxy";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be,
The time complexity of this approach is O(m*n), where m and n are the lengths of input strings. You need to iterate two DP tables here, but the time complexity will be only O(m*n).
The space complexity of this approach will be O(m*n).
5. Using LCS and Concatenation method
You can use LCS and concatenation to find LCS and then concat the remaining characters from the strings to SCS. This technique is similar to the LCS method but reconstructs supersequences differently.
To find the length of the LCS, one utilizes a DP table and subsequently reconstructs the supersequence by concatenating the remaining characters from both strings.
Steps
- lcsLength
- Fill the DP table for LCS length.
- Return LCS length.
- shortestCommonSupersequence
- Initialize the DP table.
- Find LCS length using lcsLength.
- Reconstruct LCS into string scs.
- Concat remaining str1 and str2 to scs.
- Return scs as SCS.
- Main Function
- Create input strings.
- Call shortestCommonSupersequence.
- Print SCS and its length.
Implementation in C++ Code:
#include <iostream>
#include <vector>
using namespace std;
// Function to find the LCS and the DP table
int lcsLength(const string& str1, const string& str2, vector<vector>& dp) {
int m = str1.length();
int n = str2.length();
// Fill DP table
for (int i = 1; i <= m; ++i) {
for (int j = 1; j <= n; ++j) {
if (str1[i - 1] == str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
// Function to find the Shortest Common Supersequence using LCS and Concatenation
string shortestCommonSupersequence(const string& str1, const string& str2) {
int m = str1.length();
int n = str2.length();
// Initialize DP table
vector<vector> dp(m + 1, vector(n + 1, 0));
// Call lcsLength to find the length of the LCS
int lcsLen = lcsLength(str1, str2, dp);
// Reconstruct LCS
int i = m, j = n;
string scs;
while (i > 0 && j > 0) {
if (str1[i - 1] == str2[j - 1]) {
scs = str1[i - 1] + scs;
--i;
--j;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
scs = str1[i - 1] + scs;
--i;
} else {
scs = str2[j - 1] + scs;
--j;
}
}
// Concatenate remaining characters from str1 and str2
while (i > 0) {
scs = str1[i - 1] + scs;
--i;
}
while (j > 0) {
scs = str2[j - 1] + scs;
--j;
}
return scs;
}
int main() {
string str1 = "ABC";
string str2 = "BCA";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be,
The time complexity of this approach is O(m*n), where m and n are the lengths of input strings. You need to iterate only one DP table.
The space complexity of this approach will be O(m*n) for the DP table.
6. Using Binary Search with LCS method
You can use a binary search algorithm to find the length of a common subsequence. Then, construct a supersequence based on this length. Binary search reduces the number of iterations.
We use binary search to find the maximum length with a common subsequence. The size reconstructs the Shortest Common Supersequence.
Steps
- Binary Search for Length
- Binary search to find max length with common subsequence.
- Utilize lcsLength in each iteration.
- Reconstruct SCS
- Reconstruct SCS based on found length.
- Use lcsLength to find LCS.
- Append characters until LCS, then the remaining characters.
- Main Function
- Create input strings str1 and str2.
- Call shortestCommonSupersequence.
- Print SCS and its length.
Implementation in C++ Code:
#include <iostream>
#include <vector>
using namespace std;
// Function to find the LCS (similar to LCS and Concatenation method)
int lcsLength(const string& str1, const string& str2, vector<vector>& dp) {
int m = str1.length();
int n = str2.length();
// Fill DP table
for (int i = 1; i <= m; ++i) {
for (int j = 1; j <= n; ++j) {
if (str1[i - 1] == str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
// Function to check if there exists a common subsequence of length 'len'
bool hasCommonSubsequence(const string& str1, const string& str2, int len) {
int m = str1.length();
int n = str2.length();
vector<vector> dp(m + 1, vector(n + 1, 0));
// Calculate LCS length
int lcsLen = lcsLength(str1, str2, dp);
// Check if LCS length is greater than or equal to 'len'
return lcsLen >= len;
}
// Function to find the Shortest Common Supersequence using Binary Search with LCS
string shortestCommonSupersequence(const string& str1, const string& str2) {
int low = 0;
int high = str1.length() + str2.length();
while (low < high) {
int mid = low + (high - low) / 2;
// Check if there exists a common subsequence of length 'mid'
if (hasCommonSubsequence(str1, str2, mid)) {
high = mid;
} else {
low = mid + 1;
}
}
// Reconstruct the Shortest Common Supersequence
int m = str1.length();
int n = str2.length();
int i = m, j = n;
vector<vector> dp(m + 1, vector(n + 1, 0));
lcsLength(str1, str2, dp);
// Reconstruct SCS
string scs;
while (i > 0 && j > 0) {
if (str1[i - 1] == str2[j - 1]) {
scs = str1[i - 1] + scs;
--i;
--j;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
scs = str1[i - 1] + scs;
--i;
} else {
scs = str2[j - 1] + scs;
--j;
}
}
// Concatenate remaining characters from str1 and str2
while (i > 0) {
scs = str1[i - 1] + scs;
--i;
}
while (j > 0) {
scs = str2[j - 1] + scs;
--j;
}
return scs;
}
int main() {
string str1 = "EDUCBA";
string str2 = "BAEDUC";
string result = shortestCommonSupersequence(str1, str2);
cout << "Shortest Common Supersequence: " << result << endl;
cout << "Length of SCS: " << result.length() << endl;
return 0;
}
The output will be
The binary search algorithm is used to find a range of possible lengths. It takes O(log (m+n)) iterations. Because of dynamic programming, you must call the lcsLength function in each iteration, which takes O(m*n) time complexity. Hence, the total time complexity will be O(log (m+n) * (m*n)).
Space complexity will be O(m*n) to store one DP table.
Conclusion
The shortest common Supersequence (SCS) is the superstring of a given two strings. SCS will contain both given strings inside it, and the length will be the shortest among all possible SCSs. You can solve and implement this problem in various ways. The brute force technique is a straightforward technique to solve, but it has exponential time complexity. However, you can optimize time complexity using a dynamic approach and other possible methods. The time complexity of the Dynamic approach of this problem will be polynomial, i.e., O(m*n), where m and n are lengths of input strings.
Recommended Articles
We hope this EDUCBA information on “Shortest Common Supersequence” benefited you. You can view EDUCBA’s
recommended articles for more information.