优秀的拆分

信息

此题目来自洛谷，原始题目与提交代码请前往 P1117 [NOI2016] 优秀的拆分 - 洛谷。

题目描述

如果一个字符串可以被拆分为 $AABB$ 的形式，其中 $A$ 和 $B$ 是任意非空字符串，则我们称该字符串的这种拆分是优秀的。例如，对于字符串 $\mathsf{aabaabaa}$ ，如果令 $A=\mathsf{aab}$ ， $B=\mathsf{a}$ ，我们就找到了这个字符串拆分成 $AABB$ 的一种方式。

一个字符串可能没有优秀的拆分，也可能存在不止一种优秀的拆分。比如我们令 $A=\mathsf{a}$ ， $B=\mathsf{baa}$ ，也可以用 $AABB$ 表示出上述字符串；但是，字符串 $\mathsf{abaabaa}$ 就没有优秀的拆分。

现在给出一个长度为 n 的字符串 S，我们需要求出，在它所有子串的所有拆分方式中，优秀拆分的总个数。这里的子串是指字符串中连续的一段。

以下事项需要注意：

出现在不同位置的相同子串，我们认为是不同的子串，它们的优秀拆分均会被记入答案。
在一个拆分中，允许出现 $A=B$ 。例如 $\mathsf{cccc}$ 存在拆分 $A=B=\mathsf{c}$ 。
字符串本身也是它的一个子串。

输入格式

每个输入文件包含多组数据。

输入文件的第一行只有一个整数 $T$ ，表示数据的组数。

接下来 $T$ 行，每行包含一个仅由英文小写字母构成的字符串 $S$ ，意义如题所述。

输出格式

输出 $T$ 行，每行包含一个整数，表示字符串 $S$ 所有子串的所有拆分中，总共有多少个是优秀的拆分。

输入输出样例

输入 #1

4
aabbbb
cccccc
aabaabaabaa
bbaabaababaaba

输出 #1

说明/提示

样例解释

我们用 $S[i,j]$ 表示字符串 $S$ 第 $i$ 个字符到第 $j$ 个字符的子串（从 $1$ 开始计数）。

第一组数据中，共有三个子串存在优秀的拆分： $S[1,4]=\mathsf{aabb}$ ，优秀的拆分为 $A=\mathsf{a}$ ， $B=\mathsf{b}$ ； $S[3,6]=\mathsf{bbbb}$ ，优秀的拆分为 $A=\mathsf{b}$ ， $B=\mathsf{b}$ ； $S[1,6]=\mathsf{aabbbb}$ ，优秀的拆分为 $A=\mathsf{a}$ ， $B=\mathsf{bb}$ 。而剩下的子串不存在优秀的拆分，所以第一组数据的答案是 3。

第二组数据中，有两类，总共四个子串存在优秀的拆分：对于子串 $S[1,4]=S[2,5]=S[3,6]=\mathsf{cccc}$ ，它们优秀的拆分相同，均为 $A=\mathsf{c}$ ， $B=\mathsf{c}$ ，但由于这些子串位置不同，因此要计算三次；对于子串 $S[1,6]=\mathsf{cccccc}$ ，它优秀的拆分有两种： $A=\mathsf{c}$ ， $B=\mathsf{cc}$ 和 $A=\mathsf{cc}$ ， $B=\mathsf{c}$ ，它们是相同子串的不同拆分，也都要计入答案。所以第二组数据的答案是 $3+2=5$ 。

第三组数据中， $S[1,8]$ 和 $S[4,11]$ 各有两种优秀的拆分，其中 $S[1,8]$ 是问题描述中的例子，所以答案是 $2+2=4$ 。

第四组数据中， $S[1,4]$ ， $S[6,11]$ ， $S[7,12]$ ， $S[2,11]$ ， $S[1,8]$ 各有一种优秀的拆分， $S[3,14]$ 有两种优秀的拆分，所以答案是 $5+2=7$ 。

数据范围

对于全部的测试点，保证 ${1}\le{T}\le{10}$ 。以下对数据的限制均是对于单组输入数据而言的，也就是说同一个测试点下的 $T$ 组数据均满足限制条件。

我们假定 $n$ 为字符串 $S$ 的长度，每个测试点的详细数据范围见下表：

测试点编号	${n}\le$	特殊性质
${1}\sim{2}$	$300$	$S$ 中所有字符相同
${3}\sim{4}$	$2000$	$S$ 中所有字符相同
${5}\sim{6}$	$10$
${7}\sim{8}$	$20$
${9}\sim{10}$	$30$
${11}\sim{12}$	$50$
${13}\sim{14}$	$100$
$15$	$200$
$16$	$300$
$17$	$500$
$18$	$1000$
$19$	$2000$
$20$	$30000$

题目解答

注意

本题题解尚未完成/尚未完善，不足以 AC 通过评判。

这道题目要我们拆分字符串，我们第一时间可以想到枚举所有可能性。

/**
 * 洛谷 P1117 解答程序。
 * @author CoolCLK
 */
#include <iostream>
#include <string>
#include <vector>
#define repeat(n) for (size_t _ = 0; _ < n; _++)
typedef unsigned short u_short;
typedef unsigned int u_int;
using namespace std;

u_short T;

/**
 * string 工具类，截取片段。
 * @author CoolCLK
 */
class string_part {
private:
    string* str = nullptr;
    size_t index;
    size_t length;

public:
    string_part(string* str, size_t index, size_t length) {
        this->str = str;
        this->index = index;
        this->length = length;
    }

    size_t get_index() {
        return this->index;
    }

    size_t back() {
        return this->index + this->length;
    }

    string to_string() {
        return this->str->substr(this->index, this->length);
    }
};

/**
 * 寻找字符串中 AABB 格式。
 * @author CoolCLK
 */
u_int findAABB(string str) {
    u_int ans = 0;
    vector<string_part> strs;
    for (size_t i = 0; i < str.length(); i++) {
        for (size_t len = 1; i + (2 * len) <= str.length(); len++) {
            if (str.substr(i, len) == str.substr(i + len, len) && len >= 1) {
                strs.emplace_back(&str, i, 2 * len);
            }
        }
    }
    for (auto s : strs) {
        for (auto ns : strs) {
            if (ns.get_index() == s.back()) {
                ans++;
            }
        }
    }
    return ans;
}

int main() {
    cin >> T;
    vector<u_int> answers;
    repeat(T) {
        string S;
        cin >> S;
        answers.emplace_back(findAABB(S));
    }
    for (auto ans : answers) {
        cout << ans << endl;
    }
    return 0;
}

代码中，我通过枚举 $S$ 每个的一位置并向后扩展，直到产生 $AA$ 串。并很容易地想到， $BB$ 也可以当作 $AA$ 串处理，只需要找到有几个紧邻的 $AA$ 串即可。

但是，这种方法的平均时间复杂度为 $O(n^2)$ ，最坏时间复杂度为 $O(n^4)$ 。

也可以继续优化到 $O(n^3)$ ，略微改动一下：

u_int findAABB(string str) {
    vector<u_int> L(n, 0);
    vector<u_int> R(n, 0);

    for (size_t i = 0; i < n; i++) {
        for (size_t len = 1; i + 2 * len <= n; len++) {
            bool match = true;
            for (size_t k = 0; k < len; k++) {
                if (str[i + k] != str[i + len + k]) {
                    match = false;
                    break;
                }
            }
            if (match) {
                size_t end_index = i + 2 * len - 1;
                if (end_index < n) {
                    L[end_index]++;
                }
                R[i]++;
            }
        }
    }

    u_int ans = 0;
    for (size_t t = 0; t < n - 1; t++) {
        ans += L[t] * R[t + 1];
    }

    return ans;
}

但还是不够，经测试，至少要达到 $<O({n^2})$ 才可能通过，否则就会 TLE 。

怎么办呢？

题目描述​

输入格式​

输出格式​

输入输出样例​

说明/提示​

样例解释​

数据范围​

题目解答​