Give My Text Back

Description:
题目1 : Give My Text Back

To prepare for the English exam Little Ho collected many digital reading materials. Unfortunately the materials are messed up by a malware.

It is known that the original text contains only English letters (a-zA-Z), spaces, commas, periods and newlines, conforming to the following format:

  1. Each sentence contains at least one word, begins with a letter and ends with a period.
  2. In a sentence the only capitalized letter is the first letter.
  3. In a sentence the words are separated by a single space or a comma and a space.
  4. The sentences are separated by a single space or a single newline.

It is also known the malware changes the text in the following ways:

  1. Changing the cases of letters.
  2. Adding spaces between words and punctuations.

Given the messed text, can you help Little Ho restore the originaltext?

Input:

A string containing no more than 8192 English letters (a-zA-Z), spaces, commas, periods and newlines which is the messed text.

Output:

The original text.

Sample input:

my Name is Little Hi.
His name IS Little ho , We are friends.

Sample output:

My name is little hi.
His name is little ho, we are friends.

水题。

分析:

首先弄清楚,text只会发生两种变化:

  1. 大小写变化。
  2. word与word之间,或者word与punctuation之间增加space,space只会增加不会减少。所以不会出现 hias.HIS 这样的情况(本来应该是这样 hias. His )。

思路:

  1. 首先,按行读取数据(这里存成string类型)。可以每读取一行,紧接着对其进行处理;也可以读取所有数据后(存储到 vector<string> 中),遍历每一项并对其处理,这里的代码采用后者。
  2. 对每一行的数据用空格' ' 进行split,得到 vector<string> 存储split后的子串,
  3. 遍历上面得到的vector<string> 对每一个子串进行分情况处理。

注意:

  1. 代码行处,对数据分割后得到的字串可能情况有下面几种情况。这里需要特殊对待的是最后一种情况。
      如果得到的子串是 hias. HIS ,处理后正确的结果应该是hias. His ,但是如果直接分割、小写化后结果是hias. his。原因是,这里判断首字母是否大写是根据 . 判断的,如果遇到了. 就设置下面子串的首字母大写,所以这里需要判断子串中是否存在 .
      倒数第二种情况可以直接将其转换为小写形式后直接append到结果string中,并不会产生影响。
    • ""
    • ","
    • "."
    • "hELLo"
    • "hELLo,"
    • "hELLo."
  2. 另外,上面是一开始分析漏掉的情况,提交一直wa,wa的那叫一个心痛,所以做题一定分析完全之后下手。
  3. 这道题目页可以逐个字符的读取,并对其进行处理,这种情况下会相对比本post中提到的情况容易分析和实现。参考代码

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <locale>
#include <algorithm>

using namespace std;

class Solution {
private:
// self-definition split function
vector<string> &split(const string &s, char delim, vector<string> &elems) {
stringstream ss(s);
string item;
while (getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}

vector<string> split(const string &s, char delim) {
vector<string> elems;
split(s, delim, elems);
return elems;
}

public:
string process(const string s) {
string originSen;
bool isCapital = false;
vector<string> rs;
char delim = ' ';

rs = split(s, delim); //split the data using delim.

for (int i = 0; i < rs.size(); i++) {
if (rs[i] != "") {
if (rs[i] == ","){
originSen.append(",");
}
else if (rs[i] == "."){
originSen.append(".");
isCapital = true;
}
else {
transform(rs[i].begin(), rs[i].end(), rs[i].begin(), ::tolower);
if (isCapital || i == 0) {
locale loc;
rs[i][0] = toupper(rs[i][0], loc);
if (i != 0) originSen.append(" ");
originSen.append(rs[i]);
}
else {
originSen.append(" ");
originSen.append(rs[i]);
}

// what mentioned in Attenton 1(注意1)
if (rs[i].find('.') != string::npos)
isCapital = true;
else
isCapital = false;
}
}
}

return originSen;
}
};

int main() {
vector<string> data;
string tem;
Solution s;

getline(cin, tem);
while (tem != "") {
data.push_back(tem);
getline(cin, tem);
}

for (int i = 0; i < data.size(); i++) {
string rs = s.process(data[i]);
cout << rs << endl;
}

//system("pause");
return 0;
}