website/content/ChapterFour/0100~0199/0187.Repeated-DNA-Sequences.md
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for Example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]
所有 DNA 由一系列缩写为 A,C,G 和 T 的核苷酸组成,例如:“ACGAATTCCG”。在研究 DNA 时,识别 DNA 中的重复序列有时会对研究非常有帮助。编写一个函数来查找 DNA 分子中所有出现超多一次的10个字母长的序列(子串)。
package leetcode
// 解法一
func findRepeatedDnaSequences(s string) []string {
if len(s) < 10 {
return nil
}
charMap, mp, result := map[uint8]uint32{'A': 0, 'C': 1, 'G': 2, 'T': 3}, make(map[uint32]int, 0), []string{}
var cur uint32
for i := 0; i < 9; i++ { // 前9位,忽略
cur = cur<<2 | charMap[s[i]]
}
for i := 9; i < len(s); i++ {
cur = ((cur << 2) & 0xFFFFF) | charMap[s[i]]
if mp[cur] == 0 {
mp[cur] = 1
} else if mp[cur] == 1 { // >2,重复
mp[cur] = 2
result = append(result, s[i-9:i+1])
}
}
return result
}
// 解法二
func findRepeatedDnaSequences1(s string) []string {
if len(s) < 10 {
return []string{}
}
ans, cache := make([]string, 0), make(map[string]int)
for i := 0; i <= len(s)-10; i++ {
curr := string(s[i : i+10])
if cache[curr] == 1 {
ans = append(ans, curr)
}
cache[curr]++
}
return ans
}