Old/Codility

Codility - Genomic Range Query // C++

mang_dev 2020. 2. 18. 23:33

문제

 

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).

For example, consider string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:

  • The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
  • The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
  • The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.

Write a function:

vector<int> solution(string &S, vector<int> &P, vector<int> &Q);

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as a vector of integers.

For example, given the string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6

the function should return the values [2, 4, 1], as explained above.

Write an efficient algorithm for the following assumptions:

  • N is an integer within the range [1..100,000];
  • M is an integer within the range [1..50,000];
  • each element of arrays P, Q is an integer within the range [0..N − 1];
  • P[K] ≤ Q[K], where 0 ≤ K < M;
  • string S consists only of upper-case English letters A, C, G, T.

 


 

풀이

 

문자열 S와 벡터 P, Q가 주어진다.

 

P[i]와 Q[i]를 각각 문자열의 시작과 끝 인덱스로 봤을 때, A C G T 중 가장 작은(아스키 코드에서) 알파벳을 정답 벡터로 구해야 한다. (A C G T는 각각 1 2 3 4로 구함)

 

문제 자체는 어렵지 않지만 P[i]와 Q[i]마다 하나씩 문자열에서 찾아보면 시간 초과가 빈번하게 발생한다.

 

따라서, 알파벳과 인덱스를 가지는 벡터를 하나 선언해서 알파벳에 대해 오름차순으로, 알파벳이 같다면 인덱스에 대해 오름차순으로 정렬을 하였다.

 

그 다음에는 해당 벡터의 인덱스를 하나씩 보면서 P[i]에서 Q[i] 사이가 나오면 그 인덱스를 정답 벡터에 집어 넣었다.

 

이 문제에서 막혀서 codility를 요즘 안 했었는데 20분만에 풀어버렸다 8ㅅ8


 

코드

더보기
#include <algorithm>

bool compare(pair<int, int> a, pair<int, int> b) {
	if (a.first == b.first)
		return a.second < b.second;
	else
		return a.first < b.first;
}

vector<int> solution(string &S, vector<int> &P, vector<int> &Q) {
	vector<int> ans;

	vector<pair<int, int>> v;

	for (int i = 0; i < S.size(); i++) {
		int ch;
		if (S[i] == 'A')
			ch = 1;
		else if (S[i] == 'C')
			ch = 2;
		else if (S[i] == 'G')
			ch = 3;
		else
			ch = 4;
		v.push_back(make_pair(ch, i));
	}

	sort(v.begin(), v.end(), compare);
	
	for (int i = 0; i < P.size(); i++) {
		for (int j = 0; j < v.size(); j++) {
			if (P[i] <= v[j].second && v[j].second <= Q[i]) {
				ans.push_back(v[j].first);
				break;
			}
		}
	}

	return ans;
}