Search Algorithms

📝📓 record search algorithms

they are: linear search、binary search、linear selection、Top K problem

Introduction

Searching Algorithms are designed to check for an element or retrieve an element from any dta structure where it is stored. Based on the type of search operation, thesealgorithms are generally classified into two categories:

Sequential Search: In this, the list or array is traversed sequentially and every element is checked. For example: Linear Search
Interval Search: These algorithms are specifically designed for searching in sorted data-structures. These type of searching algorithms are much more efficient than Linear Search as they repeatedly target the center of the search structure and divide the search space in half. For example: Binary Search.

Linear Search

time complexity: O(N)

Linear search is uesed on a collections of items, it relies on the technique of traversing a list from start to end by exploring properties of all the elements that are found on the way.

For example, consider an array of integers of size N, you should find and print the position of all the elements with value X. Here, the linear search is based on the idea of matching each element from the begaining of the list to the end of the list with the integer X, and then printing the position of element if the condition is 'True'.

pseudocode in C:

void find(int *a, int len){
	for(int i = 0; i < len; i ++){
		if(a[i] == '5')
			print("%d \n", i);
	}
}

Binary Search

time complexity: O(lgN)

Binary search works only on a sorted set of elements. To use binary search on a collection, the collection must first be sorted.

When binary search is used to perform operations on a sorted set, the number of iterations can always be reduced on the basis of the value that is being searched.

pseudocode in C:

int bin_search(int *a, int low, int high, const int x){
	int index = -1;
	int mid;
	if(low <= high){
		mid = (low + high) / 2;
		if(a[mid] == x)
			return mid;
		if(a[mid] < x)
			index = bin_search(a, mid + 1, high, x);
		else
			index = bin_search(a, low, mid - 1, x);
	}
	return index;
}

Linear Selection

time complexity: O(N)

Now, we need to find the k-th minimum/maximum item in an unsorted array, as we know, the selection problem can be solved in O(N * lgN) time, since we can sort the array using heap sort or merge sort and then simply index the k-th element in the output array. There are faster algorithms that the work can be done in expected linear time.

As in QuickSort algorithms, the idea is to partition the input array recursively. But unlike quicksort, which recursively process both sides of the partition, in linear selection, it only works on one side of the partition. This different shows up in the analysis: whereas quicksort has an expected running time of O(N * lg N), the expected time of linear selection is O(N).

pseudocode in C:

int linear_selection(int *a, int low, int high, const in k){
	int count = 0;
	if(low == high)
		return a[low];
	int pivot = partition(a, low, high);
	count = oivot - low + 1;
	if(count >= k)
		return linear_selection(a, low, count, k);
	else
		return linear_selection(a, count + 1, high, k - count);

Top K Selection problem

Similar with linear selction problem, now, we need to find the top k elements in an unsorted array, of course, you can write down such code as below:

results = sorted(array, reverse=True)[: 10]

Anything involving a sort will usually take O(N * lgN) time, which will keep you waiting around for several seconds or even minutes when dealing with lots of items. An O(N * lgN) algorithm, for large N, simply cannot be run in realtime when users are waiting.

There are two ways to resolve the problem:

1. solution 1: using heap

time complexity: O(N * lgK), space complecity: O(K)

Luckily, finding the top k items can be done in O(N * lgK) time, which is much, much faster than O(N * lgN), using a heap, which is actually a priority queue. Assuming that we need to select top k maximum items in a list, as in heap sort, First, accordingly, we should build a minimum heap of size K to store the target items, and the top is always minimum item in the heap. Second, adding the rest of items in array recursively to heap if it is larger than top element in the heap, Third, adjusting the heap and always place the minimum item to the top in the heap.

pseudocode in C:

void adjust_heap(int *a, int root, int len){
	int left_child = 2 * root + 1;
	int right_child = 2 * root + 2;
	int max_child;
	if(left_child >=len)
		return;
	else{
		if(right_child >= len)
			max_child = left_child;
		else
			max_child = a[left_child] > a[right_child]? left_child: right_child;
		if(a[root] > a[max_child]){
			int tmp = a[root];
			a[root] = a[max_child];
			a[max_child] = tmp;
			adjust_heap(a, max_child, len);
		}
	}
}

void build_heap(int *a, int len){
	for(int i = len / 2 - 1; i >=0; i --)
		adjust_heap(a, i, len);
}

void top_k(int *a, int len, const in k){
	int heap[k], i;
	for(i = 0; i < k; i ++)
		heap[i] = a[i];
	build_heap(heap, k);
	for(i = k; i < len; i ++){
		if(a[i] > heap[0]){
			heap[0] = a[i];
			adjust(heap, 0, k); 
		}
	}
	
	// print the top k items;
	for(i = 0; i < k; i ++)
		printf("%d ", heap[i]);
}

2. solution 2: using linear selection

time complexity: O(N), space complexity: O(N)

as we can see, the linear selection are faster than heap, however, it needs more space and may not wok well when space is limited.

we have learned about the linear selection algorithm in previous section, it uses partition function in QuickSort to get the pivot item and divide the array into two sublists, if the amount of left sublist equals to K, the left sublist will be the final Top K items, if amount is less than K, it will continue to find the rest (K - amount) items in right sublists, otherwise, it will continue to find the pivot that equals to K in left sublists.

blog