What is Hashing

  • A hash function is any function that can be used to map data of arbitrary size to fixed-size values.
  • The values returned by a hash function are called hashes or hashcodes or hash values.


Consider the above hash function. Here all keys are passed through a hash function and are mapped to integers from 0 to 15. So this has used some algorithm for the mapping, which is called the hash function.

But we can see the problem here. Two names ‘John Smith’ and ‘Sandra Dee’ are being mapped to the same hash(integer 02). This is called as hash-collision.

Common solutions to avoid this collision are Chaining and Open Addressing.
Types of Hash Functions:
1. Cryptographic
2. Non-Cryptographic

Hash Tables (Hash Maps)

Hash functions are very commonly used with a combination of hash tables. Suppose we have to perform the following operations in Students records efficiently: 

a. Insert new Student Record
b. Search a Student record based on email-address
c. Delete a student record

Now let's compare commonly used data-structures. 


Insertion
Search
Deletion
Arrays
Worst-case O(n)
Worst-Case O(n)
This can be optimized to O(log n) by using sorted data and binary tree
Worst-Case O(n)

Linked List
Worst-case O(1)
Worst-case O(n)
Worst-case O(n)
Binary Search Tree
Worst-case O(log n)
Worst-case O(log n)
Worst-case O(log n)

We can see for most of the data structures, search is dependent on n (number of items). So if n is very large, obviously our search time will be large. So how can we improve it? 

Hash Function with Hash Table
Hash Table with Hash Function will provide constant time for all the above three operations.
Now let's see how.

We use hash-table any we know fixed size of it. Suppose the hash-table which we are using allows a maximum of 5 entries in it.
Now hash function  will use email-address as key and crate hash of it . Then we will perform modulo operation to map to a the index.


//BUCKET

Distributed Hashing

First of all, I will tell you this kind of hashing is very common. We can see in Memcached or Redis. 
As the number of students is continuously growing it will be very difficult to store all the students' information in a hashtable in a single machine.
So in order to store a large amount of data , we have to distribute the hashtable to the multiple servers.
But the problem is how will we determine which server will hold that key, or if we want to retrieve value of a key, which server should be asked for?


References: 

Post a Comment

Previous Post Next Post