Hashing Table


Hashing

Hashing is the process of converting any length into a fixed sized string of text using a mathematical function. So no matter how long the text is, it can be converted into an array of numbers and letters through an algorithm. 

Understanding the concept of Hash Tables

A hash table is a data structure that is used to store information, the information in hash table usually has 2 components which is key and value or some sort of record. A key could be something like name and value could be something like phone number. Basically we're mapping the key to the value here. 
In this example, we're representing the hash in shapes.




every time the same key is entered it will spit out the same index as for this example in the picture above.

Hash Function

There are some important methods to hash a string into a key ;
  • Mid- Square
  • Division
  • Folding
  • Digit Extraction
  • Rotating Hash


Mid-Square

To understand more about mid-square technique we can see from the example below. 
Consider a hash table of size 11 which is designated as L and consider this set A of the possible values inserted into it.

 Consider a hash function being h1(n) as mentioned above, applying h1 to the set of values we can get indices where each of them get stored which is the respective key but there's a problem in 75 and 42. Both are getting the same key, which in this case we use mid-square to correct this step.
As seen above, h(2) formula has a r on it which is selected beforehand and then the r becomes the power of n.  Therefore the key of 75 and 42 is now different.

Division

Division is the simplest method of hashing an integer. Divide the string or identifier by using the modulus operator.
Suppose the table is to store strings. One of the ways would be to add up ASCII values of all the characters in the string and take modulo of table size in this case 97.

Folding

Function below takes string as an input, and it processes the string four bytes at a time and interprets each of the four-byte chunks as a single long integer value. The int values for the four-byte chunks are added together. Then resulting sum is converted to the range of 0 to M-1 using the modulus operator. 
For example, if string "aaaabbbb" is passed to sfold, then the first four bytes "aaaa" will be interpreted as the integer value 1,633,711,873 and the next four bytes will be interpreted as the int value 1,650,614,882. Their sum is 3,284,386,755 (unsigned int). If the table size is 101 then the modulus function will cause this key to hash the slot 75 in the table.
int sfold(String s, int M) {
  long sum = 0, mul = 1;
  for (int i = 0; i < s.length(); i++) {
    mul = (i % 4 == 0) ? 1 : mul * 256;
    sum += s.charAt(i) * mul;
  }
  return (int)(Math.abs(sum) % M);
}

Digit Extraction

Predefined digit of the given number is considered as the hash address. For example :
x = 14,568, then we extract the 1st, 3rd, and 5th digit, we'll get the hash code of 158.
Above is an example of digit extraction from an integer.

Rotating Hash

Any hash method can be used here such as division or mid-square. After getting the hash code/address from the hash method and do rotation. Rotation is performed by shifting the digits to get a new hash address. For example :
Hash address given : 20021
Result ; 12002 ( fold the digits)

Collision

Hash collision are practically unavoidable when hashing a random subset of a large set of possible keys. There are two general ways to handle collisions ; Linear Probing and Chaining.

Above is an example of linear probing in C.
 
Above is an example of chaining. We see the Name1 and Name3 getting the same index. But we're just storing it as a chain using linked list. 

Above is a code example of chaining. 

resource : 







Comments