This implementation utilizes vectors to improve performance compared to current byte by byte implementation for POWER7. The performance improvement is upto 4x. This patch is tested on powerpc64 and powerpc64le.