Access Control and Security of Datasets by Usage Tracking Using Block Chain Technology

Bhatt, Piyush
The thesis proposes a novel way to solve the threat of exposure of sensitive data from large datasets. Data is captured by organizations and converted into datasets. Two or more datasets may be combined to fetch critical or sensitive data which then can be misused for a variety of purposes. An unauthorized user may be able to access the data. This thesis proposes a blockchain approach to tracking data usage and will record details such as when and by whom the datasets are accessed. The Blockchain is used to record information about the user who accesses the datasets. The user name of the individual, name of the dataset accessed, the method using which the dataset is accessed(Command Line Interface or Map Reduce) and the command line operation performed (cat, copy, move, put) is recorded in each block. Each blockchain represents a dataset. Blockchains are very secure when it comes to storing sensitive information because the data inside a blockchain is immutable. The data inside each block is stored using hash values and each block is connected to the others using the hash value of the previous block. If an unauthorized modification is done to the data, the hash value would change, thus rendering all the following blocks invalid. This would alert the administrator that information has been modified. When a dataset is accessed, the username, the dataset name, the method of access (Command Line Interface or Map Reduce) and the command line operation (cat, copy, move, put) are captured using the data usage tracker and these values are used to create the blockchain. If an unauthorized modification is done, the blockchain validation process will identify the illegal access and report accordingly.