Privacy-handling Techniques and Algorithms for Data Mining
✅ Paper Type: Free Essay | ✅ Subject: Computer Science |
✅ Wordcount: 1088 words | ✅ Published: 18th Apr 2018 |
- VIVEK UNIYAL
ABSTRACT
Data mining can extract a previously unknown patterns from vast collection of data. Nowadays networking, hardware and software technology are rapidly growing outstanding in collection of data amount. Organization are containing huge amount of data from many heterogeneous database in which private and sensitive information of an individual. In data mining novel pattern will be extracted from such data by which we can use for various domains in decision marketing. But in the data mining output there will be sensitive, private or personal information of a particular person can also be revealed. There will be some misuse of finding these types of information, and it can harm the data owner. So in distributed environment privacy is becoming an important issue in many applications of data mining. Techniques of Privacy preserving data mining (PPDM) are provide new direction to solve issues. By PPDM, we can find a valid data mining results without underlying data values learning.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Find out more about our Essay Writing Service
In this dissertation we have introduced two algorithms for privacy handling concern. One is k-anonymization in which information corresponding to any individual person in a release data cannot be distinguished from that of at least k-1 other individual persons whose information also appears in release data. In this algorithm we are achieving the k-anonimyzation some values must be suppressed or generalized in database. K-anonymity have record linkage attack mode and l-diversity can have attack mode of attribute linkage.
KEYWORDS: Data Mining, Advantages and Disadvantages of Data Mining, Privacy handking, K-anonymization Algorithm, L-diversity.
ACKNOWLEDGEMENTS
I wish to take this opportunity to express my deep gratitude to all the people who have extended their cooperation in various ways during my dissertation. It is my pleasure to acknowledge the help of all those individuals.
First of all, I would like to express my deepest gratitude to my dissertation supervisor, Mr. Govind Kamboj without whom none of this would have been possible. He provided me always the essential direction and advice during the work. I am grateful to him to give a shape towards completion of my dissertation. Without his supervision and support, this work would not have been completed successfully in time.
I am grateful to the President, Vice President, Chancellor, Vice Chancellor and Head of the Department of the Graphic Era University for providing an excellent environment for work with ample facilities and academic freedom. I would also like to thank the teaching and non-teaching staff for their valuable support during M.Tech.
Last but not the least; I am grateful to all my teachers and friends for their cooperation and encouragement throughout completing this task.
(Vivek Uniyal)
M.Tech( Computer Science & Engineering)
TABLE OF CONTENTS
CANDIDATES DECLERATION iii
ABSTRACT iv
ACKNOWLEDGEMENT v
LIST OF ABBREVIATIONS ix
LIST OF FIGURES x
1. INTRODUCTION 1
1.1 Problem Statement 1
1.2 Overview 1
1.3 Advantages of data mining 3
1.4 Disadvantages of data mining 4
1.5 Why privacy-handling is required in data-mining 4
1.6 Motivation 6
1.7 Organization 4
2. BACKGROUND AND LITERATURE SURVEY 7
3. METHODS AND METHODOLOGIES 13
3.1 Randomization method 13
3.2 Group based anonymization methods 14
3.2.1 K-Anonymity framework 14
3.2.2 Personalized privacy-preservation 15
3.2.3 Utility based privacy-preservation 15
3.2.4 Sequential releases 15
3.2.5 The l-diversity method 15
3.3 Distributed privacy-preserving data mining 16
3.4 Detailed description about K-anonymity and l-diversity 16
3.4.1 Data collection and Data publishing 16
3.4.2 Privacy Data publishing 17
3.4.3 Algorithm of k-anonimity 19
3.4.4 l-diversity 24
3.4.1.1 Lack of diversity 25
3.4.1.2 Strong background knowledge 25
4. EXPERIMENTAL RESULT 27
4.1 Introduction 27
4.2 Experimental result 27
4.2.1 Result of proposed k-anonymity and l-diversity 27
5. CONCLUSION AND SCOPE FOR FUTURE WORK 33
5.1 Conclusion 33
5.2 Scope for Future Work 33
PUBLICATION OUT OF THIS WORK 34
REFERENCES 35
LIST OF ABBREVIATIONS
PPDP Privacy-preserving data publishing
PPDMPrivacy-preserving data mining
QID Quasi-Identifier
LIST OF FIGURES
Figure 1.1: Data mining a step included in the process of knowledge discovery 1
Figure 1.2 Typical data mining system architecture 2
Figure 1.3: Record Owner, Data Collection and Data Publishing 17
Figure 1.4: Hospital Database 18
Figure 1.5 Taxonomy tree for JOB, SEX, AGE (QID attributes) 20
Figure 1.6 Hospital table Original record in data base 21
Figure 1.7 Table of Sensitive record (Publishing data) 21
Figure 1.8 Table of External Data ppt table 22
Figure 1.9 Resulting data after linking the sensitive and ppl table 22
Figure 1.10 Research table (generalized with k-anonymous published data) 23
Figure 1.11 Extended table (For linking like generalized voter list) 23
Figure 1.12 For checking the k- anonymity 23
Figure 1.13 Result of linking the table research to extended 24
Figure 1.14 Hospital original data record Project 28
Figure 1.15 Comparing the Un-Generalized published and extended data tables 29 Figure 1.16 Comparing Generalized Extended and Sensitive table records 30 Figure 1.17 Table for k-anonymity and l-diversity 32 Figure 1.18 Plotting exact l-value and distinct l-diversity value in weka 33 Figure 1.19 Plotting exact l-value and entropy l-diversity value in weka 33
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: