Privacy-handling Techniques and Algorithms for Data Mining

1088 words (4 pages) Essay

18th Apr 2018 Computer Science Reference this

Tags:

Disclaimer: This work has been submitted by a university student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

  • VIVEK UNIYAL

ABSTRACT

Data mining can extract a previously unknown patterns from vast collection of data. Nowadays networking, hardware and software technology are rapidly growing outstanding in collection of data amount. Organization are containing huge amount of data from many heterogeneous database in which private and sensitive information of an individual. In data mining novel pattern will be extracted from such data by which we can use for various domains in decision marketing. But in the data mining output there will be sensitive, private or personal information of a particular person can also be revealed. There will be some misuse of finding these types of information, and it can harm the data owner. So in distributed environment privacy is becoming an important issue in many applications of data mining. Techniques of Privacy preserving data mining (PPDM) are provide new direction to solve issues. By PPDM, we can find a valid data mining results without underlying data values learning.

In this dissertation we have introduced two algorithms for privacy handling concern. One is k-anonymization in which information corresponding to any individual person in a release data cannot be distinguished from that of at least k-1 other individual persons whose information also appears in release data. In this algorithm we are achieving the k-anonimyzation some values must be suppressed or generalized in database. K-anonymity have record linkage attack mode and l-diversity can have attack mode of attribute linkage.

KEYWORDS: Data Mining, Advantages and Disadvantages of Data Mining, Privacy handking, K-anonymization Algorithm, L-diversity.

ACKNOWLEDGEMENTS

I wish to take this opportunity to express my deep gratitude to all the people who have extended their cooperation in various ways during my dissertation. It is my pleasure to acknowledge the help of all those individuals.

First of all, I would like to express my deepest gratitude to my dissertation supervisor, Mr. Govind Kamboj without whom none of this would have been possible. He provided me always the essential direction and advice during the work. I am grateful to him to give a shape towards completion of my dissertation. Without his supervision and support, this work would not have been completed successfully in time.

I am grateful to the President, Vice President, Chancellor, Vice Chancellor and Head of the Department of the Graphic Era University for providing an excellent environment for work with ample facilities and academic freedom. I would also like to thank the teaching and non-teaching staff for their valuable support during M.Tech.

Last but not the least; I am grateful to all my teachers and friends for their cooperation and encouragement throughout completing this task.

(Vivek Uniyal)

M.Tech( Computer Science & Engineering)

TABLE OF CONTENTS

CANDIDATES DECLERATION iii

ABSTRACT iv

ACKNOWLEDGEMENT v

LIST OF ABBREVIATIONS ix

LIST OF FIGURES x

1. INTRODUCTION 1

1.1 Problem Statement 1

1.2 Overview 1

1.3 Advantages of data mining 3

1.4 Disadvantages of data mining 4

1.5 Why privacy-handling is required in data-mining 4

1.6 Motivation 6

1.7 Organization 4

2. BACKGROUND AND LITERATURE SURVEY 7

3. METHODS AND METHODOLOGIES 13

3.1 Randomization method 13

3.2 Group based anonymization methods 14

3.2.1 K-Anonymity framework 14

3.2.2 Personalized privacy-preservation 15

3.2.3 Utility based privacy-preservation 15

3.2.4 Sequential releases 15

3.2.5 The l-diversity method 15

3.3 Distributed privacy-preserving data mining 16

3.4 Detailed description about K-anonymity and l-diversity 16

3.4.1 Data collection and Data publishing 16

3.4.2 Privacy Data publishing 17

3.4.3 Algorithm of k-anonimity 19

3.4.4 l-diversity 24

3.4.1.1 Lack of diversity 25

3.4.1.2 Strong background knowledge 25

4. EXPERIMENTAL RESULT 27

4.1 Introduction 27

4.2 Experimental result 27

4.2.1 Result of proposed k-anonymity and l-diversity 27

5. CONCLUSION AND SCOPE FOR FUTURE WORK 33

5.1 Conclusion 33

5.2 Scope for Future Work 33

PUBLICATION OUT OF THIS WORK 34

REFERENCES 35

LIST OF ABBREVIATIONS

PPDP Privacy-preserving data publishing

PPDMPrivacy-preserving data mining

QID Quasi-Identifier

LIST OF FIGURES

Figure 1.1: Data mining a step included in the process of knowledge discovery 1

Figure 1.2 Typical data mining system architecture 2

Figure 1.3: Record Owner, Data Collection and Data Publishing 17

Figure 1.4: Hospital Database 18

Figure 1.5 Taxonomy tree for JOB, SEX, AGE (QID attributes) 20

Figure 1.6 Hospital table Original record in data base 21

Figure 1.7 Table of Sensitive record (Publishing data) 21

Figure 1.8 Table of External Data ppt table 22

Figure 1.9 Resulting data after linking the sensitive and ppl table 22

Figure 1.10 Research table (generalized with k-anonymous published data) 23

Figure 1.11 Extended table (For linking like generalized voter list) 23

Figure 1.12 For checking the k- anonymity 23

Figure 1.13 Result of linking the table research to extended 24

Figure 1.14 Hospital original data record Project 28

Figure 1.15 Comparing the Un-Generalized published and extended data tables 29 Figure 1.16 Comparing Generalized Extended and Sensitive table records 30 Figure 1.17 Table for k-anonymity and l-diversity 32 Figure 1.18 Plotting exact l-value and distinct l-diversity value in weka 33 Figure 1.19 Plotting exact l-value and entropy l-diversity value in weka 33

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on the UKDiss.com website then please:

Related Lectures

Study for free with our range of university lectures!