Privacy preserving data mining is the process of engaging in collaborative data mining efforts without exposing confidential details related to the information contained in any of the databases being mined. It is traditionally used when an individual or organization is working with an industry competitors. While competitors can sometimes mutually benefit from sharing resources, all parties have a stake in preserving potentially private or confidential information about their current projects. Privacy preserving data mining protects the confidences of all parties by producing the results of the data mining, without actually disclosing the source of any specific bit of information.
Data mining is the process of taking a large clump of data and scanning it for overall trends. One basic example of data mining would be to look through a sales database to find out during what seasons sales of a particular product are highest. The business intelligence derived from this mining would help a company create sales during off-peak times and make other modifications to increase their gross profits. Another more complex example would be to scan through databases for consumer trends in purchasing decisions. This would allow manufacturers to accurately predict what types of products are becoming popular, enabling them to know where to focus their limited resources.
By pooling the information stored in a database with the information stored in databases by competitors, the efficiency of data mining is drastically increased. The more data there is to study, the easier it becomes to find and exploit trends. In other words, when an individual organization has 10,000 examples to draw from, they can typically catch patterns that would not become evident with only 100 examples of the same type. Naturally, however, there is always some information companies are reluctant to share with their competitors. That is where privacy preserving data mining comes into play.
Privacy preserving data mining works by allowing competing companies to feed only the data they wish to share into a central “communal” database. By limiting the data mining to strictly voluntary information, privacy is maintained on both sides without undermining the central purpose of the data mining efforts. Privacy can also be protected by utilizing an disinterested intermediary party to conduct the actual mining, allowing the companies to pool their database resources without either company having direct access to the other company’s private types of data.