The Netflix Prize: Difference between revisions
KINGofINCELS (talk | contribs) just created a summarised version of the event netflix prize which screwed over pricay of about 500000 users |
seems entirely AI generated, no sources provided |
||
Line 1: | Line 1: | ||
{{SloppyAI}} | |||
The '''Netflix Prize''' was a competition announced by Netflix in 2006 to improve its movie recommendation algorithm, [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch]. The company offered a '''US$1 million prize''' to any team that could enhance the algorithm’s accuracy by '''10.06%''' | The '''Netflix Prize''' was a competition announced by Netflix in 2006 to improve its movie recommendation algorithm, [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch]. The company offered a '''US$1 million prize''' to any team that could enhance the algorithm’s accuracy by '''10.06%''' | ||
== Background == | ==Background== | ||
Netflix’s recommendation engine, '''[https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch]''', used collaborative filtering to predict user ratings for movies based on previous ratings and patterns across similar users. By 2006, [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch] had been refined internally for several years, but Netflix sought a significant improvement. | Netflix’s recommendation engine, '''[https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch]''', used collaborative filtering to predict user ratings for movies based on previous ratings and patterns across similar users. By 2006, [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch] had been refined internally for several years, but Netflix sought a significant improvement. | ||
To enable the competition, Netflix released an anonymized dataset on '''October 2, 2006''', which included: | To enable the competition, Netflix released an anonymized dataset on '''October 2, 2006''', which included: | ||
* '''480,189 users''' | *'''480,189 users''' | ||
* '''17,770 movies''' | *'''17,770 movies''' | ||
* '''100,480,507 ratings''' | *'''100,480,507 ratings''' | ||
* Ratings on a '''1.0 to 5.0 scale''', in increments of '''0.5''' | *Ratings on a '''1.0 to 5.0 scale''', in increments of '''0.5''' | ||
* Ratings dated between '''October 1998 and December 2005''' | *Ratings dated between '''October 1998 and December 2005''' | ||
Netflix stated that all personally identifiable information had been removed, replacing user names with numeric IDs. | Netflix stated that all personally identifiable information had been removed, replacing user names with numeric IDs. | ||
Line 16: | Line 18: | ||
The dataset was split into training and test sets for evaluation, and submissions were measured against a hidden test set to prevent overfitting | The dataset was split into training and test sets for evaluation, and submissions were measured against a hidden test set to prevent overfitting | ||
== <big>competition structure</big> == | ==<big>competition structure</big>== | ||
* Start Date: '''October 2, 2006''' | *Start Date: '''October 2, 2006''' | ||
* Target: Improve [[wikipedia:Root_mean_square_deviation|RMSE]] by at least '''10%''' over [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch] baseline | *Target: Improve [[wikipedia:Root_mean_square_deviation|RMSE]] by at least '''10%''' over [https://www.algorithmhalloffame.org/algorithms/cinematch/ Cinematch] baseline | ||
* Evaluation Metric: '''[[wikipedia:Root_mean_square_deviation|RMSE (Root Mean Squared Error)]]''' | *Evaluation Metric: '''[[wikipedia:Root_mean_square_deviation|RMSE (Root Mean Squared Error)]]''' | ||
* Prize: '''US$1,000,000''' for the first qualifying team | *Prize: '''US$1,000,000''' for the first qualifying team | ||
* Duration: The competition officially ended on '''September 21, 2009''' | *Duration: The competition officially ended on '''September 21, 2009''' | ||
Over the course of the contest, thousands of teams worldwide participated, including academic groups, independent researchers, and corporate teams. | Over the course of the contest, thousands of teams worldwide participated, including academic groups, independent researchers, and corporate teams. | ||
== Re-identification Concerns == | ==Re-identification Concerns== | ||
In '''December 2007''', researchers from the '''[[wikipedia:University_of_Texas_at_Austin|University of Texas at Austin]]''' demonstrated that Netflix’s anonymization was insufficient. By comparing Netflix ratings with publicly available ratings from '''[[wikipedia:IMDb|IMDb]]''', they re-identified some users in the dataset. This process, known as a '''[[linkage attack]]''', used overlapping movie ratings and timestamps to match identities. | In '''December 2007''', researchers from the '''[[wikipedia:University_of_Texas_at_Austin|University of Texas at Austin]]''' demonstrated that Netflix’s anonymization was insufficient. By comparing Netflix ratings with publicly available ratings from '''[[wikipedia:IMDb|IMDb]]''', they re-identified some users in the dataset. This process, known as a '''[[linkage attack]]''', used overlapping movie ratings and timestamps to match identities. | ||
The researchers noted that even slight differences in rating patterns could uniquely identify individuals. This raised concerns about the privacy of Netflix subscribers and the risks of releasing large-scale datasets, even when anonymized. | The researchers noted that even slight differences in rating patterns could uniquely identify individuals. This raised concerns about the privacy of Netflix subscribers and the risks of releasing large-scale datasets, even when anonymized. | ||
== Regulatory Action and Lawsuit == | ==Regulatory Action and Lawsuit== | ||
In '''2009''', Netflix announced plans for a '''second Netflix Prize''', which would use an even larger and more detailed dataset, incorporating demographic and behavioral data. However, before its release: | In '''2009''', Netflix announced plans for a '''second Netflix Prize''', which would use an even larger and more detailed dataset, incorporating demographic and behavioral data. However, before its release: | ||
* The '''[[wikipedia:Federal_Trade_Commission|Federal Trade Commission (FTC)]]''' launched an inquiry into privacy implications. | *The '''[[wikipedia:Federal_Trade_Commission|Federal Trade Commission (FTC)]]''' launched an inquiry into privacy implications. | ||
* In '''December 2009''', a '''[https://privacylaw.proskauer.com/2009/12/articles/invasion-of-privacy/netflix-sued-for-largest-voluntary-privacy-breach-to-date/?utm_source=chatgpt.com class-action lawsuit]''' was filed in U.S. District Court, alleging that Netflix had violated the '''[[wikipedia:Video_Privacy_Protection_Act|Video Privacy Protection Act (VPPA]])''' by releasing data that could potentially identify subscribers. | *In '''December 2009''', a '''[https://privacylaw.proskauer.com/2009/12/articles/invasion-of-privacy/netflix-sued-for-largest-voluntary-privacy-breach-to-date/?utm_source=chatgpt.com class-action lawsuit]''' was filed in U.S. District Court, alleging that Netflix had violated the '''[[wikipedia:Video_Privacy_Protection_Act|Video Privacy Protection Act (VPPA]])''' by releasing data that could potentially identify subscribers. | ||
As a result: | As a result: | ||
* Netflix canceled the second competition in '''March 2010'''. | *Netflix canceled the second competition in '''March 2010'''. | ||
* In '''March 2010''', Netflix agreed to settle the lawsuit for '''US$9,000,000''', which was allocated for privacy education and research programs | *In '''March 2010''', Netflix agreed to settle the lawsuit for '''US$9,000,000''', which was allocated for privacy education and research programs |