summaryrefslogtreecommitdiff
path: root/Assignment 2/op_spam_v1.4/README.md
diff options
context:
space:
mode:
authorHunter <h.sterk@students.uu.nl>2020-10-23 13:14:32 +0200
committerHunter <h.sterk@students.uu.nl>2020-10-23 13:14:32 +0200
commit34d42d581ee5f5385c823385835182029b9eb2ff (patch)
treeecbed0902f0466feb7bd461e7c43619cf1ddbc6f /Assignment 2/op_spam_v1.4/README.md
parent6bf5c09a9bcb5f9d3f47d795b0539cb8c19fb089 (diff)
Training- en testdata toegevoegd, inclusief een functie om deze data in een dataframe te zetten
Diffstat (limited to 'Assignment 2/op_spam_v1.4/README.md')
-rw-r--r--Assignment 2/op_spam_v1.4/README.md87
1 files changed, 87 insertions, 0 deletions
diff --git a/Assignment 2/op_spam_v1.4/README.md b/Assignment 2/op_spam_v1.4/README.md
new file mode 100644
index 0000000..a84e3ae
--- /dev/null
+++ b/Assignment 2/op_spam_v1.4/README.md
@@ -0,0 +1,87 @@
+Deceptive Opinion Spam Corpus v1.4
+==================================
+
+Overview
+--------
+
+This corpus consists of truthful and deceptive hotel reviews of 20 Chicago
+hotels. The data is described in two papers according to the sentiment of the
+review. In particular, we discuss positive sentiment reviews in [1] and negative
+sentiment reviews in [2].
+
+While we have tried to maintain consistent data preprocessing procedures across
+the data, there *are* differences which are explained in more detail in the
+associated papers. Please see those papers for specific details.
+
+This corpus contains:
+
+* 400 truthful positive reviews from TripAdvisor (described in [1])
+* 400 deceptive positive reviews from Mechanical Turk (described in [1])
+* 400 truthful negative reviews from Expedia, Hotels.com, Orbitz, Priceline,
+ TripAdvisor and Yelp (described in [2])
+* 400 deceptive negative reviews from Mechanical Turk (described in [2])
+
+Each of the above datasets consist of 20 reviews for each of the 20 most popular
+Chicago hotels (see [1] for more details). The files are named according to the
+following conventions:
+
+* Directories prefixed with `fold` correspond to a single fold from the
+ cross-validation experiments reported in [1] and [2].
+
+* Files are named according to the format `%c_%h_%i.txt`, where:
+
+ * %c denotes the class: (t)ruthful or (d)eceptive
+
+ * %h denotes the hotel:
+
+ * affinia: Affinia Chicago (now MileNorth, A Chicago Hotel)
+ * allegro: Hotel Allegro Chicago - a Kimpton Hotel
+ * amalfi: Amalfi Hotel Chicago
+ * ambassador: Ambassador East Hotel (now PUBLIC Chicago)
+ * conrad: Conrad Chicago
+ * fairmont: Fairmont Chicago Millennium Park
+ * hardrock: Hard Rock Hotel Chicago
+ * hilton: Hilton Chicago
+ * homewood: Homewood Suites by Hilton Chicago Downtown
+ * hyatt: Hyatt Regency Chicago
+ * intercontinental: InterContinental Chicago
+ * james: James Chicago
+ * knickerbocker: Millennium Knickerbocker Hotel Chicago
+ * monaco: Hotel Monaco Chicago - a Kimpton Hotel
+ * omni: Omni Chicago Hotel
+ * palmer: The Palmer House Hilton
+ * sheraton: Sheraton Chicago Hotel and Towers
+ * sofitel: Sofitel Chicago Water Tower
+ * swissotel: Swissotel Chicago
+ * talbott: The Talbott Hotel
+
+ * %i serves as a counter to make the filename unique
+
+Questions
+---------
+
+Please direct questions to Myle Ott (<myleott@cs.cornell.edu>)
+
+References
+----------
+
+[1] M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive
+Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th
+Annual Meeting of the Association for Computational Linguistics: Human Language
+Technologies.
+
+[2] M. Ott, C. Cardie, and J.T. Hancock. 2013. Negative Deceptive Opinion Spam.
+In Proceedings of the 2013 Conference of the North American Chapter of the
+Association for Computational Linguistics: Human Language Technologies.
+
+License
+-------
+
+This work is licensed under the Creative Commons
+Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of
+this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a
+letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View,
+California, 94041, USA.
+
+If you use any of this data in your work, please cite the appropriate associated
+paper (described above).