Clean the data: and based on tweets answer following questions:
- dataset - Airlines.csv
1) 1p: Create the SentiBar plots. (Refer to the Practice Session)
2) 1p: Create the WordBar plots. (Refer to the Practice Session)
3) 2p Create geo map for the origin of the tweets with each airline having a different color (dot). airline_sentiment: is the class which you would like to predict.
4) (2p for each model) Use Decision Tree, Random Forest and KNN models to predict the Positive class (Either Positive or Not). Please calculate the accuracy, recall and precision of each model. Please do submit the code for preprocessing, creation of the model, accuracy, recall, precision. Hint: Consider Negative and Neutral as Negative (your might would like to convert neutral class to negative for simplicity).
Please note
1) Comment your code,
2) Add a short description of your results (for example, what kind of polarity distribution you observe in each of the airline case (Sentibar) and similarly for (Wordbar). You can create one Sentibar and one Wordbar for all the tweets of airlines together and then you can create 6 sentibars and wordbars for each airlines separately (to perform an microanalysis).
3) Among all the 3 models (Decision Tree, Random Forest and KNN), which one gives you the best results.