Обучите классификатор чувства

Этот пример показывает, как обучить классификатор анализу мнений с помощью аннотируемого списка положительных и отрицательных слов чувства и предварительно обученного встраивания слова.

Предварительно обученное встраивание слова играет несколько ролей в этом рабочем процессе. Это преобразовывает слова в числовые векторы и формирует основание для классификатора. Можно затем использовать классификатор, чтобы предсказать чувство других слов с помощью их векторного представления и использовать эти классификации, чтобы вычислить чувство части текста. Существует четыре шага в обучении и использовании классификатора чувства:

  • Загрузите предварительно обученное встраивание слова.

  • Загрузите словарь мнения, перечисляющий положительные и отрицательные слова.

  • Обучите классификатор чувства с помощью векторов слова положительных и отрицательных слов.

  • Вычислите среднее множество чувства слов в части текста.

Чтобы воспроизвести результаты в этом примере, установите rng на 'default'.

rng('default')

Загрузите предварительно обученный Word Embedding

Вложения Word сопоставляют слова в словаре к числовым векторам. Эти вложения могут получить семантические детали слов так, чтобы подобные слова имели подобные векторы. Они также отношения модели между словами через векторную арифметику. Например, король отношения королеве, как человек женщине, описан королем уравнения – человек + женщина = королева.

Загрузите предварительно обученное встраивание слова с помощью функции fastTextWordEmbedding. Эта функция требует Модели Text Analytics Toolbox™ для fastText английских 16 миллиардов Лексем пакет поддержки Word Embedding. Если этот пакет поддержки не установлен, то функция обеспечивает ссылку на загрузку.

emb = fastTextWordEmbedding;

Загрузите словарь мнения

Загрузите положительные и отрицательные слова от словаря мнения (также известный как словарь чувства) под эгидой https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html. Во-первых, извлеките файлы из файла .rar в папку под названием opinion-lexicon-English, и затем импортируйте текст.

Загрузите данные с помощью функционального readLexicon, перечисленного в конце этого примера. Вывод data является таблицей с переменными Word, содержащими слова и Label, содержащий категориальную метку чувства, Positive или Negative.

data = readLexicon;

Просмотрите первые несколько слов, маркированных как положительные.

idx = data.Label == "Positive";
head(data(idx,:))
ans=8×2 table
        Word         Label  
    ____________    ________

    "a+"            Positive
    "abound"        Positive
    "abounds"       Positive
    "abundance"     Positive
    "abundant"      Positive
    "accessable"    Positive
    "accessible"    Positive
    "acclaim"       Positive

Просмотрите первые несколько слов, маркированных как отрицательные.

idx = data.Label == "Negative";
head(data(idx,:))
ans=8×2 table
        Word          Label  
    _____________    ________

    "2-faced"        Negative
    "2-faces"        Negative
    "abnormal"       Negative
    "abolish"        Negative
    "abominable"     Negative
    "abominably"     Negative
    "abominate"      Negative
    "abomination"    Negative

Подготовка данных для обучения

Чтобы обучить классификатор чувства, преобразуйте слова в векторы слова с помощью предварительно обученного слова, встраивающего emb. Сначала удалите слова, которые не появляются в слове, встраивающем emb.

idx = ~isVocabularyWord(emb,data.Word);
data(idx,:) = [];

Отложите 10% слов наугад для тестирования.

numWords = size(data,1);
cvp = cvpartition(numWords,'HoldOut',0.1);
dataTrain = data(training(cvp),:);
dataTest = data(test(cvp),:);

Преобразуйте слова в данных тренировки к векторам слова с помощью word2vec.

wordsTrain = dataTrain.Word;
XTrain = word2vec(emb,wordsTrain);
YTrain = dataTrain.Label;

Обучите классификатор чувства

Обучите классификатор машины вектора поддержки (SVM), который классифицирует векторы слова в положительные и отрицательные категории.

mdl = fitcsvm(XTrain,YTrain);

Протестируйте классификатор

Преобразуйте слова в тестовых данных к векторам слова с помощью word2vec.

wordsTest = dataTest.Word;
XTest = word2vec(emb,wordsTest);
YTest = dataTest.Label;

Предскажите метки чувства тестовых векторов слова.

[YPred,scores] = predict(mdl,XTest);

Визуализируйте точность классификации в матрице беспорядка.

figure
confusionchart(YTest,YPred);

Визуализируйте классификации в облаках слова. Постройте слова с положительными и отрицательными чувствами в облаках слова с размерами слова, соответствующими очкам прогноза.

figure
subplot(1,2,1)
idx = YPred == "Positive";
wordcloud(wordsTest(idx),scores(idx,1));
title("Predicted Positive Sentiment")

subplot(1,2,2)
wordcloud(wordsTest(~idx),scores(~idx,2));
title("Predicted Negative Sentiment")

Вычислите чувство наборов текста

Чтобы вычислить чувство части текста, например, анализ, предсказывают счет чувства каждого слова в тексте и берут средний счет чувства.

Загрузите данные об Анализе Сводных данных Airbnb (Бостон, Массачусетс, Соединенные Штаты, 06 октября 2017) под эгидой http://insideairbnb.com/get-the-data.html. Считайте данные в таблицу и задайте, чтобы считать текстовые данные как строку.

filename = "reviews.csv";
dataReviews = readtable(filename,'TextType','string');

Извлеките текстовые данные из переменной comments и просмотрите первые несколько отзывов.

textData = dataReviews.comments;
textData(1:10)
ans = 10×1 string array
    "Pretty nice, quiet, cozy place to stay. Toiletries, snacks, coffee, WiFi, cable TV, iron was all included. One of the best things for me is how quiet it was even in the daytime. Coded door locks so no need for keys, my belongings were always safe and Andre and his wife are really good host. I stayed 7 days and never had a problem. I'll stay again if and when I had the chance."
    "The host was extremely welcoming and obliging. The neighborhood is quiet and charming, perfect for a quiet visit. Short walk to MBTA transportation."
    "Nice and easy stay - with good accommodations especially the cable TV "
    "The host has been very accommodating and helpful. The description in the ad is accurate. The room is very clean and the neighborhood is quiet."
    "It's a great quiet stay."
    "Couldn't have been happier. The apartment was well renovated, very clean and convenient to great spots. The kitchen was stocked with all the basics and a huge grocery store was around the corner so we were able to easily cook at the house. Estee also provided some great local recommendations. Wine, snacks, coffee and games were great extras. Uber ride to downtown was $8. Would most definitely stay here again."
    "The apartment is very nice- as described and very convenient. The real superstar of the listing though is the host; Estee was  phenomenal. She was very responsive and even let us know when she might not be able to be reached for a short duration of time. She provided great recommendations and tips for getting around. We had a MINOR issue, which she went out of her way to resolve very quickly. ↵↵Both bedrooms are a good size, and one has a lovely vanity. Everything is brand new - bathroom and kitchen. Estee had the kitchen stocked with staples (salt, pepper, olive oil, ketchup) and treats too! There are so many details throughout the place where she goes above and beyond. Parking on the street was easy. We hardly needed to move the car though because there was so much within walking distance. The description of a 10 minute walk to the T is accurate. ↵↵100% would stay here again. Thank you for a wonderful stay, Estee!"
    "This is a brand new gorgeous place, very clean, bright and welcoming. Estee especially knows how to make guests comfortable, there were many thoughtful touches and she recommended a delicious Indian restaurant.    There is a supermarket within 5 minutes walking distance and we used Uber to get around - downtown Boston took less than 15 minutes. Best place I have stayed in so far. Thank you Estee!"
    "Estee and Josh are great hosts. Very welcoming. Made us feel like we were staying with long time friends. Apartment very centrally located. Off street parking surprisingly easy (for Boston). Loads of restaurants within walking distance"
    "Estee was super sweet and so very accommodating! The apartment was nicely renovated and the kitchen had all our basic needs + treats as well! My family and I stayed here because of a college graduation and because street parking in front of her place was easy and everything was within walking distance, it made our stay a lot easier! Would definitely stay here again!  "

Создайте функцию, которая маркирует и предварительно обрабатывает текстовые данные, таким образом, они могут использоваться для анализа. Функциональный preprocessText, перечисленный в конце примера, выполняет следующие шаги по порядку:

  1. Маркируйте текст с помощью tokenizedDocument.

  2. Сотрите пунктуацию с помощью erasePunctuation.

  3. Удалите слова остановки (такой как "и", и) использование removeStopWords.

  4. Преобразуйте в нижний регистр с помощью lower.

Используйте функцию предварительной обработки preprocessText, чтобы подготовить текстовые данные. Этот шаг может занять несколько минут, чтобы запуститься.

documents = preprocessText(textData);

Удалите слова из документов, которые не появляются в слове, встраивающем emb.

idx = ~isVocabularyWord(emb,documents.Vocabulary);
documents = removeWords(documents,idx);

Чтобы визуализировать, как хорошо классификатор чувства делает вывод к отзывам, классифицируйте чувства на словах, которые происходят в отзывах, но не в данных тренировки и визуализируют их в облаках слова. Используйте облака слова, чтобы вручную проверять, что классификатор ведет себя как ожидалось.

words = documents.Vocabulary;
words(ismember(words,wordsTrain)) = [];

vec = word2vec(emb,words);
[YPred,scores] = predict(mdl,vec);

figure
subplot(1,2,1)
idx = YPred == "Positive";
wordcloud(words(idx),scores(idx,1));
title("Predicted Positive Sentiment")

subplot(1,2,2)
wordcloud(words(~idx),scores(~idx,2));
title("Predicted Negative Sentiment")

Чтобы вычислить чувство данной части текста, вычислите счет чувства к каждому слову в тексте и вычислите средний счет чувства.

Для выбора документов вычислите средний счет чувства. Для каждого документа преобразуйте слова в векторы слова, предскажите счет чувства на векторах слова, преобразуйте очки с помощью счета-к-следующему, преобразовывают функцию и затем вычисляют средний счет чувства.

idx = [7 34 331 1788 1820 1831 2185 21892 63734 76832 113276 120210];
for i = 1:numel(idx)
    words = string(documents(idx(i)));
    vec = word2vec(emb,words);
    [~,scores] = predict(mdl,vec);
    sentimentScore(i) = mean(scores(:,1));
end

Просмотрите предсказанные очки чувства с текстовыми данными. Очки, больше, чем 0, соответствуют положительному чувству, очки, меньше чем 0 соответствуют отрицательному чувству, и очки близко к 0 соответствуют нейтральному чувству.

[sentimentScore' textData(idx)]
ans = 12×2 string array
    "0.85721"      "The apartment is very nice- as described and very convenient. The real superstar of the listing though is the host; Estee was  phenomenal. She was very responsive and even let us know when she might not be able to be reached for a short duration of time. She provided great recommendations and tips for getting around. We had a MINOR issue, which she went out of her way to resolve very quickly. ↵↵Both bedrooms are a good size, and one has a lovely vanity. Everything is brand new - bathroom and kitchen. Estee had the kitchen stocked with staples (salt, pepper, olive oil, ketchup) and treats too! There are so many details throughout the place where she goes above and beyond. Parking on the street was easy. We hardly needed to move the car though because there was so much within walking distance. The description of a 10 minute walk to the T is accurate. ↵↵100% would stay here again. Thank you for a wonderful stay, Estee!"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    "2.0453"       "Estee was the perfect Airbnb host. The apartment was comfortable, spacious, and convenient, and Estee went to great lengths to make sure that we felt at home. She also provided great tips for us about the area. Would definitely love to stay here again."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
    "-0.37918"     "The apartment is not apropriate for 5 people. Is too little and We were no comfortable. The bathroom was no clean. There was a door in the kitchen Broken.  Is Too noisy. The elevator is Too small just for 2 people. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
    "0.94799"      "Truly a quaint place in Beacon Hill. Comfortable walking distance from MGH, Boston Common, and Suffolk University. The studio type place is great for a couple's weekend.  The wifi was excellent as was the tv and comfort of the bed.  ↵The limitations and recommendations for improvement include:↵1- improving in cleanliness as the floor was dirty enough that you couldn't walk around without shoes↵2- would recommend bringing your own basic toiletries as there was no hand soap in the bathroom.  (We were too busy to contact jj but he is quick to respond to other matters so maybe he would have made arrangements to provide you with it.)↵3- storage space is limited so a prolonged stay would be challenging↵Overall, this a very functional stay."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
    "-0.077053"    "the neibourhood is perfect!!!!!. as it is very close to Bowdoin T STATION and Park T station, walking distance from everywhere we wanted to go, quincy market, downtown, chinatown, newsbury street and every thing. the appartment IT IS NOT on Temple street rear... it is on Coolidge st, facing a quite silent and lonely and big parking lot. (:/)  it was ok...though. coming and going was easy. JJ was really quick responsive when internet and CAble TV suddenly stopped but he was very helpful trying to solve it. that was very good. ... Other issues: By the house roules and the descriptions of previews guests I supposed the appartment was inmaculated and the cleaning was really fond... BUT IT WASN´T. we found previous litter in the trash bins... kitchen and bathroom... the brown chocolate cuchions didn´t smell as if they were clean. There were uncovered sheets, and blanquets and who knows what else under the bed, that I tried not to  sweep the floor in its direction in case  I made them dirty with the dust and gravel that was already inside when we got into the apparment. I went to the closet looking for the broom and shovel and I found them... the broom plenty of dirt and lint and entangled long hairs and stuff, and the shovel broken... very discusting. It is a pitty such an amazing location dealing with all these ackward details that are not ok at all. I think everything I mentioned can be  solved easily in a very simple and cheap and loving form so the place becomes the perfect spot to spent your vacations in Boston."
    "0.17846"      "Although we didn't meet JJ, we felt he was very quick to respond. Checking in and out was a breeze. The location is convenient and walkable to everything we wanted to see.  Studio was small but certainly comfortable for the 2 of us. ↵We didn't see any info regarding wifi (nor did we ask) so we didn't use it. The bathroom sink drain looked pretty dirty, no hand soap or wash cloths. It's a basement so there is not much natural light which was fine but even the lights in the space still made it feel very dark. We also found no place to hang our stuff or stash our clothes. A lot of the drawers had clothes in them already. ↵All in all, we liked the place. Would recommend it if a few changeable things were addressed. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    "-0.31603"     "In the apartment it was very dirty .↵we walked in and there instincts to rotten melon .↵the sink was full of dirty dishes.↵the microwave and mini oven were dirty you could make it nothing to eat.↵on the herd was a coffeepot with moldy coffee.↵on the first day of our "vacation" we first had to make everything clean that we can feel comfortable.↵carissa wanted the city to show us what to do something .↵But in the week she was not at home , and when they came home she was for days in her room she only came out to make himself something to eat and dishes piled up again."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
    "-4.0895"      "Blackmail!"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
    "1.658"        "Outstanding stay.  The apartment is world-class - very, very nice.  Tremendous views of the harbor from a very cool apartment in a very cool, brand new retro building.  I had not spent time in the South Boston waterfront neighborhood previously and loved it - great cafes, restaurants, pubs, renovated lofts.  A terrific area.↵↵In addition, John was an ideal host.  Incredibly responsive and helpful.  Provided excellent recommendations in terms of spots to visit in the neighborhood as well as very clear directions relating to the logistics of checking in, wifi access, heating/cooling, etc.↵↵Finally, John is an engaging and interesting person who is an absolute pleasure to spend time with."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
    "1.7102"       "I had an amazing stay at Carney's. The hosts are friendly and very meticulous. They made sure everything was proper from the kitchen needs to the bedroom needs. Also, they made us feel like we are at home. My parents had come for my graduation and they were pleased that I did not book a hotel and instead chose to stay here. I would recommend everyone to book a room if ever they plan to come to Boston and enjoy an enriching confortable experience. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
    "0.67654"      "My husband and I came to Boston for our 1 year anniversary. We're so glad we found Elisabeth's place! Immediately when we got dropped off by our cab, Elisabeth came outside and walked us to the house.  She's incredibly nice, personable, and her place was beautiful and very clean.  We felt very comfortable staying with her and she was nice enough to give us a few recommendations around town.  Her place is a short walk to the Red Line T station and the neighborhood where she lives has everything you need close by.  Gas station and convenience store was literally down the street and lots of smaller mom and pop restaurants.  We hated to leave so early but would definitely love to come back! "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
    "-0.21651"     "My fiancé and I had just gotten engaged and wanted to stay somewhere a bit more upscale for our last night in Boston. We looked and found this "penthouse" and from arrival were let down. While the host was pleasant enough, she was hard to contact, the address was wrong, and she even had to have a neighbor show us around the place. Which would not have been weird if he wasn't doing laundry during our stay. We were promised the entire condo but the host stopped by as well, not that we minded that part, but it added to the weirdness. We were not able to use the refrigerator to store leftovers due to the HORRIBLE smell coming from it. It was so bad we turned the air off and opened the little balcony door. The condo looked too loved in to justify paying what we did. Also very confused about the $50 cleaning fee that was obviously not used before our stay, so a bit unhappy that we overpaid for a dirty place. "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

Функция чтения словаря чувства

Эта функция читает положительные и отрицательные слова из словаря чувства и возвращает таблицу. Таблица содержит переменные Word и Label, где Label содержит категориальные значения Positive и Negative, соответствующий чувству каждого слова.

function data = readLexicon

% Read positive words
fidPositive = fopen(fullfile('opinion-lexicon-English','positive-words.txt'));
C = textscan(fidPositive,'%s','CommentStyle',';');
wordsPositive = string(C{1});

% Read negative words
fidNegative = fopen(fullfile('opinion-lexicon-English','negative-words.txt'));
C = textscan(fidNegative,'%s','CommentStyle',';');
wordsNegative = string(C{1});
fclose all;

% Create table of labeled words
words = [wordsPositive;wordsNegative];
labels = categorical(nan(numel(words),1));
labels(1:numel(wordsPositive)) = "Positive";
labels(numel(wordsPositive)+1:end) = "Negative";

data = table(words,labels,'VariableNames',{'Word','Label'});

end

Предварительная обработка функции

Функциональный preprocessText выполняет следующие шаги:

  1. Маркируйте текст с помощью tokenizedDocument.

  2. Сотрите пунктуацию с помощью erasePunctuation.

  3. Удалите слова остановки (такой как "и", и) использование removeStopWords.

  4. Преобразуйте в нижний регистр с помощью lower.

function documents = preprocessText(textData)

% Tokenize the text.
documents = tokenizedDocument(textData);

% Erase punctuation.
documents = erasePunctuation(documents);

% Remove a list of stop words.
documents = removeStopWords(documents);

% Convert to lowercase.
documents = lower(documents);

end

Смотрите также

| | | | | | |

Похожие темы