0

I'm having trouble solving this problem, I want to map topics to a body of text based on keyword occurrences in the body of text. I was wondering if anyone could help.

Table: Keywords

+-------------+------+
| Column Name | Type |
+-------------+------+
| topic       | str  |
| word        | str  |
+-------------+------+

Each row of this table contains the topics mapped to a word.

Table: Content

+-------------+------+
| Column Name | Type |
+-------------+------+
| content_uuid| str  |
| body        | str  |
+-------------+------+

Each row of this table contains the content_uuid mapped to a paragraph containing the content.

I want to generate a new table which allows me to tag the content uuid to all found topics in order of the descending order of keyword (Keyword.word) counts. For example, if it was the following :

Input: Keyword table:

+-----------+-----------+
| topic     | word      |
+-----------+-----------+
| food      | pie       |
| food      | apple     |
| food      | kiwi      |
| game      | pac man   |
| game      | hang man  |
| game      | tetris    |
| math      | compass   |
| math      | calculator|
+-----------+-----------+

Content table:
+---------------+----------------------------------------------+
| content_uuid  |                 body                         |
+---------------+----------------------------------------------+
| uuid1         | The boy is eating his apple pie.             |
| uuid2         | They are playing hang man on the calculator. |
| uuid3         | She's holding a compass and a calculator.    |
| uuid4         | They are having fun.                         |
+---------------+----------------------------------------------+

Output: 
+---------------+------+------+--------------------+
| content_uuid  | food | game | math | found_topics|
+---------------+------+------+--------------------+
| uuid1         | 2    | 0    | 0    | [food]      |  -- the words apple and pie
| uuid2         | 0    | 1    | 1    | [game, math]|  -- words hang man and calculator
| uuid3         | 0    | 0    | 2    | [math]      |  -- compass and calculator
| uuid4         | 0    | 0    | 0    |  None       |  -- no words were found
+---------------+------+------+--------------------+
a_horse_with_no_name
  • 497,550
  • 91
  • 775
  • 843
v_coder12
  • 151
  • 2
  • 9

0 Answers0