0

i'm creating a blank model and trying to train a model from a dataframe which i'm converting to a list

my dataframe where i have created a training data as needed

 | id |  tdata                                             |
 |----|----------------------------------------------------|
 | 1  | ("sample text record",{"entities":[(0,5,"Name")]}), |
 | 2  | ("TypeA text record",{"entities":[(0,4,"Name")]}),|
 | 3  | ("TypeB text record",{"entities":[(0,4,"Name")]}),|

then i try to prepare my training data using the below code

TRAIN_DATA = df['tdata'].tolist()

which gives me output as below and spaCy complains with ValueError: too many values to unpack (expected 2)

['("sample text record",{"entities":[(0,5,"Name")]}),','("TypeA text record"{"entities":[(0,4,"Name")]}),','("TypeB text record",{"entities":[(0,4,"Name")]}),']

what i would need is below for successful run is without the single quotes for each item in list

[("sample text record",{"entities":[(0,5,"Name")]}),("TypeA text record",{"entities":[0,4,"Name")]}),("TypeB text record",{"entities":[(0,4,"Name")]}),]
RData
  • 893
  • 1
  • 11
  • 28
  • 2
    The values in the `tdata` column are stored as strings, rather than as tuples. The "right" way to fix this would be to rewrite the code that builds your training DataFrame. If you really need this to just work right now in this specific case, you can apply `eval` to each cell: `TRAIN_DATA = df['tdata'].apply(eval).tolist()`. Use of `eval` should be considered a last resort, though: https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice – Peter Leimbigler Aug 30 '21 at 16:13

0 Answers0