1

I have a PDF file which I converted to .txt using an online tool. Now I want to parse the data in that and split it using regular expression. I am almost done but stuck at 1 point.

Example of data is:

00 41 53 Bid Form – Design/Build (Single-Prime Contract)

27 05 13.23 T1 Services

I want to split it like : 00 41 53 Bid Form – Design/Build (Single-Prime Contract) and other is 27 05 13.23 T1 Services

The regular Expression I'm using is [0-9](\d|\ |\.)*(\D)*

It can have numbers with spaces and/or dots, then text which can be (letters, dot, comma, (, ), -, and digits).

I cannot match a string if it has number in it like the "T1 Services" above.

Alan Moore
  • 71,299
  • 12
  • 93
  • 154
Naupad Doshi
  • 475
  • 2
  • 5
  • 17
  • 2
    (Paperclip voice impression) "It looks like you're trying to split text into individual lines that doesn't require Regular Expressions. Would you like help with that?" – Simon Whitehead Apr 12 '13 at 04:08

2 Answers2

2

If I understood this correctly , you are trying to split by newline character .This is in C#.

string[] Result = Regex.Split(inputText, "[\r\n]+");
Mudassir Hasan
  • 26,910
  • 19
  • 95
  • 126
  • I am using the same Regex.Split command but because there is something wrong with the regular expression it is not splitting properly. So I basically have a doubt with the Regular Expression which I wrote above. – Naupad Doshi Apr 12 '13 at 04:50
  • Then this will surely help you..http://stackoverflow.com/questions/1547476/easiest-way-to-split-a-string-on-newlines-in-net – Mudassir Hasan Apr 12 '13 at 04:56
0

you can also done it with out regex Like this:

string phrase = ".......\n,,,,.ll..\r\n....";
string[] words;

words = phrase.Split(new string []{"\n","\r"}), StringSplitOptions.RemoveEmptyEntries);

if you want regex only then use @mhasan solution.

Civa
  • 1,878
  • 2
  • 16
  • 29