0

I have a huge text file which contains text data . Files' each line contains 12 character of data. I need to find a substring of 5 character from that file using map reduce job.

Input file.

abcdefghijkl
kahfdjshjsdh
sdfkjsdjkjks

value to search

cdefg

The 'cdefg' can occurs anywhere in the file. It can be in in two lines. So I don't know how to create a map of last two character of current line and next 3 character of next line.

Ibrar Ahmed
  • 960
  • 1
  • 12
  • 25
  • @bouteillebleu I wrote a simple program which is giving me line by of input string and split that line in 5 character and create a map for that but don't know how to handle last 2 characters of input string. – Ibrar Ahmed May 06 '17 at 14:13
  • you want to return the line which contain `cdefg` ? – YCF_L May 06 '17 at 14:16
  • I have created map in mapper and later will match with "cdefg" string, but my question how I can create maps of 5 characters. – Ibrar Ahmed May 06 '17 at 14:19
  • Use an array of character `Character[]` ! – YCF_L May 06 '17 at 14:23
  • @YCF_L Don't understand that, do you have an example. – Ibrar Ahmed May 06 '17 at 14:38
  • i really don't understand your question also, can you explain more? – YCF_L May 06 '17 at 14:39
  • I have a file containing lines of 12 characters and I want to find 5 character of string from that file. In Mapper I am getting 12 character of line and can create two maps of 5 character and left 2 character and want to get next 3 character from next line and want to create map of it. So in reducer I can compare that maps with my string. – Ibrar Ahmed May 06 '17 at 14:47
  • so your inputs should look like this `[abcde, fghij, klkah, fdjsh, jsdhs, dfkjs, djkjk, s]` ? – YCF_L May 06 '17 at 15:12
  • check my answer hope this can gives you an idea – YCF_L May 06 '17 at 15:28

1 Answers1

0

I have a file containing lines of 12 characters and I want to find 5 character of string from that file. In Mapper I am getting 12 character of line and can create two maps of 5 character and left 2 character and want to get next 3 character from next line and want to create map of it. So in reducer I can compare that maps with my string.

You can concatenate your line all together then you can split the result with 5 character check this Splitting a string at every n-th character :

abcdefghijklkahfdjshjsdhsdfkjsdjkjks
[abcde, fghij, klkah, fdjsh, jsdhs, dfkjs, djkjk, s]

You can inspire the solution from this piece of code :

File file = new File("myFile.txt");
try {
    Scanner scanner = new Scanner(file);
    String result = "";
    while (scanner.hasNextLine()) {
        String line = scanner.nextLine();
        result += line;
    }
    System.out.println(result);
    //here you can use this array
    String spl[] = result.split("(?<=\\G.....)");

    System.out.println(Arrays.toString(spl));
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

Output

abcdefghijklkahfdjshjsdhsdfkjsdjkjks
[abcde, fghij, klkah, fdjsh, jsdhs, dfkjs, djkjk, s]

EDIT

I Want to create map like this abcdefghijklkahfdjshjsdhsdfkjsdjkjks [abcde, bcdef, cdefg, defgh... ]

You can solve this problem like so :

String str = "abcdefghijklkahfdjshjsdhsdfkjsdjkjks";
List<String> list = new ArrayList<>();

for (int i = 0; i < str.length()-4; i++) {
    String s = "";
    for (int j = i; j < i+5; j++) {
        s+=result.charAt(j);
    }
    list.add(s);
}

Output

[abcde, bcdef, cdefg, defgh, efghi, fghij, ghijk, ...., djkjk, jkjks]
Community
  • 1
  • 1
YCF_L
  • 51,266
  • 13
  • 85
  • 129
  • I Want to create map like this abcdefghijklkahfdjshjsdhsdfkjsdjkjks [abcde, bcdef, cdefg, defgh... ] – Ibrar Ahmed May 06 '17 at 15:36
  • 2ndly need to concatenate last 2 characters with next line start character – Ibrar Ahmed May 06 '17 at 15:38
  • so the last value of your array should be `jkjks` right @IbrarAhmed ? – YCF_L May 06 '17 at 15:50
  • My next line is "hasjfdkjksjkdfjsalkdjf" so next maps should be [jkjks,kjksh, jksha, kshas, shasj .... – Ibrar Ahmed May 06 '17 at 16:33
  • yes my solution do what you want did you try it @IbrarAhmed – YCF_L May 06 '17 at 16:34
  • Yes try it, it start again for next line, I want to resume scanning from first line, like last 4 character of first line and one character from second line and the last 3 character from first file and first two character from second line. – Ibrar Ahmed May 06 '17 at 16:56
  • 1
    I cannot concatenate all lines because they are too many, need a solution which is getting just next line, [I need it in mapper ] – Ibrar Ahmed May 06 '17 at 17:08
  • mmm, ok i will try to find a solution when i will be free, it need more concentration @IbrarAhmed – YCF_L May 06 '17 at 17:10