Hi guys I am working on a web-scraping project in which I got 12 websites to fetch product details. I have an.xml file which contains html details of that 12 websites like price class, pricetag, div, etc through the help of which I extract product price using Jsoup.
Scenario 1: This is a time-consuming process as I am iterating over 12 websites sequentially.
onnQuerySubmit:
InputStream is = getAssets().open("websitePriceData.txt"); //fetch XML
webData = xmlParser.readXMLFromAsset(is);
new Content().execute(webData); //async task called
doInBackground:
try
{
DataPartition<String> filteredData = DataPartition.ofSize(webData,3 ); /*creating sublist of child nodes from xml so that I can group together website and their html details*/
for(int i=0; i<filteredData.size(); i++) {
String[] websiteContent;
websiteContent = filteredData.get(i).toArray(new String[3]);
String websiteName = websiteContent[0];
String priceClass = websiteContent[1];
String priceAttribute = websiteContent[2];
priceExtractionLogic();
}
postExecute:
adapter = new Adapter(getApplicationContext(), MainActivity.this, recyclerViewList, searchText);
recyclerView.setAdapter(adapter);
Scenario 2: I decided to break that one .xml of 12 websites into 4 .xml of 3 websites each.
OnQuerySubmit:
websiteDataText = new String[]{"websitePriceData1.txt", "websitePriceData2.txt", "websitePriceData3.txt", "websitePriceData4.txt"};
for(int i=0; i<websiteDataText.length; i++){
InputStream is = getAssets().open(websiteDataText[i]);
webData = xmlParser.readXMLFromAsset(is);
filteredData = DataPartition.ofSize(webData,3 );
new Content().execute(filteredData); //this is expected to create 4 doinBackground threads with each thread containing three website extraction data.
}
doinBackground() and postExecute() same as Scenario 1 Scenario 2 output screenshot is attached and I am not able to comprehend why this is happening, I am getting items repeated. If scenario 2 is completely wrong please suggest performance changes or redirect to any reading material regarding performance improvement in this case