0

Hi guys I am working on a web-scraping project in which I got 12 websites to fetch product details. I have an.xml file which contains html details of that 12 websites like price class, pricetag, div, etc through the help of which I extract product price using Jsoup.

Scenario 1: This is a time-consuming process as I am iterating over 12 websites sequentially.

onnQuerySubmit:

InputStream is = getAssets().open("websitePriceData.txt"); //fetch XML
webData = xmlParser.readXMLFromAsset(is);                  
new Content().execute(webData);                           //async task called

doInBackground:

try 
{
    DataPartition<String> filteredData = DataPartition.ofSize(webData,3 ); /*creating sublist of child nodes from xml so that I can group together website and their html details*/ 
    for(int i=0; i<filteredData.size(); i++) {
        String[] websiteContent;
        websiteContent = filteredData.get(i).toArray(new String[3]);

        String websiteName = websiteContent[0];
        String priceClass = websiteContent[1];
        String priceAttribute = websiteContent[2];        
    
        priceExtractionLogic();
    } 

postExecute:

adapter = new Adapter(getApplicationContext(), MainActivity.this, recyclerViewList, searchText);
recyclerView.setAdapter(adapter); 

Scenario 2: I decided to break that one .xml of 12 websites into 4 .xml of 3 websites each.

OnQuerySubmit:

 websiteDataText = new String[]{"websitePriceData1.txt", "websitePriceData2.txt", "websitePriceData3.txt", "websitePriceData4.txt"};
    
     
for(int i=0; i<websiteDataText.length; i++){
   InputStream is = getAssets().open(websiteDataText[i]);
   webData = xmlParser.readXMLFromAsset(is);
   filteredData = DataPartition.ofSize(webData,3 );
   new Content().execute(filteredData);  //this is expected to create 4 doinBackground threads with each thread containing three website extraction data.
}

                        

doinBackground() and postExecute() same as Scenario 1 Scenario 2 output screenshot is attached and I am not able to comprehend why this is happening, I am getting items repeated. If scenario 2 is completely wrong please suggest performance changes or redirect to any reading material regarding performance improvement in this case

enter image description here.

enter image description here

James Z
  • 12,104
  • 10
  • 27
  • 43
  • You should not make a brand new adapter in post execute - instead, make an empty adapter in `onCreate` and in post execute add new items to the existing adapter and call `notifyDatasetChanged` on it. Also keep in mind that by default AsyncTask runs on only [one thread](https://stackoverflow.com/questions/28966320/why-are-asynctask-threads-running-one-after-the-other) so this won't actually speed it up. You would have to specifically set a larger thread pool for your approach to help. – Tyler V Jun 05 '22 at 01:44
  • See [this question](https://stackoverflow.com/questions/29937556/asynctask-execute-or-executeonexecutor) for some more details and examples – Tyler V Jun 05 '22 at 01:53
  • Tried "executeOnExecuter()" and "notifyDataSetChange() on adapter" but still getting repeated items. – Pranit Vishwakarma Jun 05 '22 at 10:16

0 Answers0