1

I am developing a train schedule website. In their I am using following URL pattern

https://example.com/search/61-205/2023-12-28/stationA-to-stationC

so 61-205 will be from station ID and To Station ID. Date will be the schedule date. So the last section "stationA-to-stationC" added for SEO purpose. Now as you can see there are 365 days per year. So my URLs date field will add 365 combinations plus if the stations changed combinations will be many.

I am concerned about SEO aspect too. So my concerns is if the date ranked on Google it will give a wrong indication after few days that this site gives old schedules. So I thought not to pass the date filed in URL instead pass it via viewbag or something ( I am using asp.net core ).

or Can I tell Google not to Index date path? So the URL will be

https://example.com/search/61-205/stationA-to-stationC If this was indexed on Google I can automatically set the current date and show the results. But in this case if the user copy the URL and send it to another user via email and if he or she click it they can't view the exact day the original user viewed. instead they will see the schedule for the particular day. because date part is not in the URL. This could be ignore if the SEO benefit is low.

If I pass the date field will it add unnecessary pages for Google?

What should i do to only index https://example.com/search/61-205/stationA-to-stationC Pattern instead below (i can show today schedule if date part not included in url) . But below should work.

https://example.com/search/61-205/2023-12-28/stationA-to-stationC

  • You shouldn't use the word "search" in your URLs. Google doesn't like having search results indexed in their search results because it is poor user experience for users to click off Google search results only to land on other search results. In this case, I don't expect the pages to actually look like search results, so it might not be a problem, but to be safe, I'd use some other term such as "schedule" or "schedules" in the URL. In addition to keeping Google happy, it also would make more semantic sense. – Stephen Ostermiller Dec 28 '23 at 22:31
  • If you are planning to have 205 stations that each have a page to every other station, that is 41,820 pages. Even without dates, that is a LOT of pages to ask Google to index. Is there any way to only show Google the popular station pairs? – Stephen Ostermiller Dec 28 '23 at 22:33
  • What would you plan to do with dates in the past? Does anybody need to look at train schedules for last week, or a year ago? – Stephen Ostermiller Dec 28 '23 at 22:36
  • @StephenOstermiller Will change Search to Schedule. 2) I thought to add date part in URL but set the canonical URL to none date part. So that way user can share the URL to someone else in mean time google will not index it. how about that? 3) No one need to check past data. In case if they pass the URL to someone else via FB, or skype I thought to give the same result to end user, Thats why I thought to include the data part. Also I am not specifically saying google to index anything. I am not putting pairs in sitemap. Instead I am putting individual stations in sitemap. How about that? – Prageeth Liyanage Dec 29 '23 at 03:55
  • @StephenOstermiller What should i do to only index https://example.com/search/61-205/stationA-to-stationC Pattern instead below (i can show today schedule if date part not included in url) . But below should work.the reason is tgere are some popular routes that users are search on google.

    https://example.com/search/61-205/2023-12-28/stationA-to-stationC or shall i disallow whole search page

    – Prageeth Liyanage Dec 29 '23 at 08:43
  • Sitemaps have almost nothing to do with what gets indexed. Google can and will index pages not in your sitemap, and it will choose not to index pages listed in your sitemap. See The Sitemap Paradox Unless you decide to block Google from indexing your station pair URLs, they should probably go in a sitemap so that you can get visibility into them in search console. – Stephen Ostermiller Dec 29 '23 at 09:35
  • I like your canonical idea. You could even self-answer your question with that. – Stephen Ostermiller Dec 29 '23 at 09:36
  • Are people searching on Google for station pairs? If so, it makes sense to make popular pairs available on Google. If not, just the stations is probably a better plan. – Stephen Ostermiller Dec 29 '23 at 09:39
  • @StephenOstermiller If i disallow whole search page how could i allow popular pairs avaliable. Like another answer in this thread says if i allow whole pairs it will badly affect to crawl budget. I saw your reply in another question eventhough we use canonical tag bot will need to use its budget to scroll the url. So its a waste of resources too. – Prageeth Liyanage Dec 29 '23 at 10:12
  • Google is usually willing to crawl 100 times the number of pages you have indexed, so even 100,000 pages might not be too much if you have 1,000 indexed already. You can control which Google crawls by which you tell Google about. Only list some in the sitemap, and don't link to the others either. – Stephen Ostermiller Dec 29 '23 at 10:15
  • So what you saying is disallow whole search page. But include essential pairs in sitemap. Eventhough i explicit say not to index google will decide what to do with it. Am i correct – Prageeth Liyanage Dec 29 '23 at 10:21
  • If you disallow, the sitemap doesn't override. You allow everything, but you only link to the ones you want Google to know about. – Stephen Ostermiller Dec 29 '23 at 10:23

1 Answers1

2

From an SEO perspective. Google has stated, not all pages are worth indexing, Google specifically states they by design do not index the majority of pages they know about, which is in the trillions.

Duplicate content is a problem.

https://example.com/search/61-205/2023-12-28/

https://example.com/search/61-205/2023-12-29/

https://example.com/search/61-205/2023-12-30/

Unless these are significantly different. Search Engines would only have an interest in indexing one of them. Determining which one is the best one to index is also a problem, even if queries are used.

https://example.com/search/61-205/?date=20231228

https://example.com/search/61-205/?date=20231229

Many e-commerce sites are now having problems because Google removed the option from Google search console, to tell Google not to index the query strings.

This is related webmaster Question but is not specific to the console change.

How to keep crawler robots from indexing my page when there is a query string present in the URL?

When search engines are willing to index one of many duplicates the problem becomes which page. If the page keeps changing, and with a schedule, it would, the page, which is indexed, never matures and never gets the benefits from Google's algos related to interactions.

Adding another level does not fix the duplicate problem

https://example.com/search/61-205/2023-12-28/stationA-to-stationC

https://example.com/search/61-205/2023-12-28/stationA-to-stationB

What is the difference between these pages? Having the page title, heading, and one line of text be different is a duplicate in the eyes of search engines.

https://example.com/search/61-205/2023-12-28/stationA-to-stationC

https://example.com/search/61-205/2023-12-29/stationA-to-stationC

Are also duplicates in the eyes of search engines as only the date has been changed.

Possible Solution

https://example.com/schedule/61-205/

Can be the indexable page for the train 61-205. Schedule can be used in the URL as it is a possible user query keyword and having it in the URL is helpful for SEO.

https://example.com/schedule/61-205/itinerary?date=&station1=&station2=

Can be the fixed URL that people can share. But make the itinerary page no index; As it will be duplicate content for SEO.

Disallow: /schedule/61-205/itinerary?

https://example.com/stations/StationA

Would also be useful SEO fodder as they are all unique and people may search for them.

https://example.com/stations/StationA/61-205

https://example.com/trains/61-205

May be useful if there is enough information that is different between trains. They would also be useful to have the get a Quotation Search on which leads to https://example.com/schedule/61-205/itinerary?date=&station1=&station2= which people can share, email, etc.


On additional consideration on SEO Structure

A structure of

https://www.example.com/trains/61-205

Page tells people about the train.

https://www.example.com/trains/51-205/schedule/

Provides gain of information about the schedule and stations. Maybe some specific points of interest or places for photography. The provided schedule would be for today ... with the option to search for a different date.

https://www.example.com/station/StationA/

provides information about the station.

https://www.example.com/station/StationA/schedule/

Provides gain of information as to which trains are coming and going and destinations of interest. The provided schedule would be today. With box to search on another day.

https://www.example.com/itinerary/

Would be the search or quotation portal.

https://www.example.com/itinerary/results

would be the user-sharedable links, which would have the duplicate problem for SEO so they are no-index.

For additional fodder.

https://www.example.com/holidays

https://www.example.com/points-of-interest

https://www.example.com/photographers

https://www.example.com/photographers/shared-photos/

https://www.example.com/discounts

https://www.example.com/specials

The SEO fodder list goes on and on.

Wayne Smith
  • 2,636
  • 12
  • 15
  • thank you very much detailed answer. In my question 61-205 is from station ID and To station ID. Not the train no. Also I thought to add date part in URL but set the canonical URL to none date part. So that way user can share the URL to someone else in mean time google will not index it. how about that? Also I am not specifically saying google to index anything. I am not putting pairs in sitemap. Instead I am putting individual stations in sitemap. How about that? – Prageeth Liyanage Dec 29 '23 at 03:59
  • The page which is to be shared but not indexed would need a meta name="robots" content="noindex" a canonical is not good enough. using the meta for robots will stop the duplicate problem but the page will still be read by bots and use up the budget for bots on the site, leaving other pages which have an update unread ... to optimize site wide for bots, use robots.txt to block the URLs of the pages google should not spend time on. – Wayne Smith Dec 29 '23 at 06:58
  • What should i do to only index https://example.com/search/61-205/stationA-to-stationC Pattern instead below (i can show today schedule if date part not included in url) . But below should work.the reason is there are some popular routes that users are search on google.

    https://example.com/search/61-205/2023-12-28/stationA-to-stationC. Or shall i disallow whole search page

    – Prageeth Liyanage Dec 29 '23 at 08:48
  • You can provide the content to Google but don't create duplicate pages where the only thing different is the date. You can link to the different dates from (today's date pages) both for navigation and SEO but if the only difference is the date on the page it is a duplicate and needs no index. Google will still see the link and date on today's page. For a date like New Years Schedule, maybe make a page that is different in terms of adding more changes like (big letters holiday schedule - happy new year - etc) so it is not a duplicate. – Wayne Smith Dec 30 '23 at 14:41
  • so you suggest totally disallow search page right? So eventhough there are popular routes they will not index on google. My idea is if user select today date no date part will be added to url. If user select another date i will add it( in cae if user share it) but i will add canonical tag to non date path. Also i am adding few popular routes to sitemap. Not all. But mainly i am not disallow search page. Is my approach good or bad? – Prageeth Liyanage Dec 30 '23 at 15:56
  • If the approach adds duplicates to Google search it is a technical problem, which would be on the list of problems fixed with technical SEO. The solution is to add no-index meta to those pages, (canonical tags may work but often are ignored, hence the need to no-index), Crawling those pages if they are in quantity would be a technical SEO problem, (uses up the budget for bot and other pages get ignored) the work done by technical SEO to fix that is to use robots.txt to tell the robot not to spend its budget on those pages. – Wayne Smith Dec 31 '23 at 14:52
  • thank you very much for this detailed answer. Appreciate. I accepted your answer. – Prageeth Liyanage Jan 01 '24 at 03:46