Does the number of columns been selected affect query speed?

Question

I know some sql but don't quite understand the things behind the scene. And this question came to me the other day while I am taking shower...

If we compare these two queries below:

a. select * from table_abc;

b. select id, name, timestamp from table_abc;

which query is (theoretically) faster? My guess is a because in query b it takes time to pick out the 3 columns I want. But that's counterintuitive at the same time because that means getting less information takes longer. All comments/answers are highly appreciated!

If you are interested in deeper understanding of what affects query performance in a [row-based relational] database engine, you can take a look at this classic paper. — mustaccio, Apr 21 '21 at 21:21

score 2 · Answer 1 · answered Apr 22 '21 at 00:35

With most modern relational database management systems, there are three important things to consider relating to your question:

The amount of data being returned to the client. As you've already started to hint at, more data being returned to the client consumer generally means the longer it'll take for the query to complete. (This is generally globally true regardless of where the data is coming from, relational database, non-relational database, etc.)
The amount of data needed to be loaded from the server. This is arguably a subset of #1, but besides longer times to return the data to the client, there's also an increased time in locating and serving the data. Most relational database management systems store data rows in an object called pages on the disk. In Microsoft SQL Server, for example, the default size of a data page is 8 KB. If the amount of data in a single row exceeds 8 KB, then that row will be stored across multiple pages. When executing a query with SELECT *, that will instruct the server to locate all the pages for the rows being returned which can be multiple pages per row and thus extra work besides additional data being loaded from the server which isn't necessary if you're only using a subset of columns from the dataset being queried. The disk is typically the slowest hardware component to a server, so minimizing the amount of data the server needs to locate and load off disk is substantially beneficial.
Indexes usually won't be able to be used as efficiently when executing a SELECT * query because it's not usual to have a covering index on all the fields of a table. That means instead of being able for the server to efficiently seek out the data on the most performant index for your data needs, it'll likely instead need to scan the entire clustered index which is generally a slower operation.

Long story short, it's bad practice and an anti-pattern to use SELECT * which will 99% of the time be slower than specifying only the columns you need in the SELECT list.

Does the number of columns been selected affect query speed?

1 Answers1