PostgreSQL increase group by over 30 million rows

up vote
0
down vote

favorite

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.

create table if not exists tb

(

    id serial not null constraint tb_pkey primary key,

    week integer,

    month integer,

    year integer,

    starttime varchar(20),

    endtime varchar(20),

    brand smallint,

    category smallint,

    value real

);

The query below takes 8.5 seconds.

SELECT category from tb group by category

Is there any way to increase the speed. I have tried with and without index.

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

add a comment |

up vote
0
down vote

favorite

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.

create table if not exists tb

(

    id serial not null constraint tb_pkey primary key,

    week integer,

    month integer,

    year integer,

    starttime varchar(20),

    endtime varchar(20),

    brand smallint,

    category smallint,

    value real

);

The query below takes 8.5 seconds.

SELECT category from tb group by category

Is there any way to increase the speed. I have tried with and without index.

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

add a comment |

up vote
0
down vote

favorite

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.

create table if not exists tb

(

    id serial not null constraint tb_pkey primary key,

    week integer,

    month integer,

    year integer,

    starttime varchar(20),

    endtime varchar(20),

    brand smallint,

    category smallint,

    value real

);

The query below takes 8.5 seconds.

SELECT category from tb group by category

Is there any way to increase the speed. I have tried with and without index.

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.

create table if not exists tb

(

    id serial not null constraint tb_pkey primary key,

    week integer,

    month integer,

    year integer,

    starttime varchar(20),

    endtime varchar(20),

    brand smallint,

    category smallint,

    value real

);

The query below takes 8.5 seconds.

SELECT category from tb group by category

Is there any way to increase the speed. I have tried with and without index.

postgresql performance rdbms database-performance

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

asked Nov 10 at 19:06

Viswanath Lekshmanan

7,13912850

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

For that exact query, not really; doing this operation requires scanning every row. No way around it.

But if you're looking to be able to quickly get the set of unique categories, and you have an index on that column, you can use a variation of the WITH RECURSIVE example shown in the edit to the question here (look towards the end of the question):

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);

CREATE TABLE

testdb=# insert into tb(category) select 2 from generate_series(1, 10000)

testdb-# ;

INSERT 0 10000

testdb=# insert into tb(category) select 1 from generate_series(1, 10000);

INSERT 0 10000

testdb=# insert into tb(category) select 3 from generate_series(1, 10000);

INSERT 0 10000

testdb=# create index on tb(category);

CREATE INDEX

testdb=# WITH RECURSIVE cte AS

  (

     (SELECT category

      FROM tb

      WHERE category >= 0

      ORDER BY 1

      LIMIT 1)

   UNION ALL SELECT

     (SELECT category

      FROM tb

      WHERE category > c.category

      ORDER BY 1

      LIMIT 1)

   FROM cte c

   WHERE category IS NOT NULL)

SELECT category

FROM cte

WHERE category IS NOT NULL;

 category 

----------

        1

        2

        3

(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)

   Filter: (category IS NOT NULL)

   Rows Removed by Filter: 1

   CTE cte

     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)

           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)

                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)

                       Index Cond: (category >= 0)

                       Heap Fetches: 1

           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)

                 Filter: (category IS NOT NULL)

                 Rows Removed by Filter: 0

                 SubPlan 1

                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)

                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)

                               Index Cond: (category > c.category)

                               Heap Fetches: 2

 Planning time: 0.224 ms

 Execution time: 0.191 ms

(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

Another route you can take is to add another table where you just store unique values of tb.category and have application logic check that table and insert their value when updating/inserting that column. This can also be done database-side with triggers; that solution is also discussed in the answers to the linked question.

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242446%2fpostgresql-increase-group-by-over-30-million-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

For that exact query, not really; doing this operation requires scanning every row. No way around it.

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);

CREATE TABLE

testdb=# insert into tb(category) select 2 from generate_series(1, 10000)

testdb-# ;

INSERT 0 10000

testdb=# insert into tb(category) select 1 from generate_series(1, 10000);

INSERT 0 10000

testdb=# insert into tb(category) select 3 from generate_series(1, 10000);

INSERT 0 10000

testdb=# create index on tb(category);

CREATE INDEX

testdb=# WITH RECURSIVE cte AS

  (

     (SELECT category

      FROM tb

      WHERE category >= 0

      ORDER BY 1

      LIMIT 1)

   UNION ALL SELECT

     (SELECT category

      FROM tb

      WHERE category > c.category

      ORDER BY 1

      LIMIT 1)

   FROM cte c

   WHERE category IS NOT NULL)

SELECT category

FROM cte

WHERE category IS NOT NULL;

 category 

----------

        1

        2

        3

(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)

   Filter: (category IS NOT NULL)

   Rows Removed by Filter: 1

   CTE cte

     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)

           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)

                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)

                       Index Cond: (category >= 0)

                       Heap Fetches: 1

           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)

                 Filter: (category IS NOT NULL)

                 Rows Removed by Filter: 0

                 SubPlan 1

                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)

                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)

                               Index Cond: (category > c.category)

                               Heap Fetches: 2

 Planning time: 0.224 ms

 Execution time: 0.191 ms

(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

add a comment |

up vote
0
down vote

For that exact query, not really; doing this operation requires scanning every row. No way around it.

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);

CREATE TABLE

testdb=# insert into tb(category) select 2 from generate_series(1, 10000)

testdb-# ;

INSERT 0 10000

testdb=# insert into tb(category) select 1 from generate_series(1, 10000);

INSERT 0 10000

testdb=# insert into tb(category) select 3 from generate_series(1, 10000);

INSERT 0 10000

testdb=# create index on tb(category);

CREATE INDEX

testdb=# WITH RECURSIVE cte AS

  (

     (SELECT category

      FROM tb

      WHERE category >= 0

      ORDER BY 1

      LIMIT 1)

   UNION ALL SELECT

     (SELECT category

      FROM tb

      WHERE category > c.category

      ORDER BY 1

      LIMIT 1)

   FROM cte c

   WHERE category IS NOT NULL)

SELECT category

FROM cte

WHERE category IS NOT NULL;

 category 

----------

        1

        2

        3

(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)

   Filter: (category IS NOT NULL)

   Rows Removed by Filter: 1

   CTE cte

     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)

           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)

                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)

                       Index Cond: (category >= 0)

                       Heap Fetches: 1

           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)

                 Filter: (category IS NOT NULL)

                 Rows Removed by Filter: 0

                 SubPlan 1

                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)

                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)

                               Index Cond: (category > c.category)

                               Heap Fetches: 2

 Planning time: 0.224 ms

 Execution time: 0.191 ms

(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

add a comment |

up vote
0
down vote

For that exact query, not really; doing this operation requires scanning every row. No way around it.

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);

CREATE TABLE

testdb=# insert into tb(category) select 2 from generate_series(1, 10000)

testdb-# ;

INSERT 0 10000

testdb=# insert into tb(category) select 1 from generate_series(1, 10000);

INSERT 0 10000

testdb=# insert into tb(category) select 3 from generate_series(1, 10000);

INSERT 0 10000

testdb=# create index on tb(category);

CREATE INDEX

testdb=# WITH RECURSIVE cte AS

  (

     (SELECT category

      FROM tb

      WHERE category >= 0

      ORDER BY 1

      LIMIT 1)

   UNION ALL SELECT

     (SELECT category

      FROM tb

      WHERE category > c.category

      ORDER BY 1

      LIMIT 1)

   FROM cte c

   WHERE category IS NOT NULL)

SELECT category

FROM cte

WHERE category IS NOT NULL;

 category 

----------

        1

        2

        3

(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)

   Filter: (category IS NOT NULL)

   Rows Removed by Filter: 1

   CTE cte

     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)

           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)

                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)

                       Index Cond: (category >= 0)

                       Heap Fetches: 1

           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)

                 Filter: (category IS NOT NULL)

                 Rows Removed by Filter: 0

                 SubPlan 1

                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)

                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)

                               Index Cond: (category > c.category)

                               Heap Fetches: 2

 Planning time: 0.224 ms

 Execution time: 0.191 ms

(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

For that exact query, not really; doing this operation requires scanning every row. No way around it.

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);

CREATE TABLE

testdb=# insert into tb(category) select 2 from generate_series(1, 10000)

testdb-# ;

INSERT 0 10000

testdb=# insert into tb(category) select 1 from generate_series(1, 10000);

INSERT 0 10000

testdb=# insert into tb(category) select 3 from generate_series(1, 10000);

INSERT 0 10000

testdb=# create index on tb(category);

CREATE INDEX

testdb=# WITH RECURSIVE cte AS

  (

     (SELECT category

      FROM tb

      WHERE category >= 0

      ORDER BY 1

      LIMIT 1)

   UNION ALL SELECT

     (SELECT category

      FROM tb

      WHERE category > c.category

      ORDER BY 1

      LIMIT 1)

   FROM cte c

   WHERE category IS NOT NULL)

SELECT category

FROM cte

WHERE category IS NOT NULL;

 category 

----------

        1

        2

        3

(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)

   Filter: (category IS NOT NULL)

   Rows Removed by Filter: 1

   CTE cte

     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)

           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)

                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)

                       Index Cond: (category >= 0)

                       Heap Fetches: 1

           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)

                 Filter: (category IS NOT NULL)

                 Rows Removed by Filter: 0

                 SubPlan 1

                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)

                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)

                               Index Cond: (category > c.category)

                               Heap Fetches: 2

 Planning time: 0.224 ms

 Execution time: 0.191 ms

(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

edited Nov 10 at 21:13

answered Nov 10 at 21:04

AdamKG

9,14722635

answered Nov 10 at 21:04

AdamKG

9,14722635

answered Nov 10 at 21:04

AdamKG

9,14722635

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky