What is difference between $group and $project in mongodb?

I have read the docs and still not quite following it. According to it, it returns me specific documents according to my own specifications inside a collection. For grouping, it pretty much says the same thing: "Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping"

So, what does this following code is actually doing? It seems redundant to me.

BillingCycle.aggregate([{
    $project: {credit: {$sum: "$credits.value"}, debt: {$sum: "debts.value"}}

}, {
    $group: {
        _id: null,
        credit: {$sum: "$credit"}, debt: {$sum: "debt"}
    }
}, {
    $project: {_id: 0, credit: 1, debt: 1   }
}]});

asked Nov 3, 2018 at 19:55

What is difference between $group and $project in mongodb?

"Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping"

The purpose of $group is not only to push some fields to next stage but to gather some element on the basis of input criteria passed in the _id attribute.

On the other, hand $project function will exclude/include some field(or custom field) to next stage. As per document you can see the definition "Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields."

There is one case if we suppress the _id from $group then it will calculate accumulated values for all the input documents as a whole. Which seems to act like $project.

For the query on $project stage is redundant

BillingCycle.aggregate([ {
    $group: {
        _id: null,
        credit: {$sum: "$credit.value"}, debt: {$sum: "debt.value"}
    }
}, {
    $project: {_id: 0, credit: 1, debt: 1   }
}]});

answered Nov 4, 2018 at 4:46

What is difference between $group and $project in mongodb?

Get MongoDB - The Complete Developer's Guide now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.

Start your free trial

Docs HomeMongoDB Manual

$group

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key.

A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the _id field in the $group pipeline stage to set the group key. See below for usage examples.

In the $group stage output, the _id field is set to the group key for that document.

The output documents can also contain additional fields that are set using accumulator expressions.

Note

$group does not order its output documents.

The $group stage has the following prototype form:

{
$group:
{
_id: <expression>, // Group key
<field1>: { <accumulator1> : <expression1> },
...
}
}

Field

Description

_id

Required. The _id expression specifies the group key. If you specify an _id value of null, or any other constant value, the $group stage returns a single document that aggregates values across all of the input documents. See the Group by Null example.

field

Optional. Computed using the accumulator operators.

The _id and the accumulator operators can accept any valid expression. For more information on expressions, see Expressions.

The <accumulator> operator must be one of the following accumulator operators:

Changed in version 5.0.

Name

Description

$accumulator

Returns the result of a user-defined accumulator function.

$addToSet

Returns an array of unique expression values for each group. Order of the array elements is undefined.

Changed in version 5.0: Available in $setWindowFields stage.

$avg

Returns an average of numerical values. Ignores non-numeric values.

Changed in version 5.0: Available in $setWindowFields stage.

$bottom

Returns the bottom element within a group according to the specified sort order.

New in version 5.2.

Available in $group and $setWindowFields stages.

$bottomN

Returns an aggregation of the bottom n fields within a group, according to the specified sort order.

New in version 5.2.

Available in $group and $setWindowFields stages.

$count

Returns the number of documents in a group.

Distinct from the $count pipeline stage.

New in version 5.0: Available in $group and $setWindowFields stages.

$first

Returns a value from the first document for each group. Order is only defined if the documents are sorted.

Distinct from the $first array operator.

Changed in version 5.0: Available in $setWindowFields stage.

$firstN

Returns an aggregation of the first n elements within a group. Only meaningful when documents are in a defined order. Distinct from the $firstN array operator.

New in version 5.2: Available in $group, expression and $setWindowFields stages.

$last

Returns a value from the last document for each group. Order is only defined if the documents are sorted.

Distinct from the $last array operator.

Changed in version 5.0: Available in $setWindowFields stage.

$lastN

Returns an aggregation of the last n elements within a group. Only meaningful when documents are in a defined order. Distinct from the $lastN array operator.

New in version 5.2: Available in $group, expression and $setWindowFields stages.

$max

Returns the highest expression value for each group.

Changed in version 5.0: Available in $setWindowFields stage.

$maxN

Returns an aggregation of the n maximum valued elements in a group. Distinct from the $maxN array operator.

New in version 5.2.

Available in $group, $setWindowFields and as an expression.

$mergeObjects

Returns a document created by combining the input documents for each group.

$min

Returns the lowest expression value for each group.

Changed in version 5.0: Available in $setWindowFields stage.

$push

Returns an array of expression values for documents in each group.

Changed in version 5.0: Available in $setWindowFields stage.

$stdDevPop

Returns the population standard deviation of the input values.

Changed in version 5.0: Available in $setWindowFields stage.

$stdDevSamp

Returns the sample standard deviation of the input values.

Changed in version 5.0: Available in $setWindowFields stage.

$sum

Returns a sum of numerical values. Ignores non-numeric values.

Changed in version 5.0: Available in $setWindowFields stage.

$top

Returns the top element within a group according to the specified sort order.

New in version 5.2.

Available in $group and $setWindowFields stages.

$topN

Returns an aggregation of the top n fields within a group, according to the specified sort order.

New in version 5.2.

Available in $group and $setWindowFields stages.

The $group stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $group returns an error. To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.

Tip

See also:

This section describes optimizations to improve the performance of $group. There are optimizations that you can make manually and optimizations MongoDB makes internally.

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.

Example

If a collection named foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:

db.foo.aggregate([
{
$sort:{ x : 1, y : 1 }
},
{
$group: {
_id: { x : "$x" },
y: { $first : "$y" }
}
}
])

Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group stages if either:

  • $group is the first stage in the pipeline.

  • All preceding stages in the pipeline can also be executed by the slot-based engine.

For more information, see $group Optimization.

In mongosh, create a sample collection named sales with the following documents:

db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

The following aggregation operation uses the $group stage to count the number of documents in the sales collection:

db.sales.aggregate( [
{
$group: {
_id: null,
count: { $count: { } }
}
}
] )

The operation returns the following result:

{ "_id" : null, "count" : 8 }

This aggregation operation is equivalent to the following SQL statement:

SELECT COUNT(*) AS count FROM sales

Tip

The following aggregation operation uses the $group stage to retrieve the distinct item values from the sales collection:

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

The operation returns the following result:

{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }

The following aggregation operation groups documents by the item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:

db.sales.aggregate(
[
// First Stage
{
$group :
{
_id : "$item",
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
}
},
// Second Stage
{
$match: { "totalSaleAmount": { $gte: 100 } }
}
]
)

First Stage:The $group stage groups the documents by item to retrieve the distinct item values. This stage returns the totalSaleAmount for each item.Second Stage:The $match stage filters the resulting documents to only return items with a totalSaleAmount greater than or equal to 100.

The operation returns the following result:

{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }

This aggregation operation is equivalent to the following SQL statement:

SELECT item,
Sum(( price * quantity )) AS totalSaleAmount
FROM sales
GROUP BY item
HAVING totalSaleAmount >= 100

Tip

See also:

In mongosh, create a sample collection named sales with the following documents:

db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:

db.sales.aggregate([
// First Stage
{
$match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
},
// Second Stage
{
$group : {
_id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
},
// Third Stage
{
$sort : { totalSaleAmount: -1 }
}
])

First Stage:The $match stage filters the documents to only pass documents from the year 2014 to the next stage.Second Stage:The $group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.Third Stage:The $sort stage sorts the results by the total sale amount for each group in descending order.

The operation returns the following results:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

This aggregation operation is equivalent to the following SQL statement:

SELECT date,
Sum(( price * quantity )) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
GROUP BY Date(date)
ORDER BY totalSaleAmount DESC

The following aggregation operation specifies a group _id of null, calculating the total sale amount, average quantity, and count of all documents in the collection.

db.sales.aggregate([
{
$group : {
_id : null,
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
}
])

The operation returns the following result:

{
"_id" : null,
"totalSaleAmount" : NumberDecimal("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}

This aggregation operation is equivalent to the following SQL statement:

SELECT Sum(price * quantity) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales

In mongosh, create a sample collection named books with the following documents:

db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

The following aggregation operation pivots the data in the books collection to have titles grouped by authors.

db.books.aggregate([
{ $group : { _id : "$author", books: { $push: "$title" } } }
])

The operation returns the following documents:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

The following aggregation operation groups documents by author:

db.books.aggregate([
// First Stage
{
$group : { _id : "$author", books: { $push: "$$ROOT" } }
},
// Second Stage
{
$addFields:
{
totalCopies : { $sum: "$books.copies" }
}
}
])

First Stage:

$group uses the $$ROOT system variable to group the entire documents by authors. This stage passes the following documents to the next stage:

{ "_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
]
},
{ "_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
]
}

Second Stage:

$addFields adds a field to the output containing the total copies of books for each author.

Note

The operation returns the following documents:

{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}
{
"_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}

Tip

See also:

The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the $group operator in a common use case.

What does $group do in MongoDB?

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key. A group key is often a field, or group of fields. The group key can also be the result of an expression.

What is $project in MongoDB?

Definition. $project. Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.

Why We Use unwind in MongoDB?

MongoDB $unwind transforms complex documents into simpler documents, which increase readability and understanding. This also allows us to perform additional operations, like grouping and sorting on the resulting output.

What is $$ root in MongoDB?

The $$ROOT variable contains the source documents for the group. If you'd like to just pass them through unmodified, you can do this by $pushing $$ROOT into the output from the group.