Calculate Percentile in MongoDB using aggregation

Calculate Percentile in MongoDB:

Mongo does not provide any built-in method for calculating the percentile. In this blog, you will find a way to calculate percentile in MongoDB using the aggregation framework.

Let’s take the below sample to find out the percentile.

{
"_id" : ObjectId("5dc06d2d5e1ffa18e2728b56"),
"company" : "apple",
"brand" : "iphone",
"cin" : "asxrr201022j3jd",
"unit" : "USD",
"region" : "USA",
"prices" : [ 
    {"retail" : "id_1","price" : 649}, 
    {"retail" : "id_2","price" : 755}, 
    {"retail" : "id_3","price" : 650}, 
    {"retail" : "id_4","price" : 700}, 
    {"retail" : "id_5","price" : 710}
 ]
}

Let’s use the below query to calculate the percentile in MongoDB using the aggregation framework.

db.getCollection("random").aggregate
([
        {
            "$unwind": "$prices"

        },
        {
            "$sort": {
                "prices.price": 1

            }

        },
        {
            "$group": {
                "_id": "$company",
                "doc": {
                    "$first": "$$ROOT"

                },
                "listvalues": {
                    "$push": "$prices.price"
                }
            }
        },
        {
            "$project": {
                "percentile": {
                    "$arrayElemAt": [
                        "$listvalues",{
                             "$floor": {
                                  "$multiply": [0.6,{"$size": "$listvalues"}] 
                                    }
                                }
                        }
                    ]
                }
            }
        }
])
Output: 
{
"_id" : "apple",
"percentile" : 700
}

0.6 defines 60th percentile.

Stage1: As we are trying to find the percentile of price data, the first step is to unwind the ‘price’ array. This will flatten the ‘prices’ array.

Stage2: In the next aggregation stage, sort the data based on price.

Stage3: In this stage, group data by ‘company’ and push all the sorted prices to a list. This step is a memory-intensive process

Stage4: This stage is one of the most important aggregation stages where multiple steps are involved.

  1. Get the size of the sorted array.
  2. Multiply the number(this number is nothing but the value of percentile) to get the relative position.
  3. As the value of the position may not be a whole number. Find the nearest number. I have used the nearest lower number.
  4. In the next step, get the exact element from the sorted list. This the value you need.

Hope this brings a smile….

We have used multiple aggregation stages to calculate percentile in MongoDB. For more MongoDB related stuff follow here.

2 thoughts on “Calculate Percentile in MongoDB using aggregation

Leave a Reply