Hướng dẫn python với mongodb

PyMongo has a set of packages for Python MongoDB interaction. For the following tutorial, start by creating a virtual environment, and activate it.

python -m venv env
source env/bin/activate

Now that you are in your virtual environment, you can install PyMongo. In your terminal, type:

python -m pip install "pymongo[srv]"

Now, we can use PyMongo as a Python MongoDB library in our code with an import statement.


Creating a MongoDB database in Python

The first step to connect Python to Atlas is to create a cluster. You can follow the instructions from the documentation to learn how to create and set up your cluster.

Next, create a file named

python -m pip install "pymongo[srv]"
5 in any folder to write PyMongo code. You can use any simple text editor, like Visual Studio Code.

Create the mongodb client by adding the following:

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()

To create a MongoClient, you will need a connection string to your database. If you are using Atlas, you can follow the steps from the documentation to get that connection string. Use the

python -m pip install "pymongo[srv]"
6 to create the mongoclient and get the MongoDB database connection. Change the username, password, and cluster name.

In this python mongodb tutorial, we will create a shopping list and add a few items. For this, we created a database

python -m pip install "pymongo[srv]"
7.

MongoDB doesn’t create a database until you have collections and documents in it. So, let’s create a collection next.


Creating a collection in Python

To create a collection, pass the collection name to the database. In a new file called

python -m pip install "pymongo[srv]"
8 file, add the following code.

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]

This creates a collection named

python -m pip install "pymongo[srv]"
9 in the
python -m pip install "pymongo[srv]"
7 database.


Inserting documents in Python

For inserting many documents at once, use the pymongo

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
1 method.

item_1 = {
  "_id" : "U1IT00001",
  "item_name" : "Blender",
  "max_discount" : "10%",
  "batch_number" : "RR450020FRG",
  "price" : 340,
  "category" : "kitchen appliance"
}

item_2 = {
  "_id" : "U1IT00002",
  "item_name" : "Egg",
  "category" : "food",
  "quantity" : 12,
  "price" : 36,
  "item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])

Let’s insert a third document without specifying the

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
2 field. This time, we add a field of data type ‘date’. To add date using PyMongo, use the Python
from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
3 package.

Start by installing the package using the following command:

python -m pip install python-dateutil

Add the following to

python -m pip install "pymongo[srv]"
8:

from dateutil import parser
expiry_date = '2021-07-13T00:00:00.000Z'
expiry = parser.parse(expiry_date)
item_3 = {
  "item_name" : "Bread",
  "quantity" : 2,
  "ingredients" : "all-purpose flour",
  "expiry_date" : expiry
}
collection_name.insert_one(item_3)

We use the

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
5 method to insert a single document.

Open the command line and navigate to the folder where you have saved

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
6

Execute the file using the

python pymongo_test_insert.py

command.

Let’s connect to MongoDB Atlas UI and check what we have so far.

Log in to your Atlas cluster and click on the collections button.

On the left side, you can see the database and collection name that we created. If you click on the collection name, you can view the data as well:

The

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
2 field is of ObjectId type by default. If we don’t specify the
from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
2 field, MongoDB generates the same. Not all fields present in one document are present in others. But MongoDB doesn’t stop you from entering data — this is the essence of a schemaless database.

If we insert

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
9 again, MongoDB will insert a new document, with a new
from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
2 value. However, the first two inserts will throw an error because of the
from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
2 field, the unique identifier.


Querying in Python

Let’s view all the documents together using find(). For that, we will create a separate file

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
2:

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
 
# Create a new collection
collection_name = dbname["user_1_items"]
 
item_details = collection_name.find()
for item in item_details:
   # This does not give a very readable output
   print(item)

Open the command line and navigate to the folder where you have saved

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
2. Execute the file using the
# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
4 command.

We get the list of dictionary object as the output:

We can view the data but the format is not all that great. So, let’s print the item names and their category by replacing the

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
5 line with the following:

print(item['item_name'], item['category'])

Although MongoDB gets the entire data, we get a Python ‘KeyError’ on the third document.

To handle missing data errors in python, use pandas.DataFrames. DataFrames are 2D data structures used for data processing tasks. Pymongo find() method returns dictionary objects which can be converted into a dataframe in a single line of code.

Install pandas library as:

python -m pip install "pymongo[srv]"
0

Now import the

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
6 library by adding the following line at the top of the file:

python -m pip install "pymongo[srv]"
1

And replace the code in the loop with the following to handle KeyError in one step:

python -m pip install "pymongo[srv]"
2

The errors are replaced by NaN and NaT for the missing values.

Indexing in Python MongoDB

The number of documents and collections in a real-world database always keeps increasing. It can take a very long time to search for specific documents — for example, documents that have “all-purpose flour” among their ingredients — in a very large collection. Indexes make database search faster and more efficient, and reduce the cost of querying on operations such as sort, count, and match.

at the collection level.

For the index to make more sense, add more documents to our collection. Insert many documents at once using the

from pymongo import MongoClient
def get_database():
 
   # Provide the mongodb atlas url to connect python to mongodb using pymongo
   CONNECTION_STRING = "mongodb+srv://user:[email protected]/myFirstDatabase"
 
   # Create a connection using MongoClient. You can import MongoClient or use pymongo.MongoClient
   client = MongoClient(CONNECTION_STRING)
 
   # Create the database for our example (we will use the same database throughout the tutorial
   return client['user_shopping_list']
  
# This is added so that many files can reuse the function get_database()
if __name__ == "__main__":   
  
   # Get the database
   dbname = get_database()
1 method. For sample documents, copy the code from github and execute
# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
8 in your terminal.

Let’s say we want the items that belong to the category ‘food’:

python -m pip install "pymongo[srv]"
3

To execute the above query, MongoDB has to scan all the documents. To verify this, download Compass. Connect to your cluster using the connection string. Open the collection and go to the Explain Plan tab. In ‘filter’, give the above criteria and view the results:

Note that the query scans 14 documents to get five results.

Let's create a single index on the ‘category’ field. In a new file named

# Get the database using the method we defined in pymongo_test_insert file
from pymongo_get_database import get_database
dbname = get_database()
collection_name = dbname["user_1_items"]
9, add the following code.

python -m pip install "pymongo[srv]"
4

Explain the same filter again on Compass UI:

This time, only five documents are scanned because of the category index. We don’t see a significant difference in execution time because of the small number of documents. But we see a huge reduction in the number of documents scanned for the query. Indexes help in performance optimization for aggregations, as well. Aggregations are out of scope for this tutorial, but here’s an overview.