MongoDB+Python = PyMongo NoSQL Data-Analysis! {Part-2}

Saikiran Dasari
10 min readJan 4, 2023

--

Installation of pymongo and Doing Analysis | Mongo ATLAS Cloud Cluster Creation and Integrating it with MongoDB Compass | Importing CSV using MongoDB compass and doing pymongo data analysis |

In the previous blog, I showed an overview of MongoDB and the Installation of all requirements to work on MongoDB.

Here, we will see all the necessary steps, including Connection Setup, Doing Data Analysis and also on Cloud MongoDB ATLAS Free Cluster connection, and How to import a local CSV file and connect with pymongo and do the regular Data Analysis you do! Added Screenshots that are very easy to understand.

Photo by Desirae Hayes-Vitor on Unsplash

MongoDB is a type of NoSQL Database, that stores data in document format(bson or binary json format).

Overview of BSON format

There are different types of NoSQL databases; MongoDB falls under Document-based databases.

It’s called NoSQL because it is different from SQL databases in properties. The detailed difference and other salient features of MongoDB are given in my previous blog. I highly recommend reading that blog of mine first then coming here. The storage structure of MongoDB can be represented using the below hierarchical structure.

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python

Installing with conda:

Installing with pip:

1. Import the pymongo library now!

2. Create a Connection between the MongoDB server and Pymongo

3. Create a database called ‘Employee’

4. Create a collection of ‘employee information’ inside the ‘Employee’ db

Until and unless we will not create/add collections in any new databases,
it will not show up in Cilent Side i.e mongodb compass interface
Start Creating One Collection and Access collection of the database

##Insert Sample Documents into the Collection

5.1 Adding one Document to the collection

Using “insert_one( )” method:

This is a method by which we can insert a single entry within the collection or the database in MongoDB. If the collection does not exist this method creates a new collection and insert the data into it. It takes a dictionary as a parameter containing the name and value of each field in the document you want to insert in the collection.

# Output:

5.2 Inserting Multiple Records now!!

Using “insert_many( )” method:

This method is used to insert multiple entries in a collection or the database in MongoDB. The parameter of this method is a list that contains dictionaries of the data that we want to insert in the collection.

Makesure every data should be in list (Seperated using JSON [List of JSON docs])

Stored 4 documents at once in the ‘record’ object

#Now pass the record object in ‘insert_many( )’ method to PUSH many records at once:

passed record object in the “insert_many( )” method to store all 4 docs at once

6. Query Things as a Data Scientist (Simple Way of Querying)

A. find_one() function

This function return only one document if the data is found in the collection else it returns None. It is ideal for those situations where we need to search for only one document

It’s like retrieving the very first document from the collections

B. find All the Records:

I’m using for loop to retrieve all the records from the collection.

C. Query the JSON documents based on Equality conditions

Note: Since I have inserted more records, my o/p will be slightly more than yours!!

7. Query documents using “Query Operators”

Types of Query Operators:

A. $in (Matches any of the values in an array)

B. $ln (Less than)

C. $gt (Greater than)

D. AND Operator (Both the Conditions should be satisfied)

E. OR Operator (Either of the condition should satisfy)

A. $in operator

Matches any of the values in an array.

B. $lt Operator (less than)

Matches if values are less than the given value.

C. $gt Operator (greater than)

Matches if values are greater than the given value.

D. AND Operator (Both the Conditions should be satisfied)

$and performs a logical AND operation on an array of one or more expressions (<expression1>, <expression2>, and so on) and selects the documents that satisfy all the expressions.

E. OR Operator (Either of the condition should satisfy)

The $or operator performs a logical OR operation on an array of one or more <expressions> and selects the documents that satisfy at least one of the <expressions>.

8. How to work with NESTED JSON documents?

To fetch a particular record from the MongoDB document, querying plays an important role. Getting the right data as soon as possible to make the right decision is necessary, we can use ‘Nested Query’ for this case!

#Let’s create a collection ‘inventory’ inside the ‘Employee’ database

# Now, Insert many JSON documents and Nested JSON documents inside the collection

9. UPDATE JSON Documents

It is the parameter that contains the information to be updated in the documents.

## Functions to Discuss:

A. pymongo.collection.Collection.update_one()

B. pymongo.collection.Collection.update_many()

C. pymongo.collection.Collection.replace_one()

# Let’s Start from the Beginning i.e. importing the library, making connections, etc.

Inserted 10 documents using insert_many( )

A. pymongo. collection.Collection.update_one()

Update one document from inventory collection using item: ‘sketch pad’{like for which item you want to replace what

Use ‘$set’ operator to update/add records

Use: ‘$currentDate’ operator to add datetime variable

#Output

B. pymongo.collection.Collection.update_many()

Wherever the Quantity Size is LESS than 50, Update the size to ‘in’ and status to ‘P’ and Add currentDate i.e DateTime variable

Note:Journal, mousepad and postcard have qty less than 50

# Wherever the Quantity Size is LESS than 50, Update the size to ‘in’ and status to ‘P’ and Add currentDate i.e DateTime variable

# Output:

C. pymongo. collection.Collection.replace_one()

#Output:

MongoDB Atlas — CLOUD Cluster Operations:

MongoDB ATLAS for Production-based NoSQL Data Analysis Connection

Overview:

MongoDB ATLAS id a ‘Database As A Service’ aka DBaaS which offers a flexible, scalable, and on-demand platform that eliminates the need to set up costly physical hardware, install software, or configure for performance.

MongoDB Atlas is the global Cloud DBaaS for modern apps.

We can deploy fully managed MongoDB across AWS, Azure, or GCP.

MongoDB Atlas also provides dual benefit of flexibility & scalability.

Dynamic schemas allow users to change the schema of their data without modifying it, providing flexibility.

Registration: You can Register and Choose AWS Free Cluster

I preferred signing up with my Google Account!

Choose the SHARED Cluster of AWS

Once You add Cluster Just Connect -> this is How the Screen looks:

1st : Click on “ADD to IP address” & then

2nd Connect to Cluster using MongoDB Compass

Choose ‘I have MongoDB compass’ and Copy the Link

Open Your MongoDB Compass

Click ‘New Connection’ and Paste the connection string link

Paste the Connection String link below in the URL box:

Replace the <password> with your password above and Click on CONNECT.

So, we have 1 Cluster and a Replica set of 3 Nodes i.e. inside a cluster there are 3 Nodes.

So, We will connect this ATLAS connection string with Python, pymongo library MongoClient() connection: and

Creating a Database, Collection and Finally storing the Records inside the collection

#Output:

Here, is the MongoDB ATLAS Documentation

Importing a CSV File into MongoDB Compass and Fetching the Data using ‘pymongo’ for Python Data Analysis on Jupyter Notebook

Open MongoDB Compass and Connect to any host

Here I choose ‘admin’ default database and if you Hover on it there is a + icon just click on to to Create Collection

Click on the Collection and Add Data -> Import File

Select your File Type: In this case I’m selecting a CSV file and Import the dataset from your location.

In this Case, I have imported “IRIS” dataset

Open Your Jupyter Notebook

Import pymongo and Connect to MongoClient() and access to ‘admin’ database and Access To the Collection you have created!

Getting ready for Data Science!!

We are onto the final stage that would join this to further down the line data science/ Analytics tasks.

We need to create a DataFrame using pandas for our MongoDB Collection.

Now, further down the line, you can write the same code as any other data science/analytics task.

From this point onwards, you can be as flexible as you want with your data science skills!

If you want to practice or refine All the above and Additional Content, you can access and download my free MongoDB content here GitHub page. Uploaded notebook with the answers as a reference and WORD Documents. Make sure you download the whole zip file with the images.

What Have We Achieved?

A Complete Understanding of the Integration of MongoDB Software with Python and doing NoSQL Data Analysis

Part-1: MongoDB | Update your DS Resume with [NoSQL]|

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -

I have tried to write a detailed article and I believe that I am successful in doing so. I’ll constantly add more content that will link to each other!!

Thanks a lot for your time, and I will get back with another interesting topic shortly! Till then, bye! — Saikiran

— — — — — — — — — — — — — — — — — — — — — — — — — — -

💬 Leave a response to this article by providing your insights, comments, or requests for future articles. 📢 Share the articles with your friends and colleagues on social media.

If you liked this blog, consider following me on @MEDIUM, LinkedIn, Twitter, and GitHub for more content.

@Saikiran Dasari | Portfolio or Saikiran Dasari | Email to contact me.

The media shown in this article is not owned by MEDIUM and is used at the Author’s discretion.

--

--

Saikiran Dasari

Hi there, I’m a Data Scientist& CompScienceEngg, I like working on Data: Extraction, Pre-Processing & EDA, Feature-Engg, Modelling, NLP, Time-Series, Deployment