Summary of MongoDB learning notes (including errors, problems, techniques)

Environmental Science

OS: Ubuntu20.04

MongoDB: v5.0.2

Introduction to MongoDB

MongoDB is an open source document database that provides high performance, high availability, and automatic scaling to provide scalable high performance data storage solutions for WEB applications.

MongoDB is a product between relational and non-relational databases. It is the most versatile and similar to a relational database among non-relational databases.

The data structure of the document is basically the same as JSON. All data stored in the collection is in BSON format.

A record in MongoDB is a document that is a data structure consisting of field and value pairs (key=>value). MongoDB documents are similar to JSON objects. Field values may include other documents, arrays, and document arrays.

The basic concepts in mongodb are document, collection, database.

SQL Terms/ConceptsMongoDB Terms/ConceptsExplanation/Explanation
databasedatabasedata base
tablecollectionDatabase tables/collections
rowdocumentData Record Row/Document
columnFieldData Fields/Fields
indexIndexIndexes
table joinsTable connection, not supported by MongoDB
primary keyprimary keyPrimary key, MongoDB will automatically _ id field set as primary key

2.ubuntu installation of MongoDB 5.0

1. Import the MongoDB Public GPG Key (recommended to go to the official website (official address: https://www.mongodb.com/ ) Look at the latest GPG Key)
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -

Return to OK for successful operation, install gnupg first if gnupg is prompted not to be installed

sudo apt-get install gnupg
2. Create/etc/apt/sources.list.d/mongodb-org-5.0.list file

Different ubuntu system versions are created with different commands to view the current version of the ubuntu system:

lsb_release -dc

ubuntu20

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
3. Update apt-get
sudo apt-get update
4. Install MongoDB

Install the specified version (you can go to the official website to see the latest stable version and install it):

sudo apt-get install -y mongodb-org=5.0.2 mongodb-org-database=5.0.2 mongodb-org-server=5.0.2 mongodb-org-shell=5.0.2 mongodb-org-mongos=5.0.2 mongodb-org-tools=5.0.2

Install the latest version (possibly downloaded to an older version because the ubuntu source is not updated):

sudo apt-get install -y mongodb-org

View the MongoDB version: First switch to the mongodb installation directory --> Enter the command:

./mongo --version
5. Disable automatic upgrade to prevent accidents
echo "mongodb-org hold" | sudo dpkg --set-selections

echo "mongodb-org-database hold" | sudo dpkg --set-selections

echo "mongodb-org-server hold" | sudo dpkg --set-selections

echo "mongodb-org-shell hold" | sudo dpkg --set-selections

echo "mongodb-org-mongos hold" | sudo dpkg --set-selections

echo "mongodb-org-tools hold" | sudo dpkg --set-selections
6. Start running

The following data and log directories are created by default after installation:

Default data directory: /var/lib/mongodb, default log directory: /var/log/mongodb

You can save data and logs to a different directory by modifying the configuration file (you need to restart mongodb after the modification):

vim /etc/mongod.conf

Mongodb runs as mongodb by default. If you modify the data and log directories to create the corresponding data and log directories and grant the user the corresponding permissions for mongodb, startup fails without the data and log directories or if permissions are not sufficient

sudo chown -R mongodb /home/hadoop/mongodb

Start mongodb:

sudo systemctl start mongod

Stop mongodb:

sudo systemctl stop mongod

Restart mongodb:

sudo systemctl restart mongod

Check to see if launch succeeded:

sudo systemctl status mongod

Start successful display

Display of startup failures:

7. Uninstall MongoDB

Out of Service:

sudo systemctl stop mongod or sudo service mongod stop

Remove the installation package:

sudo apt-get purge mongodb-org*

Remove the data and log directories (below is the default installation directory, which needs to be modified to the actual directory you configured):

sudo rm -r /var/log/mongodb

sudo rm -r /var/lib/mongodb
8. Enter MongoDB shell command mode
mongo

Or specify a port

mongo -host 127.0.0.1:27017

Enter the shell interface to view databases and create databases, collections, and so on

Summary of problems

Question 1: Most packages ignore or make errors when updating apt-get(sudo apt-get update)

Solution 1: (Easy)

1. Find Settings 2, Software and Update 3, Select Other Sites 4, Select Best Server 5, Re-execute commands at the terminal. (Below are settings and software and updates found at ubuntu20 and ubuntu16 in the upper right corner of the desktop)

After the progress bar is completed

Then execute the command again in the terminal.

Solution 2: Change Source

If not, you can try to change the source manually (recommended for Tsinghua Source and Aliyun Source)

Alibaba Open Source Mirror Station: https://developer.aliyun.com/mirror/

Source Change Steps: (Here take Ali Yunyuan as an example)

Back up source list first

sudo cp /etc/apt/sources.list /etc/apt/sources.list_backup

Open the sources.list file to modify and add the Ali Cloud Mirror Source to the front of the file:

sudo vim /etc/apt/sources.list

Add at the top of the file: (copy all as marked in red circle)

Refresh List

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install build-essential

There may also be an older version of Ubuntu because ubuntu16 is no longer maintained. Consider provincial ubuntu18 and 20,

But remember to take a snapshot, in case an accident happens and you can revert.

Question 2:

perhaps

** Reason: ** File permission issue, user mongod does not have write permission to necessary files, which prevents the database service from starting.

Solution:

sudo systemctl stop mongod

sudo chown -R mongodb /home/hadoop/mongodb   (Grant privileges)

sudo systemctl start mongod

Solve it

If that doesn't work out, try restarting linux and then mongodb

reboot
sudo systemctl start mongod
Question 3:

If an error occurs: Failed to start mongod.service: Unit mongod.service not found.

Solution:

  • First execute:

    sudo systemctl daemon-reload
    
  • Then start again, or error can continue processing as follows

sudo vim /etc/systemd/system/mongodb.service

Add the following to save and restart mongodb

[Unit]

Description=High-performance, schema-free document-oriented database

After=network.target

 

[Service]

User=mongodb

ExecStart=/usr/bin/mongod --quiet --config /etc/mongod.conf

 

[Install]

WantedBy=multi-user.target
Question 4:

E: Can not get lock/var/lib/dpkg/lock - open when apt install/update is executed (11: Resource temporarily unavailable)

Solution:

First confirm if there are update tasks running and if there are other update tasks waiting to be completed or completed, otherwise execute the following commands:

sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a
sudo apt update

3. Name of method, explanation of parameters, function and precautions in use

First enter the mongodb shell

Start mongodb:

sudo systemctl start mongod

Check to see if launch succeeded:

sudo systemctl status mongod

Enter the mongodb shell:

mongo

Basic operations for MongoDB databases, collections, and documents

A record in MongoDB is a document that is a data structure consisting of field and value pairs (key=>value). MongoDB documents are similar to JSON objects. Field values may include other documents, arrays, and document arrays.

1. Features
  • Key-value pairs in the document are ordered:
    {"sport": "football", "address": "Beijing", "phone": "13989622814"} {"sport": "football", "phone": "13989622814", "address": "Beijing"}
  • Value distinguishes string from number:
    {"name": "json","age": "18"}
    {"name": "json","age": 18 }
  • Keys are case sensitive:
    {"name": "json","age": 18}
    {"Name": "json","age": 18}
2. Document key (field) naming rules

The naming of the document key (field) requires attention to the following points:

  • _ id is a system reserved keyword.
  • Cannot contain\0 or empty characters.
  • Cannot start with $
  • Cannot contain. (dot).
  • Case sensitive and not repeatable (in the same document)
3. Basic data types
  • Null: A field that represents a null value or does not exist.
    {"x": null }
  • boolean: "true" and "false"
    {"x": true }
  • Numbers: MongoDB supports a wide range of numeric types, including 32-bit integers, 64-bit integers, and 64-bit floating point numbers.
    {"x": 3.14}
  • String: All UTF-8 strings can be represented as string type data:
    {"x": "HelloWorld!"}
  • Array: A set or list of values can be represented as an array:
    {"x": ["a","b","c"] }
  • Object: Object:
    {"x": Object()}
4._id and ObjectId
  • Auto Generate_ Id: If you insert a document without specifying a'_id'value, the system will automatically create one for you, and each document has a unique'_id' value.
  • _ ObjectId:ObjectId is the default type of'_id'.
5. Date

A value in MongoDB that supports Date as a key

For example: {"name": "jack", "date": new Date()}

In the document, the date attribute value is: new Date()
new Date() creates a Date object that returns a string representation of the date.

6. Embedded Documents

Is to treat the entire MongoDB document as a value for one key in another document.

{
	"name": "jack",
	"address": {
			"street": "Zoo Park Street",
			"city": "Landon",
                          }
}

Enter the mongodb shell

implement

mongo

or

mongosh

Command into the mongodb shell (note that the mongod service is started here)

The syntax format for MongoDB to create a database is as follows:

use "DATABASE_NAME"

Create a database if it does not exist, otherwise switch to the specified database.

For example, the following example creates the database Employee:

use Employee

View all databases:

show dbs

The newly created database Employee is not in the list of databases. To display it, you need to insert some data into the Employee database.

db.Employee.insert({"name":"google"})

show dbs

The default database in MongoDB is test. If no new database is created, the collection is stored in the test database.

7. Statistical database information
db.stats()	

{
"db": "test", //database name
"Collections": 0, //number of collections
"objects": 0, //number of documents
"avgObjSize": 0, //Average size per document
"dataSize": 0, //Data size, excluding indexes, in bytes
"storageSize": 0, //allocated storage space
....
}

MongoDB Delete Database

The syntax format for MongoDB deleting databases is as follows:

db.dropDatabase()

Delete the current database, defaulting to test, and use the db command to view the current database name.

db

First, use show dbs to view all databases, then switch to database Employee with use, then delete the database, and finally use show to see if the database was deleted successfully, as shown in the following steps:

show dbs

use Employee

db.dropDatabase()

show dbs

8. Collection Operations

Command format:

db.createCollection(name, options)

For example, create a myCollection collection under a myDB database.

Execute the following commands:

use myDB

db.createCollection("myCollection")

Query all collections in the database using:

show collections

renameCollection method is used for collection renaming, as shown in the following figure:

db.myCollection.renameCollection("myColl")

View Collection Details

db.getCollectionInfos()

[
{
"name" : "myColl",
"type" : "collection",
"options" : {

	},
	"info" : {
		"readOnly" : false
	},
	"idIndex" : {
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "myDB.myColl"
	}
}

]

Deleting a collection uses the drop method, as shown in the following figure:

db.myColl.drop()

show collections

Fixed-length set

db.createCollection( "myCollection",{ capped:true,size:10000 } )

capped: fixed length, true fixed length
size: the number of documents in the collection

Capacity recycling in fixed-length collections for real-time monitoring

db.myCollection.isCapped()                //Determine if the set is fixed length.

Insert Document

MongoDB uses insert() or save() methods to insert documents into a collection with the following syntax:

db.COLLECTION_NAME.insert(document)

Example: Insert the following data into the stuinfo collection of database student s

  • Use database student
use student
  • insert method inserts a single document
db.stuinfo.insert({_id:001,name:'alice',age:18})
  • save method inserts a single document
db.stuinfo.save({_id:002,name:'nancy',age:19})
  • Insert multiple documents
db.stuinfo.insert([{_id:003,name:'harry',age:18},{_id:004,name:'curry',age:19}])

After inserting the document, use find() to view the collection data as shown in the following figure:

Note: _ id and ObjectId

  • The id is the only token of the document.
  • _ ObjectId is _ Default generation of id
Example:
db.myCollection.insert({"x":"10"})    //Do not specify _ Value of id, automatically created
db.myCollection.insert({"_id":"user001","y":"10"})    //Specify _ Value of id, taking the specified value
db.myCollection.find()
Query results:
{ "_id" : ObjectId("5c5ff402eb5725b5d8961b45"), "x" : "10" }
{ "_id" : "user001", "y" : "10" }

In the example above, stuinfo is our collection name. If the collection is not in the database, MongoDB automatically creates the collection and inserts the document.

You can also define a document as a variable before inserting it:

s={_id:5,name:'Zhang San',age:19}

db.stuinfo.insert(s)

The result of the operation is as follows:

Insert multiple nancy records:

db.stuinfo.insert([{_id:006,name:'nancy',age:17},{_id:007,name:'nancy',age:21}])

9. Creation and deletion of fixed sets
  • There is a special type of collection in MongoDB that deserves our special attention: a capped collection.

  • Fixed collections can declare the size of a collection and behave like circular queues. When data is inserted, new documents are inserted at the end of the queue, and if the queue is full, the oldest document will be overwritten by the document inserted later.

  • Fixed Collection Feature: Fixed collections are like circular queues. If there is not enough space, the earliest documents will be deleted to make room for new documents. In general, a fixed set is appropriate for any scenario where you want to automatically eliminate out-of-date attributes.

  • Fixed set scenarios: For example, log files, chat logs, call information logs, and so on, MongoDB's fixed set is used when only the most recent set of scenarios is retained.

  • Advantages of fixed sets:

    1. Write faster. Data in a fixed set is written sequentially to fixed space on disk, so it will not be "interrupted" by some random writes from other sets, and it will write very quickly (better performance without indexing).
    2. Fixed collections automatically overwrite the oldest documents, so there is no need to configure additional work to delete old documents. Set Job's timer for old documents to remove performance-prone pressure burrs.
    Fixed collections are useful for scenarios such as logging.

  • Creation of fixed sets: Unlike regular sets, fixed sets must be explicitly created before use.
    For example, create a fixed set coll_testcapped with a size limit of 1024 bytes.

    db.createCollection("coll_testcapped",{capped:true,size:1024});
    
  • Creation of a fixed set: In addition to size, you can also specify the amount of data in a fixed set for a document.
    For example, create a fixed set coll_testcapped, limited to 1024 bytes in size and 100 documents in number.

    db.createCollection("coll_testcapped2",{capped:true,size:1024,max:100});
    
  • Matters needing attention:

    1. Fixed collections cannot be changed after they are created, they can only be deleted and rebuilt.
    2. Normal collections can use convertToCapped to convert fixed collections, but fixed collections cannot be converted to normal collections.
    3. When you create a fixed set and specify a limit on the number of documents for a fixed set (referring to the parameter max), you must also specify the size of the fixed set (referring to the parameter size). Regardless of which limit you reach first, new documents inserted after that remove the oldest document from the collection.

    4. When using the convertToCapped command to convert a regular set to a fixed set, existing indexes are lost and need to be created manually. Also, this conversion command has no parameters that limit the number of documents (that is, no parameter options for max).

    5. Fixed sets cannot be sliced.
    6. Documents in a fixed collection can be updated, but updates do not result in a document's Size growing or shrinking, otherwise the update fails.
    If there is a key in the collection and its value corresponds to 100 bytes of data, if you want to update the value of this key, the updated value must also be 100 bytes, not more than 100 bytes, nor less than 100 bytes.

    7. You cannot delete a document from a fixed collection, but you can delete the entire collection.
    8. It is also important to note that when size is estimated for a set, it is not based on the storageSize of the set, but on the size of the set. StorageSize is compressed by the wiredTiger storage engine using a high compression algorithm.

Update Documents (distinguishes update from save)

MongoDB uses update() and save() methods to update documents in a collection.

  • The update() method is used to update an existing document. The grammar format is as follows:
db.collection.update(<criteria>,<objNew>,upsert,multi,writeConcern)

Parameter description:

Query: The query condition of an update, similar to what follows in a sql update query.

Objects of objNew:update and some updated operators (such as , , , inc...), etc., can also be interpreted as following set in sql update query

upsert: Optional. This parameter means whether to insert objNew if no update record exists, true is insert, false by default, and no insert.

multi: Optional, mongodb defaults to false, updates only the first record found, and if this parameter is true, updates all records found on condition.

WteConcern: Optional, the level at which the exception was thrown.

Example

Update the name of the above document by updating (): Update the document whose name is curry to "Li Si".

Execute command:

db.stuinfo.update({name:'curry'},{$set:{name:'Li Si'}})

find () to see if the modification was successful, the steps are as follows:

The above statement only modifies the first discovered document, and if you want to modify multiple identical documents, you need to set the multiparameter to true.

If the name is "nancy" all documents are updated to "Wang Wu"

db.stuinfo.update({name:'nancy'},{$set:{name:'King Five'}},false,true)

The result of the operation is as follows:

  • The save() method replaces an existing document by an incoming document. The grammar format is as follows:
db.collection.save(<document>,{writeConcern:<document>})

Parameter description:

document: document data.

WteConcern: Optional, the level at which the exception was thrown.

This method has been obsolete since version 4.2

Example: Replace_ Data for documents with id 2:

db.stuinfo.save({_id:2,name:'curry',age:20})

The steps are as follows:

10. Update Document - Document Replacement

Replace a matching document with a new one

take

{ 
	"_id" :ObjectId("5c6005ea0fc42acdb75f74a6"),
 	"name" : "foo",
	 "nickname" : "bar", 
	"friends" : 12, 
	"enemies" : 2
 }

Become

{
	"_id" : ObjectId("5c6005ea0fc42acdb75f74a6"),
	"nickname" : "bar",
	"relations" : {
		"friends" : 12,
		"enemies" : 2
	},
	"username" : "foo"
}

Modification steps:

> var u=db.user.findOne({"name":"foo"})                //Save the document to be modified in object u
> u.relations={"friends":u.friends,"enemies":u.enemies}//New field relations in object u
{ "friends" : 12, "enemies" : 2 }                      //relations nested documents
> u.username=u.name
foo
> delete u.friends
true
> delete u.enemies
true
> delete u.name
true
> db.user.update({"name":"foo"},u)                     //Update Document
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

11. Update Document - Modifier
  • $inc (increase and decrease, only for numeric types)
Insert a document
> db.myColl.insert({title:"first",visites:107})
> db.myColl.find()
{ "_id" : ObjectId("5c887166713719923acc4c92"), "title" : "first", "visites" : 107}
Modify Visits Increase 1
> db.myColl.update({title:"first"},{$inc:{visites:1}})
> db.myColl.find()
{ "_id" : ObjectId("5c887166713719923acc4c92"), "title" : "first", "visites" : 108}
Modify visits by 2
> db.myColl.update({title:"first"},{$inc:{visites:-2}})
> db.myColl.find()
{ "_id" : ObjectId("5c887166713719923acc4c92"), "title" : "first", "visites" : 106}
  • $set (can modify for specific needs)
> db.author.findOne()
{
	"_id" : ObjectId("5c600dbb0fc42acdb75f74a7"),
	"name" : "foo",
	"age" : 20,
	"gender" : "male",
	"intro" : "student"
}

Modify with $set:

> db.author.update({"name":"foo"},{$set:{"intro":"teacher"}})
> db.author.findOne()
{
	"_id" : ObjectId("5c600dbb0fc42acdb75f74a7"),
	"name" : "foo",
	"age" : 20,
	"gender" : "male",
	"intro" : "teacher"
}

$set not only modifies variables, but also data types:

db.author.update({name: "foo"},{$set: {intro: ["teacher","programmer"]}}) //String becomes an array
  • $push modifier (can insert arrays)
Original data:
{
        "_id" :  ObjectId("5a1656e656d8db3756cafce8"),
        "title" : "a blog",
        "content" : "...",
        "author" :  "foo"
}
Use $push Insert Array:
db.posts.update({title: "a blog"},
    {$push: {comments: {name: "leon",email: "leon.email.com",content: "leon replay"}}})
Modification results:
{ "_id" : ObjectId("5a1656e656d8db3756cafce8"),        "title" :  "a blog", 
"content" : "...",
"author" :  "foo",
"comments" :  [ { "name" :  "leon",
	"email" :  "leon.email.com", 
	"content" :  "leon replay"}]
 }
  • $addToSet Modifier
Original data:
{ 
"_id" :  ObjectId("5a1659a756d8db3756cafce9"),
"name" :  "foo",
"age" : 12, 
"email" : [foo@example.com,foo@163.com]
}
towards email Add one to the array email Information:
Db.user.update({name: "foo"},{$addToSet: {Email: "foo@qq.com"}})
Modification results:
{ 
"_id" :  ObjectId("5a1659a756d8db3756cafce9"),
"name" :  "foo",
"age" : 12, 
"email" : [foo@example.com,foo@163.com,		    foo@qq.com] 
}

Example:

Original data:
    { "_id" : ObjectId("59f00d4a2844ff254a1b68f7"), "x" : 1 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68f8"), "x" :  1 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68f9"), "x" :  1 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68fa"),"x" : 2 }

Requirement: Change all x1 data to 99

Take commands:
db.Collection name.update({x:1}, {$set:{x:99}, {multi: true})

Modification results:
{ "_id" : ObjectId("59f00d4a2844ff254a1b68f7"),"x" : 99 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68f8"), "x" : 99 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68f9"), "x" : 99 }
{ "_id" : ObjectId("59f00d4a2844ff254a1b68fa"),"x" :  2 }

More examples:

Generate multiple records:

for(var i=1;i<10;i++)  db.col.insert({count:i,test2:false,test5:true})

Update the first record found conditionally:

db.col.update( { "count" : { $gt : 1 } } , { $set : { "test2" : "OK"} } );

Update all records found conditionally:

db.col.update( { "count" : { $gt : 3 } } , { $set : { "test2" : "OK"} },false,true );

There is no record for update. A record is added:

db.col.update( { "count" : { $gt : 14 } } , { $set : { "test5" : "OK"} },true,false );

Update all records found conditionally:

db.col.update( { "count" : { $gt : 1 } } , { $inc : { "count" : 1} },false,true );

Update the first record found conditionally:

db.col.update( { "count" : { $gt : 10 } } , { $inc : { "count" : 1} },false,false );

Delete the document (note the distinction between remove and delete)

The remove(), deleteOne(), and deleteMany() methods can be used to remove data from a collection.

  • Delete all documents under collection col
db.col.deleteMany({})

or

db.col.remove({})

As illustrated below

  • Delete Documents with Specified Conditions

Delete all documents in the set stuinfo whose name equals Wang Wu:

db.stuinfo.deleteMany({name:'King Five'})

Delete a document whose age equals 18:

db.stuinfo.deleteOne({age:18})

The steps are as follows:

4. MongoDB database query and aggregation operations

1. Use find() method for basic document query

The grammar format is as follows:

db.collection.find(query, projection)

Parameter description:

Query: Optional, use query operators to specify query conditions

Projection: Optional, use the projection operator to specify the returned key. When querying, all the key values in the document are returned, simply omitting the parameter (omitted by default). The value of the specified key is 0, and the key-value pair is not returned. Returns for one hour.

2. Use of Document Query Conditions

3. Specific types of queries

Query for a specific type of document, such as an empty document with a NULL query key

4. Aggregate queries

Aggregates in MongoDB are primarily used to process data (such as statistical mean, sum, and so on) and return calculated data results. db.collection.aggregate() is a data-based aggregation pipeline. Each document passes through a pipeline consisting of multiple stages, which can group, filter and other functions for each stage. After a series of processing, the corresponding results are output.

1.mongo query

1. Start mongo

Mongo

2. Use the test database and create a new items collection to store order-related information.

use test
db.createCollection("items")

3. Insert document data:

Each document corresponds to information about an item in the order, including:

pnumber: commodity number

Quantity: quantity of goods

Price: unit price of goods

Insert the following commodity information:

db.items.insert([

{"quantity":2,price:5.0,pnumber:"p003"},

{quantity:2,price:8.0,pnumber:"p002"},

{quantity:1,price:4.0,pnumber:"p002"},

{quantity:2,price:4.0,pnumber:"p001"},

{"quantity":4,price:10.0,pnumber:"p003"},

{quantity:10,price:20.0,pnumber:"p001"},

{quantity:10,price:20.0,pnumber:"p003"},

{quantity:5,price:10.0,pnumber:"p002"}

])

4. Query insertion results:

The pretty() method displays all documents in a formatted way

db.items.find().pretty() 

An empty query document {} matches the entire contents of the collection. If no query document is specified, the default is {}. For example:

db.user.find({})                         //That is, query all the contents in the user collection
db.user.find({"age": 18})                //Find all documents with an age value of 18
db.user.find({"name": "jack"})           //To match a string, the value of "name" is "jack"
db.user.find({"name": "jack","age": 18})//Query all users with a user name of "jack" and an age of 18

5. Count how many document data items have

db.items.count()

6. Query commodity data with price greater than 5

Greater than using gt operator, plus $sign in front of operator

db.items.find({price:{$gt:5}})

7. Multi-Conditional Query

Example: Query commodity data with quantity 10 and price greater than or equal to 5

db.items.find({quantity:10,price:{$gte:5}})

8. Conditional queries are performed using or in the following format:

db.col.find({$or:[{key1: value1}, {key2:value2}]})

Example: Query commodity data with quantity 10 or price greater than or equal to 5

db.items.find({$or:[{quantity:10},{price:{$gte:5}}]})

9. Use of AND in conjunction with OR

Example: Query commodity data with pnumber p003 and quantity of 10 or price greater than or equal to 5

db.items.find({pnumber:"p003",$or:[{quantity:10},{price:{$gte:5}}]})

10. Query Conditions - Contains ( i n ) or No package contain ( in) or not ( in) or not (nin)

Example:

  • Query student information about Nationality in China or the United States

    db.persons.find({country: {$in: ["USA","China"]}})
    
  • Query information about students whose nationality is not Chinese or American.

    db.persons.find({country: {$nin: ["USA","China"]}})
    

11. Query Conditions - " o r " check Inquiry and " or"query and" or query and not query

$or Query
Query the information of students whose Chinese scores are greater than 90 or English scores are greater than 85:

db.personsr.find({$or: [{c: {$gte: 85}},{e: {$gte: 90}}]})

$not query
Query for information about students whose names do not contain "foo":

db.persons.find({name: {$not: /foo/}})

12. Specific types of queries - Query arrays

Data: db.food.insert({"fruit": ["apple","banana","peach"]})

Next is the specific query:

Each element is the value of the entire key (the key of the array)
db.food.find({fruit: "banana"})
$all Query: When multiple elements are required to match an array
 For example, to find existing " apple"And there's " banana"Documentation
db.food.find({fruit: {$all: ["banana","apple"]}}) 
Use key.index Syntax specifies the array's subscripts for precise queries
db.food.find({"fruit.2":"peach"})
$size Query: Query documents of a specified array size
db.food.find({"fruit": {"$size": 3}})

The query results are:

{
    "_id" : ObjectId("5b1dd90d1f23e9c34fc030a8"),
      "fruit" : [ "apple","banana", "peach"]
    }

13. Cursors

The db.collection.find() method returns a cursor, and for document access, we need to iterate through the cursor.

The cursor is used as follows:

  • Declare cursors:

    var cursor =  db.collectioName.find({query},{projection});
    
  • Open cursor:

cursor.hasNext() //Determines if the cursor has reached the end.
  • Read data:

    cursor.Next()    //Remove the next document of the cursor.
    
  • Close cursor:

    cursor.close()   //This step can be omitted, usually an automatic shutdown, or a display shutdown.
    
    //Let's walk through the cursor example with a while loop:
    var mycursor = db.user.find({})
    while(mycursor.hasNext()) {
     	printjson(mycursor.next());
     }
    
  • Cursor - Output Result Set

    The db.collection.find() method returns a cursor, and for document access, we need to iterate through the cursor.

    1. Output cursor result set using print:

    var cursor = db.user.find()
    while (cursor.hasNext()) {
    	print(tojson(cursor.next()))
    }
    

    2. Output cursor result set using printjson:

    var cursor = db.user.find()
    while (cursor.hasNext()) {
    	printjson(cursor.next()))
    }
    
  • Cursor - Iteration

    1. Iteration function: The cursor has an iteration function that allows us to customize the callback function to process each cell individually. Cursor.forEach (callback function), as follows: define callback function, open cursor, iterate.

    Data source:

    { "_id" : ObjectId("5b1dd90d1f23e9c34fc030a8"), "fruit" : [ "apple","banana","peach" ] }
    { "_id" : ObjectId("5b1dddc51f23e9c34fc030a9"), "fruit" : [ "apple","kumquat","orange" ] }
    { "_id" : ObjectId("5b1ddddc1f23e9c34fc030aa"), "fruit" : [ "cherry","banana","apple" ] }
    
    • Define a function first (get a fruit array of documents)

      var getFruit=function(obj){ print(obj.fruit)}
      
    • Open cursor:

      var cursor=db.food.find();
      
    • Iteration:

      cursor.forEach(getFruit);
      

    Output results:

    apple,banana,peach
    apple,kumquat,orange
    cherry,banana,apple
    

    2. Iteration based on arrays

    Data source:

    { "_id" : ObjectId("5b1dd90d1f23e9c34fc030a8"), "fruit" : [ "apple","banana","peach" ] }
    { "_id" : ObjectId("5b1dddc51f23e9c34fc030a9"), "fruit" : [ "apple","kumquat","orange" ] }
    { "_id" : ObjectId("5b1ddddc1f23e9c34fc030aa"), "fruit" : [ "cherry","banana","apple" ] }
    

    Operation:

    var cursor=db.food.find();
    var documentArray =cursor.toArray();
    printjson (documentArray);
    

    Output results:

    [{ "_id" : ObjectId("5b1dd90d1f23e9c34fc030a8"), "fruit" : [ "apple","banana","peach" ] }
    { "_id" : ObjectId("5b1dddc51f23e9c34fc030a9"), "fruit" : [ "apple","kumquat","orange" ] }
    { "_id" : ObjectId("5b1ddddc1f23e9c34fc030aa"), "fruit" : [ "cherry","banana","apple" ] }]
    

14. Use aggregate aggregates

Aggregates in MongoDB are primarily used to process data (such as statistical mean, sum, and so on) and return calculated data results.

Grammar:

db.collection.aggregate(pipeline, options)

Example:

db.orders.aggregate([
   { $match: { status: "A" } },
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])

Phase 1: m a t c h rank paragraph , too filter check Inquiry junction fruit No. two rank paragraph : match phase, filter query results phase two: match phase, filter query results phase 2: group phase, group documents

Example: Count the quantity of all the items in the order, that is, the sum of the statistical quantities.

db.items.aggregate([{$group:{_id:null,total:{$sum:"$quantity"}}}])

Example: Group by product type, and then count the quantity sold

db.items.aggregate([{$group:{_id:"$pnumber",total:{$sum:"$quantity"}}}])

Example: Group by the same product type, and then query the details of the order with the same product type that sells the most.

db.items.aggregate([{$group:{_id:"$pnumber",max:{$max:"$quantity"}}}])

Example: Grouping by the same product type and then querying each order detail for the average price of the same product type sold

db.items.aggregate([{$group:{_id:"$pnumber",price:{$avg:"$price"}}}])

15. Use of pipes

Example: Grouping by the same product type, counting the quantity of each product, and getting the maximum quantity.

db.items.aggregate([{$group:{_id:"$pnumber",total:{$sum:"$quantity"}}},{$group:{_id:null,max:{$max:"$total"}}}])

2. Index

The most effective way to improve query efficiency. It is a special data structure introduced to solve the slow query speed, and stores part of the data content in a form that is easy to traverse. Indexed data is stored in memory, which also speeds up the efficiency of index finding data.

Index features:

  • MongoDB typically greatly improves query efficiency by scanning each file in the collection and selecting records that meet the query criteria when reading data without an index
  • Can speed up queries, but also reduce performance such as modifying inserts
  • Is a special data structure, and an index is a structure that sorts the values of one or more columns in a database table
  • The default is to use btree to organize index files

Create an index:

Command format:

db.collection.createIndex( <keys>,<options> )

Example: Create an ascending index by age field:

db.person.createIndex({age: 1})

keys: The name and sorting method of the index you want to create, 1 being in ascending order; - 1 means in descending order.
options: Optional parameter indicating the setting for indexing. Optional values are as follows:

ParameterTypeDescription
backgroundBooleanIndexing in the background so that no other database activity is blocked while indexing, defaults to false.
uniqueBooleanCreate unique index, default false
namestringThe name of the index. Assuming it is not specified, MongoDB generates an index name by joining the field names and sort order of the indexes.
partialFilterExpressionBooleanIf specified, MongoDB will only index records that satisfy the filter expression.
sparseBooleanIndexing is not enabled for field data that does not exist in the document, and the default value is false.
expireAfterSecondsintegerSpecify the expiration time of the index
storageEnginedocumentAllow users to configure the storage engine for indexes

Type of index

  • Default Index

    For each collection, the default is in _ An index is created on the id field, and this particular index cannot be deleted. The id field is mandatory and unique and is maintained by the database.

  • One-key Index

    An index created on one key is a one-key index, which is the most common index, such as _created by default by MongoDB The index of id is a one-key index

    db.collection.createIndex(key, options)
    
    db.getCollection('test').createIndex( {"name":1} )
    
    db.getCollection('test').createIndex( {"name":-1} )
    ensureIndex()
    

    Create an index in the background:

    db.values.createIndex({open: 1, close: 1}, {background: true})
    
  • Composite Index

    An index built on multiple keys is a composite index

    db.getCollection('test').createIndex( {"name":1,"phone":-1} )
    

    name is in positive order and phone is in reverse order.

    //find operation
    db.getCollection('test').find({name:"qiiq"})
    
    db.getCollection('test').find({name:"qiiq",phone:12512135})
    //These two operations are capable of joining indexes.
    
    
    //Joint indexes are not allowed for the following two operations
    db.getCollection('test').find({phone:12512135,name:"qiiq"})
    
    db.getCollection('test').find({phone:12512135})
    
  • Multi key Index

    If the document contains a field of type array, its name can be indexed directly, so MongoDB will create a separate index for each element in the embedded array

    Note: A multikey index is not equal to creating an index on a multicolumn field (composite index)

  • Composite multikey index

    For a composite multikey index, each index can contain at most one array.
    Creating a composite multikey index in more than one array is not supported.

    Assume the following set exists

    { _id: 1, a: [ 1, 2 ], b: [ 1, 2 ], category: "AB - both arrays" }
    

    Create index db.COLLECTION_NAME.createIndex({a:1,b:1}) is not allowed because both a and B are arrays.

  • Text Index

  • Geographic Location Index

  • Hash indices

Delete Index

There are two main ways:
1. All indexes in the current collection will be deleted except the default index on _id.

db.Collection name.dropIndexes()

2. You can delete an index based on the specified index name or index document, except for the default index on _id.

db.Collection name.dropIndex(index)

Example: Delete the age index you just created:

db.person.dropIndex({age: 1})
unique index

A unique index ensures that the specified key for each document in the collection has a unique value.
The syntax is as follows:

db.collection.createIndex(Index name or index document, {unique:true});

For example, if you want to keep the "name" key of a document with different values, create a unique index:

db.people.createIndex({"name": 1},{"unique": true})

That is, create a unique index in the people collection in ascending order by the value of the name key

  • Eliminate duplication
    For example, delete duplicate index values for the "name" key:

    db.people.createIndex({"name": 1},{"unique": true,"dropDups": true}})
    
  • Composite Unique Index

    Composite index: An index built on multiple fields. For example:

    db.person.createIndex({"age": 1,"name": 1})
    

    Composite unique index:

    db.person.createIndex({"age": 1,"name": 1},{"unique":true})
    
Index Management
  • Query Index
    Query index size:

    db.Collection name.totalIndexSize();(That is, the amount of space occupied by the index)
    

    For example:

  • Modify Index
    Mongodb does not have a separate way to modify an index. If you need to modify an index, you need to delete the old index before creating a new one.

    db.Collection name.dropIndex("Index Name")
    

5. Python program import data (python operates MongoDB)

pymongo is python's module for accessing MongoDB, which defines a class PyMongoClient that operates on MongoDB, including connection management, collection management, index management, add-delete check, file operation, aggregation operation, and so on.

1.python operation MongoDB example 1:

Data preparation

1. New python file

vim pydtf.py

2. Importing data

  • Write python program to import data into database Taobao, set as order_info
from pymongo import MongoClient
from random import randint
import datetime

client = MongoClient('localhost',27017) # Set up a connection
db = client.taobao # Connect to taobao database
order = db.order_info # Settings Collection
# Set Document Content List
status = ['A','B','C'] 
cust_id = ['A123','B123','C123']
price = [500,200,250,300]
sku = ['mmm','nnn']
# Cycle to generate random data
for i in range(1,100):
    items = []
    item_count =randint(2,6)
    for n in range(item_count):
        items.append({"sku":sku[randint(0,1)],"qty":randint(1,10),"price":randint(0,5)})
        # Generate new records
    new = {
    "status":status[randint(0,2)],
    "cust_id":cust_id[randint(0,2)],
    "price":price[randint(0,3)],
    "ord_date":datetime.datetime.utcnow(),
    "items":items
    }
    print(new)
    # Insert into Database
    order.insert_one(new)
print(order.estimated_document_count())

  • Run pydtf.py to import data.
python3 pydtf.py

MongoDB aggregate function MapReduce

MongoDB has two aggregate functions: aggregate and mapreduce

The mapreduce function provides an aggregate operation of mapreduce (programming model), and its workflow is illustrated below:

MapReduce in MongoDB has the following main stages:

  • Map: Map an operation to every document in the collection

  • Shuffle: Documents are grouped according to Key and a list of values is generated for each different Key.

  • Reduce: Processes elements in the value table until there is only one element in the value table. The value table is then returned to the Shuffle process, and is processed iteratively until each Key corresponds to only one value table and there is only one element in the value table, which is the result of MR.

  • Finalize: This step is not required. After the final MR results are obtained, some data "trimming" processing is performed.

View data formats

mongo # Start mongo shell
use taobao
db.order_info.findOne() 

Query each cust_ Sum of all price s for ID

1. Define the map function:

var mapFunction1 = function() {
                       emit(this.cust_id, this.price);
                   };

2. Define the reduce function:

var reduceFunction1 = function(keyCustId, valuesPrices) {
                          return Array.sum(valuesPrices);
                      };

3. Execute mapreduce, outputting the result to the map_of the current db Reduce_ In the example collection:

db.order_info.mapReduce(
                     mapFunction1,
                     reduceFunction1,
                     { out: "map_reduce_example" }
                   )

4. Query results

db.map_reduce_example.find()

Calculate average inventory for all items

1. Define map functions

var mapFunction2 = function() {
                       for (var idx = 0; idx < this.items.length; idx++) {
                           var key = this.items[idx].sku;
                           var value = {
                                         count: 1,
                                         qty: this.items[idx].qty
                                       };
                           emit(key, value);
                       }
                    };

2. Define the reduce function

var reduceFunction2 = function(keySKU, countObjVals) {
                     reducedVal = { count: 0, qty: 0 };
                     for (var idx = 0; idx < countObjVals.length; idx++) {
                         reducedVal.count += countObjVals[idx].count;
                         reducedVal.qty += countObjVals[idx].qty;
                     }
                     return reducedVal;
                  };

3. Define finalize functions

var finalizeFunction2 = function (key, reducedVal) {
                       reducedVal.avg = reducedVal.qty/reducedVal.count;
                       return reducedVal;
                    };

4. Execute mapreduce

db.order_info.mapReduce( mapFunction2,
                     reduceFunction2,
                     {
                       out: { merge: "map_reduce_example_2" },
                       finalize: finalizeFunction2
                     }
                   )

5. View execution results

db.map_reduce_example_2.find()

2.python operation MongoDB example 2

Writing python programs

1. Create a python file named pyinsert.py

vim pyinsert.py

2. Write the following code in pyinsert.py:

from pymongo import MongoClient
from random import randint
'''Define a list of random name information to generate'''
name1 = ["yang ", "li ", "zhou "]
name2 = [ "chao","hao","gao","qi gao","hao hao","gao gao","chao hao","ji gao","ji hao","li gao","li hao",]
provinces = ["guang dong", "guang xi", "shan dong","shan xi", "he nan"]
'''Connect MongoDB'''
client = MongoClient('localhost', 27017)
db = client.student
sm = db.smessage
sm.remove()
'''Cycle to generate student information'''
for i in range(1, 100):
    name = name1[randint(0, 2)] + name2[randint(0, 10)] 
    province = provinces[randint(0, 4)] 
    '''Student Information Document'''
    new_student = { 
        "name": name, 
        "age": randint(1, 30), 
        "province": province, 
        "subject": [ 
            {"name": "chinese", "score": randint(0, 100)}, 
            {"name": "math", "score": randint(0, 100)}, 
            {"name": "english", "score": randint(0, 100)}, 
            {"name": "chemic", "score": randint(0, 100)}, 
        ]}
    print(new_student) 
    '''insert MongoDB data base'''
    sm.insert_one(new_student)
 
print(sm.count())

3. Execute py code

Switch to the directory where pyinsert.py is located and execute the following commands:

python3 pyinsert.py #Run Code

4. View the inserted data

mongo # Start mongo shell
use student
db.smessage.findOne()

Query in mongodb shell terminal

1. Query the average age of Guangdong students.

db.smessage.aggregate({$match: {province: "guang dong"}},{$group: { _id: "$province", age:{$avg:"$age"}}})

2. Query the average age of all provinces.

db.smessage.aggregate({$group: { _id: "$province", age:{$avg:"$age"}}})

3. Query the average results of all subjects in Guangdong Province.

db.smessage.aggregate({$match: {province: "guang dong"}},{$unwind: "$subject"},{$group: { _id: {province:"$province",sujname:"$subject.name"}, per:{$avg:"$subject.score"}}})

4. Sort on the basis of Topic 3.

db.smessage.aggregate({$match: {province: "guang dong"}},{$unwind:"$subject"},{$group:{ _id:{province:"$province",sujname:"$subject.name"}, per:{$avg:"$subject.score"}}},{$sort:{per:1}})

3.python operation MongoDB example 3

Writing python programs

1. Create a python file named pybbs.py

vim pybbs.py

2. Write the following code in pybbs.py:

from pymongo import MongoClient
from random import randint 

name = [
    'yangx',
    'yxxx',
    'laok',
    'kkk',
    'ji',
    'gaoxiao',
    'laoj',
    'meimei',
    'jj',
    'manwang',
]

title = [
    '123',
    '321',
    '12',
    '21',
    'aaa',
    'bbb',
    'ccc',
    'sss',
    'aaaa',
    'cccc',
]

client = MongoClient('localhost', 27017)
db = client.test
bbs = db.bbs
bbs.drop()

for i in range(1, 10000):
    na = name[randint(0, 9)]
    ti = title[randint(0, 9)]
    newcard = { 'author': na, 'title': ti,}
    bbs.insert_one(newcard)

print(bbs.estimated_document_count())

Note: The code in pybbs.py has changed two places, as shown in the following figure:

Because the original remove() and count() methods were deprecated at version 5.0.2, as shown in the following figure

So I changed remove() to drop(), which deletes collections, and count () to estimated_document_count(), which performs the same function.

3. Execute py code

Switch to the directory where pybbs.py is located and execute the following commands:

python3 pybbs.py #Run Code

4. View the inserted data

mongo # Start mongo shell
use test
db.bbs.findOne()

Query in mongodb shell terminal

1. Query the author of each record.

db.bbs.aggregate({"$project":{"author":1}})

2. Grouping author names.

db.bbs.aggregate({"$group":{"_id":"$author","count":{"$sum":1}}})

3. Sort on the basis of Title 2.

db.bbs.aggregate({"$group":{"_id":"$author","count":{"$sum":1}}},{"$sort":{"count":-1}})

4. Limit the output to 5 based on Topic 3

db.bbs.aggregate({"$group":{"_id":"$author","count":{"$sum":1}}},{"$sort":{"count":-1}},{"$limit":5})

6. Core Components

Core Component - Mongod

  • mongod: This program handles all data requests, manages data formats, and performs operations for background management.
  • When a mongod runs without any parameters, it connects to the default data directory/data/db and to the default port 27017, where it listens for socket requests.

Mongodb Launch Command mongod Parameter Description
- port arg #Specify service port number, default port 27017
- bind_ip arg #Bind service IP, if 127.0.0.1 is bound, it can only be accessed locally without specifying the default local all IP
- logpath arg #Specify the MongoDB log file, note that the specified file is not a directory
- logappend #Use append to write logs
- fork #Run MongoDB as a daemon to create a server process
- auth #Enable validation
- CPU #periodically shows CPU utilization and iowait of the CPU
- dbpath arg #Specify the database path

These parameters can be written to the mongod.conf configuration document, for example:

dbpath = /data/mongodb

logpath = /data/mongodb/mongodb.log

logappend = true

port = 27017

fork = true

auth = true

Core Component - Mongo

Mongo: Provides an interactive JS API for developers to perform test queries and operations directly on databases, and can also be used by system administrators to effectively manage databases.

Core Component - Mongos

Mongos: Slices used for MongoDB. It is equivalent to a routing service that handles query requests from the application tier and determines where the requested data is located in the fragmented cluster set.

mongod.lock file and oplog file

When mongodb is started, a mongod.lock file is generated in the data directory. If you exit normally, this mongod.lock file will be cleared. If you exit abnormally, startup will be disabled the next time you start, thereby preserving a clean copy of your data. Some people might think of deleting this file, so please don't do this. If you do, we don't know if the data file will be corrupted. If the mongod.lock file prevents mongod from starting, repair the data file instead of simply deleting it. The mongod.lock file here stores the process number for starting mongod.

"Normal Exit" is mentioned here and described in detail as follows:
MongoDB provides several commands to shut down services, specifically as follows:

  • Close with Crtl+C

Cursor: Type Crtl+C to close
Note: If the MongoDB service is started in the foreground mode, the use of the Crtl+C service will be shut down, which will wait for the current operation in progress to complete, so it will still be a clean shutdown mode.

  • Close with database command
[mongo@redhatB data]$ mongo
> use admin;
> db.shutdownServer();
  • Close with mongod command
[mongo@redhatB data]$ mongod  --shutdown  --dbpath /database/mongodb/data/

Note: The shutdown option of the mongod command cleanly shuts down the MongoDB service.

  • Use kill Command

    • View mongo related processes

      [mongo@redhatB data]$ ps -ef | grep mongo
      root     17573 14213  0 05:10 pts/1    00:00:00 su - mongo
      mongo    17574 17573  0 05:10 pts/1    00:00:00 -bash
      mongo    18288     1  0 06:12 ?        00:00:00 mongod -f /database/mongodb/data/mongodb_27017.conf
      mongo    18300 17574  6 06:13 pts/1    00:00:00 ps -ef
      mongo    18301 17574  0 06:13 pts/1    00:00:00 grep mongo
      
    • kill mongo service process

      [mongo@redhatB data]$ kill 18288
      [mongo@redhatB data]$ ps -ef | grep pmon
      mongo    18304 17574  0 06:13 pts/1    00:00:00 grep pmon
      

      Note: You can use the kill command of the operating system to send SIGINT or SIGTERM signals to the mongod process, i.e.'kill-2 PID', or'kill-15 PID'.
      "Kill-9 pid" is not recommended because MongoDB may cause data loss if it is running without opening the journal.

In the mongo database, oplog is the data store and data master-slave synchronization, whereas in the local database, $show collections, there is a collection of oplog.rs, explaining the fields in it:
{ ts : ..., op: ..., ns: ..., o: ... o2: ... }
The above is an oplog information, which is used by the replication mechanism to synchronize and maintain data consistency between nodes.

  • ts:8-byte timestamp, represented by a 4-byte unix timestamp + 4-byte self-incremental count. When a new primary is selected, such as when the master is down, the secondary with the largest ts is selected as the new primary.
  • op:1 byte operation type, such as i for insert and d for delete.
  • ns: The namespace where the operation is located.
  • o: The document corresponding to the operation, that is, the content of the current operation (such as the fields and values to be updated during the update operation)
  • o2: where condition when updating is performed, this property is only available for update

Where op, can be one of several situations:
"i": insert
"u": update
"d": delete
"c": db cmd
"db": Declare the current database (where ns is set to =>database name+'.)
"n": no op, which is an empty operation, is executed periodically to ensure timeliness

7. Copying

Target of replication

Ensure data redundancy and reliability in production deployments, and ensure data is not lost due to single point of damage by saving copies on different machines. Be able to cope with the risk of data loss and machine damage at any time.
In other words, it also improves the reading ability, the user's reading server and writing server are in different places, and different servers serve different users, increasing the load on the whole system.

Simply put, we need to achieve the following goals:
1. Failover (Failover, Failover, Failover)
2. Redundancy (data redundancy)
3. Avoid single points for disaster recovery, report processing, and data availability
4. Separate reading from writing to share reading pressure
5. System Maintenance Upgrade Transparent to Users

Base for replication

There are two types of MongoDB high availability: Replica Sets, also known as replica sets
Master-Slave master-slave mode is also completely obsolete in MongoDB 3.6.

The MongoDB replica set architecture is as follows:

In MonoDB, once you have created a replica set, you can use replication.

If the primary server crashes, the backup server automatically upgrades one of its members to a new primary server.

A primary library; Both slave libraries can be selected as master libraries when the master library is down.

When the primary library is down, both slave libraries run for election, and one of them becomes the primary library. When the original primary library is restored, you can join the current replication cluster as the slave library.

MongoDB Oplog
MongoDB Oplog is the replication medium used by MongoDB Primary and Saecondary during and after the replication is established, that is, all write operations in Primary are recorded in MongoDB Oplog, and then the Oplog is pulled from the library to the main library and applied to its own database. Here, Oplog is a collection of MongoDB local databases, which is a Capped collection, meaning it is fixed size, recycled. As follows:

Introduction to content and fields in MongoDB Oplog:

{
"ts" : Timestamp(1446011584, 2), #Operation time, current timestamp +counter, counter is reset per second
"h" : NumberLong("1687359108795812092"),#Global Unique Identification of Operation
"v" : 2, #oplog version information
"op" : "i", #Operation type i: Insert operation u: Update operation d: Delete operation c: Execute command (e.g. createDatabase, dropDatabase)
"ns" : "test.nosql",#Collection for which the operation is targeted
"o" : { "_id" : ObjectId("563062c0b085733f34ab4129"), "name" : "mongodb", "score" : "100" }
#Operation content, if update operation
}

{- How is data synchronized??-}
MongoDB's Replica Sets architecture stores operations through a log called oplog.
Clusters rely on oplog s for data synchronization.
oplog features:

  • Full name local.oplog.rs, under the local database.
  • oplog is a Capped Collection type (fixed-length set, following FIFO principles).
  • Each document in the oplog represents an operation performed on the primary node. Oplog only records operations that change the state of the database

{-Oplog content-}

Implement Replica Set - Create Replica Set

1. Start a mongo shell with the nodb option

     $ mongo --nodb

2. Create a replica set replicaSet(3 nodes)

     > replicaSet = new ReplSetTest({"nodes" : 3})

3. Start mongod server

replicaSet.startSet()     // Start three mongod processes
replicaSet.initiate()     // Configure replication capabilities

4. Open a new shell, and in the second shell, connect to the mongod running on port 3000:

conn1 = new Mongo("localhost:31000")
     connection to localhost:31000
     testReplSet:PRIMARY>

5. Connect to test database for conn1 connection

     testReplSet:PRIMARY> primaryDB = conn1.getDB("test")
     test

6. Execute the isMaster command on the connection connected to the primary node to see the state of the replica set:

     testReplSet:PRIMARY> primaryDB.isMaster()
     isMaster There are a few more fields returned, and one important field indicates the primary node ("ismaster" : true)

1. Write on the primary node: insert 1000 documents

 testReplSet:PRIMARY> for (i=0; i<1000; i++) { primaryDB.coll.insert({count: i}) }

2. Check the number of documents in the collection to ensure that the insertion is really successful

 testReplSet:PRIMARY>primaryDB.coll.count()
 1000    //1000 documents, successfully inserted

3. Check that one of the replica set members has copies of those documents that have just been written. Connect to any of the backup nodes:

testReplSet:PRIMARY> conn2 = new Mongo("localhost:31001")
connection to localhost:31001
testReplSet:SECONDARY>secondaryDB = conn2.getDB("test")
test

4. If you want to read data from the backup node, you need to set the "No problem reading data from the backup node" identity

testReplSet:SECONDARY> conn2.setSlaveOk()

5. You can now read data from this backup node. Use common queries

testReplSet:SECONDARY> secondaryDB.coll.find()
{ "_id" : ObjectId("5037cac65f3257931833902b"), "count" : 0 }
...
testReplSet:SECONDARY> secondaryDB.coll.count()
1000

6,

testReplSet:SECONDARY> secondaryDB.coll.insert({"count" : 1001})  //fail

Implement Replica Set - Close Replica Set

Automatic failover:
1. Turn off the primary node first

 testReplSet:PRIMARY>primaryDB.adminCommand({"shutdown" : 1})

2. Execute isMaster on the backup node to see which new primary node is

 testReplSet:SECONDARY> secondaryDB.isMaster()

Close the replica set:
Close the replica set from the first shell:

> replicaSet.stopSet()

8. Fragmentation mechanism

The concept of fragmentation

Sharing is the process of splitting up data and storing it on separate machines, sometimes called partitions.
One of the goals of fragmentation is to create a cluster of three, nine, or even 3,000 machines that will act as a single server to the application.
MongoDB supports automatic fragmentation, eliminating manual fragmentation management, clustering automatically fragments data and doing load balancing.

A typical cluster structure for MongoDB is as follows:

A fragmented cluster consists of fragmentation, mongos routers, and configuration servers.

Role of roles in fragmentation

  • Configuration Server: A stand-alone mongod process that holds cluster and fragmented metadata.
  • Routing server: mongos, which serves as a routing function for program connections.
  • Fragmented server: A stand-alone, common mongod process that stores data information. It can be a replica set or a separate server.

Demand Pain Points for Fragmentation Technology Solution

  • Database applications with high data throughput and throughput will put greater pressure on the performance of a single machine.
  • Large queries will deplete the CPU of a single machine;
  • Large amounts of data put a lot of pressure on the storage of a single machine, which eventually drains the memory of the system and transfers the pressure to the keyboard IO

How slicing works

{- The principle of fragmentation --}
The basic idea of MongoDB fragmentation:

  • Divide the collection into small pieces. These blocks are scattered across several slices, each of which is responsible for only a portion of the total data.
  • The application does not need to know which slice corresponds to which data, or even that the data has been split, so run a routing process called mongos before slicing. This router knows where all data is stored.
  • For the application, it only knows that a common mongod is connected. Router knows the relationship between data and slices
  • If the request responds, the router gathers it and sends it back to the application.

{-When is the slicing done?-}

Scenarios applicable

  • Insufficient disk for a single node.
  • A single mongodb can no longer meet the performance requirements for writing data.
  • You want to put a lot of data in memory to improve performance.

Manage Fragmentation

{-MongoDB fragmentation process-}

  • Start Configuration Server
  • Start mongos
  • Add mongod instance (slice)
  • Enable fragmentation of the database
  • Partitioning a collection

{-MongoDB fragmentation process-}

  • Open the config server.
       mongod --dbpath D:\sharding\config_node --port 2222
  • Open the mongos server.

    mongos --port 3333 –configdb 127.0.0.1:2222
    
  • Start the mongod server.

    mongod --dbpath E:\sharding\mongod_node1 --port 4444
    mongod --dbpath E:\sharding\mongod_node2 --port 5555 
    
  • Service configuration.

    mongo localhost:3333/admin
    db.runCommand({"addshard":"127.0.0.1:4444",allowlocal:true})
    db.runCommand({"addshard":"127.0.0.1:5555",allowlocal:true})
    
  • Turn on database fragmentation

    db.runCommand({"enablesharding":"test"})
    
  • Specifies the slice key for slicing in a collection

    db.runCommand({"shardcollection":"test.person","key":{"name":1}})
    
  • Insert a 10w record through mongos, then view the data fragmentation of mongodb through the printShardingStatus command.

The difference between replication and fragmentation:
Replication is to have multiple servers with the same copy of the data, each server being a mirror of the other servers, and each tile has a different subset of the data than the other.

9. MongoDB Replica Set and Fragmented Deployment Instances

content

The main ways to deploy MongoDB clusters are replication, replica set, and fragmentation mode. Since version 3.6, master-slave mode has been abandoned and fragmentation mode can only add replica set members.

The content of this experiment is to deploy the replica set and slicing mode of MongoDB on your own computer, which can be deployed in a single virtual machine, a single Windows, or a combination of multiple virtual machines and Windows. The following steps deploy a Linux virtual machine on one Windows and one on Windows.

Suppose you deploy two replica sets, ws, us, each containing three members, two replica sets being a slice server, and two replica sets creating data directories and log names for w1,w2,w3 and u1,u2,u3, respectively. The structure is as follows:

That is, two members w1, w 2 and a member u1 of us of the replica set ws are placed in the windows system, and the replica set config of configuration information is also placed two members in the windows system. Each machine opens a route entry

Replica Set Deployment

1. Determine the IP of windows and linux systems to ensure connectivity

Open the cmd interface in windows, enter ipconfig, and enter ifconfig in Linux to see the IP separately. You can see whether the IP on both sides is connected by the ping command

2. Copy or create a profile, or specify parameters directly at startup

The configuration file in linux is / etc/mongod.conf. After installing in window s, check the mongod.cfg file in the bin directory to see if there is any mongod.

3. Create corresponding data and log directories

To create a data and log directory along the path in the configuration file, simply create a directory instead of creating a file. Create a new folder directly in window, and create a directory through mkdir command in linux

4. Start replica sets ws and us, respectively

Start command:

Mongod-f profile path--replSet ws (or us)--shardsvr

Or mongod --config profile path--replSet ws (or us) --shardsvr

For example: mongod-f D:\mongodb\config\w1.cfg --replSet WS --shardsvr

Be careful:

1. In window s, you need to add the bin directory under the installation directory to the environment variable before you can call the mongod command directly, otherwise you need to navigate to the bin directory or enter the full path call

2. - Following the replSet is the name of the replica set, which needs to be the same when the same replica set member starts

3. - shardsvr means start as slicing, which can also be added when deploying a slicing server

5. Initialize two replica sets to connect to one of the servers separately

For example, initialize the replica set into port 27011

mongosh 192.168.67.1:27011

Or mongosh--host 192.168.67.1--port 27011

You can also access it through the mongo command, where the second way host s specify the database IP to connect to, and ports specify the port to connect to

After entering the shell interface, it can be initialized by the rs.initiate() method, after which the current server will be initialized as a primary member, and then other members can be added to the replica set by rs.add('IP:PORT'). Here we add the services of port 27012 in widows and port 27013 in linux to the replica set of ws.

rs.add('192.168.67.1:27012')

rs.add('192.168.67.131:27013')

The other two servers are also automatically initialized as secondary members after the completion member is added, so you can view the status and membership information of the current replica set using the rs.status() method

6. Testing copy set replication

Determine if the server we're adding data to is primary, which you can see in the shell interface input line whether it's primary or secondary, add the data anyway, and then go into the shell interface of other members to see if the data was added successfully

Note: By default, read and write requests for replica sets are made through primary members, and if you want to read data from secondary, you need to set it up

1. After calling the find() method in a secondary member, add a method readPref() setting to get data from the secondary, which needs to be called once per query, such as db.col.find().readPref('secondary')

2. If you don't want to call the readPref() method every time, you can read data from the secondary by calling the following method settings in the secondary: db.getMongo().setReadPref('secondary'), after which you can query the data by calling the find() method directly

The parameters for the readPref() and setReadPref() methods can be one of the following:

1.primary: primary node, default mode, read operation is only in the primary node, if the primary node is not available, error or throw an exception.

2.primaryPreferred: The preferred primary node, in most cases read operations in the primary node, if the primary node is not available, such as failover, read operations in the secondary node.

3.secondary: From node, read operation is only from node, if from node is not available, error or throw exception.

4.secondaryPreferred: The preferred slave node, in most cases the read operation is in the slave node, and in special cases (such as single primary node architecture), the read operation is in the master node.

5.nearest: The nearest node, the member whose read operation is in the nearest neighbor, may be the primary or secondary node.

Piecewise deployment

1. Make sure that when you deployed the replica set in the previous step, you added the shardsvr parameter at startup to start the fragmented server

Since version 3.6, a fragmented server can only be a replica set. If you need to deploy a single service as a fragmented server, you can set it as a single-member replica set and start it as a fragmented server

2. Deploy config server

Similar to replica set deployment, changing'-shardsvr'to'-configsvr' at startup means startup as a configuration server, where we configure ports to 27031,27032,27033, respectively, with the cluster name config

3. Start the mongos service

mongos is a routing server that reads metadata between data and fragments stored by the configuration server, making the entire cluster look like a database.

Mongos--configdb config/192.168.67.1:27031,192.168.67.1:27032,192.168.67.131:27033--logpath log path--logappend--port 27017

Be careful:

1. The startup here is that mongos is not mongod. mongos does not need to configure dbpath because it does not store data.

2.logpath and other parameters can also be included in the configuration information

3. - Port specifies the mongos port, not starting on port 27017 by default, and the configdb parameter specifies the list of configuration servers used by mongos

4. You can connect through 127.0.0.1 when you are not IP bound

5.mongos does not need to be configured in replica set mode, but can also start by putting log files and ports in the configuration file:

Mongos--configdb config/192.168.67.1:27031, 192.168.67.1:27032, 192.168.67.131:27033-f Profile Path

4. Add fragmentation information

Connecting to the mongos port via mongo or mongosh allows you to log in to the shell interface of the routing server

mongosh --host 192.168.67.1 --port 27017

Add fragmentation: sh.addShard('us/IP address list')

Or: db.runCommand({addShard:'us/IP address list', name:'us_shard', maxSize: 10240})

Be careful:

1. The runCommand() command allows you to specify the name of the fragment and the size of the data store

2. The IP list does not need to list all the IPs, just fill in one IP, and the replica set members will be checked automatically

5. Set up data slicing

By default, MongoDB does not automatically slice data. To slice data, you first need to set the database to allow slicing, then set the collection of databases to slice. When the collection is not sliced, the collection is saved in the main slice.

Allow fragmentation for database settings: sh.enableSharing ("test")

Slice the collection col in the test database: sh.shardCollection('test.col', {_id: 1})

The first parameter is the name of the database and collection, and the second parameter is the slice key for slicing, similar to the primary key, where the data is split by the range of the slice key.

5. Test fragmentation

The default data block size is 64MB. Once mongos is connected, you can configure the size of the data block in the config database with the following commands

use config

db.settings.insertOne({_id: 'chunksize', value: 1})

Be careful:

1. Update the settings collection using the update method if it exists

2. _ The ID is set to chunsize to indicate the size of the configuration data block, the value is in MB, and the range can be set to 1-1024

Add multiple pieces of data to the collection, you can see the tile information through sh.status(), or you can connect to the corresponding replica set through mongo or mongo sh to verify that only a subset of the fragmented data can be queried

Error Summary

Error 1: "Error parsing YAML config file: yaml-cpp: error at line 2, column 13: illegal map value"

Reason: After mongodb 3.0, the configuration file was in YAML format, which is very simple, using: to indicate that you start with "spaces" as indentation. It is important to note that if there is a value after':', it needs to be followed by a space, and if the key only represents a hierarchy, there is no need to add a space after':' (e.g. systemLog: Neither spaces are required after). Four spaces are indented per line at the level of hierarchy, eight spaces are indented at the second level, rotated in turn, and no space indentation is required at the top level. If formatted incorrectly, the above error will occur

Solution:

When you modify the configuration file, mongod.cfg, and so on, modify the indentation, which is indented by four spaces per line at the level, and eight spaces at the second level, in turn.

Error 2: Mongosh command unavailable due to uninstalled mongosh

Solution:

Install mongosh

MongoDB's Shell tool, mongosh, is a full-featured JavaScript and Node.js 14.x REPL deployment interaction environment with MongoDB. It allows us to query and manipulate databases directly. This tool needs to be installed separately after MongoDB is installed. You can download and install it yourself by looking for the corresponding version (ubuntu/windows/...) on the download page of the MongoDB website:

  • https://www.mongodb.com/try/download/shell?jmp=docs

Tip 1: See who is holding a port under Windows

1. Open a command window (run as an administrator)

Start ->Run ->cmd

2. Find all running ports

Enter command:

netstat -ano

Tip 2: Ubuntu looks at port usage and uses the netstat command

View Connected Service Ports (ESTABLISHED)

netstat -a

View all service ports (LISTEN, ESTABLISHED)

netstat -ap

View the specified port, which can be combined with the grep command:

netstat -ap | grep 8080

You can also use the lsof command:

lsof -i:8888

To close programs that use this port, use kill +corresponding pid

kill -9 PID Number

ps:kill sends a signal to a process id. The default signal sent is SIGTERM, while kill-9 sends SIGKILL, or exit. Exit signals are not blocked by the system, so kill-9 can kill the process successfully.

This set, if you need to deploy a single service as a fragmented server, you can set it as a single-member replica set and start it as a fragmented server

2. Deploy config server

Similar to replica set deployment, changing'-shardsvr'to'-configsvr' at startup means startup as a configuration server, where we configure ports to 27031,27032,27033, respectively, with the cluster name config

3. Start the mongos service

mongos is a routing server that reads metadata between data and fragments stored by the configuration server, making the entire cluster look like a database.

Mongos--configdb config/192.168.67.1:27031,192.168.67.1:27032,192.168.67.131:27033--logpath log path--logappend--port 27017

Be careful:

1. The startup here is that mongos is not mongod. mongos does not need to configure dbpath because it does not store data.

2.logpath and other parameters can also be included in the configuration information

3. - Port specifies the mongos port, not starting on port 27017 by default, and the configdb parameter specifies the list of configuration servers used by mongos

4. You can connect through 127.0.0.1 when you are not IP bound

5.mongos does not need to be configured in replica set mode, but can also start by putting log files and ports in the configuration file:

Mongos--configdb config/192.168.67.1:27031, 192.168.67.1:27032, 192.168.67.131:27033-f Profile Path

4. Add fragmentation information

Connecting to the mongos port via mongo or mongosh allows you to log in to the shell interface of the routing server

mongosh --host 192.168.67.1 --port 27017

Add fragmentation: sh.addShard('us/IP address list')

Or: db.runCommand({addShard:'us/IP address list', name:'us_shard', maxSize: 10240})

Be careful:

1. The runCommand() command allows you to specify the name of the fragment and the size of the data store

2. The IP list does not need to list all the IPs, just fill in one IP, and the replica set members will be checked automatically

5. Set up data slicing

By default, MongoDB does not automatically slice data. To slice data, you first need to set the database to allow slicing, then set the collection of databases to slice. When the collection is not sliced, the collection is saved in the main slice.

Allow fragmentation for database settings: sh.enableSharing ("test")

Slice the collection col in the test database: sh.shardCollection('test.col', {_id: 1})

The first parameter is the name of the database and collection, and the second parameter is the slice key for slicing, similar to the primary key, where the data is split by the range of the slice key.

5. Test fragmentation

The default data block size is 64MB. Once mongos is connected, you can configure the size of the data block in the config database with the following commands

use config

db.settings.insertOne({_id: 'chunksize', value: 1})

Be careful:

1. Update the settings collection using the update method if it exists

2. _ The ID is set to chunsize to indicate the size of the configuration data block, the value is in MB, and the range can be set to 1-1024

Add multiple pieces of data to the collection, you can see the tile information through sh.status(), or you can connect to the corresponding replica set through mongo or mongo sh to verify that only a subset of the fragmented data can be queried

Error Summary

Error 1: "Error parsing YAML config file: yaml-cpp: error at line 2, column 13: illegal map value"

Reason: After mongodb 3.0, the configuration file was in YAML format, which is very simple, using: to indicate that you start with "spaces" as indentation. It is important to note that if there is a value after':', it needs to be followed by a space, and if the key only represents a hierarchy, there is no need to add a space after':' (e.g. systemLog: Neither spaces are required after). Four spaces are indented per line at the level of hierarchy, eight spaces are indented at the second level, rotated in turn, and no space indentation is required at the top level. If formatted incorrectly, the above error will occur

Solution:

When you modify the configuration file, mongod.cfg, and so on, modify the indentation, which is indented by four spaces per line at the level, and eight spaces at the second level, in turn.

Error 2: Mongosh command unavailable due to uninstalled mongosh

Solution:

Install mongosh

MongoDB's Shell tool, mongosh, is a full-featured JavaScript and Node.js 14.x REPL deployment interaction environment with MongoDB. It allows us to query and manipulate databases directly. This tool needs to be installed separately after MongoDB is installed. You can download and install it on the MongoDB download page by yourself (ubuntu/windows/...):

  • https://www.mongodb.com/try/download/shell?jmp=docs

Tip 1: See who is holding a port under Windows

1. Open a command window (run as an administrator)

Start ->Run ->cmd

2. Find all running ports

Enter command:

netstat -ano

Tip 2: Ubuntu looks at port usage and uses the netstat command

View Connected Service Ports (ESTABLISHED)

netstat -a

View all service ports (LISTEN, ESTABLISHED)

netstat -ap

View the specified port, which can be combined with the grep command:

netstat -ap | grep 8080

You can also use the lsof command:

lsof -i:8888

To close programs that use this port, use kill +corresponding pid

kill -9 PID Number

ps:kill is a signal sent to a process id. The default signal sent is SIGTERM, while kill-9 sends SIGKILL, or exit. Exit signals are not blocked by the system, so kill-9 kills the process successfully.

Tags: Database MongoDB Ubuntu Data Warehouse

Posted on Fri, 26 Nov 2021 15:12:50 -0500 by Cugel