NATIONAL UNIVERSITY HOCHIMINH CITY UNIVERSITY OF INFORMATION TECHNOLOGY ADVANCED PROGRAM IN INFORMATION SYSTEMS NGUYEN TAN THANH- 15520813 BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS THESIS ADVISOR Assoc. Nguyen Dinh Thuan HO CHI MINH CITY, 2021 Introduction Nowadays, with the booming technology era, the success of Internet has made the number of users accessing the same system increasing. Typically, Facebook serves more than 1000 billion visitors a month and more than 800 million visitors, we can imagine how information explosion is. To solve the above explosive problem, we have expanded super large server systems, divided into many clusters located all over the world.
But with the current growth rate exponentially, increasing the number of servers is not enough. We need to review and upgrade storage solutions for the future. The database server system is very demanding or the server will be overloaded. For systems with numbers in the millions to billions, good performance is a must.
In today's RDBMS systems, the performance problem is often not good for this. The SQL language is an interpreted language with constraints in tables that make the actual performance of the database system sluggish when it is executed on such a large system. Not to mention that with large systems, the issue of data distribution, data integrity is very important. NoSQL meets all of these requirements.With the speed of not going through the SQL queries, having high availability, high dispersion and great stability, NoSQL is suitable for systems with numbers.
large number of queries. In this thesis, I will research on a fairly popular NoSQL type - MongoDB. MongoDB is an open source database. MongoDB offers a flexible data model to meet the requirements of real-world systems.
MongoDB allows to build high- performance, low-latency applications quickly and efficiently. MongoDB deserves to be a reliable database. Acknowledgment We would like to send our deep thanks to Mr.Nguyễn Đình Thuân, who helped and facilitated the group to complete this graduation thesis well. He dedicated himself to guide and give extremely valuable comments to make the topic more and more complete.
The teacher's suggestions help us to approach, understand and solve the problem more easily. At the same time, we would also like to express our gratitude to the teachers and teachers of the University of Information Technology - National University of Ho Chi Minh City, especially the teachers and faculty of Software Technology who are dedicated imparting knowledge and experiences to us from the first days of studying at school. The enthusiasm of our teachers has helped us to have a solid foundation knowledge as well as valuable practical experience so that we can successfully complete our tasks of study, work and research. Besides, we also send our thanks to family, brothers, sisters and friends for encouraging and helping us a lot in the learning process as well as in life.
Ho Chi Minh City, 2021 Student implementation Nguyén Tan Thanh PROFESSER'S COMMENTS TEACHER’S COMMENTS CHAPTER I: INTRODUCTION .----- 5255225 Sc‡>t2xtrertrterrrterrrrrrrrrrrrrree 1 11 The science and novelty of the topic .2 Reasons for Choosing tOpics .4 Subject Of SẨUY. Sàn HH HT HH TH HH tt 3 1.5 _ Scope of the study. Method ofimplemen(atiON.7 _ Expected results achieved. Expect the structure of the thesis fO.
5 +6 5+c‡E‡*‡EEEkekeEerrrkrkekrree 3 2 CHAPTER 2: OVERVIEW OF NOSQL DATABASES. MMs nsec Ae TA. NoSQL storage architecCture.2 Some charaC(€riS(ICS.--¿- + St St tt re. Archiving archit€C(UTC.-- ¿+6 St HH1 HH1.
+ Sàn TT HH HH HH ti 9 2.Ặ cà St HH. Base and ACID.4 Final COnSIS(TCY. TT HH HT ng Hiến 12 2.5 Mulu-version Concurrency control (MVCC).6 Scalability and perÍOrImanCe.5 Features of NoSQL database.- ¿cà tt Sv SH 212121111111 ke 15 2.6 Difference between NoSQL and SQ\L.7 Advantages and disadVantage©S. Sàn HH HH nhà 19 3.
CHAPTER 3: MONGODB ADMINISTRATION SYSTEM. ó- HH T HH HH HT HH TT HH TT HH 20 3.2 Development history of MongolDB. Characteristics of MongolDB. S2 121 121211 11H H121 HH HH HH HH HH gi 22 3.1 Database and ColleCtiOn.
Fields and data types .-- Set tHerười 25 3. Design the data model .1 Embedded data model,.2 Reference data model. Model of relationships between document. Compare index types.4 Index pTOp€T(IS.
TT HH HH HH in 38 3.1 The replica - set StUCẨUT€.1 Fragmentation in MongolDB.3 Balanced data distributiOn.4 Configuration Sf€ps. cece cee 5 5S 11212121 115 E1 12121111 TH HH HH gi 47 4 CHAPTER 4: COMPARISON OF TWO DBMS MANAGEMENT SYSTEMS MONGODB AND SQL SERVER.1 Main diŸf€T€TICC. 53 42 Syntax differences NA .cccseeceseseseesecseseseeeesesessesesescseseeseseseseseeseseneees 57 5 CHAPTER 5: CONCLUSION.1 Some results achieved: .2 In terms of comparative eXp€TimeI(. ceccecessceeesseseseseeeteeseseseseseseeeeresseaeseseeteesserseseees 67 CHAPTER I: INTRODUCTION The Covid epidemic breaks out all over the world, all industries tend to turn to digital.
The impact of IT on human society is enormous. The development and application of the Internet have changed the model and way of doing business. Simultaneously with the development of social networks, which have allowed users to freely create content on social networks, data growth is rampant from data about images, blogs, social media updates. Everyday, electronic documents, music and video files are developed at a rapid rate.
All types of information, audio and video data can be digitally converted so that any computer can store, process and forward them to multiple people. This leads to data increasing very quickly, exceeding the processing limit of traditional database management systems. Storing and exploiting this huge amount of data to filter out useful data is the biggest challenge that humanity has faced in modern society.1 The science and novelty of the topic Current Relational Database Management systems (RDBMs) show weaknesses such as indexing large amounts of data, paging, or distributing media streams (movies, pictures, music, etc. Relational databases are designed for not too large data models while social networking services have a huge amount of data and are constantly updated due to the large number of users.
NoSQL database is especially suitable for extremely large applications (search services, social networks,. and small, minimizing computations, related read-write tasks combined with batch processing. Processing ensures the data processing requirements of social networking services. This database system can store and process from very small amounts to very large amounts of data to petabytes of data with high fault resistance and load capacity but requires only low hardware resources.
Up to now, there are about 150 types of NoSQL databases, each database system has its own characteristics and often used for different projects. In this thesis, I would like to learn in detail about MongoDB database management system, and compare it with the traditional database management system, SQL Server. MongoDB is an open source database system developed and supported by 10gen, the leading NoSQL database used by millions of people. MongoDB is often used for mid- range and large applications, usually social networking sites.
Currently, MongoDB is being used in a number of big companies such as MTV Networks, Craigslist, Foursquare. MongoDB is a database that is developing well and is used for many technology projects today. With the advantage of being an open source database system, the development potential of MongoDB is huge.1: Performance on MySQL and Cassandra (Source: itzone.vn) Facebook Search > 50 GB Data MySQL Cassandra Writes Average ~300ms 0.12ms Reads Average ~350ms 15ms 1.2 Reasons for choosing topics However, due to the massive data explosion, especially the unstructured nature of the data in recent years, non-relational database technologies like MongoDB become useful for solving problems that Traditional relational database encountered. Due to the needs of the era.
I decided to choose the thesis topic "Compare MongoDB and SQL Server". This is a very relevant topic, the development of information technology. Thesis plays an important role in providing a new insight into a trendy database, helping to develop applications with medium or high data volume, especially for social networking sites.3 Research objectives Find out the features of NoSQL and the difference between NoSQL and SQL. Understand the storage architecture of NoSQL, the types of NoSQL databases, the pros and cons of NoSQL.
From the specific characteristics of the NoSQL management systems, from there it is possible to use the NoSQL administration systems suitable for each appropriate application. Learn MongoDB and compare it with SQL Server database management system. As well as learn the features, schema design, index, replication, and querying on the MongoD admin system. The detail of the comparison is to show the difference between a non-relational database management system - MongoDB and a relational database management system - SQL Server.4 Subject of study Learn about NoSQL database overview, test, compare and evaluate NoSQL with traditional relational database model (SQL Server).
Learn about MongoDB and compare it with other NoSQL database management systems like SQL server. The comparison criteria include theoretical and experimental differences.5 Scope of the study Learn about NoSQL and MongoDB database management system and compare the theory and experiment between MongoDB and SQL Server management system, theoretically compare them based on features, limitations, computability, integrity, distribution, system requirements, architecture. Experimentally, we will compare and evaluate the performance of MongoDB compared to SQL Server on the regular operations of a popular database: Select, Update, Delete and Insert.6 Method of implementation - Learn about NoSQL database, especially MongoDB database management system and the traditional SQL Server database management system. - Experiment with Query questions such as Select, Update, Delete and Insert on many data with different records to compare their execution time on each database system.
- Conclusion and development direction 1.7 Expected results achieved Understand overview of NoSQL-MongoDB database, test, compare, evaluate MongoDB with traditional relational database model (SQL Server) on different datasets through which we can use it. Database management system in each specific case.8 Expect the structure of the thesis to - Chapter 1: Overview - Chapter 2: NoSQL database - Chapter 3: MongoDB database management system - Chapter 4: Results of running experiments comparing two database management systems, MongoDB and SQL Server - Chapter 5: Conclusion and development direction CHAPTER 2: OVERVIEW OF NOSQL DATABASES 2.1 NoSQL NoSQL - Not Only SQL, a new generation of databases that doesn't use the relational data model to manage data. NoSQL has schema-free. Designed for distributed data storage models with huge amounts of data up to rows of petabytes.
According to Eric Evans "The focus of NoSQL is to solve problems that RDBMS cannot solve" [21]. NoSQL is a new generation of database with outstanding features such as: Non relation, Distribute, open source, and Horizontal scalable, free schema, simple API. It is possible to store data processing from very small amounts up to rows of petabytes in a system with high load, high fault tolerance and real-time response [1] [2] [17]. NoSQL encompasses a wide range of different database technologies that have been developed in response to the need for increased data volume, access frequency and processing efficiency, along with economical storage capabilities.
Relational databases are not designed to meet the challenges of the scale of storage and agility of these modern applications. [17] DATA GROWTH 40 X axis: Year Y axis: Global data 35 Unit of measurement: Zettabytes 1ZB = 1,000’ bytes 30 25 20 15 10 ° oni 2005 2006 20 7 | 2008 2014 2015 2016 5 2 2018 § 2 Figure 2.1: Chart of data growth [Note: UNECE research source] 2.2 History NoSQL has a lot of motivation to develop, but in fact NoSQL is not a completely new thing. The term "NoSQL" was used by Carlo Strozzi in 1998 as the name of the file based on the database he is working on. This is the generic name for lightweight open source relational database but does not use SQL for queries.
This is a relational database without an SQL interface. As such it is not really a part of our NoSQL movement today. [12] [1] [3] The term re-emerged in 2009 when Eric Evans, a member of Rackspace, commissioned by the Cassandra project, reintroduced the term NoSQL when Last.