Workshop Introducing SQL: A Foundation of Data Analytics Robb Sombach University of Alberta Alberta School of Business 1 Agenda • Introduction • SQL • Why SQL? • Data Definition • What about Python? R? Language (DDL) • Data Analytics • Exercise 2 • Data Manipulation • Relational Database Language (DML) • What is a database? • Exercise 3 • Terminology • SQLite • Open Data Portal • Exercise 1 • How I prepared for today 2 Robb Sombach • Work Experience • 15+ years working in the IT industry • 10+ years Self-Employed IT Consultant • IT Positions • Systems Analyst / Business Analyst • Database Administrator (Oracle / SQL Server) • Network Administrator • Developer 3 Robb Sombach • Teaching Experience • 5 years teaching at NAIT • Computer Systems Technology (CST) • Digital Media and Information Technology (DMIT) • 6+ years teaching at University of Alberta • Technology Training Centre • Alberta School of Business 4 Resources All Workshop files can be downloaded here http://bit.ly/odd_2019 5 Introduction Workshop Introducing SQL: Foundation of Data Analytics 6 Goals • Introduce relational database concepts • Provides hands-on, real world database experience using data from the City of Edmonton Open Data Portal • Foster a collaborative workshop • Please interupt and ask questions 7 Why SQL? • Simple • Accessible • Applicable • Powerful • Pervasive • Valuable • Universal 8 Why not Python? R? • Difficult for beginners • Complicated syntax • Requires programming knowledge (logic, algorithms) • Is SQL better than Python or R? • SQL is good for some things • Python/R is good for other things • Compliment each other • SQL is a great starting point 9 Data Analytics • Analytics is the discovery, interpretation, and communication of meaningful patterns in data; and the process of applying those patterns towards effective decision making • Organizations may apply analytics to business data to describe, predict, and improve business performance • https://en.org/wiki/Analytics 10 Relational Database Workshop Introducing SQL: Foundation of Data Analytics 11 What is a database? • A relational “database” management system (RDBMS) organizes data • The logical structure of the database is based upon the information needs of an organization • Entities (“things” of interest to the organization), AND • Relationships (how the Entities are associated with each other) 12 Advantages of a RDBMS • Establish a centralized, logical view of data • Minimizes data duplication (i. “redundancy”) • Promote data accuracy and integrity • Capacity of database • Superior multi-user or concurrent access • Security • Retrieve information quickly https://www.com/blog/advantages-database-development-business/ • Inter-operability 13 Database Terminology • Table, Entity, Relation, (similar to an Excel Worksheet) • Row, Record, Instance • Column, Field, Attribute • Primary Key – unique and mandatory • Foreign Key – a cross- reference between tables because it references the primary key of another table • Relationship – created though foreign keys 14 How to introduce SQL? • Microsoft Access • https://products.com/en- ca/access • Microsoft SQL Server • https://www.com/en- us/sql-server/sql-server-2017 • MariaDB, MySQL • https://mariadb.org/ • https://www.com/ • Postgresql • https://www.org/ • Oracle • https://www.com/database/ • Hadoop, Spark, Hive, Pig • https://hadoop.org/ 15 A database that … • Has full-featured SQL • Aviation-grade quality and • Has billions and billions of testing deployments • Zero-configuration • Is a single-file database • Has ACID (Atomic, • Has public domain source Consistent, Isolated, and code Durable) transactions, even after power loss • Small footprint • Has a stable, enduring file • Has a max DB size of 140 format terabytes • Is has extensive, detailed • Has a max row size of 1 documentation gigabyte • Has long-term support (to • Is faster than direct file the year 2050) access https://www.html 16 SQLite • “SQLite is the most widely deployed database in the world with more applications than we can count, including several high-profile projects” • https://www.html • “SQLite is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine” • https://www.html • Perfect for learning SQL (the foundation of data analytics) 17 Exercise 1: Download and Run SQLite BD Browser • Download SQLite • Download SQLite DB Browser Portable • https://sqlitebrowser.org/dl/ 18 Exercise 1: Download and Run SQLite • Extract the ZIP archive to the Desktop • Start SQLite • SQLiteDatabaseBrowserPortable.exe • Create a New database • open_data_day_2019.db • Save the database in the Data folder • Click Cancel when prompted to create a table • Done! 19 Exercise 1: Completed 20 SQL Workshop Introducing SQL: Foundation of Data Analytics 21 What is SQL? • SQL stands for Structured Query Language • SQL is pronounced S-Q-L or sequel • SQL is a standard language for managing, manipulating and querying databases • Developed at IBM in the early 1970’s • In 1986, ANSI and ISO standard groups officially adopted the standard “Database Language SQL” definition • Most SQL databases have their own proprietary extensions in addition to the SQL standard • SQL is the language used to ask questions (query) of a database which will return answers (results) 22 Why is SQL the foundation of Data Analytics? • Data engineers and database administrators will use SQL to ensure that everybody in their organization has access to the data they need • Data scientists will use SQL to load data into their models • Data analysts will use SQL to query tables of data and derive insights from it 23 Components of SQL • SQL consists of three components which offer everything required to manage, maintain and use a database 1. Data Definition Language 2. Data Manipulation Language 3.
Data Control Language 24 Data Definition Language (DDL) • This component is used to define the structure (or schema) of the database • For tables there are three main commands: • CREATE TABLE table_name • To create a table in the database • ALTER TABLE table_name • To add or remove columns from a table in the database • DROP TABLE table_name • To remove a table from the database 25 Exercise 2: Data Definition Language • Select the Execute SQL tab in SQLite • Type or copy/paste the CREATE TABLE statement into the empty SQLite Execute SQL window • Click the Execute SQL button on the toolbar • If the table is created successfully, you should receive the following message: • Query executed successfully: CREATE TABLE "MOSQUITO_TRAP_DATA“ • Click Write Changes to make commit the changes permanent • View the changes in the Database Structure tab 26 CREATE TABLE "MOSQUITO_TRAP_DATA" ( `SAMPLEID` INTEGER PRIMARY KEY AUTOINCREMENT, `TRAP_DATE` NUMERIC, `GENUS` TEXT, `SPECIES` TEXT, `TYPE` TEXT, `GENDER` TEXT ); https://www.org/lang_createtable.html 27 Exercise 2: Data Definition Language • Select the Execute SQL tab in SQLite • Type or copy/paste the ALTER TABLE statements into the empty SQLite Execute SQL window • Click the Execute SQL button on the toolbar • If the table is created successfully, you should receive the following message: • Query executed successfully: ALTER TABLE "MOSQUITO_TRAP_DATA“ • Click Write Changes to make commit the changes permanent • View the changes in the Database Structure tab 28 ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALNORTHWEST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALNORTHEAST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALSOUTHEAST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RIVERVALLEYEAST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RIVERVALLEYWEST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RESIDENTIALNORTH` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALSOUTHWEST` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `LAGOON` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `GOLFCOURSE` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `INDUSTRIALPARK` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RESIDENTIALSOUTH` INTEGER; ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `TOTAL` INTEGER; https://www.org/lang_altertable.html 29 Exercise 2: Data Definition Language • Select the Execute SQL tab in SQLite • Type or copy/paste the DROP TABLE statement into the empty SQLite Execute SQL window • Click the Execute SQL button on the toolbar • If the table is created successfully, you should receive the following message: • Query executed successfully: DROP TABLE "MOSQUITO_TRAP_DATA" • Click Write Changes to make commit the changes permanent • View the changes in the Database Structure tab 30 DROP TABLE "MOSQUITO_TRAP_DATA"; https://www.org/lang_droptable.html 31 Exercise 2: Data Definition Language • Create the MOSQUITO_TRAP_DATA table again using the DDL on the next slide • Click Write Changes to make commit the changes permanent • View the changes in the Database Structure tab • Done! 32 CREATE TABLE "MOSQUITO_TRAP_DATA" ( `SAMPLEID` INTEGER PRIMARY KEY AUTOINCREMENT, `TRAP_DATE` NUMERIC, `GENUS` TEXT, `SPECIES` TEXT, `TYPE` TEXT, `GENDER` TEXT, `RURALNORTHWEST` INTEGER, `RURALNORTHEAST` INTEGER, `RURALSOUTHEAST` INTEGER, `RIVERVALLEYEAST` INTEGER, `RIVERVALLEYWEST` INTEGER, `RESIDENTIALNORTH` INTEGER, `RURALSOUTHWEST` INTEGER, `LAGOON` INTEGER, `GOLFCOURSE` INTEGER, `INDUSTRIALPARK` INTEGER, `RESIDENTIALSOUTH` INTEGER, `TOTAL` INTEGER ) https://www.org/lang_createtable.html 33 Exercise 1: Completed 34 Data Manipulation Language • This component is used to manipulate data within a table • There are four main commands: • SELECT • To select rows of data from a table • INSERT • To insert rows of data into a table • UPDATE • To change rows of data in a table • DELETE • To remove rows of data from a table 35 Exercise 3: SELECT Data Manipulation Language • Select the Execute SQL tab in SQLite • Type or copy/paste the SELECT statement into the empty SQLite Execute SQL window • SELECT COUNT(*) FROM MOSQUITO_TRAP_DATA; • Click the Execute SQL button on the toolbar • Do you get an answer? Why not? https://www.org/lang_select.html 36 Exercise 3: INSERT Data Manipulation Language • Add some data to the MOSQUITO_TRAP_DATA table created in Exercise 2 • Type or copy/paste the INSERT statement into the empty SQLite Execute SQL window • Click the Execute SQL button on the toolbar • Click Write Changes to make commit the changes permanent • View the changes in the Browse Data tab • The MOSQUITO_TRAP_DATA table now has seven rows of data 37 INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','spencerii','Black legs','Female',0,0,0,0,0,1,0,0,0,1,1,3); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','dorsalis','Banded legs','Female',0,1,0,0,0,0,2,0,0,0,0,3); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','euedes','Banded legs','Female',1,1,0,0,2,0,0,0,0,0,0,4); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','excrucians','Banded legs','Female',1,2,0,0,2,1,0,0,0,1,0,7); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','fitchii','Banded legs','Female',0,2,0,0,1,0,0,0,0,0,4,7); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','flavescens','Banded legs','Female',6,5,8,0,0,0,5,0,0,3,1,28); INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER, RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST, RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE, INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-01','Aedes','vexans','Banded legs','Female',3,168,1,21,38,8,16,0,0,3,32,290); https://www.org/lang_insert.html 38 Exercise 3: SELECT Data Manipulation Language • Type or copy/paste the SELECT statement into the empty SQLite Execute SQL window • SELECT COUNT(*) FROM MOSQUITO_TRAP_DATA; • Click the Execute SQL button on the toolbar • When you execute the query, you are asking the database a question • Can you tell me the number of rows in the MOSQUITO_TRAP_DATA table? • The database gives you an answer (the result) and you should have received the following message: • 7 rows returned in 1ms from: SELECT * FROM MOSQUITO_TRAP_DATA; https://www.org/lang_select.html 39 Exercise 3: SELECT Data Manipulation Language • What if you want to see all the rows in your database? • SELECT * FROM MOSQUITO_TRAP_DATA; • Returns all columns and rows in a table • What if you only want to see the Genus, Species and Total of each row? • SELECT GENUS, SPECIES, TOTAL FROM MOSQUITO_TRAP_DATA; • Returns only the GENUS, SPECIES, TOTAL columns for each row in a table https://www.org/lang_select.html 40 Data Manipulation Language • The WHERE clause Operator Description • Uses operators to extract = Equal only those records that <> Not equal. Note: In some versions of fulfill a specified condition SQL this operator may be written as !