Many people do not understand what a database is or how it is different than an excel sheet
Data manipulation is neither natural nor visual
Joins, Aggregations, and Subqueries are an acquired taste
Errors are hard to parse
Don't get frustrated!
just understanding the concepts is powerful, even if you aren't creating queries yourself
Why Databases?
Size: "Unlimited"
Accuracy: When updating data, it is much less error prone
Security: Nuanced permissioning settings
Queryable: Allows you to "ask questions" of your data
Database Jargon (labels)
Spreadsheets vs Databases
Spreadsheet
Database
Sheet
Table
Column
Field
Row
Record
This is a table
Field
Field
Field
Record
data
data
Record
data
data
Database Jargon (keys)
Primary Key - used to uniquely identify a record.
Foreign Key - used to refer to a primary key in another table.
Schema - name for whole set of tables and their relationships.
Database Jargon (data types)
VARCHAR: Character string having variable length.
This data type treats everything as text. You can place numbers in here but adding them together would just attach them which is what you would expect with text.
a + b = ab 1 + 2 = 12
INTEGER: Whole numbers.
This data type only allows whole numbers inside of it. All math functions can be applied to it. Text cannot be placed here.
1 + 2 = 3
FLOAT: Numbers with decimal points.
This data type works the same as INTEGER but allows decimal values.
1.1 + 2.2 = 3.3
BOOLEAN: True or False value.
This data type can be either True or False. This is useful for setting binary characteristics or status about some data.
Is the house on fire? True or False (1 or 0)?
Database Jargon (data types - continued)
DATE: Date / Time value.
This data type contains date and/or time information which allows you to perform calculations related to date/time.
How many dats are between 1/1/19 and 2/4/19? (34)
NULL: This means there is no value in the cell (and this can show up in any of the above types)
This data type is useful for showing the absence of data. A blank cell and a Null cell are different. Null is ignored by aggregations and it is easy to filter for. Blank values or 0s would be included in aggregations and can be more difficult to detect.
What is the Average of 1 + NULL + 5? (6 / 2 = 3)
What is the Average of 1 + 0 + 5? (6 / 3 = 2)
Accessing Data
Try your hand at additional practice at The Data School by Chart.io.
Combining Data Tables - SQL Joins Explained
A JOIN clause in SQL is used to combine rows from two or more tables, based on a related column between them.
Inner Join
Left Join
Full Outer Join
Union
Cross Join
Join Practice
For more practice with JOINs, try the Join walkthrough and code on live data at data.world.
Common Syntax Errors
Spelling
Check table spelling
Check column spelling
Check value spelling
SELECT *
FROM Facbook
<- Wrong column name
Where Names = "Jeffrey"
<- Wrong column name, wrong value name
Facebook
Name
# of Friends
Matt
300
Lisa
500
Jeff
600
Sarah
400
Common Syntax Errors
Single vs Double Quotes
Single quotes(' '), double quotes(" "), and backticks (` `) can all have different meanings in SQL.
Single quotes often indicate a text string
SELECT name, age FROM facebook WHERE name = 'Matt'
Single Quotes
facebook
Name
age
Matt
27
Lisa
30
Jeff
31
Sarah
40
Double quotes often indicate a table name or a field
SELECT "age" FROM "facebook"
Double Quotes
facebook
Name
age
Matt
27
Lisa
30
Jeff
31
Sarah
40
Common Syntax Errors
Capitalization
Doesn't matter unless you put it in quotes, then it must match exactly.
SELECT NaMe FROM facebooK
SELECT "Name" FROM "facebook"
Common convention is to put SQL commands in all caps
SELECT, FROM, WHERE, etc...
Data Types
Sometimes when a number is stored in a database it is stored as a string.
This means when we refer to it in a query we must put single quotes around it.
SELECT * FROM facebook WHERE age = '27'
Common Syntax Errors
When writing a SQL query you have to place the clauses you use in this order:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT
0 Rows Returned
JOINing on the wrong column
Column names can be misleading, check which you are using.
0 Rows Returned (continued)
Over Filtering
Check to see if your filter is the problem by commenting it out using:
SELECT * FROM facebook -- WHERE # of Friends > 1000