Welcome to MSTC

Welcome to MSTC

Unit 1 Database Design Instructor: Brent Presley Instructors Notes Database Design Steps Relational Database Development Relational Database Development (See Database Design notes for more details)

(152-156) Goals: Databases that: Adaptable; fields and tables can be added easily Flexible; data can be retrieved in unlimited number of ways Accurate; no data redundancy, fields limit data entry where possible 1) Fact Finding a) b) Determine fields required for database Make sure there arent multi-part fields

2) Name Tables a) Using simple nouns. Use plural or singular for all entity names (dont mix singular and plural) 3) Draw Entity Relationship diagram 4) Determine Primary Keys for Each Entity Keys uniquely describe each record of a table 5) Resolve Many-to-Many Relationships

a) b) c) d) Insert new entity between parents Name new entity One instance of parent1 + one instance of parent2 is called what? Re-evaluate cardinality Probably 1------M [ ] M--------1 Determine keys for new entity

Probably keys from both parents 6) Determine Foreign Keys (Linking Fields) For each child entity (many side of a relationship), ensure the key from its parent(s) has been copied to the child. 7) Remove calculated fields and constants a) b) c)

d) Make a separate list of calculated fields and equations used to calculate them Ensure data required to generate calculated fields is available in the field list Required data can be combined from multiple tables Constants are fields whose value is the same for all records 8) Name and assign fields (non-key) attributes to appropriate table a) b) Assign to only one table (no redundancy) Linking fields must be redundant

9) For all fields, determine type and size a) b) c) Consider specifying value ranges and default values as well Designate logical keys Create sample records 10) Ensure no data redundancy except for linking fields.

Watch for synonyms, fields with different (though similar) names Database Design Database Design Notes Activity Database Design Goals -- Database that is: Adaptable - Fields and tables can be added (removed) easily Flexible

- Data can be retrieved in an unlimited number of ways Accurate - No data redundancy - Validation on fields - Default values - Look ups Step 1 Fact Finding Determine field (data storage) requirements Sources: - Current users (owners) - Existing databases - Existing forms or other documents Dont worry about grouping, simply list

Split multi-part fields into separate fields - Example: Split Name into FirstName and LastName - Example: Split Address into Street, City, State and Zip - Example: Split Phone into AreaCode and Phone, maybe Extension Handout Student Enrollment field list Step 2 Name Tables

List tables for Enrollment Database Browse through field list, list those tables that are obvious (others might (will) surface later) Tools and Resources

XAMPP (First Part of Quarter) MySQL Workbench (First Part of Quarter) Azure (Second Part of Quarter) Visual Studio Community 2013 (Second Part of Quarter) SQL (W3Schools) SQLCourse.com SQLZoo.net SQL (TutorialsPoint) SQL Tutorial SQL (TutsPlus) Essential SQL Learn SQL The Hard Way Udemy Training (Free): Sachin Quickly Learns SQL

Udemy Training (Free): Database Design Udemy Training (Free): MySQL Database for Beginners Udemy Training (Free): SQL Server for Beginners 3 WHAT IS A DATABASE? What is a database? https://www.youtube.com/watch?v=t8jgX1f8kc4 Introduction to Databases This will preview a lot of information that we will discuss in more detail in the weeks to come

https://www.youtube.com/watch?v=4Z9KEBexzc M HISTORY OF DATABASE SYSTEMS File systems (before mid 1960s) Problems: Data redundancy update anomalies no abstract data model requires knowledge of storage details no standard query language

HIERARCHICAL DATABASES (MID 1960S) Developed by North American Rockwell and IBM as the IMS (Information Management System) Based on a tree structure Example: A Product assembled from components, which are assembled from subcomponents Problems: Changes in data structure require changes in application programs that access that structure No Many-to-Many relationships Programmers must be thoroughly familiar with the database structure.

NETWORK DATABASES Extension of the hierarchical data model Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) Advantage: Many-to-Many relationships are implemented Problems: Navigation is even harder RELATIONAL DATABASES Proposed in 1970 by E.F. Codd while working at IBM. IBM largely ignored his work, as the company

was investing heavily at the time in commercializing IMS databases. It was not until 1978 that Frank T. Cary, then chairman and CEO of IBM ordered the company to build a product based on Dr. Codds ideas. Oracle emerges But IBM was beaten to the market by Lawrence J. Ellison, a Silicon Valley entrepreneur, who used Dr. Codds papers as the basis of a product around which he built a start-up company that has since become the Oracle Corporation.

New York Times April 23, 2003 Obituary of E. F. Codd (1923-2003) 1. RELATIONAL DATABASES Data Abstraction- allows people to forget unimportant details View Level a way of presenting data to a group of users Logical Level how data is understood to be when writing queries WHAT IS A NULL?

It basically means both since the a column allows NULL and there is no default value set for the column. If you insert into the table and don't specify a value and there is no default value for the column, the value will be null (undefined). ENTITY An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint. ATTRIBUTES Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes. There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A

student's age cannot be negative, etc. ATTRIBUTE TYPES Simple attribute Simple attributes are atomic values, which cannot be

divided further. For example, a student's phone number is an atomic value of 10 digits. Composite attribute Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first_name and last_name. Derived attribute Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average_salary in a department should not be saved directly in the database, instead it can be derived. For another example, age can be derived from data_of_birth. Single-value attribute Single-value attributes contain single value. For example Social_Security_Number. Multi-value attribute Multi-value attributes may contain more than one

values. For example, a person can have more than one phone number, email_address, etc. KEYS Key is an attribute or collection of attributes that uniquely identifies an entity among entity set. For example, the roll_number of a student makes him/her identifiable among students. Super Key A set of attributes (one or more) that collectively identifies an entity in an entity set. Candidate Key A minimal super key is called a candidate key. An entity set may have more than one candidate key. Primary Key A primary key is one of the candidate keys

chosen by the database designer to uniquely identify the entity set. RELATIONSHIPS The association among entities is called a relationship. For example, an employee works_at a department, a student enrolls in a course. RELATIONSHIP SET A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes. These attributes are called descriptive attributes.

Degree of Relationship The number of participating entities in a relationship defines the degree of the relationship. Binary = degree 2 Ternary = degree 3 n-ary = degree Cardinality One-to-one One entity from entity set A can be associated with at most one entity of entity set B and vice versa.

Cardinality One-to-many One entity from entity set A can be associated with more than one entities of entity set B however an entity from entity set B, can be associated with at most one entity Cardinality Many-to-one More

than one entities from entity set A can be associated with at most one entity of entity set B, however an entity from entity set B can be associated with more than one entity from entity set A. Cardinality Many-to-many One entity from A can be

associated with more than one entity from B and vice versa. DATABASE DESIGN GOALS Adaptable Fields and tables can be added (removed) easily Flexible Data can be retrieved in an unlimited number of ways Accurate

No data redundancy Validation on fields Default values Look ups SMALL GROUP PROJECT

You are a known database developer and the parent of a thirteen-year-old son who is actively involved in the local Junior League Baseball program. Your son will be playing in one of the 12 local teams who will be competing in the National Division Junior League Tournament. Each pair of local teams plays twice against each other during the fourmonth season. With the intention of creating the best conceivable national team, the U.S. Junior Baseball League president, Mr. Henry Zemog, wants to gather appropriate statistics from all team players during the National Division Junior League Tournament. You have been asked by Mr. Zemog to design a database for tracking each teams and players statistics during the tournament series. The national team will represent the United States in the International Junior League World Series Tournament to be held in Heritage Park in Taylor, Michigan. You will have access to the complete game statistics for

each game that is played. You have agreed to fulfill this task. Using the lessons learned in Chapter 1 about the relational model and your knowledge of basic baseball statistics, use your favorite drawing tool to produce a relational diagram that can serve as a preliminary step toward the final database design. At this stage of the development process, the basic constructs should include only the entities and their relationships. Name the diagram Junior League Baseball Database. POTENTIAL ANSWER TO GROUP PROJECT There are many possible solutions GROUP PROJECT 2 You are in the requirements analysis phase

of designing a database for an organization. List the pieces of information that you need to acquire from stakeholders in order to minimize shortcomings and iterations during the preliminary design phase. POTENTIAL ITEMS FOR GROUP PROJECT 2 A list of products and services the organization provides An organizational chart, a list of stakeholders, and a list of job responsibilities Current handling of the information system and record keeping Current storage of the data and information, such as forms and reports

Department that will take ownership of the system Personnel responsible for using, entering, and maintaining the data Security levels Location of the database Infrastructure, software, and hardware equipment ENHANCED BASEBALL TABLE Add offensive and defensive statistics to the earlier example DATA NORMALIZATION Database Normalization is a technique of organizing the data in the database. Normalization is a systematic

approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables. Normalization is used for mainly two purpose, Eliminating reduntant(useless) data. Ensuring data dependencies make sense i.e data is logically stored. ASSIGNMENT IN GROUPS OF 2-3 Use the Internet to research normal forms and explain any drawbacks to normalizing

data. In your own words, write a one-page summary of your findings and any additional recommendations or observations that you may have. Include title and reference page (not to be counted toward total pages). STEPS IN BUILDING A DATABASE STEP 1- FACT FINDING Determine field (data storage) requirements Sources: Current users (owners)

Existing databases Existing forms or other documents Dont worry about grouping, simply list Split multi-part fields into separate fields Example: Split Name into FirstName and LastName Example: Split Address into Street, City, State and Zip Example: Split Phone into AreaCode and Phone, maybe Extension Handout Student Database field list Assign Terminology Worksheet Student enrollment db fields

Social Security Number Student Name

Email Program Code Program Name GPA Phone number Phone type Street Address City State Zip Code

Instructor Number Instructor Name Instructor Home Phone Instructor Business Phone Email Address Web Site

Course Grade Course Number Course Name Description Credits

Course Time Course Days Instructor Number STEP 2 NAME TABLES Browse through field list, list those tables that are obvious (others might (will) surface later)

List tables for Enrollment Database Table Naming Conventions Add the tbl prefix to each table name Name tables using either plural nouns or singular nouns. Dont mix with in a database. - E.g. tblCustomers, tblLocations, tblVehicles - E.g. tblCustomer, tblLocation, tblVehicle - Unique and descriptive - 2012: Lean towards plural nouns Ensure abbreviations are clear to everyone, not just those involved in the project.

Brief, but complete - Use minimum words necessary Dont include database terminology: Record, File, Table Dont include adjectives that restrict data - Example: Wisconsin Rapids Employees, Stevens Point Employees Results in duplicate structures. Structures (field lists) of both tables will be identical STEP 2- NAME TABLES Make a separate table for multi-value fields. Example: a field named Hobbies might contain

bowling, fishing Create a separate Hobbies entity (each hobby will be listed as a separate record in this table) Multi-value fields are difficult to search and nearly impossible to validate or sort. Tip: if the field name is plural, its probably a multi-value field. STEP 3- DRAW ENTITY RELATIONSHIP DIAGRAM Entity Relationship Diagram (ERD) is picture that shows the relationships between tables of a database

Helps discover additional tables and defines relationships Rectangle used to represent each table in a database Line drawn between tables that are directly related At end of each line, include cardinality One occurrence in table 1 is related to how many occurrences of table 2 (maximum number) One occurrence in table 2 is related to how many occurrences of table 1 (maximum number) For our purposes, the maximum is listed as 1 or many (M) ENTITY RELATIONSHIP DIAGRAM The above ERD fragment expresses that:

One lab contains (M)any computers One computer exists in only one (1) lab Entity Relationship Diagram (ERD) https://www.youtube.com/watch?v=-fQ-bRllhXc FOR MORE INFORMATION Data modelling and the ER model https://www.youtube.com/watch?v=IfaqkiHpIjo (60) ERD CONCEPTS Crows feet

notationdesignates the cardinality of the relationship ERD CONCEPTS DRAW THE ERD FOR THIS (GROUP)

As a part of its project management database, the company wants to store information about resources (employees), projects and bookings. For each employee, the following information is stored: Employee ID, First and Last name, Rank, and billing rate. Employees are organized into solution sets, each solution set has a head of the solution set, who is the resource owner for all employees in that SS. For each solution set we record the SS ID and the SS name. For scheduling purposes, we want to store information about the head of each solution set, and about assignment of employees to solution sets. An employee can belong to only one solution set. The scheduling system also stores information about project. For each project, the following information is stored: Project ID, Status, Location and Client name. As a part of the scheduling system, we store information about each calendar day in a year. When a booking is requested for an employee, the employee is

scheduled to work on a particular project, on a particular day for the specified amount of time (10%-100%). For each booking we also record current status SOLUTION DRAW THE ERD FOR THIS (GROUP) On-line payment system stores information about all customers, including name, id, address, e-mail and password. Each customer has set up a specific method of payment, which may be a credit card payment or automated direct

withdrawal. For all types of payment we store the following information: an ID and the date the method of payment was set up. For credit card payments we store CC number and type and the expiration date. For automated withdrawal we store the name of financial institution, the routing number, account number and the date of monthly withdrawal. SOLUTION STEP 4 DETERMINE PRIMARY KEY Determine Primary Key for each Entity The primary key is the field or fields whose value uniquely identifies a record in that table. For Lab, it might be Room Number

For Computer, it might be ID Number STEP 4 DETERMINE PRIMARY KEY Primary keys can be a combination of two keys For Lab, if the building has multiple floors, a combination key might be Room Number plus Floor (e.g. Room 10 on Floor 5) STEP 4 DETERMINE PRIMARY KEY If you need to combine 3 or more fields to create a unique primary key, consider creating an ID Number field for that table (surrogate key).

These keys are usually autonumber fields Often times these are used in all tables. Primary key requirements: Unique. No two keys will have the same value Cannot be null. In multi-field keys, none can be null Values in field rarely (if ever) change PRIMARY KEY CONSIDERATIONS Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign

keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "primary key" can change for real world situations.

http://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables SURROGATE VS NATURAL KEY On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place

EXAMPLE Define keys for enrollment database STEP 5 RESOLVE MANY TO MANY RELATIONSHIPS Many-to-Many (M-M) are relationships where the cardinality is M (many) in both directions. The Lab-Computer example above is a 1-M (one-tomany) relationship. The following represents a M-M relationship One customer orders many products. MANY TO MANY RELATIONSHIPS

M-M relationships are nearly impossible to implement using a database program M-M relationships must be resolved into multiple 1-M relationships in order to implement the database RESOLVING M-M RELATIONSHIPS Insert a new entity between the two entities Name the new entity. What is one occurrence of table1 combined with one occurrence or table2 called? One customer ordering one product is called? an ordered product.

Re-evaluate the cardinality of the new relationships Probably 1----M [] M----1 (Manys attached to new entity) M-M RELATIONSHIPS Determine the primary keys (always at least 2) for the new entity. Usually the keys from the two parents Parent entities are those on the 1 side of a relationship (Customer and Product) Child entities are those on the M side of a relationship (Ordered Product) One entity can be the parent in one relationship and a child in a

different relationship. OTHER RELATIONSHIP ISSUES What happens to child records when parent records are deleted? Restrict Delete Parent record cannot be deleted until all child records (in all child tables) have been deleted. Preferred technique. Requires consideration of affects of deleting this parent record Cascade delete When a parent record is deleted, all associated child records (in

all child tables) are automatically deleted dangerous STEP 6 DETERMINE FOREIGN KEYS For every relationship, the primary key from the parent table must exist in the child table. This is what links the tables together in a relational database. Often, the links will already exist because of M-M resolution. If the parents primary key does not exist in the child, copy the field into the child table. This field DOES NOT become part of the childs primary key. Designate the field as a link (L) for data dictionary

Copy keys from Student, Section, and Instructor into child tables. STEP 7 REMOVE CALCULATED FIELDS AND CONSTANTS Because todays computers are so fast, its better to calculate these values as you need them instead of storing them in the database. Additionally, if you calculate them as you need them, you ensure the values are always up to date. Make a separate list of the calculated fields you removed. Include the equation used to calculate the value.

STEP 7 REMOVE CALCULATED FIELDS AND CONSTANTS Ensure all the parts of the equations are stored somewhere in the database. Equation parts can be stored in different tables (linking allows you to bring them together) If parts can be calculated, dont store them either STEP 7 REMOVE CALCULATED FIELDS AND CONSTANTS

Constants are fields that ALWAYS store the same value No need to waste storage space Print the constant value on reports when needed There are exceptions to this rule. Values that rarely change, though calculated, may be fields in the database. Ive never run into an instance of this though UPDATE DATABASE Remove GPA from Student table GPA = Total Points /

Total Credits Total Points = Sum of all grade points Total Credits = Sum of all credits earned Grade Points available (determined from letter grade) Credits Earned available Remove State (constant) Remove City, create ZipCity table to lookup city based on zip Zip is linking field in Student Assign fields to entities in Enrollment database STEP 8- ASSIGN REMAINING FIELDS TO ENTITIES

For all remaining fields (from Step 1), assign to one and only one table. Only linking fields may be duplicated in a database FIELD NAMING STANDARDS Field Naming Standards Apply to primary keys and linking fields as well. Use singular nouns If plural makes more sense, this is not a field but another table.

Unique and descriptive Include table name when field name occurs in two tables (StudentAddress, InstructorAddress) (optional) Use minimum number of words Use acronyms and abbreviations wisely (only if everyone understands them) If the name includes / & - and or, it probably represents two or more fields. Split them. Split multipart fields into separate fields If a field can be decomposed into parts, its probably more than one field.

Example: Address (street, city, state, zip) Phone (area code, number, extension) STEP 9 FOR ALL FIELDS, DETERMINE TYPE AND SIZE Use types and sizes available in your database program Types and sizes of linking fields (foreign keys) must be identical in each table MYSQL : int or varchar Varchar(20)

Int (if its automatically assigned) MYSQL COMMON DATA TYPES VARCHAR (string 0-255 characters) TEXT (0-65k characters)

INT BIGINT DATE DATETIME BOOLEAN http://www.cheatography.com/davechild/che at-sheets/mysql/ MYSQL DATA TYPES Complete listing STEP 10 ENSURE NO REDUNDANCY

EXCEPT LINKING FIELDS Check for synonyms, two fields with different names that are actually the same thing. Example: Social Security Number and Employee ID Double-check to ensure non-linking fields only occur in one entity STEP 10 ENSURE NO REDUNDANCY EXCEPT LINKING FIELDS Field Formatting / Validation Considerations Designate digits required for text field Use a lookup for this field

All linking fields should be lookups Autocap: automatically capitalize the first letter of each word in the field Uppercase: automatically capitalize all letters in the field N1-n2: numeric value range check STEP 10 ENSURE NO REDUNDANCY EXCEPT LINKING FIELDS Field Formatting / Validation Considerations Auto populate from field Automatically populate this field from another field in the database (credits earned = current credits)

Not a lookup User not usually allowed to edit Required Keys are automatically required ADDITIONAL THOUGHTS Database design is best done by a group of people unless you have significant experience. Dont be afraid of undiscovered errors in your design When you build the database, errors will surface and you can correct them early When you populate the tables with data, other errors might surface.

Again, youll usually catch these early on. If you follow these guidelines, your database will be adaptable, flexible and accurate. Any design errors you find after using the database for a while (lots of data entered) should still be relatively easy to correct, especially with Access help DATA DICTIONARY A data dictionary, or data repository, is a central storehouse of information about the systems data An analyst uses the data dictionary to

collect, document, and organize specific facts about the system Also defines and describes all data elements and meaningful combinations of data elements DATA DICTIONARY Documenting the Data Elements You must document every data element in the data dictionary The objective is the same:

to provide clear, comprehensive information about the data and processes that make up the DATA DICTIONARIES Data dictionary must contain the following information: Table Name Field (attribute) name Expanded field name Field contents or long description Data type and length or size Default value(s)

Format (required or optional digits or characters & sequence of characters if appropriate) Domain (range or choices) Allow NULL? (Y or N) Key (PK or FK) Foreign Key referenced table DATA DICTIONARY DOCUMENT DATA DICTIONARY DOCUMENT

Recently Viewed Presentations

  • Changing Directions in the Study of Conditioning

    Changing Directions in the Study of Conditioning

    Changing Directions in the Study of Conditioning ... Instinctive Drift. Occurs when an animals innate response tendencies interfere with conditioning processes (Brelands) Raccoons. ... Changing Directions in the Study of Conditioning Last modified by:
  • Chapter 10: Phase Diagrams - Clarkson University

    Chapter 10: Phase Diagrams - Clarkson University

    (Illustration only) (Figs. 10.14 and 10.17 from Metals Handbook, 9th ed., Vol. 9, Metallography and Microstructures, American Society for Metals, Materials Park, OH, 1985.) 175 mm a a a a a a hypoeutectic: C0 = 50 wt% Sn Adapted from...
  • L'état de stress post-traumatique: Évaluation et traitement

    L'état de stress post-traumatique: Évaluation et traitement

    Intervention en contexte traumatique Cours du 1er septembre 2010 Présentation de l'état de stress post-traumatique Line Vaillancourt, Ph.D. Psychologue & Mélissa Martin, Ph.D. Psychologue
  • Computing and Global Health: Bridging Health System Needs and ...

    Computing and Global Health: Bridging Health System Needs and ...

    Equipment assignments following country preferences. Inventory data from some countries was preliminary. There is a high variability in costs for cold rooms due to construction and setting up the site. Country A's analysis assumes three month supply intervals to district...
  • Office Word 2003 Lab 3 - David Tucker

    Office Word 2003 Lab 3 - David Tucker

    CSCI 104. Computing Concepts. Objectives. Explain the five parts of an information system: people, procedures, software, hardware, and data. Distinguish between system and application software. ... Office Word 2003 Lab 3 Last modified by: Tucker, David ...
  • Database Best Practices IT Partners Conference 4/19/2005 Mike

    Database Best Practices IT Partners Conference 4/19/2005 Mike

    Database Best Practices IT Partners Conference 4/19/2005 Mike Sherman [email protected] My Background Manager, Applications & Network Services team in the Department of Facilities.
  • 4 CAUSES Collapse of the Soviet Union /

    4 CAUSES Collapse of the Soviet Union /

    a. Beginning in 1986, the Soviet Union began significantly cutting funding to its satellite states. This move is usually discussed as the beginning of the end for the Soviet Union, which was experiencing a recession similar to the Great Depression...
  • Anasazi Heritage Center Canyons of the Ancients National

    Anasazi Heritage Center Canyons of the Ancients National

    Anasazi Heritage Center Canyons of the Ancients National Monument Entrance Hall Theatre Museum Functions Main Exhibition Displays Archaeology Material Evidence Weaving Jewelry and Garment Making Pithouse Display Pottery Designs Pottery Coiled Greyware Mugs Double Mug Lidded Jars Escalante Ruin SOUTHWEST...