SQL Server Database Standards
Copyright 2002-2004 David Russell
1.4.1
De-normalization for Performance
1.4.3
Tokens, or Surrogate Keys
2.1.6
Data Transformation Services Packages.
2.1.10
User defined data types
2.1.13
Default and Check constraints
3.1.3
Outer joins, Inner joins, Nested, correlated sub
3.5
Keys: primary, foreign, and alternate.
3.6
Temporary Tables (INSERT...EXEC)
4.4
SP Templates for Error Handling and Transaction Processing
4.4.1
Standard SP Template with Error Handling
4.4.2
Standard Transaction Processing Logic.
5.2
Other Scripts (DTS, Maintenance Plans, etc.)
6.2.2
Consequences of Changing Passwords
6.3.1
Notification from Management
6.3.2
Notification to VSS/PVCS Administrator.
Data modeling is essential to building a well-functioning database that meets the needs of the business. The data model must be built with a good understanding of the current needs of the business, the purpose of the current design effort, and the longer-term needs of the business. This ensures that the model will be extensible and expandable. Data modeling is usually performed in three phases, conceptual, logical, and physical. Data modeling standards are intended to ensure that system has been well thought out and that all major entities and relationships have been defined.
During the conceptual design, an entity relationship diagram (ERD) is created to graphically represent the businesses data and information needs. The ERD shows what the database will be able to store and what information can be retrieved from it. At this level, the model shows what a system can do, not how it does it. Entities are real objects – people, places, and things – that are relevant to the application. Proper entity names would be Student, Teacher, School, or Classroom. Relationships enforce business rules and describe how two entities interact with each other. For example, a Student ‘Goes To’ a School and a Teacher ‘Is Assigned To’ a Classroom.
An ERD must be created for each new system or application. The ERD should be developed in VISIO or ER/Studio and must use Crows-Feet notations. The ERD must be reviewed with a DBA prior to proceeding with the project.
During the logical design phase, the ERD is mapped to tables and each table is tested to determine that it is in third normal form. Data normalization meets four important goals:
Arranging the data into logical groupings where each group represents a small part of the whole database,
Minimizing the amount of duplicate data stored in the database,
Organizing the data so that, when it needs to be modified, it only needs to be changed in one place,
And building a database that allows quick, efficient access to the database without compromising the integrity of the data being stored.
During logical design, the data needs to be examined for likely primary keys within each table and for foreign key relationships that will need to be maintained. All new systems must be designed and normalized to third normal form. This phase is necessary to ensure a well designed and thought out system. During the physical design phase and subsequent testing, specific areas of the design can be de-normalized as necessary to meet design and performance goals. The engineer can perform the logical design phase and have it reviewed by a DBA or the engineer can provide the ERD to the DBA and request assistance in the logical design phase.
Following is a brief description of data normalization at the first, second, and third normal forms. All database designs must be taken to third normal form. As coding and testing progress, the design can be de-normalized as necessary to meet specific design and performance goals.
First Normal Form – The data must be atomic, that is, it cannot contain any repeating groups. Repeating groups are columns that occur more than once. They behave like an array of data.
Second Normal Form – The data must be functionally dependant only on the primary key of the table. For example, a school table would contain the school name because it is dependant upon the school id but would not contain the principal’s name because there will be many principals during the lifetime of the school.
Third Normal Form – The data must not contain any transitive dependencies. This means that no non-key column in the table can define the value of another non-key column. In the school table above it would be correct to include the district id in the school table because it depends on the school id, but the district name should not be included because it depends on the district id.
During the physical design phase, the logical design is mapped to the requirements of the target RDBMS. All logical designs are mapped to Microsoft SQL Server 2000.
During physical design, table and element names and data types are modified as necessary to conform to defined naming standards and Microsoft SQL Server 2000 requirements. The engineer can perform the physical design phase and have it reviewed by a DBA or the engineer can request DBA assistance in this phase.
The process of de-normalizing a database should always begin with a database that has first been fully normalized to third normal form. There are times when de-normalizing a database makes good sense but it must be remembered that you pay a price for de-normalization. Often, a de-normalized database is more difficult to extend in the future and it may be more difficult to ensure full data integrity. Each de-normalization decision must evaluate the benefit against the future costs. In general, a database should be de-normalized only when a performance gain (usually from reducing the number of joins required in a query) outweighs any losses in flexibility and data integrity. User convenience is not a justification for de-normalization. Stored procedures and views can be used to produce the desired results.
In addition to de-normalization, there are many other issues that must be considered in optimizing a database. Most of these will be discussed in the Performance section under Database Objects. However, two critical factors that must be considered during database design are the nature of the joins that must be performed to extract data from the database and indexing requirements to support these joins. As the database is being designed, primary keys, foreign keys, and candidate keys are defined. When optimizing the design, the indexing of foreign keys and candidate keys must be considered. Primary keys are automatically indexed when the primary key is defined.
During the course of designing the database, it may be difficult to identify logical primary keys for some tables. These tables may either have candidate keys that are derived from a complex concatenation of elements or long character values that do not lend themselves to efficient indexing. It may also be necessary to define a key that can be guaranteed to be unique across multiple databases or servers. In these cases, a surrogate key or token should be defined to act as the primary key. The two most common tokens are an integer column defined with the IDENTITY attribute, which will ensure uniqueness within the table, or a GUID column, which will ensure uniqueness across databases and servers.
A Data Dictionary should be completed for each new database, application, or subject area. In its simplest form, the data dictionary is a collection of data element definitions. More advanced data dictionaries contain the database schema with key definitions, entity-relationship models of the data elements and other documentation. The Data Dictionary will contain the following:
Data element name (caption)
Unique data element name from the application domain. This is the real life
name of this data element.
Short description
Description of the element in the application domain.
Related data elements
List of closely related data element names when the relation is important.
Field name
The name of the element in the database schema.
Code format
Data type, length, and any edit masks applied to the element.
Null value allowed
Null or non-existing data value may be or may not be allowed for an
element.
Default value
Data element may have a default value.
Allowed values
Explanation of code tables, validation rules, and other constraints used to
validate this element.
Inter-element validation
details
Validation rules between this element and other elements in the data
dictionary.
Database table references
Reference to tables the element is used and the role of the element in each table.
Special indication when the data element is the key for the table or a part of
the key.
Meaning of the element
Short definition and references to other documents needed to understand the
meaning and use of the data element.
Source of the data
Short description of where the data comes from. Calculation rules are included
here.
Validity dates
Start and possible end dates, when the element is or was used.
History references
Dates when the element was originally defined, defined in its present form,
and/or retired from use.
External references
References to books, other documents, laws, etc.
Version or control number
Version number and configuration management references.
Data element notes
Short notes not included above.
Table definitions
Table name
The name of the table in the database schema.
Short description
Description of the table in the application domain.
Table owner
Database name
List of data elements
List of keys
List of data elements that participate in the primary key, alternate keys, and
foreign keys. All tables must have a
primary key.
Indexes
List of data elements that are indexed.
Describes the index as clustered/non-clustered and unique/non-unique.
Triggers
List of triggers on the table, their function, and other tables they reference.
Meaning of the table
Short definition and references to other documents needed to understand the
meaning and use of the table.
Validity dates
Start and possible end dates, when the table is or was used.
History references
Dates when the table was originally defined, defined in its present form,
and/or retired from use.
External references
References to books, other documents, laws, etc.
Version or control number
Version number and configuration management references.
Table notes
Short notes not included above.
Human Factors concepts explain that the mind is generally capable of comprehending seven objects at one time, give or take two. The ideal way to present an entity relationship diagram (E/RD) is to keep the boxes within this range of five to nine entities. We would like to ultimately get to the point where all of our E/RDs are published on the web. In order to display them on a screen in a meaningful way, a top-level view of our database may refer to seven +/- “subject areas”. Subject Areas will be defined for organizations, people, products, lessons, objectives, customers, schools, etc. Customers and schools may seem redundant in this list; however, these are meant to be representative of way we will be able to slice that data and the actual analysis will have to take place to identify them. The idea is to present a top-level view that we can drill down through, keeping each level in the optimal range of seven objects, plus or minus two. You drill down through the diagram until you get to the level of detail you need. This is one of those ideas that may never get implemented for lack of time in the schedule; but as we move to a single integrated database, it is a goal that we should keep in mind.
For the first release of our single database, there is very little attempt being made to “integrate”. There is no data normalization between the applications, or within the applications if they are not currently normalized, much less within and across subject areas. Each existing application is being created within the single database as a stove-piped set of tables. New development will incorporate what it can from existing applications, and in a few areas we will be creating new structures to support an integrated model. These new structures will be prefixed with “tln_” instead of the application name. Okay, “tln_” is the application prefix for new development. Currently we are implementing three new entities: tln_contactinfo, tln_legalvalue, and tln_legalvaluetype. The tln_contactinfo entity consists of address, email, and phone numbers. A decision was made not to normalize this into address, email and phone numbers; but to keep it as one entity. This is a step in the right direction that can be further normalized in the future. (Read that as “probably won’t happen”.)
None of the database model types in Visio 2000 are totally appropriate. The Backman template is probably the best for my style if you are going to use Visio. The drawings that are currently available have been, or will be, reverse engineered from the physical tables in the SQL Server diagram component. The next step for these drawings is to put them into Embarcadero ERStudio and make them pretty and printable. This also may never happen, and is somewhat non-productive when trying to get to the web-publishing goal, more than likely in an Adobe PDF format.
Keep in mind if you develop diagrams in the SQL Server diagramming tool that changes you make to a drawing will physically affect the database structure. This is potentially dangerous. My personal recommendation is for you to make drawings in Visio and use the Backman template. Other solutions are acceptable. The intent is to define and document the structures and ultimately for these documents to be in a unified format.
Object names should not contain spaces or other special characters that require the use of brackets or quotes.
Object names should not contain underscores within the names, with the exception of an underscore between the prefix, which identifies the subject area, and the main object name. However, when naming indexes, it is appropriate to use underscores to separate table and column names for readability.
Prefixes should be 2 to 6 characters in length and should be meaningful for the related subject area or application. The prefixes sp_, xp_, and dt_ must be reserved for SQL Server system stored procedures and may not be used for a subject area prefix.
Do not use reserved words for object names.
Table names should be in the singular form and should clearly describe the contents of the table. COSObjective, COSStateObjective, and LessonPlan are examples of good table names.
The primary key for a table, when created through the use of the IDENTITY data type or a GUID, should be given the name of the table, minus the prefix, followed by the character value ‘ID’. For example, the primary key for the topten_Game table would be GameID.
Foreign keys that reference an IDENTITY or GUID type primary key should be given the same name as the primary key in the referenced table. For example, a foreign key reference to the topten_Game table would be named GameID. Where a table contains multiple foreign keys that reference the same table, an additional descriptor must be added to the element name. For example, elements may be named PrimaryGameID and SecondaryGameID.
Element names should be consistent throughout all tables and all databases. Other than as discussed above for primary and foreign keys, element names that occur within more than one table should not be prefixed with the name of the table. For example, the element name Comment might occur in several tables. It should be named Comment in each table rather than prefixing it with the table name.
Views should be prefixed with the character ‘v’, followed by an identifying prefix (if necessary), and an underscore. The remainder of the name should identify the nature of the data being retrieved by the view.
Format: v[pfx]_ViewName
Example: vcos_SelectProductsByObjective
Stored procedure names should contain an identifying prefix followed by an underscore. The prefix should never be sp_, xp_, or dt_ as these are reserved for system stored procedures. For security purposes, stored procedures that do not update data must include the word ‘Get’, ‘Sel’ or ‘Select’ at the beginning of the name.
Format: pfx_[Select]StoredProcedureName
Example: srch_SelectContentTypeById
DTS Package names should contain an identifying prefix followed by an underscore. The name should clearly identify the function of the package.
Format: pfx_PackageName
Example: tln_DailyFlashImport
User defined functions should contain an identifying prefix followed by an underscore. The name should clearly identify the purpose of the function.
Format: pfx_FunctionName
Example: tln_SetLastUpdated
Trigger names should be prefixed with the characters ‘tr’, followed by the characters ‘i’, ‘u’, and/or ‘d’ to indicate if the trigger is invoked for in Insert, Update, or Delete operation. The prefix is followed by an underscore and the name of the table.
Format: tr[IUD]_TableName
Example: tri_LessonPlan
tru_LessonPlan
Index names consist of the optional characters ‘u’ or ‘c’ to indicate unique or clustered, followed by ‘idx’, an underscore, and the name of the table, an underscore, and the name of the indexed columns, separated by underscores.
Format: [u][c]idx_TableName_ColumnName1[_ColumnName2]
Example: idx_Content_URL
cidx_Content_ContentTypeId
uidx_Content_SortedIndex
User defined data types should be given a descriptive name, followed by an underscore and the word type.
Example: SocialSecurityNumber_type
Description_type
Primary key names consist of the characters ‘PK’, the optional character ‘C’ to indicate clustered, an underscore, the table name, and the column name.
Format: PK[C]_TableName_ColumnName
Example: PK_LessonPlan_Id
PKC_Registry_UserName
Foreign key names consist of the characters ‘FK’, an underscore, the name of the table, and the name of the referencing column, an underscore, and name of the referenced table.
Format: FK_TableName_ColumnName_TableName
Example: FK_Content_ContentTypeId_ContentType
Default and check constraints should be created as separate objects and then bound to columns. Do not create a default or check constraint as part of the DDL to create the column. SQL Server will assign a system-generated name. Defaults should be prefixed with def_ and check constraints with chk_. The name should be either the name of the column being referenced, if the constraint is column specific or a meaningful name for more generic constraints.
Example: def_UserId
def_Zero
chk_ValidFormat
Variable names often represent data from the columns in a table. For consistency, they should be named with a ‘@’ followed by the name of the column. In cases where variables are created to contain data that is not in a table, they should be clearly named. @ErrorCode, @MessageID, etc. would represent well-named variables.
Column names must always be specified in selection queries. The use of ‘select *’ creates queries which may return significantly more data than is needed by the application. It can also result in application problems when columns are added to or removed from a table.
Performance must be considered when writing any database query or update statement. Query performance and optimization are critical to the overall success and usability of an application. While there are some general guidelines that can be used in optimizing the performance of a query, there are very few hard and fast rules. When in doubt, it is best to consult with a DBA.
When deciding to optimize a query, consider the frequency with which it will be run. The best candidates for optimization are queries that will run frequently within an application. If a query is seldom run, it may not be worth the effort to optimize unless its performance is truly unacceptable.
Consider the impact that optimizing a query will have on other queries. In some cases, changes made to the database schema to optimize one query will have an adverse affect on other queries.
Indexes are an effective way to optimize many queries but care must be taken in adding indexes. As a rule of thumb, once a table has more than five indexes, updates to that table may be adversely affected since the system must also update all of the indexes.
Do not use ‘select distinct’ unless absolutely necessary.
Use ‘union all’ instead of ‘union’ unless absolutely necessary.
Do not use cursors. There is always a way to avoid the use of cursors. Consider using ‘while’ constructs instead of a cursor. Cursors create significant overhead and concurrency issues within the system. (But remember, rules are made to be broken. If you have a valid excuse, please discuss it with the DBAs).
Consider the use of multiple select statements and temporary tables to simplify complex joins in stored procedures.
Avoid the use of ‘in’ and ‘not in.’ A join will return the same results, often with less overhead.
When using LIKE, do not use a wildcard character as the first character in the string. The statement LIKE ‘%dog%’ will prevent any indexes on the field from being used. This will require a full-table scan to satisfy the query.
Avoid the use of the NOT keyword. This keyword will negate the use of any indexes on the column and result in a full-table scan.
Consider all of the options. There are many techniques that can achieve the same results. Don’t get hung up on just one approach.
When attempting to optimize queries involving small (less than one thousand rows) tables, adding indexes to the small table may not change the performance of the query. A small table can be read into memory and scanned more quickly than the index can be read. The index will add overhead without providing any benefits.
Don’t guess. Use the Query Analyzer to display estimated and actual execution plans. Study these to determine where the biggest performance impacts are coming from. Use the Index Analysis feature to have SQL Server recommend indexes to solve the problem. Consider running a profile and index analysis of the entire application during a load test. This will allow the performance of the system to be balanced.
Remember to remove any indexes that you add during testing if they don’t solve your problem. Unused indexes on a table can also be a source of performance problems, especially with update statements.
Look at the statistics pages that SQL Server generates during the execution of the application. These are often clues that will lead to beneficial indexes.
Outer joins return all of the rows from at least one of the tables in the FROM clause, as long as those rows meet the search conditions in the WHERE or HAVING clauses. They are useful when a query needs to retrieve data for every row in a parent table even if there are no rows in the child table.
Inner joins eliminate all rows from one table that do not have a match in the other table. This is the normal join behavior.
A nested query is a query that is nested inside of another query. Nested queries can be used to return a single value for use with a comparison operator, a list of values to be used with the ‘in’ operator, or for existence tests with the ‘exists’ operator. When used with the ‘in’ or ‘exists’ operators, it is almost always possible to achieve better performance with a join. Nested queries are also referred to as sub-queries.
A correlated or repeating sub-query is a special form of nested query that depends upon the outer query for its values. The correlated sub-query is executed repeatedly, once for every value returned by the outer query.
Parameterized queries are implemented through the data-driven-query task, an advanced feature of DTS Designer. This feature has been optimized for situations in which:
Incremental updates need to be applied to a data mart
Maximum flexibility in using queries is needed. Data-driven queries achieve this goal by allowing parameters to be passed to the query at run time.
Ability to handle complexity of the query and need for flexibility outweighs the need for performance (the Bulk Insert task is optimized for performance).
Triggers are special kinds of stored procedures that execute automatically when the specified data modification statement is executed. They are frequently used to implement referential integrity when a field in a table in one database references a field in a table in another database.
Indexes are implemented for the purpose of performance tuning queries to obtain optimum data access speed within the database.
A primary key is a column or group of columns that uniquely identify a row in the database. The primary key must consist of data that will never change over the life of the data. Quite often, an integer field in implemented as the primary key using the IDENTITY data type. This creates a sequentially incremented value that ensures uniqueness.
A foreign key is a column or group of columns that are used to create and enforce a link between two tables in the database. This link ensures that data cannot be inserted in the child table unless a corresponding entry exists in the parent table and that data cannot be deleted from the parent table if any associated child rows exist.
Alternate keys represent a column or columns in a table that could have been chosen as a primary key but were not. The alternate key values must be unique within the table.
Within a stored procedure it is often necessary to create temporary tables in order to hold the interim results of a query. To ensure that multiple instances of the stored procedure can be executed at the same time, all temporary tables must be created in TEMPDB. This is ensured by naming all temporary tables with the # prefix. When creating a temporary table, ‘SELECT INTO’ should not be used. SQL Server locks system tables within TEMPDB until statement that creates the table has completed. This can cause serious concurrency issues within SQL Server. The preferred method for creating temporary tables is to execute the CREATE TABLE statement. As an alternative is to code the SELECT INTO statement with a WHERE clause that will ensure that no data is returned and that the statement will execute quickly. ‘WHERE 1 = 0’ works very well for this. Another alternative is to use a select of the form ‘SELECT TOP 0 … INTO.’
Temporary tables remain in the TEMPDB database until they are explicitly deleted or the connection that created them terminates. Since application connections can exist for an extended period of time, it is important that temporary tables be dropped at the end of the stored procedure that created them. This ensures that TEMPDB is not congested with old temporary tables that are no longer needed by the application.
For readability, comments should be written on their own line. Avoid imbedding comments within a statement or adding comments to the end of a statement. It is very easy to miss these comments when reading the code.
At a minimum, code should have a comment section at the beginning that explains what the code does. The format of this comment section is described elsewhere in this document.
Use comments to define or further describe the purpose of a statement if it is complex or may not be clear to the reader.
Use the /* … */ format for all comments.
Make sure that comments are kept up to date when modifying code. An old comment is worse than no comment at all.
It has just been brought to our
attention that there is a bug in SQL Query
Analyzer that causes it to not always recognize what you are commenting, and
therefore attempts to run commented code.
This has not been fixed in SQL Server 2000. The alternative format that we would prefer
not to use will have to be used. Precede
comment lines with two hyphens (i.e., “-- this is a comment line”).
Do not use GOTO statements, except for error handling.
Use SET NOCOUNT ON in all stored procedures. This eliminates the network traffic involved in sending record counts back to the application.
Return 0 to indicate that a stored procedure succeeded and –1 to indicate failure.
Do not use the return code to return data to the application.
Capitalize SQL Server key words, system variables, and function names – SELECT, FROM, @@ERROR, @@ROWCOUNT, GETDATE(), etc.
SQL Server system variables must be accessed in a specific way to get the desired results since their values change with every statement executed. In order to use the values returned by @@ERROR and @@ROWCOUNT, it is best to select the values from these two system variables into local variables before any additional processing is performed. Otherwise, the values in these system variables will be changed by statements that follow the one the values are needed for.
The comment section at the head of each stored procedure, trigger or view should be formatted as follows:
/*
NAME: xx_StoredProcedureName
AUTHOR(s): Author Name
CREATED: Date Written
PURPOSE: Describe what the code does
PARAMETERS:
List
of input and output parameters
RETURNS: Meaning of values returned in output parameters or return values
Updated By Update Date Description
------------------------- ---------------- ----------------------------------
Programmer Name 01/01/2001 Description of
changes
*/
/*
This is a multiple line comment. The begin and end comment
Tags should be on a separate line.
*/
Due to a bug in SQL Query Analyzer use the alternate method of commenting as follows:
--
-- This is a multiple line comment. Because of a bug in SQL Query
-- analyzer you can not use the /* comment */ method of commenting.
-- This makes it very hard to implement the header comments above
-- hopefully the template tool will take some of this headache away.
--
/* This is a single line comment */
-- This is a single line comment using the alternate method.
Indent lines of code in accordance with their hierarchy in the code. Indents are used to shift nested lines of code to the right for readability. Indent the inner statements of WHILE loops, BEGIN/END blocks and IF/ELSE statements. Also use indentation to line up multi-line DDL and DML statements. Each indentation should be 4 spaces in length. The use of tabs is discouraged because of alignment problems in different software programs.
Examples:
IF/ELSE statements:
IF @@ERROR <> 0
COMMIT
ELSE
ROLLBACK
WHILE loop:
WHILE @x <= 10
BEGIN
SQL statement
SQL statement
END
Multi-line DDL and DML:
CREATE TABLE xx_TableName (
ColumnName1 int,
ColumnName2 int)
SELECT ColumnName1,
ColumnName2
FROM xx_TableName
WHERE condition1 and Condition2
Application errors generally occur in two varieties: logic errors and system errors. With a logic error, the statement executes successfully but does not give the desired results. For example, a statement might attempt to delete a row from a table but that row is not found in the table. System errors are those errors that are generated by SQL Server when it is unable to execute the statement.
Every INSERT, UPDATE, and DELETE statement must be tested for error conditions after it is executed. These statements can fail without interrupting the stored procedure or calling program. It is possible for a stored procedure to run to completion and commit the work that it performed, even though some of the required updates failed. Errors must be checked for immediately after executing the statement. As each statement is executed, the system error code, @@ERROR, gets set to the error code for that statement. If this error code will be needed later in the stored procedure then the very first statement executed after the update statement must assign the value of @@ERROR to another variable.
With logic errors, it is the developer’s responsibility to decide what level of error checking is necessary. If the developer is expecting a statement to affect one row of data every time it is executed then checking the value of @@ROWCOUNT after the statement is executed might be desirable. One area that is especially important is selecting data into a variable. If the SELECT statement does not return any rows, the contents of the variable may be suspect. If it returns more than one row, there is no way to determine which row was used to populate the variable.
The RAISERROR statement should be used to report errors back to the calling program. Depending upon the severity of the error, RAISERROR can also write messages to the NT Event Log. RAISERROR can be called to report both logic errors and system errors back to the calling program.
Generally, the use of GOTO statements is prohibited in code. However, there is one area where they can be used to great advantage. By using the GOTO statement to branch to a specified label in a stored procedure, it is possible to place all error handling in one location in the stored procedure.
Example error handling code:
DECLARE @ErrorNumber int
BEGIN TRANSACTION
UPDATE TableName
SET ColumnName1 = 1
WHERE ColumnName2 = ‘x’
SELECT @ErrorNumber = @@ERROR
If @ErrorNumber <> 0
BEGIN
GOTO ErrorTrap
END
COMMIT
RETURN 0
ErrorTrap:
ROLLBACK
RAISERROR(‘Error updating record in
TableName’,16,1)
RETURN -1
Proper transaction handling is an important part of coding stored procedures, especially where the stored procedure executes more than one update statement. Transactions hold locks in the system until they are released, either through a ROLLBACK or a COMMIT. Once an error has occurred, there is no point in continuing to execute the stored procedure. Use the GOTO statement to jump to the error handling logic, ROLLBACK the transaction, and return to the calling program. This ensures that failed transactions hold locks in the database for a minimum amount of time. The above template provides an example of transaction processing.
The statements necessary to create the tables, keys, indexes, stored procedures, views, triggers and other objects in the database will be checked into VSS in a directory structure defined for the database systems. Each object – table, stored procedure, etc. – will be maintained in a separate script. This will enable the scripts to be assigned appropriate version Ids and labels for the project they are being modified for. Like any code, these scripts should be modified by first being checked out from VSS. Do not modify database objects using the Enterprise Manager tools, as it is possible that the changes will not get included in the scripts. The scripts should be given the name of the object with an extension of .SQL.
Each script should include a DROP statement to ensure that the object is dropped if it already exists. This should be followed by the appropriate CREATE statement to recreate the object. When necessary, the script must also include any GRANT statements to reassign permissions to the object. For tables, the script must also include all statements necessary to create the primary key, indexes, foreign keys, and any check or default constraints that need to be applied. For lookup tables, the script should either contain the insert statements necessary to populate the table or a text file containing the necessary data should be added to VSS.
In addition to the scripts that are necessary to create all of the database objects, a master script should be created. This master script is a batch command file that consists of calls to the OSQL and BCP utilities to execute the SQL scripts to create all of the objects in the database, populate any lookup tables, and add any necessary accounts to the database. This script should be built in such a way as to resolve any dependencies between objects. For example, a stored procedure that calls another stored procedure should be created only after the called stored procedure is created. Tables with foreign key references must be created after the referenced table is created. A master drop script should also be created to allow all objects in the database to be quickly dropped.
Scripts will also be created for any table structure modifications. The script should contain the statements necessary to add columns to a table, delete columns from a table, modify existing columns, and add or delete indexes and keys as necessary.
All scripts will be labeled and given a version number so that they can be matched to the project that the database changes are associated with.
All DTS Packages, scheduled jobs, maintenance plans, and related SQL Server tasks will be scripted out and stored in VSS. When it is necessary to modify a task, scripts will be created that contain the statements needed to drop the existing task, create the new task, and grant any permissions that are needed.
To be determined.
SQL Server allows two different authentication modes, Windows NT and SQL Server.
SQL Server security requires that accounts be created within SQL Server for authorized users. These users must then log into SQL Server with an account name and password that are different than their network account.
Implementing Windows NT security requires that users be on the same domain as the SQL Server they are connecting to or on a domain that is trusted by the domain that SQL Server is in. With Windows NT security, users can be placed in groups at the domain level and these groups can be granted access within SQL Server. In this way, the bulk of the security administration can be performed at the domain level. With SQL Server security, individual accounts must be created at each server. These accounts can then be added to a role and the roles can be granted access.
Good security practices dictate that passwords expire and be changed on a regular basis. Since SQL Server does not provide a mechanism to automatically expire passwords, a manual process will need to be developed to provide password management to individuals with access to the SQL Server databases. This process will expire the passwords every 60 days. Since this is a manual process, the specific mechanism for implementing this has not yet been defined.
In general, SQL Server accounts will be created for individuals and for applications. The consequences of changing passwords for these accounts are very different and need to be addressed separately.
For application accounts, the user ID and password in the connect string must be correct for the application to be able to access the database. Automatic expiration of application accounts would result in the failure of the applications until the connect strings were corrected. Changing application passwords will have to be coordinated so that the passwords and the connect string get changed at the same time and so that the change does not affect customers.
For individual accounts, the normal approach to expiring passwords is to notify the user that their password is about to expire and give them the opportunity to change it. Unfortunately, SQL Server does not have password expiration features built into it. Changing an individual account password would render the account unusable until the new password was delivered to the user. Some mechanism will need to be developed to allow users the ability to change their own passwords and to enforce that passwords get changed on a regular basis.
The Database Administration Group will periodically run password-checking software against the SQL Server databases to check for weak passwords. Any employees who are found to have weak passwords will be asked to change them. Since SQL Server does not provide a means of checking passwords, this process is dependant upon locating and implementing a commercially available product.
User maintenance is an ongoing part of any security plan. Employee accounts must be removed from the system when the employee leaves the company, new employees must be added to the system, and permissions must be changed as job duties and responsibilities change. The success or failure of any user maintenance plan hinges on good communication between all of the affected departments.
When an employee needs to have access granted to a SQL Server database or needs to have existing access levels changed, the employee should submit a request to their manager, detailing the level of security needed. If approved, the manager should forward the request to the Database Administration Group in Internet Engineering. The Manager of Internet Engineering will then review the request and the requesting manager and employee will be notified of the disposition of the request.
When an employee leaves the company the Human Resources Department will notify the Database Administration Group in Internet Engineering of that employee’s last day. The Database Administration Group with then remove that employee’s account from the SQL Server databases.
When employee accounts are removed from the VSS databases, it is the responsibility of the Database Administration Group in Internet Engineering to notify the VSS/PVCS Administrator so that employee’s account may also be removed from those systems.
SQL Server generates errors and messages in response to a number of different events. Many of these events are, or can be, written to the Windows Event Log. The SQL Server Agent reads the Windows Event Log and can be set to fire an alert when a specific condition is met. By default, SQL Server error messages with a priority of 19 or higher are automatically written to the Event Log. However, specific error messages in the sysmessages table can be modified to cause them to be written to the Event Log. In addition to the system error messages, applications can write to the Event Log using the RAISERROR WITH LOG command.
The specific events to be monitored through alerts are yet to be determined. However, this is a powerful tool for automatic notification of problems within SQL Server and should be implemented as soon as possible.
There are a variety of tools, both within SQL Server and Windows, and available from third parties, that can be used to monitor SQL Server. Two Quest products have been purchased for use in monitoring SQL Server. The first one is a 24-by-7 lights out monitoring tool. It tracks and maintains performance statistics and can send alerts when performance goes outside of defined parameters. It can also be used to monitor performance over time and identify possible bottlenecks as they develop. The Quest Spotlight product is a supplement to this product. When a problem is identified, Spotlight can be used to monitor SQL Server and gather additional information about the problem.
In addition, another third-party product, SiteScope, is being used to monitor CPU and disk space utilization and to send out alerts when these values exceed a set threshold. The Windows Performance Monitor can be used to monitor a number of Windows and SQL Server performance variables to identify possible trouble areas. Finally, the SQL Server Profiler can be used to track the exact SQL statements that are being executed on the server. With this tool, poor performing queries can be identified and addressed and applications can be debugged by monitoring how they interact with SQL Server.
As with any software package, SQL Server has periodic patches and revisions issued as problems are identified. These patches and revisions usually take the form of Hot Fixes and Service Packs. While it is Internet Operations responsibility to install these patches on the SQL Server systems, the Database Administration Group in Internet Engineering will first evaluate the patch and determine if it needs to be installed. In general, Hot Fixes do not need to be installed unless they address a known problem in the system. Service Packs, on the other hand, should be installed in order to keep the system current.
All projects that require the creation of new database tables or the modification of existing tables will require a design review. At the design review, the developer should present an entity relationship diagram, a proposed database schema, and the specifications for the project. The design review will look at how the entities and relationships defined in the ERD map to the requirements of the project. The proposed schema will be reviewed to ensure that it has been normalized to third normal form and that it is an accurate physical representation of the ERD. Table and element names will be reviewed for compliance to defined naming standards.
Once the design review has been completed, the developer will be responsible for communicating changes in the specifications or database design to the DBA team. If necessary, additional design reviews will be held to examine changes during the course of the project.
The design review should be scheduled as early as possible, before significant time has been spent coding the application or stored procedures. This will keep the impact of design changes to a minimum.
All database projects will require a code review. In this review, all views, stored procedures, triggers, and other SQL code will be evaluated to ensure that it meets coding standards and that it is readable and understandable. Code will also be reviewed for reliability and proper error handling. A developer who is qualified and able to answer technical questions about the code and design must present the code. The initial code review must be held before the project moves into QA and should be held as early as possible to allow time for any necessary rework. Once the project has moved into QA, additional code reviews will be held to as necessary to address changes to the code. Ideally, code reviews should be scheduled throughout the development phase. Instead of waiting until all the code is written, developers are encouraged to submit stored procedures to the DBA staff for review on a regular schedule. This will ensure that changes have a minimal impact on the overall project.
The team has developed a standardized data access layer (DAL) for use on all Internet Engineering projects. The specifics of the DAL can be found in the documents specific to that project. The DAL defines how the application will connect to SQL Server, how it will retrieve and update data within the database, and where the necessary connection string data will be stored. It is the responsibility of the developers to follow this standard on all projects.
Last Revised: April 2007