SQL DBA: June 2014

SQL Server Architecture

DATABASE TRANSACTIONS

The most well-known feature of a transaction is that it must complete all of the commands in their entirety or none of them. This features, called automicity, is just one of four properties defined in the early days of database theory as requirements for a database transaction, collectively known as ACID properties.

ACID Properties

1. atomicity

2. consistency

3. isolation

4. durability

Atomicity

All the effects of the transaction must complete successfully or the changes are rollback back.

An example of an atomic transaction is withdrawal from an ATM machine; the machine must both dispense the cash & debit your bank acc.

Consistency

The consistency requirement ensures that the transaction cannot break the integrity rules of the database; it must leave the database in a consistent state.

An example:

1. Stock levels cannot be a negative value

2. Spare part cannot exist without a parent object

3. Sex field must be Male / Female

In order to be consistent, a transaction must not break any of the constraints or rules defined for the data.

Isolation

Keeping the changes of incomplete transactions running at the same time separate from one another.

Changes it makes must not be readable by any other transaction, although SQL Server does allow you to control the degree of isolation in order to find a balance between business and performance requirements.

Durability

Once a transaction is committed, it must persist even if there is a system failure – that is, it must be durable.

The information needed to replay changes made in a transaction is written to the transaction log before the transaction is considered to be committed.

SQL SERVER TRANSACTION

Two types of transactions in SQL Server that are differentiated only the way they are created: implicit & explicit.

Implicit transactions are used automatically by SQL Server to guarantee the ACID properties of single commands.

For ex: an Update statement that modified 10 rows, SQL Server would run it as an implicit transaction so that the ACID properties would apply, and all 10 rows would be updated or none of them would.

Emplicit transactions are started by using the BEGIN TRANSACTION & are stopped by using the COMMIT TRANSACTION or ROLLBACK TRANSACTION commands.

Emplicit transactions are used to group together changes to which u want to apply the ACID properties as a whole, which also enables u to roll back the changes at any point if ur business logic determines that u should cancel the change.

THE LIFE CYCLE OF A QUERY (fig 1:1)

To introduce the high-level components of SQL Server’s architecture, this session uses the example of a query’s life cycle to put each component into context in order to foster ur understanding and create a foundation for the rest of the book.

The basic SELECT query life cycle

xxxxxxxxxxxxxxx

The Relational & Storage Engines

The sql server is split into two main engines:

1. Relational engine

2. Storage engine

The Relational engine is also sometimes called the query processor because its primary function is query optimization and execution.

It contains a command parser to check query syntax and prepare query trees, a Query Optimizer that is arguably the crown jewel of any database system, and a Query Executor responsible for execution.

The Storage Engine is responsible for managing all I/O to the data, and contains the Access Methods code.

Handles I/O requests for rows, indexes, pages, allocations and row versions, and a Buffer Manager, which deals with SQL Server’s main memory consumer, the buffer pool.

THE BUFFER POOL

The largest consumer of memory in SQL Server, the buffer pool contains all the different caches in SQL Server, including the plan cache and the data cache, which is covered as the sections follow the query through its life cycle.

A BASIC SELECT QUERY

Start at the client, where the first component u touch is the SQL Server Network Interface (SNI).

SQL Server NW interface (SNI)

SNI is a protocol layer that establishes the NW connection between the client and the server.

SNI is not connect directly, need to configure a NW protocol on the client and the server.

Shared memory: Simple and fast, shared memory is the default protocol used to connect from a client running on the same computer as SQL server.

TCP/IP: TCP/IP is the most commonly used access protocol for SQL Server. It enables u to connect to SQL Server by specifying an IP address and a port number.

Typically, this happens automatically when you specify an instance to connect to.

Default TCP port no: 1433

SQL Browser service will find the right port for a named instance using

UDP port: 1434

Named Pips: TCP / IP and Named Pipes are comparable protocols in the architecture in which they can be used.

A named pipe was developed for LANs but it can be inefficient across slower NWs such as WANs.
To use Named pipes u first need to enable it in SQL Server Configuration Manager (connecting remotely) and then create a SQL Server alias, which connects to the server using Named Pipes as the protocol.
Named Pipes uses TCP port 445, so ensure that the port is open on any firewalls between the two computers, including the WINDOWS firewall.

VIA: Virtual Interface Adapter is a protocol that enables high-performance communications between two systems. It requires specialized hardware at both ends and a dedicated connection.

Like Named Pipes, to us the VIA protocol u first need to enable it in SQL Server Configuration Manager and then create a SQL Server alias that connects to the server using VIA as the protocol.

TDS( Tabular Data Stream) Endpoints

TDS is a MS-proprietary protocol originally designed by Sybase that is used to interact with a database server. Once a connection has been made using a NW protocol such as TCP/IP, a link is established to the relevant TDS endpoint that then acts as the communication point between the client and the server.

There is one TDS endpoint for each network protocol and an additional one reserved for us by the dedicated administrator connection (DAC).

Once connectivity is established, TDS messages are used to communicate between the client and the server.

The SELECT statement is sent to the SQL Server as a TDS message across a TCP/IP connection (TCP/IP is the default protocol).

Protocol Layer

When the protocol layer in SQL Server receives ur TDS packet, it has to reverse the work of the SNI at the client and unwrap the packet to find out what request it contains.
The protocol layer is responsible for packaging up results and status messages to send back to the client as TDS messages.
Our SELECT statement is marked in the TDS packet as a message of type “SQL Command,” so it’s passed on to the next component, the Query Parser, to begin the path toward execution.

Fig 1:2 Shows where our query has gone so far. At the client, the statement was wrapped in a TDS packet by the SQL Server Network Interface and sent to the protocol layer on the SQL Server where it was unwrapped, identified as a SQL command, and the code sent to the Command Parser by the SNI.

xxxxxxxxxxxxxxxxxxxxxxx

Command Parser

The command parser’s role is to handle T-SQL language event.

It first checks the syntax and returns any errors back to the protocol layer to send to the client.

If the syntax is valid, then the next step is to generate a query plan or find an execution plan.

The query plan contains the details about how SQL Server is going to execute a piece of code.

It is commonly referred to as an execution plan.

To check for a query plan, the Command Parser generates a hash of the T-SQL and checks it against the plan cache to determine whether a suitable plan already exists.

The plan cache is an area in the buffer pool used to cache query plans.

If it finds a match, then the plan is read from cache and passed on to the Query Executor for execution.

Plan cache

Creating execution plans can be time consuming and resource intensive, so it makes sense that if SQL server already found a good way to execute a piece of code that it should try to reuse it for subsequent requests.

The plan cache, part of SQL Server’s buffer pool, is used to store execution plans in case they are needed later.

If no cached plan is found, then the command parser generates a query tree based on the T-SQL.

Our basic query didn’t have an existing plan so a query tree was created and passed to the Query Optimizer.

xxxxxxxxxxx

Optimizer

The Optimizer is the one of the most prized possession of the SQL Server team and one of the most complex and secretive parts of the product.

“cost-based” optimizer: It evaluates multiple ways to execute a query and the picks the method that it deems will have the lowest cost to execute.

Fig 1-4 shows the addition of the query plan as the output from the optimizer being passed to the query executor.

Also introduced is the Storage Engine, which is interfaced by the Query Executor via OLE DB to the Access Methods (coming up next)

xxxxxxxxxxxxxxxxxxxxxxxx

Access Methods

It is a collection of code that provides the storage structure for your data and indexes as well as the interface through which data is retrieved and modified.

Our SELECT statement needs to read just a few rows that are all on a single page. The access methods code will ask the Buffer Manager to retrieve the page so that it can prepare an OLE DB rowset to pass back to the Relational Engine.

Buffer Manager

Manages buffer pool; the majority of SQL Server’s memory usage.

SQL Server needs to maintain a minimum level of free pages in memory.

Data Cache

Largest part of the buffer pool; it’s the largest memory consumer within SQL Server.

Every data page that is read from disk is written to before being used.

The sys.dm_os_buffer_descriptors DMV contains one row for every data page currently held in cache.

SELECT COUNT(*)*8/1024 as 'Cached Size (MB)', CASE database_id

WHEN 32767 THEN 'ResourceDB'

ELSE DB_NAME(DATABASE_ID)

END AS 'dbtabase'

FROM sys.dm_os_buffer_descriptors

GROUP BY db_name(database_id), database_id

ORDER BY 'Cached Size (MB)' DESC

OUTPUT:

Cached size (mb) Database

3287 People

34 tempdb

12 ResourceDB

4 msdb

The amount of time that pages stay in cache is determined by a least recently used (LRU) policy.

Can view how long SQL Server expects to be able to keep a page in cache by looking at the MSSQL$<instance>: Buffer Manager\Page Life Expectancy counter in Performance Monitor.

A Basic select Statement Life Cycle Summary (Fig1-5)

xxxxxxxxxxxxxxxxx

Buffer Manager

The key point here is that the UPDATE statement has changed the data in the data cache, not in the actual db file on disk.

This is done for performance reasons, and the page is now what’s called a dirty page bcos it’s different in memory than it is on disk.

The page that needs to be modified is already in cache, so all the buffer manager needs to do is modify the page as requested by the Access Methods.

Recovery

Modifications are written to the transaction log first and are then actioned in memory only.

This is done for performance reasons and bcos u can recover the changes from the transaction log should you need to.

Dirty pages

Once the page has been modified in memory it is marked as a dirty page.

Clean pages can be flushed from cache using dbcc dropcleanbuffers, which can be handy when ur troubleshooting development and test environments bcos it forces subsequent reads to be fulfilled from disk, rather than cache, but doesn’t touch any dirty pages.

Sys.dm_os_buffer_descriptions DMV, display dirty pages exist in each database:

SELECT DB_NAME(DATABASE_ID) AS 'DBNAME', COUNT(PAGE_ID) AS 'DIRTY PAGES'

FROM SYS.DM_OS_BUFFER_DESCRIPTORS

WHERE IS_MODIFIED =1

GROUP BY DB_NAME(DATABASE_ID)

ORDER BY COUNT(PAGE_ID) DESC

Dirty pages will be written back periodically whenever the free buffer list is low or a checkpoint occurs.

Lazy writer

The Lazywriter is a thread that periodically checks the size of the free buffer list.

When it’s low, it scans the whole data cache to age-out any pages that haven’t been used for a while.

If it finds any dirty pages that haven’t been used for a while, they are flushed to disk before being marked as free in memory.

Checkpoint Process

A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk.

By default, on a busy server, SQL Server will issue a checkpoint roughly every minute, which is market in the transaction log.

ALL ABOUT TRACE FLAGS

Trace flags provide a way to change the behavior of SQL Server temporarily and are generally used to help with troubleshooting or for enabling and disabling certain features for testing.

Recovery Interval

By default the recovery interval is set to 0, which allows SQL Server to choose an appropriate interval, which usually equates to roughly one minute between automatic checkpoints.

Recovery Models

SQL Server has three database recovery models:

1. FULL,

2. BULK-LOGGED

3. SIMPLE

Full

All of their operations fully logged in the transaction log and must have a backup strategy that includes full backups and transaction log backups.

SQL server databases that require the highest level of recoverability should use the FULL Recovery Model.

Bulk-logged

Temporarily to improve the performance of certain bulk operations by minimally-logging them.

Redo information is not logged which means that v also lose point-in-time recovery.

All other operations are fully-logged just like the full recovery model.

Simple

When the simple recovery model is set on a database, all committed transactions are truncated from the transaction log every time a checkpoint occurs.

xxxxxxxxxxxxxxxxxxxxxxxxx

Another benefit to having everything in one place is that v can now get better visibility of what’s happening at that level than was possible prior to SQLOS.

V can access all this information through dynamic management views (DMVs).

Any DMV that starts with sys.dm_os_provides an insight into the workings of SQLOS.

Note: the SQLOS doesn’t replace Windows. Ultimately, everything ends up using the documented Windows system services; SQL Server just uses them in such a way as to optimize for its own specific scenarios.

SQLOS is not a way to port the SQL server architecture to other platforms like Linux or MacOS so it’s not an OS abstraction layer.

It doesn’t wrap all the OS APIs like other frameworks such as .NET, which is why it’s referred to as a “thin” user-mode layer. Only the things that SQL Server really needs have been put into SQLOS.

SUMMARY

o The Query Optimizer’s job is to find a good plan in a reasonable amount of time; not the BEST plan.

o Anything u want to read or update will need to be read into memory first.

o Any updates to data will be written to the transaction log on disk before updated in memory so transaction log performance is critical; the update isn’t written directly to the data file

o A database page that is changed in memory but not on disk is known as a dirty page.

o Dirty pages are flushed to disk by the checkpoint process and the lazywriter.

o Checkpoints occur automatically, roughly every minute and provide the starting point for recovery.

o The lazywriter keeps space available in cache by flushing dirty pages to disk and keeping only recently used pages in cache.

o When a database is using the Full recovery model, full backups will not truncate the transaction log. You must configure transaction log backups.

The SQLOS is a framework used by components in SQL Server for scheduling, I/O, and memory management.

ShortCut	Details
Ctrl + D	Results in Grid
Ctrl + T	Results in Text
Ctrl + U	Change database
Ctrl + E or F5	Execute Query
F8	Show/Hide Object browser
Ctrl + Q	New Query Window
Ctrl + N	New Query with New Connection
Ctrl + O	Open a .SQL Script file
Shift + Alt + Enter	Full Screen
Ctrl + F5	Parse the query
Ctrl + R	Show/Hide Results Pane
F6	Switch between query and results panes
Alt + F1	Information about all the objects in the current database

CHAP 2: UNDERSTANDING MEMORY

o Understanding physical memory and how virtual memory addressing is used.

o Optimizing 32 bit systems with large amount of memory

o Optimizing the memory configuration on 64 bit systems

o AWE and its uses on 32-bit and 64-bit environments

o Explaining MemToLeave

o SQL Server’s memory clerks, caches, and pools

o Looking at SQL Server’s plan cache

o Determining a setting for Max Server memory

o An in-depth look at Query / Workspace memory

Memory, disk and CPU are the holy trinity of resources in a computer system, and memory is first because it’s the are that ur most likely to have an issue with.

Memory issues can cause both disk and CPU saturation, so when troubleshooting a server issues (or at least a performance issue) u need to start by looking at the memory profile of the system.

Physical Memory

Most commonly referred to as RAM.

RAM= primary storage / main memory / system memory.

Bcos it’s directly addressable by the CPU.

The peak transfer rate for a mainstream RAM module would be about 5GB/sec.

xxxxxxxxxxxxxxxxxxxx

CHAP 8 : DEFINING YOUR APPROACH TO TROUBLESHOOTING

o How to approach troubleshooting

o Defining the problem ur working on

o Knowing your SLA and baselines

o Defining the exit criteria

o Choosing how to gather data

o Your options for analyzing a data collection

o Troubleshooting components like failover Clustering, Replication, and Analysis services

Tips for Identifying a Problem

o Take screenshots or cut and paste pop-up screens indicating errors.

o Obtain copies of all relevant logs such as the Event Viewer logs, the SQL Server Error Log, dump files, and application-specific logs. Sometimes different logs will have different levels of detail or even different information captured.

o Understand the steps necessary to reproduce the problem.

o When you gather your information, make sure you understand the context. Known whether the logs are from a client workstation, a middle tier server, or a server running sql server.

If it is an existing system, look back at your change control documents to see what has changed recently on the system.

Any change, even if seemingly unrelated, should be reviewed.

Such as windows or sql server patch, a new policy or removed permission, a configuration option, or an application or database schema change, needs to be reviewed.

Bite-Size Chunks

Understand the overall problem u can break it down into manageable pieces and isolate the various pieces. The following is a list of troubleshooting by category.

Connectivity Issues

Check if the problem only occurs with once protocol such as Named Pipes or TCP/IP.

Can you connect using the dedicated administrator connection (DAC)?

Performance Issues

For a performance problem u need to determine if the issue is on the client, the middle tier, the server on which SQL Server runs, or the network.

Another application on the server is using all the available CPU resources and leaving nothing for SQL Server to use. This can make SQL server seem unresponsive or slow.

Memory or Compilation Issues

If you are able to identify one user query that is slow, the most common causes are insufficient resources (network, memory, I/O, CPU), blocking, out-of-date statistics, or inefficient indexes.

Sometimes the problem is not with the query execution but with the compilation. Often, analyzing the plan cache can lead to resolving the problem.

A basic data collection will capture the following counters:

LogicalDisk and PhysicalDisk:

Avg. Disk Bytes / {Read, Write, or Transfer}
Disk Bytes/sec
Disk Transfers/sec
Avg. Disk sec/Read
Avg. Disk sec/Write

Memory

Available Mbytes
Free system page Table Entries

SQL Server Buffer Node:

Page life Expectancy
Target Pages
Total Pages

SQL Server Databases:

Active Transactions
Log File(s) Size (KB)
Log File(s) Used Size (KB)
Percent Log Used
Transactions / sec

SQL Server General Statistics:

Processes Blocked
User Connections

SQL Server Latches:

Average Latch Wait Time (ms)
Latch Waits /sec

SQL Server Locks:

Average wait Time (ms)
Lock Requests/sec
Lock Timeouts/sec
Lock Waits/sec
Number of Deadlocks/sec

SQL Server Memory Manager:

Optimizer Memory (KB)
SQL Cache Memory (KB)
Target Server Memory (KB)
Total Server Memory (KB)

SQL SERVER SQL Statistics:

All counters

SQL SERVER Transactions:

Free Space in tempdb (KB)
Lognest Transaction Running Time
The other counters if you are using any type of snapshots, including triggers

SQL Server Wait Statistics:

All counters

Process (Choose at least your SQL Server instance(s) as well as any other applications/process that might compete with SQL Server.):

% Processor Time
Private Bytes
Virtual Bytes
Working Set

Processor (“<All instances>” and Total):

% Processor Time

Storage vendor I/O counters and PerfMon I/O counters rarely match and you have to understand the purpose and scope of each.

A SQLTrace may have different levels of detail. The more events you choose, the bigger the trace file and the greater the overhead of capturing the data. A big trace file gives you more data to analyze but may become too large to realistically manage even with rollover files.

Some events, such as locks and even some of the performance events, are much more intensive than others.

Some of the events you will commonly see in a baseline are:

o Some or all of the “completed” events from the RPC and T-SQL categories

o Most or all of the events from the “Errors and Warnings” category

o All of the “Database” category events (grow, shrink, etc.)

o The “Locks” category Deadlock Graph event and possibly the escalation and timeout events but generally NOT any of the other lock related events. There are other ways to troubleshoot blocking that are much less intensive.

o The “Progress Report” category events

o Some, all, or none of the Performance events. These events can take a lot of space so consider the trade-offs carefully.

Events and Alerts

Windows has built in event logs, SQL Server has an Error log, and your application may have a long as well.

For example:

I/O errors from SQL Server are often accompanied by hardware errors in the windows event logs.
The cause of a cluster failover might be apparent from looking in the cluster log or the Windows event logs.
A series of monitoring tool alerts warning that a disk is approaching capacity might indicate whether an out of space error is due to a sudden problem or steady growth over time.

Turn off any monitoring; make sure you turn it back on once the problem is resolved.

TOOLS AND UTILITES

The following is a list of some information-gathering tools nd utilities that are available.

SQLDiag

It can be used to gather basic environmental information such as the SQL Error Logs, event logs, and SQL Server configuration settings with the /X option. It is captured in SQLTrace and PerfMon information.

PSSDiag

This is a wrapper around SQLDiag with some extensions that Microsoft Support configure and send to customers to help troubleshoot a specific issue.

Windows Event logs (Application, System, and Security)

These logs are often useful to show you what errors, warnings, and informational messages have occurred in the recent past.

Custom Application Logs

Application is instrumented to write errors or diagnostic information.

User Dumps

See an exception in the SQL Error Logs you should also see a mini-dump file with the extension .mdmp in the same directory.

PerfMon

Performance Monitor ships with Windows and can be used to gather information on the resources used on the server.

It can track system information such as CPU and memory usage, I/O statistics, and network activity.

NetMon

Network Monitor is a free network packet analyzer tool that is used to look at data as it is sent over the network.

Management Data Warehouse

This is used for performance trending and can be used to see how performance has changed over time.

TROUBLESHOOTING OTHER COMPONENTS

Failover Clustering

When you collect a full user dump from SQL Server it will appear that SQL Server is unresponsive. If you need to gather a user dump from a clustered instance, you need to increase the time that the cluster service waits before it decides to do an automatic failover.
Any files u expect to be available, regardless of which node owns the clustered instance.
To start the instance in single_user mode, u need to start it outside the cluster so that the cluster service does not take the only available connection.
Take the SQL Server, SQL Agent, and Full-Text resources offline.
Don’t take the SQL Server name, IP, or disks offilen.
After this u can start SQL Server from the command line using –m parameter and whatever other parameters are necessary for your troubleshooting steps.
Finally stop the command-line version of SQL Server and then bring SQL Server, SQL Agent, and Full Text online.

The SQL Server service depends on an IP address, a name (DNS entry), and at lest one disk.

If any of these dependent resources are offline, SQL Server cannot be brought online (started).

Do the other applications (s.a other instances of SQL Server) already have resources in use that this instance cannot use?

Are the permissions to directories, files, and Registry keys the same on all nodes?

Some queries u may want to use when troubleshooting a clustered instance:

o Check to see if the instance is clustered:

SELECT SERVERPROPERTY('ISCLUSTERED')

o Use the below DMV to list the drives that can be used by this clustered instance of SQL Server for database files, transaction logs, and SQL Server backups.

SELECT * FROM SYS.DM_IO_CLUSTER_SHARED_DRIVES

o Get a list of possible owenrs of this SQL Server instnace

SELECT * FROM SYS.DM_OS_CLUSTER_NODES

Replication

In many respects u can treat replication as a third party product that happens to use SQL server as a backend.

Replication activity occurs via “agents” that are SQL Agent jobs that call replication executables. Each agent makes calls to stored procedures that result in data movement between replicated databases so u can trace and troubleshoot those stored procedure calls just like u would any other application.

You can use SQLTrace or Profiler to find out what procedure calls replication is making at any given time. In most cases u have at least some network traffic to deal with as most production systems involve at least 2-3 different servers.

U have a performance problem, u can look at SLAs, baselines, and nw traces just like with any other application.

If any of the replication databases are on a clustered instance, replication works the same way as it does on a standalone instance; however, just as with any other clustered application, the files that replication references must be on a clustered shared drive. For replication those files are specified in snapshot properties for each type of replication.

For transactional replication latency problems u can use tracer tokens.

A tracker token is basically just a simple bit of data sent through the system. It is used to measure the elapsed time from the publisher to the distributer and from the distributor to the subscriber(s).

Break down the problem into where it occurs and which agent is running when the problem occurs.

The executable called by the replication agent (a replication job in SQL Server Agent) initiates the connection to SQL Server and passes it data and parameters.

The parameters can be specified directly in the agent job or in an agent profile. The parameters passed into each replication agent can greatly impact performance.

Most replication performance can be handled with normal SQL Server performance troubleshooting.

Replication is in many ways just another application that runs on top of SQL server.

It is important to understand how and when each agent runs; once u do so, it becomes easier to narrow down the problem and concentrate on the correct instance of SQL Server.

TO help you understand the role of each agent, here are some descriptions:

SNAPSHOT AGENT

Reads data on the publisher and writes the output to a file. It is used in all types of replication.

Sometimes it can cause blocking or performance problems on the publisher.

The job that initiates the snapshot has to be able to read the data in the database and write to the designated snapshot folder.

If the snapshot folder is not local to the subscriber, you have to consider where nw bandwith impacts any performance issues you may see.

LOG READER AGENT

It is used only for transactional replication but this includes peer-to-peer replication.

It reads the transaction log on the publishing database and sends the data to the distribution database.

This adds some load to the publisher and in particular adds load to the directory where the log file of the publishing database resides.

When the log reader is running it changes the log from a purely sequential operation to a semi-random operation.

On the system that is already stressed, the additional movement of the disk read heads may cause some increased response times for that drive.

The log reader is writing to the distribution database which may be on a different server (a remote distributor) from the publisher so once again you may need to consider network bandwidth in performance problems.

DISTRIBUTION AGENT

It is used in transactional (including peer-to-peer) and snapshot replication. It moves data from the distribution database to the subscriber(s).

Initially this data movement is through the application of a snapshot. If the article is configured so that the object is recreated when a snapshot is applied, applying the snapshot through the Distribution Agent will read the snapshot folder, find the CREATE statements stored there is various files when the Snapshot Agent ran.

If the article is configured to replace the data on the subscribers (this is almost always true) then the Distribution Agent will Bulk Copy the data from the snapshot folder into the subscriber tables.

The majority of the time the Distribution Agent is not applying a snapshot, it is applying changes captured by the log reader. Network bandwidth can be an issue with this agent as well.

Merge Agent

It is used only in merge replication; it moves data between the publisher and subscribers. Merge replication is not as straight fwd as the other types of replication.

The distribution database plays a much smaller role as data is moved directly between the publisher and the various subscribers.

Triggers on the tables involved in merge replication populate these metadata tables on the local database.

Network bandwidth can sometimes contribute to performance issues.

Cleanup Agents

It remove metadata that is no longer needed by replication. The cleanup agents may modify large amounts of data at once which can cause a surge in log usage and possibly blocking.

These agents are running on the local instance so network bandwidth is not an issue.

Replication Monitor

SSMS has a Replication Monitor that is used to monitor the health of the replication topology and indicate the status of each replication job.

This is a very useful tool that can provide immediate feedback about the health of our replication topology.

You can use SQLTrace/Profiler to see what commands are issued by Replication Monitor and mode those commands to create ur own custome monitoring or reports if necessary.

Analysis Services

Apply to both SQL Server and Analysis Services troubleshooting. For performance troubleshooting, u need baselines to compare to, for test systems u need data that is as close to production as possible, and Profiler and PerfMon are two of your main troubleshooting tools.

XMLA Queries

It is the language used to submit administrative queries to Analysis Services. Processing, for example is a collection of XMLA commands.

The key Profiler events to capture for an XMLA query problem are:

o Command event : ALL

o Errors and Warnings: ALL

o Progress Reports: ALL

CHAP 9:

Viewing Server Performance with PerfMon and the PAL Tool

WHAT’S IN THIS CHAPTER

When and how to use Pefmonance monitor
How to baseline server performance
Using sample thresholds to identify performance problems
Automating data analysis with Performance Analysis for Logs (PAL)

PerfMon:

It is a Windows tool for measuring real-time server performance and recording system performance to a file over time.

Performance monitor can assist with resolving complex problems, some of which may seem random or intermittent, by narrowing the focus to the root cause.

This chapter addresses these questions in three main ways:

It provides key counters and thresholds for issues
It helps u gather a baseline form a healthy server
It demonstrates tools to assist with analyzing performance logs.

PERFORMANCE MONITOR OVERVIEW

PerfMon provides server-wide, real-time and logged performance monitoring. In windows server 2008, PerfMon has been incorporated into the Reliability and Performance Monitor.

Use Performance Monitor for the following common tasks:

View real-time performance data on your server.
See performance data represented visually
Record performance data over an extended time frame
Quantify the performance impact of HW or SW changes
Save and export performance data
Fire alerts based on performance thresholds
Compare performance data from different servers

Reliability and Performance Monitor in Windows Server 2008

Reliability and Performance Monitor comprises three main components:

1. Monitoring Tools

2. Data Collector Sets

3. Reports

Data Collector Sets

It combine all information necessary for common problem diagnostics, including event tracing, performance counters, and configuration.

Three predefined system data collector sets are included: LAN Diagnostics, System Diagnostics, and System Performance.

Reliability Monitor

It provides a system stability chart. Here, events s.a. H/W failure, Application failures, or Windows failures are tracked against a timeline. The chart displayed by Reliability Monitor plots the system Stability Index – a rating system for reliability whereby 10 reflects a stable server and 1 reflects an unstable server.

NEW COUNTERS FOR SQL SERVER 2008 IN PERFMON

SQL Server 2008 has some new PerfMon counters to provide performance logging for new features s.a Resource Governor, and there are new counter instances for improved features, s.a for monitoring compressions in database mirroring.

There are two new counters for Resource Governor called i. Resource Pool State & ii. Workload Group Stats, each containing more than ten instances relating to the new Resource Governor functionality.

New PerfMon Counters for SQL Server 2008

xxxxxxxxxx

New Counters for DB Mirroring

xxxxxxxxxxxxxxxxxxx

TROUBLESHOOTING SQL SERVER PROBLEMS

When accepting a problem to investigate, it’s useful to know the following information:

The problem statement, in order to understand the user experience and expected application behavior.

When troubleshooting steps have been carried out already (by the users and first-or second-line support engineers)

What data has already been gathered about the problem

REPRODUCING PROBLEMS

If possible, take the time to reproduce the problem. When you reproduce a problem, you’re half the way to fixing it, as recreating the exact scenario in which a problem condition occurred required a thorough understanding of the problem itself.

PerfMon is great tool to start troubleshooting most kinds of problems. It can be used to do either of the following:

Prove a problem condition
Narrow the problem scope

If practical, establish and prove the problem condition. PerfMon can then be used to eliminate areas of doubt or identify specific areas where further investigation is required.

PerfMon is vital in situations where SQL Server is co-resident with another application or a Windows feature such as ISS.

DATA-DRIVEN TROUBLESHOOTING

It refers to an iterative approach to troubleshooting problems whereby each decision or recommendation is based on data gathered about the problem.

Data-driven troubleshooting relies on three components:

Tools to collect data
Engineers to read data and understand the problem
Any one of a number of people to assist with problem resolution (server h/w engineers, storage engineers, or developers)

Note:

PerfMon will help u identify the problem area. For ex, if you note that CPU is higher than usual, or disk reads are slow, you can focus on high CPU queries or queries with expensive disk access as necessary.

Here’s a ( non-exhaustive) list of tools available for troubleshooting SQL Server problems:

1. SQL Server error log

2. Windows event log

3. SQL Profiler

4. PerfMon

5. Stored procedure

6. DMV

7. DBCC

8. Replication Monitor

9. Wait stats

10. Performance Data Warehouse

11. SQL Nexus

12. Database Tuning Advisor

13. PSSBlocker script

14. PerfStats Script

15. SQL Server Management Studio Reports

16. Trace flags

17. Execution Plans

18. Stack dumps

PerfMon provides a server-level view of hardware utilization. It is Windows monitoring tool; as such it can be used to monitor any application.

MONITORING READ-TIME SERVER ACTIVITY

Within the Reliability and Performance Monitor, the following steps provide an overview of server activity:

1. Select Performance Monitor from the Monitor Tools folder.

2. Right-click anywhere in the char area and choose Add Counters.

3. Select “MEMORY”, and click to expand

4. Select “Available Mbytes” and click add.

5. Scroll to local the “LogicalDisk” counter and select “Avg. Disk sec/Transfer”. If you have multiple disks, then select each disk individually from the instance list and click OK.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

A couple of pointers to help out:

1. Organize the columns so u can read each row more easily.

2. Press Ctrl+H to enable highlighting for the selected counter – this means the line graph is highlighted.

3. Select all counters, right-click, and choose Scale Selected Counters to ensure that they are all displayed within the screen.

Mostly u’ll want to monitor a server, and to monitor a remote server by typing the server name from the Add Counters dialog.

STARTING OUT WITH DATA COLLECTOR SETS

Data collector sets are groups of data gathering tools, and can include kernel tracing, performance logs, and configuration data.

To utilize, a predefined data collector, expand Data Collector Sets > System, right-click system Performance, and choose Start.

The System Performance collector will run for 60 seconds. Once collection has finished, navigate to Reports > System > System Performance, and choose the most recent report.

xxxxxxxxxxxxxxxxxxxxxxx

Brief summary of the three types of data that can be captured in the data log:

LOG TYPE	DESCRIPTION
Performance Counter	PerfMon counters provide performance data for most aspects of Windows and SQL Server
Event trace data	Using Event Tracing for Windows provides low-level operating system tracing
System configuration information	This captures Registry keys

Create new data collector set a useful name and create it manually.

Click Add, and select all counters as below

xxxxxxxxxxxxxxxxxx

WHAT TO BE AWARE OF WHEN RUNNING PerfMon

When monitoring, consider performance implications with a view to reducing overhead and minimizing risk of the following:

o Making problems worse

o Affecting monitoring results

THE IMPACT OF RUNNING PERFMON

PerfMon is a lightweight tool and its impact on any given server is partly related to how PerfMon is configured and partly dependent on the workload of that server while PerfMon is running.

One of the challenges with many performance problems is that we must obtain a Perfmon log to identify the cause of the problem. Without a log, engineers and managers can observer poor application performance and hypothesize about potential causes and remedies; but in order to diagnose the problem and take remedial action, we must obtain performance data.

SERVER WITH VERY POOR PERFORMANCE

Run PerfMon remotely
Reduce the sampling interval
Include as few counters as possible
Log to disk

COMMON PerfMon PROBLEMS

In some cases you may have problems with PerfMon itself, specifically counters could be missing or aren’t displayed correctly, or there could be problems connecting to servers remotely.

If you are running SQL Server in a Windows-on-windows (WOW) mode, i.e., 86 SQL Server on x64 Windows, u’ll be unable to run PerfMon remotely from other x64 machines bcos the remote Registry service is an x64 process; therefore, counters will only be visible to x86 processes.

REMOTE MONITORING FAILURES

If you’re trying to monitor remote servers and this is failing, the most likely cause is permissions problems. Some common causes:

1. Ensure that the account is the local administrator on the target server

2. Confirm NetBIOS access to the target server

3. Ensure that the remove Registry service is running on the target server

4. Ensure that no local security policy or Active Directory group policy is restricting access.

CPU PERFORMANCE COUNTERS

SQL Servers suffering from performance problems caused by high CPU usage is one of the most frequently occurring performance problems.

It can be easy to identify sqlservr.exe using Task Manager; the following table describes counters that will provide further evidence to assist troubleshooting.

COUNTER		WHAT TO LOOK FOR	PROBLEM CONDITION
Processor	%ProcessTime	Percent of total time the CPUs are busy servicing productive request	> 80%
Processor	% Privilege Time	Percent of total CPU time spent servicing kernel mode request	> 30%
Process	% Processor Time (sqlservr)	Percent of total time SQL Server spent running on CPU (user mode + Privilege mode)	> 80%
Process	% Privilege Time (sqlservr)	Percent of time SQL Server was executing in Privilege mode	> 30% of % Processor Time (sqlserver)

COMMON CAUSES OF CPU PROBLEMS

The three common causes of high CPU condition are:

Missing or Outdated Statistics:

The Query Optimizer is dependent on relevant statistics to determine a good execution plan – missing or outdated statistics could cause it to select a sub-optimal plan, causing excessive CPU consumption.

Missing Indexes:

A lack of useful indexes can result in a high-CPU condition. SQL Server is dependent on meaningful indexes to retrieve data efficiently, and missing indexes can often cause execution CPU utilization.

A lack of useful indexes can result in expensive operations such as hash joins and sorts that could otherwise be avoided with improved indexes.

Excessive Recompilations:

Poor plan reuse can cause a high-CPU condition whereby SQL Server consumes excessive CPU cycles while generating query plans. Recompilations can be caused by ad-hoc or dynamic queries or by a lack of memory (procedure cache), causing plans to be dropped from cache.

INVESTIGATING MEMORY-RELATED PROBLEMS

It is closely related to the availability and performance of sufficient memory.

SP_CONFIGURE:

Min/max server memory
AWE Enabled
Min memory per query

Windows:

/3GB,/USERVA,/PAE
Lock Pages in Memory privilege

Windows Task Manager doesn’t provide a representative measure of the memory consumed by SQL Server because Task Manager only reflects the buffer pool, and SQL Server makes memory allocations outside this space for objects such as extended stored procedures, linked servers, and the CLR.

Additionally, when SQL Server is using AWE on 32-bit servers, Task Manager will not reflect the correct memory allocation.

TYPES OF MEMORY PRESSURE

SQL Server can suffer from internal or external memory pressure, and understanding how to identify and troubleshoot each will enable more targeted troubleshooting.

External memory pressure occurs most often when SQL Server is running on a shared computer and several processes are competing for memory.

The Resource Monitor within the SQL Server Operating System (SQLOS) will receive a signal from Windows requesting SQL Server to reduce its committed memory. This will cause SQL Server to recalculate its target commit level, reducing it if required.

Internal memory pressure occurs when internal SQL Server resources compete with each other for memory.

This is typically caused by SQL Server shrinking the buffer pool.

Use the DBCC MEMORYSTATUS command to gain visibility of SQL Server memory consumption.

xxxxxxxxxxxxxxxxxxxx

Disk or Storage Problems

SQL server read/write performance is closely related to the ability of Windows to retrieve and write data pages to disk efficiently.

COUNTER		WHAT TO LOOK FOR	PROBLEM CONDITION
Physical Disk	Avg. disk sec/Read	Average time in seconds to complete a read from disk	> 0.010 Sub-optimal > 0.020 Poor
Physical Disk	Avg. disk sec/Write	Average time in seconds to complete a write from disk	> 0.010 Sub-optimal > 0.020 Poor

SQL Server Performance Problems

Sometimes server HW resources do not cause bottlenecks but application performance is still bad.

In this situation, internal SQL Server resources can become exhausted or depleted.

Key SQL Server PerfMon Counters

xxxxxxxxxxxxxx

xxxxxxxxxxxxxxxx

WAIT STATS ANALYSIS

SQL Server Wait stats record the resource and time SQL server spends waiting for each resource. A number of these wait types are exposed as PerfMon counters:

Lock waits
Log write waits
Network I/O waits
Non-page latch waits
Page I/O latch waits
Page latch waits

It may be easier to access these wait stats from the DMVs within SQL Server; however, collecting them as part of a system-wide data gathering exercise with PerfMon minimizes the logistics effort involved in collecting the data.

GETTING STARTED WITH PERFORMANCE ANALYSIS FOR LOGS (PAL)

Performance Analysis for logs is a tool created by engineers from the product support organization with MS to automate log file analysis to provide rapid identification of problems within log files. PAL works using templates to analyze performance data and will flag values outside the acceptable range as determined in the templates.

PAL was designed to reduce the labor-intensive log analysis tasks and help engineers provide faster turnaround in log analysis.

Since its original release PAL has been enhanced to include templates to analyze performance logs for many popular MS server products, s.a ISS, BizTalk, Exchange, Active Directory, and SQL Server.

Templates are based on thresholds used by product support teams at Microsoft when troubleshooting customer problems.

Using PAL simplifies the task of log analysis, providing a report that highlights key problem areas.

PAL uses templates and widely-tested thresholds that correlate performance data from multiple counters, which means reports provide a robust and reliable solution for log file analysis.

PAL is not a replacement for good methodology and usual troubleshooting steps!

There is no need to install the PAL on a production server. Performance analysis is resource intensive; therefore, to avoid impacting performance of a production server, capture the logs on the server suffering the problem and copy these to a desktop or test server for analysis with PAL

USING SQL Server to Analyze PerfMon Logs

Analyzing PerfMon logs from within a database has the benefit of data access through the familiar language of T-SQL, meaning problems should be easier to identify and you can write queries looking for specific problem conditions.

In the following example, three counters could be combined to identify a low-memory condition:

1. Available memory less than 100 MB

2. Page life expectancy less than 60 seconds

3. Buffer cache hit ratio less than 98%

The following query could be used to identify any instance during the data capture window when the low memory condition existed:

SELECT * FROM subset WHERE Mem_Avail_Bytes < 1000000 and Buff_Mgr_PLE < 60 and Buff_Cache_Hit_Ratio < 98

COMBINING PerfMon Logs and SQL Profiler Traces

A feature first available in SQL Server 2005 was the capability to combine PerfMon logs with SQL Profiler traces.

Using Profiler to combine logs in this way enables u to view T-SQL code that’s running on the server, combined with the HW impact of running the code, s.a high CPU or low memory.

xxxxxxxxxxxxxxxxxxxx

Extracting Performance data for a Specific Timeframe

Were a problem to occur at 10:30 am to March 15, it would be useful to extract the time frame from 10:00 a.m to 11.00 a.m to provide a manageable log size, without losing any data points, as shown in the following example:

Relog Server001_LOG.blg – b 15/03/2009 10:00:00 –e 15/03/2009 11:00:00 –o Server001_LogExtract.blg

The Relog parameter –c enables a subset of counters to be specified.

Relog Server001_LOG.blg – c “\Memory\*” –o Server001Memory_Log.blg

USING LogMan

It is a command line tool used to create and mange PerfMon logs. The LogMan tool can be used to schedule the starting and stopping of logs.

This can be a useful alternative to using the Windows AT scheduler or the scheduler functions available within PerfMon.

The great benefit of using LogMan is that you can centrally start & stop performance monitoring.

xxxxxxxxxxxxxxxxx

Using LogParser

It is a simple to use yet powerful tool for log file analysis, popularized for analyzing logs from IIS web servers.

This can be great when searching Windows event logs, IIS logs, or PerfMon logs.

xxxxxxxxxxxxxxxxx

CHAPTER 10

TRACING SQL Server with SQL Trace and Profiler

· An overview of why tracing is important

· Deep look into the Architecture of SQL Trace

· Comprehensive coverage of the Tracing views, functions, and stored procedures

· An extended analysis of Security and SQL Trace

· Profiler, Profiler and more profiler

Profiler is a GUI; SQL Trace is the just. Profiler is a very useful tool and is a great option for ad-hoc performance analysis.

By running a trace the DBA can answer the following questions:

· How many calls are made to the database?

· How long are the calls taking?

· Are the calls synchronous or asynchronous in nature?

· Has the developer used stored procedure or are the calls all ad-hoc SQL statements?

WHAT TO TRACE

The SQL Trace architecture is present to answer three types of questions.

1. The first, as you have seen, relates principally to performance. SQL Server can tell u via SQL trace what was being executed when any by whom.

2. It can answer the “who accessed the system and changed a table” question but also the “who changed the permission for a given user” questions as well.

3. The third type of question is the “what went wrong” question. SQL Trace enables you to track errors and warnings and can guide you to answer some problems in your solutions.

You can get some of this information from the SQL log or the windows event log. However, the nice bit about capturing this data in SQL Trace is that it makes it easy to see those errors in context”.

Knowing that a certain stored procedure had just started prior to an exception being raised my be critical to diagnosing a fault or issue.

How to Trace

There are only two ways to trace information out of SQL Server and both of these methods use the SQL Trace architecture.

1. Use Profiler &

2. Build and execute a server-side trace

Profiler:

Is a GUI which you can use to interactively build and execute a tracing sessions. You can select any combination of events and SQL Trace will return them to the GUI so you can see what is happening in real time.

The server-side trace however is the main event, in my opinion, when it comes to tracing SQL Server, especially in production environments. You create server-side trace by executing a number of system stored procedures against SQL Server.

The server-side trace then writes the event that you have asked it to capture to a file on the file system. For auditing style traces you should always use a server-side trace.

THE ARCHITECTURE OF SQL TRACE

In SQL Server, this tracking information is implemented as a series of events. The various components of the database engine are collectively known as event procedures.

SQL server provides the trace controller (fig 10-1). The trace controller component tells the rest of the database engine, via a bitmap, which events are being requested by a tracer.

If someone, anyone, has an open trace running and wants a given event, then the database engine will generate that event when it occurs. An open trace equates to a trace session.

Once the event has been generated by the event producer, the trace controller decides what to do with it. The event is held in a queue and synchronized while the trace controller determines what it needs to do with it. Bcos the trace controller knows what events each trace has requested, it passes the complete event down the line to each trace session that has asked for it.

xxxxxxxxxxxxxxxxxxxxxxxxx

The trace session can now act on the trace data it has received from the controller. Each event received contains a complete set of columns which can be used to further filter the data that is captured in the trace.

For example, u might be interested in only those events generated by the actions of a single user; consequently you may have created a filter to only keep events from a specific SPID.

After the events have been filtered, the event can be sent to the I/O provider for the trace.

The I/O provider is the determining factor for the endpoint of the trace event. The event is either written to a file or sent to a rowset provider, where it is consumed by an application s.a Profiler.

EVENT CLASSIFICATION AND HIERARCHIES

It is classified by a mini-hierarchy that consists of two levels. The top level is the event category, which logically groups the vents together. An ex: might be locks or stored procedures. Note that there is no root level that contains all event categories.

There are 21 different categories in SQL Server 2008. The second level of the hierarchy reflects the events themselves and is known as the event class.

The event class can also be sub-classified. An event subclass consists of a subclass name and a subclass value. An event subclass is grouped together by the event column and the event class to create a unique list of permissible names and values for an event column.

An event class can also be sub-classified. An event subclass consists of a subclass name & a subclass value.

xxxxxxxxxxxxxxxx

There is one view for each of the six grey boxes seen in fig: 10-2 above. Notice how the binding box wires everything together.

The sys.trace_event_bindings view holds the many-to-many relationships between events and event columns to form the event class.

SELECT CAT.NAME AS CATEGORY_NAME

,EV.NAME AS EVENT_NAME

,COL.NAME AS COLUMN_NAME

,COL.IS_FILTERABLE AS COLUMN_IS_FILTERABLE

,COL.MAX_SIZE AS COLUMN_MAX_SIZE

,COL.[TYPE_NAME] AS COLUMN_TYPE_NAME

,SUB.SUBCLASS_NAME AS SUBCLASS_NAME

,SUB.SUBCLASS_VALUE AS SUBCLASS_VALUE

FROM SYS.TRACE_CATEGORIES CAT

JOIN SYS.TRACE_EVENTS EV ON CAT.CATEGORY_ID=EV.CATEGORY_ID

JOIN SYS.TRACE_EVENT_BINDINGS BI ON BI.TRACE_EVENT_ID=EV.TRACE_EVENT_ID

JOIN SYS.TRACE_COLUMNS COL ON BI.TRACE_COLUMN_ID=COL.TRACE_COLUMN_ID

LEFT JOIN SYS.TRACE_SUBCLASS_VALUES SUB

ON SUB.TRACE_COLUMN_ID=BI.TRACE_COLUMN_ID AND SUB.TRACE_EVENT_ID=BI.TRACE_EVENT_ID ORDER BY CAT.NAME,EV.NAME,COL.NAME,SUB.SUBCLASS_VALUE

SQL TRACE CATALOG VIEWS

That section missed the most important view: sys.traces. This is the view that actually shows you the list of trace definitions on the server.

All the other tracing views are present to provide the metadata, but sys.traces shows the actual object definitions of user-defined and system-defined traces.

SYS.TRACES

This is the catalog view to see all traces that have been configured against the SQL Server. A trace will appear in this view when a SQL trace definition has been created using sp_trace_create. A server-side trace is first created with a status of “stopped”.

The stopped state is mentioned here bcos the status of the trace determines its visibility in sys.traces. A trace is visible when in two states: stopped or started. However, using sp_trace_setstatus u can also close the trace.

If u want to audit the creation and deletion of traces on a SQL Server, u can always create a trace to do this urself. U can’t use triggers on system catalogs and so creating a trace is your best option here. U can see the mechanics of creating a trace definition with Profiler in the “Exporting a Trace Definition” .

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxx

Is_rowset

This one is the least interesting. It is used to identify traces that are being consumed by a consumer s.a Profiler or the SMO trace provider.

Using the READER_SPID column, it is easy to see which SPID is responsible for the trace and then track that user courtesy of his or her login information.

SELECT trc.id as Trace_id

,trc.[status] as Trace_status

,trc.start_time as trace_starttime

,trc.last_event_time as trace_lasteventtime

,trc.event_count as trace_eventcount

,trc.dropped_event_count as trace_droppedeventcount

,ses.login_time as session_logintime

,ses.host_name as Session_Hostname

,ses.program_name as session_programname

,ses.login_name as session_loginname

,ses.nt_domain as sessions_ntDomain

,ses.nt_user_name as sessions_NTusername

,ses.cpu_time as sessions_CPUTime

,ses.Memory_usage as session_memoryusage

,ses.reads as session_reads

,ses.writes as session_writes

,ses.logical_reads as session_logicalReads

,ses.last_request_start_time as session_lastrequeststarttime

,ses.last_request_end_time as session_lastrequestendtime

FROM SYS.TRACES TRC

JOIN SYS.DM_EXEC_SESSIONS SES

ON TRC.READER_SPID=SES.SESSION_ID

WHERE IS_ROWSET=1

Is_rollover

Tracking can generate a larege amount of data. When writing to a file, this can quickly transalte to something large and unwieldly. When files get beyond a certain size, then they become more difficult to move and administer making house-keeping more difficult – especially if u need to move these files over unreliable network links.

By keeping files under 2GB u can also use the windows compresion function to conserver space.

SQL Trace provides a feature in the file provider known as rollover. When enabled, SQL Trace will generate a new linked file when the current file reaches the size set in the MAX_SIZE parameter.

Both the IS_ROLLOVER and the Max_Size values need to be set for this to work.

If rollover isn’t enabled and the file has reached the MAX_SIZE, then SQL Server will simply stop tracing – the trace will be stopped and closed, resulting in the definition being deleted form the system.

Ex: when rolled over, C:\Tmp\MyTrace.trc would become C:\Tmp\MyTrace_2.trc.

SQL Server attempts to maintain the number of files at the MAX_FILES number, but if for some reason it is prevented from deleting the file, then it will just ignore it.

This file would need to be cleaned up manually. To continue our example, if MAX_FILES had been set to 2 and C:\Tmp\MyTrace_2.trc was filled, then SQL Server would delete C:\Tmp\MyTrace.trc before creating C:\Tmp\MyTrace_3.trc.

Is_showdown

If this flag is set in the trace definition, then SQL Server will shut itself down if for any reason it isn’t able to write out the trace information successfully.

Interestingly, the sys.traces catalog view also provides the number of dropped events by a trace. From this value you can infer just how dangerous it would be to use this feature with a given trace configuration.

Is_default

The default trace is a SQL Trace that is always running and is designed to be a lightweight trace that ticks along in the background.

SELECT data.* into TraceTable

FROM sys.traces t CROSS APPLY

Fn_trace_gettable (t.path,default) data

WHERE is_default=1

Convert it into Human readable, use some of the user-defined functions and traslate the integers they provide to the textual descriptions:

DECLARE @vTraceID INT;

SET @vTraceID=1;

WITH filter AS

(SELECT columnid

, CASE logical_operator

WHEN 0 THEN 'AND'

WHEN 1 THEN 'OR'

ELSE 'err'

END AS logical_operator

,CASE comparison_operator

WHEN 0 THEN ' = '

WHEN 1 THEN ' <> '

WHEN 2 THEN ' > '

WHEN 3 THEN ' < '

WHEN 4 THEN ' >= '

WHEN 5 THEN ' <='

WHEN 6 THEN ' LIKE '

WHEN 7 THEN ' NOT LIKE '

END AS comparison_operator

FROM :: fn_trace_getfilterinfo (@vTraceID)

)

SELECT cat.name AS CategoryName

,evt.name AS EventName

,col.name AS ColumnName

,STUFF ( ( SELECT '|' + child_fil.logical_operator

+ child_fil.comparison_operator

+ CAST(child_fil.columnid AS VARCHAR(MAX)

)

FROM filter child_fil

WHERE parent_fil.columnid = child_fil.columnid

FOR XML PATH ('')

), 1,1, ''

) AS Colfilters

FROM fn_trace_geteventinfo(@vTraceID) evi

JOIN sys.trace_events evt

ON evi.eventid = evt.trace_event_id

JOIN sys.trace_categories cat

ON evt.category_id=cat.category_id

JOIN sys.trace_columns col

ON evi.columnid = col.trace_column_id

LEFT JOIN filter parent_fil

ON col.trace_column_id=parent_fil.columnid

GROUP BY cat.name

,evt.name

,col.name

,parent_fil.columnid

ORDER BY cat.name

,evt.name

,col.name

TRACE CATEGORY	TRACE EVENT
Errors and Warnings	Attention
Errors and Warnings	Exception
Stored Procedures	RPC: Starting
T-SQL	SQL:BatchStarting

The Attention and Exception events are generic, whereas the default trace captures much more specific events. Furthermore, by capturing the RPC:Starting and SQL: BatchStarting events, tbe black box trace captures potentially heavily called events.

The black box can only use two files. Each file can be a maximum size of 5MB. Therefore the maximum amount of information is constrained to 100 MB. Given that the black box is capturing RPC:Starting and SQL:Batchstarting events don’t expect the data in these files to hang around for long!

Sys.trace_categories

The following code demonstrates how to query this catalog view and also provides the textual descriptions of the category types.

SELECT category_id AS Category_ID

,name AS Category_name

,[TYPE] AS Category_TypeID

,CASE [type]

WHEN 0 THEN 'Normal'

WHEN 1 THEN 'Connection'

WHEN 2 THEN 'Error'

ELSE 'Unknown'

END AS [Category_TypeName]

FROM sys.trace_categories

Sys.trace_events

The sys.trace_events catalog view is also primarily used for converting the event id back to a text-based descripton.

SELECT trace_event_id AS Event_ID

,evt.name AS Event_name

,evt.category_id AS Event_categoryID

,cat.name AS Category_name

From sys.trace_events evt

JOIN sys.trace_categories cat

ON evt.category_id=cat.category_id

ORDER BY cat.name,evt.name

Sys.trace_columns

This view contains the list of all possible columns. A couple of columns on the catalog view does reveal some intereting information. I’d like to focus ur attention on two columns in particular: the is_repeatable and is_repeated_base bit flags:

SELECT trace_column_id,name,TYPE_NAME,max_size,

is_filterable,is_repeatable,is_repeated_base

FROM sys.trace_columns

WHERE is_repeatable=1

OR is_repeated_base=1

Sys.trace_subclass_values

When an event is raised, data is passed down from SQL Server’s Trace Controller to the trace session. This data contains the values for the columns bound to that event.

SELECT cat.name AS Category_name

,ev.name AS event_name

,col.name as column_name

,sub.subclass_name as subclass_name

,sub.subclass_value as subclass_value

,datalength(sub.subclass_name) as subclass_name_length

FROM sys.trace_categories cat

JOIN sys.trace_events ev

ON cat.category_id=ev.category_id

JOIN sys.trace_event_bindings bi

ON bi.trace_event_id=ev.trace_event_id

JOIN sys.trace_columns col

ON bi.trace_column_id=col.trace_column_id

LEFT JOIN sys.trace_subclass_values sub

ON sub.trace_column_id=bi.trace_column_id

AND sub.trace_event_id=bi.trace_event_id

WHERE sub.subclass_name IS NOT NULL

AND cat.name = 'Locks'

AND ev.name ='Lock:Acquired'

AND col.name ='Mode'

ORDER BY cat.name

,ev.name

,col.name

,sub.subclass_value;

However, subclass_value 0 returns a name of “NULL.” With that information u probably realize what the issue is. The value shown is actually the word NULL, not a NULL value per se.

The hint is the white background, and the additional filed of suclass_name_length confirms this. The value of this field is 8, representing the byte count for the subclass name. Clearly, then, subclass_name is a unicode data type – hence the double-byte count for every character.

The following code snippet returns all events that have a subclass_name of ‘NULL’ and fig 10-8 shows all nine examples of this is SQL Server 2008.

SELECT evt.name

,col.name

,sub.subclass_name

,sub.subclass_value

FROM sys.trace_subclass_values sub

JOIN sys.trace_columns col

ON sub.trace_column_id=col.trace_column_id

JOIN sys.trace_events evt

ON sub.trace_event_id=evt.trace_event_id

WHERE subclass_name='NULL'

name	name	subclass_name	subclass_value
Lock:Deadlock Chain	Mode	NULL	0
Lock:Released	Mode	NULL	0
Lock:Acquired	Mode	NULL	0
Lock:Deadlock	Mode	NULL	0
Lock:Cancel	Mode	NULL	0
Lock:Timeout	Mode	NULL	0
Lock:Escalation	Mode	NULL	0
Lock:Timeout (timeout > 0)	Mode	NULL	0
Blocked process report	Mode	NULL	0

It is also worth nothing that only eight of the 66 columns use this feature as can be seen in the folloiwng code snippet. However, these eight columns are used across 111 of the available events, as shown in Figure 10-9.

SELECT COUNT(DISTINCT Sub.trace_column_id) AS distinct_trace_columns

,COUNT(DISTINCT sub.trace_event_id) AS distinct_trace_events

FROM sys.trace_subclass_values sub

distinct_trace_columns	distinct_trace_events
8	111

For refernece, the columns that map to specific subclass values are as follows (the results of which can be seen in Figure 10-10):

SELECT col.trace_column_id AS trace_column_id

,col.name AS column_name

,COUNT(DISTINCT sub.subclass_value) AS subclass_value_count

FROM sys.trace_subclass_values sub

JOIN sys.trace_columns col

ON sub.trace_column_id=col.trace_column_id

GROUP BY col.trace_column_id

,col.name

ORDER BY col.name

	trace_column_id	column_name	subclass_value_count
1	31	Error	4
2	21	EventSubClass	23
3	55	IntegerData2	1
4	32	Mode	22
5	28	ObjectType	80
6	58	OwnerID	7
7	30	State	1
8	57	Type	12

sys.trace_event_bindings

The sys.trace_event_bindings catalog view simply provides the association between all events and all columns to create the permissible list. The best way to visualize this is probably through the Trace Properties dialog box in Profiler.

The duration column is available for SQL:BatchCompleted but not available for SQL:BatchStarting. Since SQL Server hasn’t yet achieved the feat of time travel, it makes sense that it can only tell you the duration of a batch after it has finished.

SQL TRACE PROCEDURES AND FUNCTIONS

Sp_trace_create

To start the process off, a trace definition needs to be created. This is performaed by using sp_trace_create. This stored procedure is responsible for setting the ground rules for the trace.

Note that SQL Trace is able to output only to a file (i.e., the flat file I/O provider) when invoked via this stored procedure. Moreover, this flat file should be located on a local drive so as not to incur network I/O.

Don’t forget that the SQL Server servie account will need sufficient permission in Windows to write the trace file in the target directory.

Sp_trace_create will not overwrite a file that already exists. If you try and do this you will get the error.

Sp_trace_setevent

This stored procedure performs that function .however, somewhat painfully, it is necessary to call SP_TRACE_SETEVENT for every column that you wish to capture for the event.

A more accurate name for this procedure would have been SP_TRACE_SETEVENTCOLUMN.

The SPID column is always included for each event added to a trace definition. You cannot exclude it.

You simple trace SP:STMTCOMPLETED while executing the preceding SELECT statement, then you do not get one row as you might expect.

Sp_trace_setfilter

As the name suggests, SP_TRACE_SETFILTER is used when building a server-side trace to filter the results that are captured to a file.

The SYS.TRACE_COLUMNS catalog view has a column called IS_FILTERABLE that you can use to check whether u can place a filter against the columns:

SELECT is_filterable, COUNT(*) AS filter_count FROM sys.trace_columns GROUP BY is_filterable

The three columns that aren’t filterable are as follows:

1. EventClass

2. EventSequence

3. ServerName

Filtering a Range

In order to create a range filter with SP_TRACE_SETFILTER it is necessary to execute it twice. The first time sets the lower boundary; the second time creates the upper boundary.

Sp_trace_setstatus

It is the stored procedure used to start, stop, and destroy trace definitions.

The parameters for sp_trace_setstatus.

If you are planning to open trace files with SQL Profiler, try to ensure that the file sizes are manageable by rolling the file fairly frequently. Profiler also tries to be helpful by offering to load in subsequent rolled over files for you, which is nice.

Sp_trace_generateevent

It enables u to instrument your SQL code with user-defined events. It is quite limited, as you are only given 10 district user-defined events to play with. Nor is it frequently used by aplication developers who normally instrument code outise of the stored procedure layer.

Fn_trace_geteventinfo

It returns all the events and columns specified for a currently defined trace. It’s essentially an enabling function that is used by tool providers and DBAs who want to know what events are being captured in trace sessions.

SELECT evt.name AS EventName

,col.name AS ColumnName

FROM fn_trace_geteventinfo(1) inf

JOIN sys.trace_events evt ON inf.eventid = evt.trace_event_id

JOIN sys.trace_columns col ON inf.columnid = col.trace_column_id

ORDER BY eventid

,columnid

Fn_trace_gettable

It is the most interesting and useful in everyday use. This function enables you to read a trace file even while it is open and being written to. It can therefore be used to process trae data collected in trace files and load it into tables in SQL Server for a more detailed analysis.

SELECT data.* FROM sys.traces t

CROSS APPLY

fn_trace_gettable(t.path,default) data

WHERE is_default = 1

ORDER BY EventSequence

DOWN LOAD CODE:

SECURING SQL TRACE

SQL server 2005 changed the security dynamic for tracing. A new permision was created called ALTER TRACE.

GRANT ALTER TRACE TO Trace

TRACING LOGIN CREATION / DELETION

CREATE LOGIN TRACE WITH PASSWORD=’XY’, CHECK_POLICY=OFF

CREATE LOGIN [DOMAIN\USERS] FROM WINDOWS;

DROP LOGIN [DOMAIN\USERS]

DROP LOGIN Trace;

Logins Changing Passwords

There are two ways a SQL Login password can be changed:

SP_PASSWORD
ALTER LOGIN

USE MASTER

CREATE LOGIN TRACE WITH PASSWORD='XYZ', CHECK_POLICY=OFF;

GRANT ALTER TRACE TO TRACE;

EXECUTE AS Login='Trace';

ALTER LOGIN TRACE

WITH PASSWORD ='NEW XYZ'

OLE_PASSWORD = 'XYZ'

REVERT

DROP LOGIN TRACE

Securing the Output of SQL Trace

There are two I/O providers for SQL Trace events:

Flat-file provider
Rowset provider

Rowset Provider is used by Profiler

Flat-file Provider is used by Server-side tracing.

Rowset Provider Protection

· Save to a table

· Save to a file (which may or may not be XML)

PROFILER

It is a graphical GUI tool that is oftern used to run traces against production systems.

It consume CPU and make heavy use of the TEMP folder on your production system, it is also an implementation of a rowset provider.

ADVANCED FEATURES OF PROFILER

Profiler allows you to Group by one or many events. However, u can also right clike on the events and select a different view. This view is called grouped view.

However, an alternative method would be to group by a clientprocessID, SPID, HostName, NTUserName, LoginName, or SessionLoginName to help isolate a single user in the context of all other event data.

For SQL statements these options only work for SQL:BatchStarting, SQL:BatchCompleted, RPC:Starting, and RPC: Completed events.

First, create a trace in Profiler that includes the following events and start running:

SQL:BatchStarted

Showplan XML for Query Compile

Showplan XML Statistics Profile

To make it easier to extract events u may want to consider grouping the events by the EventClass as shown in the previous sub-section “Grouping Events”

The folloiwng additional events will enable you to drill down to the statement level and compare and review execution plans, albeit at the expense of increased overhead in the data collection:

Performance: Show Plan Statistics
Stored Procedures: SP:StmtCompleted
Stored Procedures: SP:StmtStarting
T-SQL: SQL:StmtCompleted
T-SQL: SQL:StmtStarting

CHAP: 14

Using Management Studio Reports and the Performance Dashboard

Cover:

· How to get basic troubleshooting information by pointing and clicking in SQL Server Management Studio

· Where to get more reports from MS for even deeper information

· The basics of building ur own reports

xxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxx

CHAP 14: USING MANAGEMENT STUDIO REPORTS AND THE PERFORMANCE DASHBOARD

In SQL server 2008, Microsoft shifted focus away from SSMS reports in favour of the new Management Data Warehouse (MDW).

The MDW addreses some weaknesses of the standard reports and performance dashboard.

USING THE STANDARD REPORTS

xxxxxxxxxxxxxxxxxxxxxx

READING THE SERVER DASHBOARD REPORT

It’s simply a summary of the most basic configuration options on the server and some very rudimentary connection statistics.

CONFIGURATION CHANGES AND SCHEMA CHANGES REPORTS

If the server dashboard report shows a configuration change, the configuration changes history report can help the database administrator find out when it was mede, and by how.

Likewise, the SCHEMA CHANGES HISTORY report, showns al DLL changes made in all databases on the instance.

1. the SQL Server service has been restared five times since the change was made; or

2. the default trace has accumulated more than 100MB of data since the change.

JOB monitoring: If the data capture is performed via a SQL Agent job, then a malicious user could simply turn off the job or disable the schedule.

Memory Consumption Report

It provides a current snapshot of how SQL Server is utilizing memory. This report is only a one-time peek into SQL Server’s internals, and so DBAs should not make configuration decisions based on this report.

When DBAs find themselves needing the kind of information in this report, it’s time to consider using the MDW instead. (Management Data Warehouse).

ACTIVITY REPORTS

DBA better insight into CPU and memory issues on the server. However, viewing all current queries through Activity Monitor or by running sp_who can seem like drinking water from a fire hours.

If a user calls the IT department to complain that a server is busy, the DBA can check the following reports:

ACTIVITY – all Blocking Transactions: This report is similar to running an sp_who2 and looking at the BlkBy column to see if any transaction blocking another transaction.
ACTIVITY – all Cursors: If a slow cursor-based query or, heaven forbid, a nested cursor, is chewing up a great deal of resources on a server, it’s visible in this report. This is also useful for cursors that have not been closed, if a user leaves SSMS open without finishing their running code.
ACTIVITY – Top Sessions: This report, shown in fig: 14-7, breaks out the oldest, most CPU-intensive, memory-intensive, read-intensive, and write-intensive sessions.

These reports get their data from DMVs s.a. sys.dm_exec_sessions, sys.dm_exec_connections, and sys.dm_exec_requests.

PERFORMANCE EXECUTION REPORTS:

The performance – Object Execution Statistics report is a great example of how the Standard Reports come very close to providing value, yet fall just a bit short.

It makes a good example of how to build better replacement reports.

DMV sys.dm_exec_query_stats joins to sys.dm_exec_sql_test to get the exact SQL executed:

SELECT SQL_HANDLE,SQL_HANDLE AS CHART_DISPLAY_OPTION,SQL_HANDLE AS CHART_DISPLAY_OPTIONIO,MASTER.DBO.FN_VARBINTOHEXSTR(SQL_HANDLE),

DENSE_RANK() OVER (ORDER BY s2.dbid,s2.objectid) as SPrank, s2.dbid,s2.objectid, (select top 1 SUBSTRING (text,(s1.statement_start_offset+2)/2,

(case when s1.statement_end_offset = -1 then len(convert(nvarchar(max),text))*2 else s1.statement_end_offset end -s1.statement_start_offset)/2)

from sys.dm_exec_sql_text(s1.sql_handle)) as [SQL Statement]

, execution_count, plan_generation_num,last_execution_time, ((total_worker_time +0.0)/execution_count)/1000 as [avg_worker_time]

, total_worker_time/1000.0 , last_worker_time /1000.0,min_worker_time/1000.0,max_worker_time/1000.0,max_worker_time/1000.0

, ((total_logical_reads+0.0)/execution_count) as [avg_logical_reads] , total_logical_reads, last_logical_reads, min_logical_reads, max_logical_reads,

((total_logical_writes+0.0)/execution_count) as [avg_logical_writes], total_logical_writes,last_logical_writes, min_logical_writes, max_logical_writes,

((total_logical_writes +0.0)/execution_count + (total_logical_reads +0.0)/execution_count) as [avg_logical_IO], total_logical_writes + total_logical_reads,

last_logical_writes + last_logical_reads, min_logical_writes+ min_logical_reads,max_logical_writes + max_logical_reads from sys.dm_exec_query_stats s1

cross apply sys.dm_exec_sql_text(sql_handle) as s2 where s2.objectid is not null and DB_NAME (s2.dbid) is not null order by s1.sql_handle;

Backup and Restore Reports:

DBA often troubleshoot backup job times by looking at the runtime durations of the SQL Server Agent Jobs. For more granual detail, right-click on an individual database and choose the Backup and Restore Events reports.

SELECT t1.database_name, t1.type as [type], (AVG(datediff(ss,backup_start_date, backup_finish_date)))/60.0 as AverageBackupDuration,

COUNT(*) as BackupCount From msdb.dbo.backupset t1

inner join msdb.sys.databases t3 on (t1.database_name =t3.name)

group by t1.database_name,t1.type order by t1.database_name, t1.type

database_name	type	AverageBackupDuration	BackupCount
DB_Restore	D	1.3	1
DB_TABLELIST	D	0.05	7
DB_TABLELIST	L	0	10
EBILL	D	10.55	3
FTR_SR_MASTER	D	23.633333	4
MASTER	D	0.016666	2
MIS_APR2014	D	24.7	1
MODEL	D	0	1
MSBill_2014	D	6.7	1
MSDB	D	0.116666	1
REPEAT_APR2014	D	25.283333	1
SelfcarePrePaid_APR2014	D	0	1
SRFTR_APR2014	D	3.05	1

In the result set, the Type column indicates the type of backup:

TYPE	DETAILS
D	FULL
I	DIFFERENTIAL
L	TRANSACTION LOG

DB_Level reports can be run for all databases by calling that T-SQL code with the system SP : sp_msforeachdb.

On the SSMS report, clicking the plus (+) sign in the Device Type column yields more inforamtion about each individual backup file. This comes in handy when troubleshooting a broken LSN chain or problems with log shipping.

Sometimes end users or developers do their own full backups out of the sequence from the DBA’s backup and recovery scheme, and the DBA will need to track down the location of the out-of-control backup file.

Index Usage Statistics Report

Adding indexes can be the fastest way to solve a performance problem. Repeated applications of that solution can be a pretty quick way to cause a performance problem.

To balace faster reads against slower insert/ update/ deletes, the database administrator needs to know how each index is used and how much it is being updated.

If an index is heavily used for weeks and then suddenly is no longer needed due to a query change or the presence of a better index, the number of User Seeks and User Scans will still seem high.

To mitigate this issue, the DBA needs to check the Last User Seek, Last User Scan, and Last User Lookup Time columns.

If they are old, then the index has not been used recently even though the Seeks and Scans number will be high.

If a database is only used once per month or once per quarter, the Last User Seek, Scan, and Lookup times won’t be updated.

Index Physical Statistics Reports

As data is inserted, updated, and deleted in a table, the table and its indexes become fragmented.

Index fragmentation slows down sequential operation s.a. range scans and read-ahead because the storae needs to hunt around for all the relevant pages.

This report shows all clustered and nonclustered indexes in the database and their depth – the number of levels in the b-tree of the index.

Microsoft’s whitepaper on index fragmentation in SQL Server 2000 indicates that DBAs should not be concerned with objects under 1,000 pages in size:

http://technet.microsoft.com/en-us/library/cc966523.aspx

The easiest way to stay on top of fragmented objects is to use maintenance plans to periodically rebuild or reorganize all indexes in a database. Unfortunately, this cause a lot of logged and locked operations, which affects performance while the maintenance plan runs.

For databases over 100GB, consider writing scripts to reorganize and/or rebuild only the fragmented objects. Rebuilding only the most fragmented objects will reduce the performance impact on database mirroring and log shipping, which both slow down dramatically during index rebuilds.

USING THE PERFORMANCE DASHBOARD

Microsoft’s Production Support Services (PSS) team for SQL Server created a set of custome reports for SQL Server 2005 SP2 called the Performance Dashboard.

These reports gave DBA more powerful troubleshooting information about the health of their server.

Sadly, the reports were not updated for SQL Server 2008, but with a few easy tweaks, the reports will work fine. Keep in mind, however, that they won’t account for new SQL Server 2008 features like filestream data or spatial data.

SUMMARY

In SQL Server 2008 and 2005, MS’s new dynamic management views gave DBAs insight into what’s happening inside the Storage Engine, and the SSMS reports helped surface that information.

Using the bundled Standard Reports to get a quick view of basic health metrics
Installing the Performance Dashboard Reports to deliver more insightful information on problems
Building their own reports modeled on the Performance Dashboard Reports for truly powerful analysis.

15: USING SQL SERVER MANAGEMENT DATA WAREHOUSE

· Purpose of the Management Data Warehouse (MDW)

· Configuring the MDW

· Creating custom collectors

· Reporting on the MDW

MDW Architecture

The MDW consists of the following three key components:

1. Data collection sets

2. Data warehouse

3. Reports

The Data Collection Sets are defined as SQL Server Agent jobs used to collect and upload data from target servers to the data warehouse. The data ware house is a centralized repository for performance data storage. Reports are executed agains the data warehouse to present data stored in the data warehouse.

DATA COLLECTION SETS

It is necessary to retrieves performance datat from the targets and store this in the data warehouse. Performance data collection is carried out by a user-mode process called DCEXEC.EXE that runs on each target server.

This application is called by a SQL Server Agent job and is responsible for collecting performance data based on a collection items defined in the collection set.

Data collection sets can be configured in either cached mode or non-cached mode.

xxxxxxxxxxxxxxxxxxxxx

The Server Activity collection set consists of one Generic T-SQL collector type used to snapshot the following DMVs:

dm_os_waiting_tasks
dm_os_latch_stats
dm_os_process_memory
dm_os_memory_nodes
dm_os_schedules
dm_io_virtual_file_stats

SQL DBA

Saturday, June 28, 2014

SQL SERVER 2008 Troubleshooting