Andras

Software Architect - Red Gate Software

  • Why should I rebuild a table after dropping or adding a column?

    Posted Thursday, February 19, 2009 8:30 AM | 4 Comments

    The amount of space used by table rows can be important, and there are good articles on the web that show ways to estimate the amount of space that a table row uses. However, the actual space used by a table row could be more than what we would have calculated. The problems come when we modify the schema of a table. Sometimes it is necessary to add a column to a table or to drop a column from a table. This operation is apparently straightforward: when you drop a column, it should free up the space, and a new column should use more space. However, when a column is dropped, SQL Server will not go and modify all the database pages for your table, free the space, move rows around, etc. By dropping a column, you may end up just updating the metadata, and your rows will still consume the same amount of space as before. This could have unpleasant consequence, such as in the following example:

    CREATE TABLE dbo.wastedSpace1
        
    (
          
    id INT,
          
    columnWeWillDrop NCHAR(3500),
          
    lastColumn INT DEFAULT 255
        
    )

    We create a table that has three fixed length columns. One of these columns (columnWeWillDrop) will consume 7000 bytes (NCHAR uses UCS-2 encoding, which uses two bytes per character)
    We insert a single row into this table like:

    INSERT INTO wastedSpace1 VALUES (1,REPLICATE(N'a', 3500), 255)

    In this example I put the string of 3500 letter ‘a’ into the columnWeWillDrop column. This will help us to identify it easier on the database pages. Indeed, let’s look at the relevant database page:

    SELECT  first_page
    FROM    
    sys.partitions sp
            
    JOIN sys.system_internals_allocation_units siau
    ON sp.partition_id = siau.container_id
    WHERE  
    sp.OBJECT_ID = OBJECT_ID('dbo.wastedSpace1')
            AND
    sp.index_id = 0

    In my case the result is 0x590100000100. Note that I’m using SQL Server 2008, for earlier version you will need to use different system tables/views. This is the page id of the first page for our table, the page number is 0x159 and the file number 1 (0x590100000100). Since we have not added a clustered index to the table, it will be a heap, and data pages for heaps are doublelinked, so from the first page we will be able to explore the whole table. The content of the first page is (0x159 equals to 345 decimal):

    DBCC TRACEON (3604)
    DBCC PAGE (2,1,345, 3)
    DBCC TRACEOFF (3604)

    The first parameter is the database id (in my case, since I worked in tempdb it is 2), the second parameter is the file id, the third is the page id, and the last parameter influences the details returned by the DBCC PAGE command. We can see that it contains the character ‘a’, the contents of our column ‘columnWeWillDrop’

    Now let’s drop this column:

    ALTER TABLE wastedSpace1 DROP COLUMN columnWeWillDrop

    If we look at the database page, we will be surprised to see the large number of ‘a’s. While a select * from our table will not return the dropped column, it still uses physical space!

    So what will happen if we add a few columns to our table? SQL Server can accommodate fixed length columns with up to 8000 bytes of total length. We currently should use 8 bytes (two times four for the two integers) plust a bit of overhead.

    ALTER TABLE wastedSpace1 ADD  newColumn1 NCHAR(300)

    This works as expected. In theory we should have plenty of space left, as we are using about 608 bytes (well below the 8000 byte limit) plus some overhead. So let's try to add another fixed length column:

    ALTER TABLE wastedSpace1 ADD  newColumn2 NCHAR(300)


    We get an error message:

    Msg 1701, Level 16, State 1, Line 2

    Creating or altering table 'wastedSpace1' failed because the minimum row size would be 8215, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

    Unfortunately the space used by our dropped column will remain to be wasted. We need to rebuild the table. In case of a heap, we could select into another table, or create a clustered index. In case of a table that has a clustered index, we could use the ALTER INDEX .. REBUILD

    These operations will be expensive, because likely all the database rows will be modified. However, this is the price for the speed of the ALTER TABLE DROP COLUMN operation, which is just a meta data operation. It is nice to see however, that SQL Server 2005 and 2008 has improved since 2000 in respect to these schema operations, and many repeated column add/drop operations do not always result in wasted space.

    With variable length columns the amount of wasted space is not that significant, but it still exists. It is worth while exploring the system_internals_partition_columns system view and see where fixed length columns start (these are the ones with a positive number in the leaf_offset column), and check if they form a contiguous row data area.

    SELECT  sc.name, sipc.leaf_offset, sipc.max_inrow_length
    FROM    sys.partitions sp
            
    JOIN sys.system_internals_partition_columns sipc
                 ON
    sp.partition_id = sipc.partition_id
            
    JOIN sys.columns sc
                 ON
    sc.column_id = sipc.partition_column_id AND sc.OBJECT_ID = sp.OBJECT_ID
    WHERE   sp.OBJECT_ID = OBJECT_ID('dbo.wastedSpace1')

    If you are interested to hear more about the hows and whys of the above problem, come to my session at SQL Bits 4 in Manchester on 28 March 2009.

    Also, if you happen to be in Neuss, Germany 22 April 2009, do come to my session at the European PASS Conference where I’ll be talking about the various ways to pass parameters in SQL Server.

  • SQL Server 2008: Table-valued parameters

    Posted Wednesday, August 27, 2008 5:10 PM | 4 Comments

    Table-valued types and parameters are among the many exciting new features of the recently released SQL Server 2008. These table-valued types allow you to pass table data to stored procedures and functions.

    But why is this useful? In previous versions of SQL Server one could pass parameters to stored procedures and functions. A single parameter can describe a single value. But what did you do if the information you want to pass to a stored procedure is more dynamic? If not all information is required for the stored procedure? For example a customer order that has optional delivery time or priority information? You could pass null values, but if these make the code look rather unreadable. Previous solutions included optional parameters for stored procedures. Parameters that had a default value need not be specified. This made stored procedure calls nicer, but the parameter list of stored procedures could quickly become unmanageable. The situation was aggravated when new parameters were introduced.

    Furthermore, this solution could not handle a different type of dynamic information, for example a customer order that contains multiple items. People started to use dirty hacks to circumvent this restriction, and used comma separated lists, and later XML. However, a database management system is certainly not ideal for processing strings or XML, especially when using T-SQL.

    Another alternative was using a communication table that was either fixed or created on demand. This solution suffered too, because when the communication table was permanent, it needed regular clean up, concurrency needed to be handled manually, and with temporary tables there could be recompilation issues.

    Using communication tables was also not ideal from the point of view of roundtrips or code simplicity between the applications and the database. Multiple insert statements needed to be executed just to set up the stored procedure call, all of this put into a single transaction, etc.

    Well, this is hopefully the past, welcome the new table-valued types and parameters. So let’s see an example.

    In order to use table-valued parameters one needs to set up a table type. This looks like the crossbreed of a user defined type and a table definition:

    CREATE TYPE myTableType AS TABLE
        (
          id INT NOT NULL
                 PRIMARY KEY,
          data NVARCHAR(100)
        )

    Note that you can specify a primary key, constraints, computed columns, etc. This gives you a lot of flexibility, and this extra control will help you to optimize performance and ensure data integrity.

    Once you have set up a table type, you can use it as a stored procedure or function parameter. In the following example we just select the information inside the table that is passed in as a parameter:

    CREATE PROCEDURE myProcedure
        ( @TableVariable myTableType READONLY )
    AS
        BEGIN
            SELECT  *
            FROM    @TableVariable
        END
    GO

    As the above example shows, you can query the table parameter as if it were an ordinary table variable. However, you cannot modify the table parameter, and this is spelled out explicitly by the mandatory “READONLY” word after the parameter specification.

    To see how the above works in T-SQL we can declare a table variable using our new table type, insert data into this table variable (and I also use the new row constructor in SQL Server 2008 to insert multiple rows in a single statement), and call the stored procedure with the table variable as its parameter:

    DECLARE @table myTableType
    INSERT  INTO @table
    VALUES  ( 1, 'a' ),
            ( 2, 'b' ),
            ( 3, 'c' )
    EXEC dbo.myProcedure @table

    After executing the above we get a resultset like:

    id          data
    ----------- ----------------
    1           a
    2           b
    3           c

    (3 row(s) affected)

    The above may simplify the work of people writing stored procedures and functions, but what about developers. Well, the above works just as well with ADO.Net. When creating a SQLCommand object for a stored procedure, one can now pass in a DataTable object as a parameter, exactly as it is done in the following C# code segment:

    SqlConnection connection = new SqlConnection("server = ...");
     
    SqlCommand command = connection.CreateCommand();
    command.CommandText = "dbo.myProcedure";
    command.CommandType = CommandType.StoredProcedure;
     
    // declare a table to store the parameter values
    DataTable paramTable = new DataTable();
    paramTable.Columns.Add("id", typeof(int));
    paramTable.Columns.Add("data", typeof(string));
     
    // add the table as a parameter to the stored procedure
    SqlParameter sqlParam = command.Parameters.AddWithValue("@TableVariable", paramTable);
    sqlParam.SqlDbType = SqlDbType.Structured;
    sqlParam.TypeName = "dbo.myTableType";
     
    // add rows to the table parameter
    paramTable.Rows.Add(new object[] {1, "a"});
    paramTable.Rows.Add(new object[] {2, "b" });

     Note that there are no insert statements; we can just use a DataTable to pass in table parameters to the stored procedure. Hopefully this solution in SQL Server 2008 will get rid of all the now unnecessary “hacks” that used XML, comma separated lists and temporary tables used in a large number of database applications.

     If you want to hear more about what is new in SQL Server 2008, and you are in the UK or Poland on the following dates, do come to my presentations or talk to me later. For the first half of September my schedule looks like the following:

  • The unexpected behaviour of DirectoryInfo.GetFiles() with three letter extensions

    Posted Friday, August 01, 2008 11:02 AM | 1 Comments

    There is a documented, but certainly counterintuitive issue with the DirectoryInfo.GetFiles() method in .Net. This method returns a list of files that match a particular pattern. For example in the following example it will return us all the files on drive Z: that have the exact extension “.foobar”


    DirectoryInfo folder = new DirectoryInfo(@"z:");

    FileInfo[] files = folder.GetFiles("*.foobar",
        SearchOption.AllDirectories);



    However, the DirectoryInfo.GetFiles method behaves very differently when you use it with an extension that contains exactly three characters.  For example consider the following example:


    FileInfo[] files = folder.GetFiles("*.sql", SearchOption.AllDirectories);


    This will, as expected, return all the files with the extension “.sql”. However, it will also return all the files that have the extension “.sql-backup”, “sqlold”, “sql~”, etc.  Surprisingly this is the behaviour that is documented in Visual Studio’s documentation. A quote from that documentation (http://msdn.microsoft.com/en-us/library/ms143327.aspx):


    “The matching behavior of searchPattern when the extension is exactly three characters long is different from when the extension is more than three characters long. A searchPattern of exactly three characters returns files having an extension of three or more characters. A searchPattern of one, two, or more than three characters returns only files having extensions of exactly that length.



    The following list shows the behavior of different lengths for the searchPattern parameter:
    •    "*.abc" returns files having an extension of.abc,.abcd,.abcde,.abcdef, and so on.
    •    "*.abcd" returns only files having an extension of.abcd.
    •    "*.abcde" returns only files having an extension of.abcde.
    •    "*.abcdef" returns only files having an extension of.abcdef.


    The reason for the above strange behaviour is the support for the 8.3 file name format. A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.


    This behaviour has bitten me with a tool I’ve been working on. This tool reads “.sql” files and builds up a database schema from these files. This schema can then be compared with live database schemata. The primary motivation for such a tool is to support database schemata in source control. However, there were two different scenarios when the application started to fail. In one case I used emacs to edit a file, and it left me (as expected) a backup file postfixed with a ~ character. On another occasion, I used a source control system that decided to store caching information in the same folder where the sql scripts were located, and the cached files had an extension that started with sql and were followed by a timestamp. In both of these cases the database schema that built by my application was inconsistent, due to objects being duplicated.


    The only solution to the strange behaviour of the DirectoryInfo.GetFiles() seems to be to check the extension of the file explicitly if you use an extension with exactly three characters. The FileInfo.Extension property returns the full extension of the file, not only the first three characters.


  • SQL Server 2008: new data types and .Net 2 with and without SP1

    Posted Friday, June 20, 2008 1:05 PM | 4 Comments

    SQL Server 2008 has introduced a few new data types, among others the new date types, like date, time, datatime2 and datetimespan. Because .Net 2 was released before SQL Server 2008 has introduced these data types, there are no classes that map to these new types in .Net 2.

    But this has changed with .Net 2 SP1, which introduces the DateTimeOffset structure.

    So, what should we expect when we run an application without SP1?
    When we connect to SQL Server 2008 using .Net 2, the version of the runtime of the connecting application will determine the .Net data type that is returned to the user. For example, for a SQL datetimespan column without  .Net SP1 we get back a string, with SP1 we get a DateTimeOffset. Note that this happens even if you compile your application on a machine with SP1. So even though you have compiled and tested your application on .Net 2 SP1, a customer who runs the client without SP1 may experience a different behaviour. So what are these differences:

    We will use a simple table that contains the new date data types:

    CREATE TABLE someProblematicTypes
        (
          c_date date
        , c_time time
        , c_datatime2 datetime2
        , c_datetimeoffset datetimeoffset
        )

    Insert a single row:

    INSERT  INTO someProblematicTypes
    VALUES  (
              GETDATE()
            , GETDATE()
            , GETDATE()
            , GETDATE() )

    Then we create a C# application that uses an SqlDataReader  to retrieve the information about the data stored in this table. We compile this on a machine with .Net2 SP1, and run it on two machines, one with .Net 2 SP1 and one without. The instance of the SqlDataReader returns an object for the various columns, and the values and their types for the above table are:


    Without SP1:

    Column c_date
       Value:2008-06-20
       Type :System.String
    Column c_time
       Value:2008-06-20
       Type :System.String
    Column c_datatime2
       Value:2008-06-20 12:11:28.6000000
       Type :System.String
    Column c_datetimeoffset
       Value:2008-06-20 12:11:28.6000000 +00:00
       Type :System.String


    With SP1:

    Column c_date
       Value:20/06/2008 00:00:00
       Type :System.DateTime
    Column c_time
       Value:20/06/2008 00:00:00
       Type :System.DateTime
    Column c_datatime2
       Value:20/06/2008 12:11:28
       Type :System.DateTime
    Column c_datetimeoffset
       Value:20/06/2008 12:11:28 +00:00
       Type :System.DateTimeOffset


    That the results are different, and let's not forget that it is the same executable that is being run. Without .Net2 SP1 we are getting strings instead of DateTime and the new DateTimeOffset. This could upset the application, and if our application was expecting a DateTime type, and got a string, we could easily get an invalid cast exception.

    Also, we must be careful when we use the new DateTimeOffset type in our code. The application that is compiled and tested on a machine with SP1 will by default start on a client machine that does not have .Net 2 SP1. However, when it gets to the new data type, it will throw an unknown type exception. Unfortunately, since .Net 2 SP1 is relatively new (released November 2007), many have not upgraded to it. There are two general solutions to this. Either force the your customers to upgrade to .Net 2 SP1, or your application must be prepared for the above type differences and must be tested both with and without .Net 2 SP1.
     

  • SQL Server 2008: New languages and older operating systems

    Posted Monday, April 21, 2008 11:36 AM | 2 Comments

    SQL Server 2008 introduces a few new languages like Welsh, Tibetan and Norwegian. While these new languages map to the codepages in Windows Server 2008, in earlier operating systems (like XP) this is not so, and simple queries like “select * from mytable” can break with the SQL error 4078.


    The problem occurs if you use a collation that uses one of the new languages in a char, varchar or text column. In queries that return such a non-unicode column SQL Server relies on the client to use the appropriate codepage when displaying or converting the returned characters to Unicode. If the codepage does not exist on the client side, the query will fail. For example, in the following we declare a table variable with a varchar column that uses the new FRISIAN_100_CS_AS collation.


    DECLARE @t TABLE (

          id INT PRIMARY KEY IDENTITY(1, 1),

          data VARCHAR(100) COLLATE FRISIAN _100_CS_AS)

                                                                  

    INSERT  INTO @t VALUES  ( N'aaa' )

    SELECT data FROM @t


    The above statement will work fine on the computer on which SQL Server 2008 is installed. If you execute it in Query Analyzer on a computer with Windows XP, then you will get the following error:


    Server: Msg 4078, Level 16, State 1, Line 13

    The statement failed because column 'data' (ID=0) uses collation Welsh_100_CS_AS, which is not recognized by older client drivers. Try upgrading the client operating system or applying a service update to the database client software, or use a different collation. See SQL Server Books Online for more information on changing collations.


    (If it was run from a .Net application, the SqlException exception message is: “The Collation specified by SQL Server is not supported.”)


    (I use Query Analyzer only because with CTP6 of SQL Server 2008 you cannot use Management Studio 2005. However, you can use osql, sqlcmd, Query Analyzer, your own applications, etc.)


    The same problem occurs with .Net applications. Whether an application that executes the above query will actually work will depend on what operating system it is executed on.


    A reasonably simple workaround is not to use select *, and when querying columns that are based on the char, varchar, or text, cast them to nchar, nvarchar or ntext. In this case the data will be returned in Unicode (actually UCS-2), and the query will work on pre Windows Server 2008 systems. In a .Net application you would also need to collate the result to a known collation, like latin1_general.

     

    The new languages in SQL Server 2008 that seem to be affected are:


    Chinese_Traditional_Stroke_Count, Chinese_Traditional_Bopomofo, Chinese_Simplified_Pinyin, Chinese_Simplified_Stroke_Order, Chinese_Traditional_Pinyin, Chinese_Traditional_Stroke_Order, Danish_Greenlandic, Japanese_XJIS, Japanese_Bushu_Kakusu, Norwegian, Romansh, Serbian_Latin, Serbian_Cyrillic, Bosnian_Latin, Bosnian_Cyrillic, Urdu, Persian, Mapudungan, Upper_Sorbian, Bashkir, Maltese, Sami_Norway, Sami_Sweden_Finland, Turkmen, Bengali, Assamese, Pashto, Tibetan, Welsh, Khmer, Lao, Frisian, Tamazight, Nepali, Azeri_Latin, Azeri_Cyrillic

  • SQL Compare 7 Beta and peeking into SQL Server backup files

    Posted Friday, March 28, 2008 10:51 AM | 2 Comments

    I want to restore only a single table from my database backup. We have been hearing this request from many of our SQL Backup customers. The problem is simple: Given a large database backup and a live database that has a partial “corruption”, e.g. the data in a single table has been accidentally modified and committed, the DBA would like to have the table, and nothing else, restored. Or, another example is when a particular stored procedure has been modified, and we would like to get back the original version. This problem is a bit worse if we do not even know which backup file contains the right information.


    In SQL Server the only way to solve the above problem is to restore the whole backup file to a new database, and copy the table data or the stored procedure to the current database.


    Well, this problem is now history. We have just released the first public beta of SQL Compare 7. This tool, which is primarily used for comparing and synchronizing two database schemata, can now read the schema information directly from a backup file. This complements well SQL Data Compare 6.1 which already has this feature at the data level. These two products open up a lot of new recovery opportunities.


    If you want to restore a few stored procedures, you just compare your live database with a backup file, and you can explore the differences in your schema objects. It may be that the modified stored procedure has a few dependent stored procedures that you also wish to restore. After selecting all the schema objects you wish to retrieve from the backup files, you can either restore them or create and save a synchronization script. When it comes to tables, Data Compare can do even more. Not only can you restore the data in a particular table, you can do this at a much finer level of granularity. First, you can see the differences between a table in the backup and the live database, and you can control which individual rows should be restored.


    Anyway, back to SQL Compare 7 Beta 1. You can download it from http://www.red-gate.com/messageboard/viewtopic.php?t=6730


    It was quite a challenge to implement reading schemata information from backup files. First, the core system tables are different in SQL Server 2000, 2005 and 2008. Some objects, like XML schema collections, have very cryptic representations, and there is of course backup compression in SQL Server 2008. We are still working on row and page compressions. Of course we support Red Gate’s SQL Backup files, both compressed and encrypted ones, but the tool works with SQL Server’s native backups just as well.


    Another major feature for this beta is support for SQL Server 2008. Although we will not see SQL Server 2008 RTMed until at least Q3 2008, when it is out, we hope to have SQL Compare 7 ready and waiting for it. In the meantime, you are encouraged to download and try this beta. Please let us know if you find any issues with it, or if you think that there is something we have missed from this product. The forum for this beta is on http://www.red-gate.com/messageboard/viewforum.php?f=78

     

    Regards,

      Andras

  • SQL Server 2008 CTP6 takes a day off

    Posted Friday, February 29, 2008 4:55 PM | 1 Comments

    SQL Server 2008 CTP6 does not work on February 29th. Period.

    On one hand, it is really nice that Microsoft is releasing Community Technology Previews for its latest SQL Server. It helps developers and DBAs to test their solutions and get familiar with the next version of SQL Server. On the other hand, since Microsoft still has many months before the final release, there are many bugs in these CTPs. This is certainly expected, this is the point of these CTPs. But there was one bug that nearly killed me.

    As a last minute preparation for tomorrow’s SQL Bits event in Birmingham, I decided to check that my demo scripts work properly. I tried to connect to my SQL Server instance on my laptop, and was greeted with an unpleasant error message. Initially I was not worried, after a few minutes of fluffing around I started to blame the screen capturing application I’ve been asked to install for the event. After some more headache, I though, it was not worth the time, so I installed SQL Server 2008 CTP6 on a second laptop. Surprise! The installation has failed too. Same happened when I tried to install it on a virtual server, etc. So now, after three hours wasted, I’ve found a relevant bug report https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=330614

     

    Apparently, SQL Server 2008 CTP6 does not work on 29 February. It is ironic that this is just two days after the launch of the product (the launch has nothing to do with the release).

    Changing the system time to 1 March has temporarily resolved this issue :), but the bald patch on my head has grown today.

  • Type less in SQL Server 2008 (variable initialization and multiple row inserts)

    Posted Thursday, February 28, 2008 9:34 AM | 2 Comments

     

    Among the large number of new features in SQL Server 2008 there are two, almost insignificant features, that I use almost constantly. These are variable initialization and the values clause of insert statements that now accept multiple rows.

    When playing with SQL one inevitably declares some variables. After the declaration, and as per the SQL standard, we can use the SET statement to assign a value to a variable:

     

    DECLARE @a int, @b int

    SET @a = 1

    SET @b = 2

     

    If we do not stick to the standard, then we can use a SELECT statement to assign values to several variables in a single statement:


    SELECT @a = 1, @b =2


    Sometimes, using SELECT is indeed our only choice, e.g. when we want to get both the row count and the error code from the last statement. There is no SET alternative to the following statement:

     

    SELECT @RowCount = @@ROWCOUNT, @ErrCode = @@ERROR

     

    Anyway, SQL Server 2008 allows us to declare and initialize variables in a single statement, like:


    DECLARE @a int = 1, @b int = 2

     

    While I still do not understand why saving on the number of lines of code on my screen makes me feel better (two 19’’ screens provide me with plenty of space :) ), somehow I like this feature.

     

    Another small extension to T-SQL that prevents me from including more typos is adding multiple rows in an insert statement:

     

    DECLARE @table TABLE ( a INT, b INT )

    INSERT INTO @table VALUES (1,1), (2,2), (3,3)

     

    The INSERT statement here inserts three rows into the @table. This is something I got used to before, and was missing from SQL Server.

     

    It is nice to see that in SQL Server 2008 Microsoft is adding such syntactic sugar in addition to the number of other improvements, like spatial data types, data compression, transparent data encryption, etc. I’ve been talking about these improvements at VBUG and NxtGen events during the last few weeks, so in my blog I’d also like to say thanks to the people who attended and organized these events.

  • Can I redeclare a variable in T-SQL? Teaser for SQL Bits

    Posted Friday, February 15, 2008 1:17 PM | 2 Comments

    Variables in T-SQL behave differently from what people may be used to in languages like C or C#. The scope of a variable is the batch in which it is declared. This may not be intuitive to people who are used to local variables. So let’s look at an interesting example:

     

    DECLARE @counter INT

    SET @counter = 0

     

    WHILE @counter < 100

        BEGIN

            DECLARE @value INT

            SET @value = ISNULL(@value, 0) + 5

            SET @counter = @counter + 1

        END

     

    SELECT  @value

     

     

    In this example we have a simple loop. The loop counter is a variable @counter. What is interesting that inside the loop we have a variable declaraion for @variable. We assign it a value (if it was null we set it to 5, if it was not null, we increment it by 5). Does this make sense? How could I ever consider the previous value if I’ve just declared the variable? What is unusual for many developers, is that the declaration is evaluated only once. This means that in the second iteration of the loop the @value variable is already declared and set, so we are incrementing it by five in every subsequent iteration.

    At the end of the loop we can access the @value variable (do not forget that the scope is the whole batch), and the value will be 500 (100 iterations, 100 times adding 5).

    Variable declarations are evaluated at the time of parsing, and the scope is the whole batch. Because of this, you cannot have two declare statements for the same variable in a batch, even if the only one declaration is reachable in the code. For example, the following statement will result in a parsing error:

    IF 1 = 1

        DECLARE @a INT

    ELSE

        DECLARE @a INT

     

    The error is:

    Msg 134, Level 15, State 1, Line 4

    The variable name '@a' has already been declared. Variable names must be unique within a query batch or stored procedure.

     

     

    If you would like to know more about how transient data, like the above variable, is handled in SQL Server, come to my session at SQL Bits in Birmingham (UK) on 1 March, 2008. Like the previous SQL Bits event it is free. The last one was held in Reading half a year ago, and with 20 sessions and over 300 people attending it, it provides a unique opportunity to learn about SQL Server, as well as to meet your fellow DBAs and SQL developers in the UK. I will talk about transient data in SQL Server, and you can learn about what is stored in tempdb, how the tempdb can be a performance bottleneck, what the differences between temporary tables and table variables are, what is stored in the transaction log file and how its can size be kept under control.

  • Some interesting, obscure (and absolutely useless) T-SQL syntax

    Posted Wednesday, November 28, 2007 10:05 AM | 3 Comments

    /*
    Writing ugly SQL Statements is an art. Today I've spent
    a few minutes trying to break our parser as well as giving
    a grill to SQL Server's parser.
    So here is some extra syntax that is not behaving as
    I'd expect it to behave.
    */

    select 1.a

    -- Yes, this is legal, surprise surprise it returns a
    -- table with a single column, a single row, the value
    -- is 1, the column name is "a" !

    -- So let's push it a bit further:

    select -1.a

    /*
    Works, the result is:
    a
    ---------------------------------------
    -1
    */

    select -1/2.[-1/2]

    /*
    Wow, this works too, and it gives me
    -1/2
    ---------------------------------------
    -0.500000

    So can I just put anything after the ?decimal? point?
    */

    select 1.SomeString

    /*
    SomeString
    ---------------------------------------
    1
    */


    -- But then there are exceptions:
    select 1.a, 1.e, 0.a

    /*
    a                                  a
    ----------- ---------------------- ----------------
    1           1                      0.0
    sa

    1.e does not produce the alias (e for exponent???),
    0.a seems to use float, so the value displayed is 0.0

    It is a shame that you cannot use this with string literals.
    Unless you want to torture your colleagues I reckon that
    the above is absolutely useless. But if you have an idea why
    the above works, and works the way it works, do let me know.

    PS: the above syntax works on 2000,2005 and 2008.

      Andras
    */

  • SQL Server 2008: Microsoft has given, and Microsoft has taken away (powersum)

    Posted Thursday, November 22, 2007 12:14 PM | 8 Comments

    SQL Server 2008 November CTP came out this week. This CTP finally contains many major features that are worth playing with. However, there are things that are no longer in SQL Server, things that I will be missing (and many things that I’ll be happy to see go).

     

    One of my favourite system functions is powersum. This is not documented, so Microsoft cannot be blamed at all for removing it (Well, not really removing it, but just denying public access to it. It is very useful, so Microsoft keep using it internally). However, I liked using this on 2005 to create bit arrays from numbers. Powersum is an aggregate. It takes numbers, and it returns a varbinary. It takes its input number, and SHIFTs 1 to the left by this number, then ORs this with the aggregated value.


    For example the following query returns 0x16:


    SELECT  powersum(col)

    FROM    ( SELECT    1

              UNION ALL

              SELECT    2

              UNION ALL

              SELECT    4

            ) AS data ( col )

     

    0x16’s binary representation is 10110 (so the bits at position 1, 2 and 4 are set).


    I’m really sorry to see this function go. I used it when returning column permissions in SQL Compare. It allowed me to group column permissions together, consequently my queries were sending less data. It also allowed building query results in 2005 that were similar to SQL Server 2000’s syspermissions table.

     

    However, the access restriction to this useful function reminds me that while there are new features in SQL Server 2008, some obsolete features and syntax will go away, just like it has happened with SQL Server 2005. I’ve seen many people finally migrating to the “new” join syntax, and abandoning the =* syntax in 2005. Since SQL Server 2008 does not support compatibility level 70 and before, this syntax will finally die out.

     

    2008 also seems to remove the DUMP and LOAD keywords. People should use BACKUP and RESTORE instead, and it is a good time to look at all those old maintenance jobs.

     

    Books online mentions a few other changes that we should expect in SQL Server 2008. (The page is Deprecated Database Engine Features in SQL Server 2005). However this list is likely to change. For example this page claims that SETUSER will be removed from 2008, but it still seems to work (let’s not forget though that SQL Server 2008 is not yet released). It is worth however to keep a close eye on this page as well as the "Discontinued Database Engine Functionality in SQL Server 2008" page in 2008’s Books online.

  • SQLBits in Reading and the British language

    Posted Monday, October 08, 2007 9:35 AM | 0 Comments

    SQLBits was held in Reading this Saturday and it certainly superseded my expectations. There were over 300 people, excellent talks and plenty of opportunities to meet others.


    There were four parallel sessions, one for BI, Dev, Katmai and DBA, and since there is only one me, it was, as always, difficult to choose.


    One thing that probably tops my list of interesting findings for the day is the “British language”. Since the event was in the UK, this was of course bound to happen. And it did happen at least four times in Dave McMahon’s entertaining session. He liked using “SET LANGUAGE ‘British’”. This is a statement that certainly increases the comfort level of SQL Server. It is useful when you want to parse date strings in your session, and these strings are following the format of a different culture. Let’s say, they are in the format commonly used in the UK. You may not want to play with SET DATEFORMAT, just type in “SET LANGUAGE ‘British’”.  Note that sys.messages is looked up based on the language_id, which in case of British is still 1033, so do not expect different error messages after using SET LANGUAGE ‘British’ :)


    There were of course more interesting bits at this event, but I leave them for later. For now I just want to say thanks to Simon Sabin and Tony Rogerson for organizing the event, and of course, many thanks to those who attended my session.

  • The GO command can have a parameter?

    Posted Monday, September 24, 2007 1:57 PM | 2 Comments

    I have mixed feelings about the GO command. It is not a T-SQL statement, it is just something that Management Studio and the other SQL Server tools understand as a batch separator command. Indeed, you can change it to whatever you wish in Management Studio under Tools->Options->Query Execution->SQL Server->General->Batch separator.
    So you can write queries like

    SELECT * FROM sys.objects
    foo
    SELECT * FROM sysobjects
    foo

    Of course I'm still struggling to find a reason why someone would change the GO command.

    One thing I've found out recently is its parameter. SQL Server Management Studio seems to accept an integer after the GO command, and this will start an execution loop. For example if you write

    PRINT 'Hello word'
    GO 5

    The result will be:

    Beginning execution loop
    Hello word
    Hello word
    Hello word
    Hello word
    Hello word
    Batch execution completed 5 times.


    This is perfect for lazy moments when I want to populate a test table with some default values like:

    CREATE TABLE foo
        ( a INT PRIMARY KEY IDENTITY
        , b INT DEFAULT 1
        )
    GO

    I usually write something like:
    INSERT TOP (10) INTO foo (b) SELECT 1 FROM sys.objects

    but with the parameter to the GO command the above can be achieved with even less typing:

    INSERT INTO foo DEFAULT VALUES
    GO 10

    The above will also insert 10 rows :)
    Do let me know if you find a more interesting use for this parameter.

        Andras
  • Pivots with Dynamic Columns in SQL Server 2005

    Posted Friday, September 14, 2007 9:28 AM | 28 Comments

    Pivots in SQL Server 2005 can rotate a table, i.e. they can turn rows into columns. PIVOTs are frequently used in reports, and they are reasonably simple to work with. However, I've seen quite a few questions about this operator. Most questions were about the column list in the PIVOT statement. This list is fixed, but many times the new columns are determined by the report at a later stage. This problem is easily solved when we mix pivots with dynamic SQL, so here is a very simple example about how to dynamically generate the pivot statement:

    PIVOT allows you to turn data rows into columns. For example, if you have a table like this (I use only three months here for simplicity):

    CREATE TABLE Sales ([Month] VARCHAR(20) ,SaleAmount INT)
    
    INSERT INTO Sales VALUES ('January', 100)
    INSERT INTO Sales VALUES ('February', 200)
    INSERT INTO Sales VALUES ('March', 300)
    
    SELECT FROM SALES
     
    
    Month             SaleAmount
    ----------------  -----------
    January           100
    February          200
    March             300 
    
    

    Suppose we wanted to convert the above into this:

     
    January     February    March
    ----------- ----------  ----------
    100         200         300
    
    

    We can do this using the PIVOT operator, as follows:

    SELECT  [January]
          [February]
          [March]
    FROM    SELECT    [Month]
                      SaleAmount
              FROM      Sales
            p PIVOT SUM(SaleAmount)
                        FOR [Month] 
    IN ([January],[February],[March])                   ) AS pvt

    However, in the above example, I have the column names fixed as the first three months. If I want to create a result in which the columns are dynamic (for example, they are read from a table), then I need to construct the above query dynamically. To demonstrate this let’s look at the following example:

    In the first table I have the column names I want to use:

    CREATE TABLE Table1 (ColId INT,ColName VARCHAR(10))
    INSERT INTO Table1 VALUES(1, 'Country')
    INSERT INTO Table1 VALUES(2, 'Month')
    INSERT INTO Table1 VALUES(3, 'Day')
    
    

    In the second table I have the data. This consists of a row identifier (tID), a column ID (ColID) that refers to the column type in Table1, and a value:

    CREATE TABLE Table2 (tID INT,ColID INT,Txt VARCHAR(10))
    
    INSERT INTO Table2 VALUES (1,1, 'US')
    INSERT INTO Table2 VALUES (1,2, 'July')
    INSERT INTO Table2 VALUES (1,3, '4')
    INSERT INTO Table2 VALUES (2,1, 'US')
    INSERT INTO Table2 VALUES (2,2, 'Sep')
    INSERT INTO Table2 VALUES (2,3, '11')
    INSERT INTO Table2 VALUES (3,1, 'US')
    INSERT INTO Table2 VALUES (3,2, 'Dec')
    INSERT INTO Table2 VALUES (3,3, '25')
    
    

    Now I would like to retrieve data from these two tables, in the following format:

    
    tID         Country    Day        Month
    ----------- ---------- ---------- ----------
    1           US         4          July
    2           US         11         Sep
    3           US         25         Dec 
    
    

    In other words I want to turn the data rows in Table2 into columns. If I had a fixed set of columns for the result, i.e. the columns Country, Day, and Month were fixed, I could use SQL Server 2005’s PIVOT operator in a query like:

    SELECT  tID
          [Country]
          [Day]
          [Month]
    FROM    SELECT    t2.tID
                      t1.ColName
                      t2.Txt
              FROM      Table1 AS t1
                        JOIN Table2 
    AS t2 ON t1.ColId t2.ColID         p PIVOT MAX([Txt])                     FOR ColName IN [Country][Day],                                      [Month] ) ) AS pvt ORDER BY tID ;

    However I need to construct this query dynamically, because the column names Country, Day, and Month are specified in a table, and can be changed independently from my query. In our case these columns are given in Table1.

    In the first step to generate the final pivot query I need to create the list of columns, in this case [Country], [Day], [Month].

    Since there is no string concatenation aggregator in SQL (a concatenation aggregator would not be deterministic without some order restriction), and since the column names are stored in rows of a table, I need to flatten these columns into a single row or variable. There are various solutions to achieve this. One solution would be to use a query like:

    DECLARE @cols NVARCHAR(2000)
    SELECT  @cols COALESCE(@cols ',[' colName ']',
                             '[' colName ']')
    FROM    Table1
    ORDER BY colName
    
    

    This query works both on SQL Server 2000 and 2005. It is efficient, but some may not like it because it uses the same variable (@cols) on both sides of an assignment. Another solution that works on SQL Server 2005 only is to use XML PATH.

    DECLARE @cols NVARCHAR(2000)
    SELECT  @cols STUFF(( SELECT DISTINCT TOP 100 PERCENT
                                    '],[' t2.ColName
                            FROM    Table1 AS t2
                            ORDER BY '],[' t2.ColName
                            FOR XML PATH('')
                          ), 12'') + ']'
    
    

    This second query (I’ve seen this posted by Peter Larsson) has the advantage that it does not use the @cols variable on the right hand side. I like this solution more, since this can be extended as a general string concatenation aggregate in more complex queries.

    Both of the above queries generate, from Table1, the string: ‘[Country],[Day], [Month]’. This column list is used twice in the pivot query that we aim to construct. Once it is use in the list of columns that we want to retrieve, and once it is used as the list of values that should become columns. Having constructed this list of columns above, we can just concatenate it with the missing parts of the pivot query like:

    DECLARE @query NVARCHAR(4000)
    SET @query N'SELECT tID, '+
    @cols +'
    FROM
    (SELECT  t2.tID
          , t1.ColName
          , t2.Txt
    FROM    Table1 AS t1
            JOIN Table2 AS t2 ON t1.ColId = t2.ColID) p
    PIVOT
    (
    MAX([Txt])
    FOR ColName IN
    ( '+
    @cols +' )
    ) AS pvt
    ORDER BY tID;'
    

    Executing this with

    EXECUTE(@query)

    will give us the expected result: a table that is pivoted and shows columns that were specified in a table:

    tID         Country    Day        Month
    ----------- ---------- ---------- ----------
    1           US         4          July
    2           US         11         Sep
    3           US         25         Dec
    
  • Do you need to be dbo to compare two databases using SQL Compare?

    Posted Monday, July 30, 2007 10:01 AM | 1 Comments

    This question comes up time to time. SQL Compare compares two database schemata, and in order to do so it needs to read the system tables and system views. On the other hand, some organizational policies can be rather restrictive about who and with what permissions are allowed to connect to a production system.

     

    The more complex databases make use of object owners to organise the database objects. Objects like tables, views, user defined types can be owned by individual users/schemas. Under SQL Server 2000, in order to read all the schema information, you do need to be dbo, otherwise you will not be able to see some of the object definitions, or you will not be able to learn about the existence of certain database objects.

     

    Under SQL Server 2005 however there is a permission (VIEW DEFINITION) to control access to viewing object definitions. To add this permission to user UserA in a particular database run:

     

    GRANT VIEW DEFINITION TO UserA;

     

    You can also grant this permission at server level by executing:

     

    GRANT VIEW ANY DEFINITION UserA;

     

    The nice thing about this is that with the help of the above permission you are not granting permission to modify the database schema, so you can allow people to use SQL Compare to compare and monitor schema changes without allowing them to modify the databases in question. You can still generate a synchronization script, but that you can later execute as a different user.

     

    Andras

More Posts Next page »


















<February 2010>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
28123456
78910111213
Finding Stuff in SQL Server Database DDL
 You'd have thought that nothing would be easier than using SQL Server Management Studio (SSMS) for... Read more...

Mission Critical: SQL Server 2008 Performance Tuning Task List
 In which Buck Woody imagines how the US military would have tackled DBA checklists for... Read more...

Simple Query tuning with STATISTICS IO and Execution plans
 A great deal can be gleaned from the use of the STATISTICS IO and the execution plan, when you are... Read more...

Switching rows and columns in SQL
 When they use SQL Server, one the commoner questions that Ms Access programmers ask is 'Where's the... Read more...

Writing Efficient SQL: Set-Based Speed Phreakery
 Phil Factor's SQL Speed Phreak challenge is an event where coders battle to produce the fastest code to... Read more...