Forums (RSS 2.0)" href="http://www.simple-talk.com/community/forums/rss.aspx?ForumID=-1&Mode=0" />
Click here to monitor SSC

Do Alpacas Dream of farming DBAs?

Last post 09-10-2007, 5:13 AM by Anonymous. 36 replies.
Page 1 of 3 (37 items)   1 2 3 Next >
Sort Posts: Previous Next
  •  08-10-2007, 8:53 AM Post number 71373

    Do Alpacas Dream of farming DBAs?

  •  08-16-2007, 10:40 AM Post number 34851 in reply to post number 71373

    TRUNCATE TABLE dbo.tblOrphans

    Then blame it on the Alpaca that you let loose in the server room. You could always blame the script provided by the vendor. The business obviously isn't impressed with the application if they are replacing it, so you could probably get away with it. The data will be gone when that happens anyway. I wouldn't do something like this with important data, but if it's so unimportant it's being truncated after a month as it is.

    I suppose the solution that would fulfill the requirements would involve partioning the table, moving the partition onto its own file and moving the file onto the drive with the free space. But that seems like more work than the vendor put into their solution, so the hell with them.
  •  08-16-2007, 5:31 PM Post number 34864 in reply to post number 71373

    • KatanaCS is not online. Last active: 09-14-2007, 1:45 PM KatanaCS
    • Not Ranked
    • Joined on 10-17-2006
    • San Diego, California
    • Level 1: Deep thought

    1st question - Version of SQL Server & DB

    What version of SQL Server is this on? And what is the compatibility mode for the specified database?
  •  08-16-2007, 8:15 PM Post number 34872 in reply to post number 71373

    • Rodney is not online. Last active: 01-16-2012, 8:00 PM Rodney
    • Top 25 Contributor
    • Joined on 03-29-2006
    • Pensacola, Florida
    • Pentium Dual Core

    Good to See

    Timothy,
    Good to see this article. I am nursing a fractured right hand so writing has been slow-going. I deal with the same issues daily. I will post my solution for this (don't need the goodies). Deletes are out of the question for this many records. The best thing to do is BCP a million recors out at a time of unwanted data, then move the remaining "wanted" data to a staging table and then truncate the original, move the "wanted" data back and load the historical data off to archive (if needed). Then some clean up for indexes. Do not run the vendor scripts. In fact...burn them. And further...Stop those users from surfing so much at work.
    Anyway...good to see you here and I look forward to more articles in the future. This is the good stuff.
    Rodney Landrum
    Author "Pro SQL Server 2008 Reporting Services (Apress):
  •  08-17-2007, 2:28 AM Post number 34882 in reply to post number 71373

    • WBrewer is not online. Last active: 01-12-2012, 12:25 PM WBrewer
    • Top 50 Contributor
    • Joined on 06-30-2006
    • London UK
    • Level 2: Deep Blue

    My solution.

    If Rodney wins a goodie bag, I want it if he doesn't. The pens really work.

    My favourite solution to this would be a quiet one. I'd put the recovery mode for the database into 'simple', and shrink the log. I'd get enough working space by moving stuff around. Then I would run a routine on the agent that took a chunk out of the two tables every minute. (How much of a chunk would have to be established by experiment) I reckon that there are sufficient indexes on the table to allow this. This chunk would have to be more than the rate of insertions per minute. (I've popped in the figure 2000 as a guess as to what would work without causing problems)

    DELETE FROM tblOrphans
    WHERE fieldx IN (
    SELECT TOP 2000 fieldx
    FROM tblOrphans
    LEFT OUTER JOIN tblParent1
    ON tblOrphans.FieldX = tblParent1.FieldX
    LEFT OUTER JOIN tblParent2
    ON tblOrphans.FieldX = tblParent2.FieldX
    WHERE tblParent2.fieldx IS NULL
    AND tblParent1.fieldx IS NULL )

    The average DBA will want to rush in, having changed in a telephone booth, and do something totally radical to gasps of admiration from the onlookers, but when I came across a similar problem I figured out that, whereas an increasing table size meant misery, a decreasing table size meant happiess.

    Alternatively, one could nip down to the local computer store, grab a nice big hard disk and copy the existing one onto it using one of these clever drive copying applications. (I won't mention names as I'm damned if I'll give them a plug)

    A slightly more radical solution would be to stop the database, copy the mdf off onto a nice big server with plenty of room to breathe. Restart it. Attach the file on the new server and work your wicked way on it without fear of locking, logfile growth etc etc. Once you have the database short-back-and-sides and fully tested, then return it to the errant server.
  •  08-17-2007, 9:34 AM Post number 34893 in reply to post number 71373

    Another possible solution...

    Throw the database in single user mode, to turn off the app (middle of the night, or whatever)

    Create a new filegroup with a file on the C: drive--plenty of room there

    Create a new table with the same columns as tblOrphans, on the new filegroup

    Insert all of the qualifying rows:

    INSERT NewTable
    SELECT *
    FROM tblOrphans
    WHERE FieldX IN (SELECT FieldX FROM tblParent1)
    AND FieldX IN (SELECT FieldX FROM tblParent2)

    Drop tblOrphans.

    Move the data back to the primary filegroup and the table tblOrphans, using SELECT INTO.

    Put the database back into multiuser mode and move on with life.


    Looking for an advanced SQL Server 2005 book?

    Expert SQL Server 2005 Development
  •  08-17-2007, 10:02 AM Post number 34895 in reply to post number 71373

    wait a moment...

    I screwed up the "qualifying rows" query in the last post. Try this instead:


    INSERT NewTable
    SELECT *
    FROM tblOrphans
    WHERE FieldX IN (SELECT FieldX FROM tblParent1)
    OR FieldX IN (SELECT FieldX FROM tblParent2)


    Looking for an advanced SQL Server 2005 book?

    Expert SQL Server 2005 Development
  •  08-17-2007, 1:31 PM Post number 34912 in reply to post number 71373

    My solution

    -- Temporarily drop this space-mongering index
    
    DROP INDEX dbo.tblOrphans.idx1
    GO
    -- Temporarily drop this space-mongering index
    DROP INDEX dbo.tblOrphans.idx2
    GO

    -- Space savings from dropped indexes allow us to
    -- save off the "wanted" records into a new table
    SELECT  o.FieldXo.FieldY
    INTO    dbo.tblOrphans_new
    FROM    dbo.tblOrphans o 
       
    INNER JOIN SELECT  FieldX
                     
    FROM    dbo.tblParent1
                     
    UNION
                     SELECT  
    FieldX
                     
    FROM    dbo.tblParent2
                   

        
    ON t.FieldX o.FieldX
    GO

    -- Kill the old bloated table
    DROP TABLE dbo.tblOrphans
    GO

    -- Rename the new to the old
    EXEC sp_rename 'dbo.tblOrphans_new''tblOrphans''OBJECT'
    GO

    -- Create index on the parent to ease the joins below
    CREATE NONCLUSTERED INDEX IX_tblParent1_FieldX 
      
    ON dbo.tblParent1 FieldX )
      
    WITH PAD_INDEX = OFFSORT_IN_TEMPDB = OFFDROP_EXISTING = OFF,
      
    IGNORE_DUP_KEY = OFFONLINE = OFF ON [PRIMARY] 
    GO
    -- Create index on the parent to ease the joins below
    CREATE NONCLUSTERED INDEX IX_tblParent2_FieldX 
      
    ON dbo.tblParent2 FieldX )
      
    WITH PAD_INDEX = OFFSORT_IN_TEMPDB = OFFDROP_EXISTING = OFF,
      
    IGNORE_DUP_KEY = OFFONLINE = OFF ON [PRIMARY] 
    GO

    SET QUOTED_IDENTIFIER ON
    SET 
    ANSI_NULLS ON
    GO

    -- Cleanup future orphans as appropriate
    CREATE TRIGGER dbo.TR_AD_tblParent1 ON dbo.tblParent1
      AFTER 
    DELETE
    AS
      DELETE  
    o
      
    FROM    dbo.tblOrphans o 
         
    INNER JOIN SELECT  d.FieldX
                          
    FROM    deleted d LEFT JOIN dbo.tblParent2 t 
                          
    ON t.FieldX d.FieldX
                          
    WHERE   t.FieldX IS NULL
                      ) 

      
    ON l.FieldX o.FieldX
    GO

    -- Cleanup future orphans as appropriate
    CREATE TRIGGER dbo.TR_AD_tblParent2 ON dbo.tblParent2
      AFTER 
    DELETE
    AS
      DELETE  
    o
      
    FROM    dbo.tblOrphans o 
         
    INNER JOIN SELECT  d.FieldX
                         
    FROM    deleted d 
                         
    LEFT JOIN dbo.tblParent1 t 
                         
    ON t.FieldX d.FieldX
                         
    WHERE   t.FieldX IS NULL
                     ) 

          
    ON l.FieldX o.FieldX
    GO

    -- Replace dropped index now that space is not an issue
    CREATE UNIQUE NONCLUSTERED INDEX idx1 
      
    ON dbo.tblOrphans FieldX ASCFieldY ASC )
      
    WITH PAD_INDEX = OFFSORT_IN_TEMPDB = OFFDROP_EXISTING = OFF,
      
    IGNORE_DUP_KEY = OFFONLINE = OFF ON [PRIMARY]
    GO
    -- Replace dropped index now that space is not an issue
    CREATE UNIQUE NONCLUSTERED INDEX idx2 
      
    ON dbo.tblOrphans FieldY ASCFieldX ASC )
      
    WITH PAD_INDEX = OFFSORT_IN_TEMPDB = OFFDROP_EXISTING = OFF,
      
    IGNORE_DUP_KEY = OFFONLINE = OFF ON [PRIMARY]
    GO
  •  08-17-2007, 1:32 PM Post number 34913 in reply to post number 71373

    My solution --- take 2

    It butchered my code formatting. I hope you can follow it.

    (Ed: I liked the code so much I went in and reformatted it
    to make it easier to follow. I hope you don't mind!)

  •  08-19-2007, 2:49 PM Post number 35015 in reply to post number 71373

    • LukCAD is not online. Last active: 11-10-2010, 1:42 PM LukCAD
    • Not Ranked
    • Joined on 11-19-2006
    • Belarus
    • Level 1: Deep thought

    good example

    Thank you for your description of problem. Now i can image what will happend with programs where must be purge process executed periodically. It is like half-automatization or better to say manual mode by operator. But on my mind indx2 must be cutted off at all. I think it eats your server's space and executing time. Better to do materialized view (increase space of course of your database) but it will works fast and no needing the purge process many years.
  •  08-19-2007, 5:45 PM Post number 35024 in reply to post number 71373

    • FrankAu is not online. Last active: 30/10/2007, 7:44 AM FrankAu
    • Not Ranked
    • Joined on 08-19-2007
    • Sydney
    • Level 1: Deep thought

    This seems likely to moi ...

    I actually prepared this on ASE 12.5.3 hence the use of ROWCOUNT ... basically the number of rows smacked each iteration controls the impact on your log. I would look at using something like 10,000 although you would have to do some arithmetic to make sure the log demands can be met.

    DECLARE @rc INT
    SET ROWCOUNT 1
    SELECT @rc = 1
    WHILE @rc > 0
    BEGIN
    DELETE #tblOrphan
    FROM (#tblOrphan O
    LEFT OUTER JOIN #tblParent1 P1
    ON O.FieldX = P1.FieldX
    )
    LEFT OUTER JOIN #tblParent2 P2
    ON O.FieldX = P2.FieldX
    WHERE COALESCE(P1.FieldY, P2.FieldY) IS NULL
    SELECT @rc = @@rowcount
    END
    SET ROWCOUNT 0
    GO

    This should be OK to run while the system remains online although if transactions were being processed I would prolly be quite conservative on my ROWCOUNT setting to minimise lock contentions etc!

    Cheers Frank.
  •  08-22-2007, 1:24 PM Post number 35194 in reply to post number 71373

    Quite a few solutions.

    Read through all the solutions and many are forgetting that there is limited space and there is no mention if a new server is available. So based on all the solutions dleteing in small chunks should work but will require quite some effort and time. The other would be BCP out the data as suggested. Either way its a big problem as you have space crunch. Would like to see a solution that is way different and efficient then the regular DBA minds would thing of and do post the solution you will use to accomplish this issue. Good Luck!!!
  •  08-22-2007, 2:45 PM Post number 35199 in reply to post number 71373

    Space considerations

    Figuring on an average data size, the remaining 14 million rows would consume about 3.5GB. Dropping the indexes from the tables and adding them back later should free up plenty of space like David recommended. Moving truncating and replacing would be my best course recommendation and could be done in short order without the need to shuffle data around to different disks.
  •  08-22-2007, 3:01 PM Post number 35200 in reply to post number 71373

    My solution

    1) Backup database holding tblOrphan to C: drive.
    2) Create temporary database on C: drive recovery model simple.
    3) Script out tblOrphan table.
    4) Create tblOrphan table in new database on C: drive.
    5) Insert data to be saved into new tblOrphan table (anticipate 14mil records) at the rate of 100,000 per insert (or more if tempdb can handle it).
    6) Truncate tblOrphan table.
    7) Backup log file (for database with tblOrphan) to C: drive.
    8) Shrink the database (with tblOrphan table) to remove all empty space (minus amount of space needed to put 14mil records back in) + 10% (or whatever your growth rate is).
    9) Import tblOrphan data from tblOrphan table on C: drive (anticipate 14mil rows) 100,000 rows at a time (more if possible).
    10) Delete temporary database on C: drive.
    11) Create nightly job to run tblOrphan delete statement.
  •  08-22-2007, 6:25 PM Post number 35206 in reply to post number 71373

    My solution

    It looks like you need to get rid of at least 8 million records a day to permit 2 additional GB of data.

    I'm concerned about you using an INT for your primary key and using millions of records. I would consider changing it to a big int or even a unique identifier.

    For the future, I would recommend looking into creating a partitioned view and then redefine it when it's time to delete a table. Just add a column to partion on that is filled with a unique number each day (number of days since jan 1, 1970) so that after 30 days, you can drop them.

    To let the database operate on cleaning itself up on its own, I reccomend creating a job to remove data in chunks to prevent minimal performance cost to the end-users.

    Turn on auto-shrink

    Do a Full Backup.

    Create a job to run the following at least every 10 minutes:

    SET ROWCOUNT 60000
    Delete from tblOrphans where FieldX in(
    (
    SELECT TOP 60000 T1.FieldX
    FROM tblOrphans T1 WITH(NOLOCK)
    left outer join tblParent1 T2 WITH(NOLOCK) on
    T1.FieldX = T2.FieldX
    left outer join tblParent2 T3 WITH(NOLOCK) on
    T2.FieldX is null
    and T1.FieldX = T3.FieldX

    where isnull(T2.FieldX, T3.FieldX) is null
    ))

    add an additional step to the job to backup the transaction log for both the current and tempdb databases.

    You can also add the following to help get your schema situated and orphaned data removed in the future:

    ALTER TABLE dbo.tblOrphans ADD CONSTRAINT
    PK_tblOrphans PRIMARY KEY CLUSTERED
    (
    FieldX
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    go
    ALTER TABLE dbo.tblParent1 ADD CONSTRAINT
    PK_tblParent1 PRIMARY KEY CLUSTERED
    (
    FieldX
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    ALTER TABLE dbo.tblParent2 ADD CONSTRAINT
    PK_tblParent2 PRIMARY KEY CLUSTERED
    (
    FieldX
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO
    ALTER TABLE dbo.tblParent1 WITH NOCHECK ADD CONSTRAINT
    FK_tblParent1_tblOrphans FOREIGN KEY
    (
    FieldX
    ) REFERENCES dbo.tblOrphans
    (
    FieldX
    ) ON UPDATE NO ACTION
    ON DELETE NO ACTION
    GO
    ALTER TABLE dbo.tblParent2 WITH NOCHECK ADD CONSTRAINT
    FK_tblParent2_tblOrphans FOREIGN KEY
    (
    FieldX
    ) REFERENCES dbo.tblOrphans
    (
    FieldX
    ) ON UPDATE NO ACTION
    ON DELETE NO ACTION
    GO
    CREATE TRIGGER dbo.tblParent1Delete
    ON dbo.tblParent1
    AFTER DELETE
    AS
    BEGIN
    SET NOCOUNT ON;
    delete tblOrphans where
    exists (select top 1 0 from deleted with(nolock) where tblOrphans.FieldX = deleted.FieldX)
    and not exists(select top 1 0 from tblParent2 WITH(NOLOCK) where tblOrphans.FieldX = tblParent2.FieldX)


    END
    GO
    CREATE TRIGGER dbo.tblParent2Delete
    ON dbo.tblParent2
    AFTER DELETE
    AS
    BEGIN
    SET NOCOUNT ON;
    delete tblOrphans where
    exists (select top 1 0 from deleted with(nolock) where tblOrphans.FieldX = deleted.FieldX)
    and not exists(select top 1 0 from tblParent1 WITH(NOLOCK) where tblOrphans.FieldX = tblParent1.FieldX)


    END
    go


Page 1 of 3 (37 items)   1 2 3 Next >
View as RSS news feed in XML