发现还有300多GB,还以为数据库损坏了(新葡京32450网址:data corruption)

发布时间:2019-12-01  栏目:数据  评论:0 Comments

  
乍一看,还以为数据库损坏了(data corruption),但是在做完DBCC
CHECKDB后,发现其实数据库其实是完好无损的。那么肯定是跟Replication有关。但是在搜索了相关资料,仅仅在The
process could not execute
‘sp_repldone/sp_replcounters”
这篇博客中找到了类似错误的资料:

由于testdb是使用backup还原的测试数据库,没有在master中注册为Publisher
database,必须设置 database 为publish,表示 Database can be used for
other types of publications.

[7.0, 2000] Updated
1-6-2006

 

由于test db的还原模式是Simple,并且没有active
user,最大的可能性是db的Trasaction
log被标记为Replication,使用以下函数统计,发现有大量的log未被LogReader读取。

On the other hand, if most of the
activity in the Distribution database is caused by the Log Reader Agent
that is running often (or even continuously), and not other sources
(such as subscribers), then increasing the size of the CommitBatchSize
parameter makes more sense, as larger batch sizes mean that the Log
Reader Agent doesn’t have to work as often, which in turn reduces
overhead, boosting performance.

 

最后,使用DBCC ShrinkFile命令,Transaction Log File收缩完成。

In transactional replication, the
transaction log of the database involved in replication is both written
to and read from
. Without replication, a transaction log is almost
always written to, and rarely read from. Because a transaction log is
both written to and read from when using transactional replication, this
can cause I/O performance issues on databases that experience large
numbers of transactions.

 

 

While increasing the CommitBatchSize
sounds like a good idea, there is also a corresponding downside to
increasing it. The problem occurs because larger batch sizes mean that
the actual commit takes longer to occur, which in turn causes locks in
the distribution database to be held longer than if the commits were to
take less time. Locks, as you know, can reduce concurrency in a
database, reducing performance.

Common Causes

 

  • The last LSN in Transaction Log is less than what the LSN Log Reader
    is trying to find. An old backup may have been restored on top of
    Published Database. After the restore, the new Transaction Log
    doesn’t contain the data now distributor & subscriber(s) have.

  • Database corruption.

 

exec sys.sp_repltrans

Transactional replication
performance can sometimes be boosted by changing the PollingInterval
parameter of the Log Reader Agent
. By default, the Log Reader Agent
polls the published database’s transaction log every 10 seconds. If the
transaction log of the published database is very busy, then performance
may be boosted if the PollingInterval is increased.

昨天一台SQL
Server 2008R2的数据库在凌晨5点多抛出下面告警信息:

sp_repldone
(Transact-SQL).aspx)

To set this option, run this
command:

 The log scan
number (620023:3702:1) passed to log scan in database ‘xxxx’ is not
valid. This error may indicate data corruption or that the log file
(.ldf) does not match the data file (.mdf). If this error occurred
during replication, re-create the publication. Otherwise, restore from
backup if the problem results in a failure during startup.

sp_repltrans
(Transact-SQL).aspx)

*****

How to fix this

 

  • Ensure database consistency by running DBCC CHECKDB on the
    database. 

  • If an old backup was restored on top of published database then
    use sp_replrestart

  • If going back to the most recent transaction log backup is not an
    option then execute sp_replrestart  on
    publisher in published database. This stored procedure is used when
    the highest log sequence number (LSN) value at the Distributor does
    match the highest LSN value at the Publisher.

  • This stored procedure will insert compensating LSNs (No Operation)
    in the publisher database log file till one the compensating LSN
    becomes more than the highest distributed LSN in distribution
    database for this published database. After this it inserts this new
    high LSN in the msrepl_transactions table in the distribution
    database and executes sp_repldone on published database to update
    the internal structures to mark a new starting point for log reader
    agent.

  • Ensure that the log reader agent is stopped and there is no incoming
    transactions on the published database, when this SP is executed.

  • Since transactions may have been lost, we recommend to reinitialize
    the subscriber(s) and/or recreate publication/subscription(s).  For
    large databases consider using “Initialize from Backup” as discussed
    in SQL Book Online.

 

但是在这个案例当中,
数据库既没有损坏,也没有还原过。 只能是Replication出现了错误,但是在SQL
Server的Replication中又没有找到相关错误信息,本身这个是AWS的DMS自动生成的Replication,很多内部信息不太清楚(例如,是否出现异常),官方也没有找到很详细的介绍这个错误的相关资料。在此记录一下。

 

 

 

 

参考资料:

 

DBCC SQLPERF(LOGSPACE)

If you see a significant increase in
the latency for either of these processes, this should be a signal to
you to find out what new or different action has happened to cause the
increased latency. [6.5, 7.0, 2000, 2005] Updated 1-6-2006

 

注册成功之后,使用 sp_repldone,将所有的Transaction Log
标记为Distributed。

Transactional replication offers an
option called Immediate-Updating Subscribers. This feature provides
for transactional consistency for all of the various Subscribers of a
publisher. By making use of the Microsoft Distributed Transaction
Coordinator (MSDTC), Immediate-Updating Subscribers allows subscribers
to update a copy of the local data, and the Publishers data is also
updated, simultaneously. This change is then replicated from the
Publisher to the other Subscribers, so they all have consistent
data.

When xactid is NULL, xact_seqno is
NULL, and reset is 1, all replicated transactions in the log are
marked as distributed. This is useful when there are replicated
transactions in the transaction log that are no longer valid and you
want to truncate the log,

So this issue is, what is more
important, faster replication or less locks? The ideal size of the
CommitBatchSize parameter depends on your specific situation. For
example, if there is a large amount of activity in the distribution
database coming from both the publisher and many subscribers, then
reducing the batch size can be beneficial because of reduced locking
(improved concurrency) in the distribution database. By increasing
concurrency, transactions that have to wait on locks don’t have to wait
as long, and overall performance if boosted. For example, if there are
100 subscribers, each subscriber needs to be updated from the
Distribution database. This can create a lot of activity in the
distribution database, which be slowed down if there are many locks
occurring.

留下评论

网站地图xml地图