Tips

03月 12日, 2026

24 分程度

Skipped & Deferred Searchを理解する(パート2)

今井浩平 (Kohei Imai)

パート1では、予定されていたサーチが実行されない場合におけるスキップと延期の違いや原因について解説しました。今回は、主な原因やベストプラクティス、トラブルシューティングを解説します。

なぜ検索がスキップ／延期されるのか

代表的な理由と説明

SS = Scheduled Searches（定期検索のみ対象）

S/A = Scheduled & Ad-hoc（定期検索とアドホック検索の両方対象）

理由1
”The maximum number of concurrent historical scheduled searches on this cluster has been reached.”
「このクラスタでの同時実行履歴定期検索（historical scheduled searches）の最大数に達しました。」
- クラスタ全体のスケジュール検索同時実行制限に到達したことを意味します。 (SS)
理由2
"The maximum number of concurrent running jobs for this historical scheduled search on this cluster has been reached."
「このクラスタで、この履歴定期検索の同時実行ジョブ最大数に達しました。」
- 通常、同じ検索は同時に1つしか実行されません。
- もしt0の検索がまだ実行中に t1 のスケジュール時刻が来た場合、t1 はスキップされます。これは以下の原因で発生します：
  - 検索の実行時間がスケジュール間隔より長い（例：毎分実行設定だが90秒かかる）
  - インデクサーのパフォーマンス低下（CPU使用率高、ディスクI/O遅い）
  - 実行開始の遅延（レイテンシ）があり次の予定時刻に重なる(SS)

上記の問題について調査したい場合、以下のサーチをMonitoring Consoleから実行することによって情報を得ることができます。

```note: the below search excludes datamodel accelerations
| rest /servicesNS/-/-/saved/searches splunk_server=<your search head captain>
| where is_scheduled=1
| search disabled=0|
stats count by title next_scheduled_time cron_schedule schedule_priority schedule_window dispatch.earliest_time dispatch.latest_time
| fields title next_scheduled_time cron_schedule schedule_priority schedule_window dispatch.earliest_time dispatch.latest_time
| rename title as savedsearch_name
| join savedsearch_name[ search index=_internal source=*scheduler.log* status=skipped OR status=deferred savedsearch_name=*
| fillnull value=NA reason| stats count by savedsearch_name status reason]
| join savedsearch_name[ search index=_internal search_group=dmc_group_search_head search_group="dmc_searchheadclustergroup_" sourcetype=scheduler status=* dispatch_time=*
| eval runtime_delay= (dispatch_time-scheduled_time)| stats avg(runtime_delay) as avg_runtime_delay avg(run_time) as avg_runtime max(run_time) as max_runtime by savedsearch_name
| eval avg_runtime_delay=round(avg_runtime_delay,0)
| eval avg_runtime=round(avg_runtime,0)
| eval max_runtime=round(max_runtime,0)
| rename avg_runtime_delay as "avg_runtime_delay(sec)" avg_runtime as avg_runtime(sec) max_runtime as max_runtime(sec)
| sort - "max_runtime(sec)"]
| eval scheduler_action = case(status == "skipped", "scheduler will not attempt to re-run these searches for the scheduled_time",status == "delayed", "scheduler will retry to run the search up until the next scheduled_time",status == "deferred", "scheduler will attempt to re-run these searches for their scheduled_time at a future point in time",status == "delegated_remote_error", "scheduler will retry once to delgate, if still fails no further attempt")
| table savedsearch_name status reason count scheduler_action avg_runtime(sec) max_runtime(sec) "avg_runtime_delay(sec)" cron_schedule dispatch.earliest_time dispatch.latest_time
| sort -max_runtime(sec)

理由3
”Search not executed: The maximum disk usage quota for this user has been reached. Use the Job Manager to delete some of your saved search results.”
「検索未実行：このユーザのディスク使用量制限（disk quota）に達しました, Job Manager で保存済み検索結果を削除して容量を確保してください。」(S/A)
理由4
”Search not executed: The maximum number of concurrent historical searches for this user based on their role quota has been reached.”
「検索未実行：このユーザの同時実行履歴検索の最大数に達しました、ロールに設定されている srchJobsQuota 制限に到達しています。」 (S/A)

上記のエラーが出た場合は、サーチヘッドで設定されているユーザやロールの制限関連設定（authorize.conf）を確認します。

srchDiskQuota
- この役割に属するユーザーの検索ジョブが使用できるディスク容量（MB）の最大値。
- サーチヘッドクラスタ環境では、この設定は各メンバー単位で有効。クラスタ全体での一括管理は行わない。
- ディスパッチマネージャーは、サーチのディスパッチ時にクォータをチェックし、さらにサーチプロセスは検索がアクティブな間、limits.confの‘disk_usage_update_period’設定で定義された間隔でクォータを確認。
- クォータは常時チェックされているわけではないため、一時的にクォータを超過する場合がある。
- このクォータを超過すると、返却されていない結果があってもサーチが即座に自動終了する。
- デフォルト値は‘100’（100MB）。
srchJobsQuota
- ロールに属するユーザが同時に実行できる履歴検索の最大数。
- デフォルト値は3。
- リアルタイム検索は対象外（rtSrchJobsQuota で別設定）。

ベストプラクティス（Best Practices）

※以下の検索クエリ例はすべてMonitoring Console（MC）から実行し、SHCインスタンスがMCで監視対象になっていることを前提とします。

1. 孤立したナレッジオブジェクトの解消

孤立 (orphaned)状態のスケジュール検索やナレッジオブジェクトを削除・修正して、無駄な検索実行を防ぎます。

参考：Resolve orphaned searches - Splunk Docs

2. 全期間（All-time）検索の見直し

全期間を対象にする検索 (dispatch.earliest_time="" または 0)は、実行時間が長くなりconcurrencyを圧迫します。

時間範囲をスケジュール周期に合わせて見直す

検索例：

| rest /servicesNS/-/-/saved/searches splunk_server=<captain>
| where is_scheduled=1
| search disabled=0
| search title!="Bucket Copy Trigger" dispatch.earliest_time="" OR dispatch.earliest_time=0
| rename eai:acl.* as *
| table title cron_schedule dispatch.earliest_time dispatch.latest_time app owner search
| search search!=*earliest*
| search search=*index*

3. リアルタイム検索（rt search）の削減

リアルタイム検索はSH・インデクサーのCPUコアを専有し続け、パフォーマンス低下やスキップ増の原因になります。

不要なリアルタイム検索は削除
必要な場合は「Indexed Realtime」に変更（limits.conf設定）

リアルタイム検索の確認

サーチヘッドクラスター側

index=_introspection search_group=dmc_group_search_head search_group=dmc_searchheadclustergroup_ sourcetype=splunk_resource_usage (component=PerProcess data.search_props.sid::*) "data.search_props.mode"=RT*
| dedup "data.search_props.label"
| table data.search_props.label data.search_props.app data.search_props.user data.search_props.type data.search_props.mode data.search_props.sid data.elapsed
| rename data.search_props.label as "savedsearch name" data.search_props.app as app data.search_props.user as user data.search_props.type as type data.search_props.mode as mode data.search_props.sid as sid data.elapsed as "has been running for (seconds)"
index=_introspection search_group=dmc_group_search_head search_group=dmc_searchheadclustergroup_ source=*resource_usage.log* component=PerProcess "data.search_props.sid"=* "data.search_props.mode"=RT*
| timechart span=30s max(data.elapsed) by data.search_props.sid useother=f limit=0

インデクサー側

index=_introspection search_group="dmc_group_indexer" host= source=*resource_usage.log* component=PerProcess "data.search_props.sid"=* "data.search_props.mode"=RT*
| stats max(data.elapsed) as "has been running for (Seconds)" by host data.search_props.search_head data.search_props.app data.search_props.mode data.search_props.type data.search_props.user data.search_props.sid | dedup data.search_props.sid
| rename data.search_props.label as "savedsearch name" data.search_props.app as app data.search_props.mode as mode data.search_props.sid as sid data.elapsed as "has been running for (seconds)"
| sort -"has been running for (seconds)"index=_introspection search_group="dmc_group_indexer" host= source=*resource_usage.log* component=PerProcess "data.search_props.sid"=* "data.search_props.mode"=RT*
| bin _time span=10s
| stats dc(data.search_props.sid) AS distinct_search_count by _time
| timechart bins=200 max(distinct_search_count) AS "max rt search concurrency"

対応1. rt search capability をパワーユーザ、管理者ユーザ以外からは除外することを検討してください。

リアルタイム検索をロールで禁止する設定例(authorize.conf)：

[role_<rolename>]
rtSrchJobsQuota = 0
cumulativeRTSrchJobsQuota = 0
schedule_rtsearch = disabled

対応2. リアルタイム検索を行う場合は indexed realtime を規定で使うように設定してください。

limits.conf 以下の設定はappもしくはユーザーの配下で設定してください。

[realtime]
= true

4. 検索のearliest timeを短縮

周期5分の検索なのに過去1時間を検索している、などの過剰な時間範囲は短縮します。

イベントの遅延を考慮する場合、et/ltに少しバッファを持たせる設定（例:-6m@m -1m@m）が有効です。

| rest /servicesNS/-/-/saved/searches splunk_server=<captain>
| search disabled=0 is_scheduled=1
| table title cron_schedule dispatch.earliest_time dispatch.latest_time eai:acl.app eai:acl.owner
index=_audit sourcetype=audittrail source=audittrail savedsearch_name!="" TERM(action=search) ( TERM(info=completed) OR ( TERM(info=granted) search_et=* "search='search")) NOT "search_id='rsa_*"
| eval timesearched = round((search_lt-search_et),0)
| fields savedsearch_name, timesearched, user
| join savedsearch_name     [| rest splunk_server= "/servicesNS/-/-/saved/searches/" search="is_scheduled=1" search="disabled=0"     | fields title, cron_schedule, eai:acl.app     | rename title as savedsearch_name     | eval pieces=split(cron_schedule, " ")     | eval c_min=mvindex(pieces, 0), c_h=mvindex(pieces, 1), c_d=mvindex(pieces, 2), c_mday=mvindex(pieces, 3), c_wday=mvindex(pieces, 4)     | eval c_min_div=if(match(c_min, "/"), replace(c_min, "^.*/(\d+)$", "\1"), null())     | eval c_mins=if(match(c_min, ","), split(c_min, ","), null())     | eval c_min_div=if(isnotnull(c_mins), abs(tonumber(mvindex(c_mins, 1)) - tonumber(mvindex(c_mins, 0))), c_min_div)     | eval c_hs=if(match(c_h, ","), split(c_h, ","), null())     | eval c_h_div=case(match(c_h, "/"), replace(c_h, "^.*/(\d+)$", "\1"), isnotnull(c_hs), abs(tonumber(mvindex(c_hs, 1)) - tonumber(mvindex(c_hs, 0))), 1=1, null())     | eval c_wdays=if(match(c_wday, ","), split(c_wday, ","), null())     | eval c_wday_div=case(match(c_wday, "/"), replace(c_wday, "^.*/(\d+)$", "\1"), isnotnull(c_wdays), abs(tonumber(mvindex(c_wdays, 1)) - tonumber(mvindex(c_wdays, 0))), 1=1, null())     | eval i_m=case(c_d < 29, 86400 * 28, c_d = 31, 86400 * 31, 1=1, null())     | eval i_h=case(isnotnull(c_h_div), c_h_div * 3600, c_h = "*", null(), match(c_h, "^\d+$"), 86400)     | eval i_min=case(isnotnull(c_min_div), c_min_div * 60, c_min = "*", 60, match(c_min, "^\d+$"), 3600)     | eval i_wk=case(isnotnull(c_wday_div), c_wday_div * 86400, c_wday = "*", null(), match(c_wday, "^\d+$"), 604800)     | eval cron_minimum_freq=case(isnotnull(i_m), i_m, isnotnull(i_wk) AND isnotnull(c_min_div), i_min, isnotnull(i_wk) AND isnull(c_min_div), i_wk, isnotnull(i_h), i_h, 1=1, min(i_min))     | fields - c_d c_h c_hs c_h_div c_mday c_min c_min_div c_mins c_wday c_wdays c_wday_div pieces i_m i_min i_h i_wk     | fields savedsearch_name cron_minimum_freq cron_schedule eai:acl.app] | eval magic=cron_minimum_freq*20 | where timesearched>magic | eval ratio=round(timesearched/cron_minimum_freq,0) . ":" . 1, timesearched=round(timesearched/60,0), cron_minimum_freq=cron_minimum_freq/60 | dedup savedsearch_name | table savedsearch_name, eai:acl.app, user, timesearched, cron_minimum_freq, cron_schedule, ratio | rename savedsearch_name AS "Saved Search Name", eai:acl.app AS "App", user AS "User", timesearched AS "Time Searched (Minutes)", cron_minimum_freq as "Minimum Frequency (Minutes)", cron_schedule AS "Cron Schedule", ratio as Ratio

5. ホットタイム（混雑時間帯）の平準化

毎時0分、30分など検索が集中する時間帯を避けるため、cronスケジュールを分散させます。

全スケジュール検索の分布確認例

| rest /servicesNS/-/-/saved/searches splunk_server=    search="is_scheduled=1" search="disabled=0" earliest_time=-1h@m latest_time=now | table title cron_schedule scheduled_times | mvexpand scheduled_times | rename scheduled_times as _time | timechart span=1m count as "Searches Scheduled" | join splunk_server     [| rest splunk_server= "/services/server/status/limits/search-concurrency?cluster_wide_quota=1"     | stats max(max_hist_scheduled_searches) as max_clusterwide_scheduler_concurrency ]

過去1時間以内にスケジュールされた検索を、アプリ／所有者およびスケジュール時刻ごとに特定します。たとえば、上記の検索で特定の分(minute x)にスケジュールされた検索が多すぎることが判明した場合は、以下の検索を使用して、その分(minute x)にスケジュールされた検索を特定します。これらは、cronスケジュールをその分(minute x)以外にずらす候補となります。

| rest /servicesNS/-/-/saved/searches splunk_server= search="is_scheduled=1" search="disabled=0" earliest_time=-1h@m latest_time=now | mvexpand scheduled_times | table title scheduled_times eai:acl.app eai:acl.owner | rename title as savedsearch_name eai:acl.app as app eai:acl.owner as owner | convert ctime(scheduled_times) as scheduled_times | stats values(savedsearch_name) as savedsearch_name by scheduled_times app owner

6. フォワーダーのイベント分散

フォワーダーから全てのインデクサーへイベントが均等に分散されていることを確認してください。イベントの分散が不均等になると、一部のインデクサーに他のインデクサーよりも多くのイベントが保持されることになり、検索の速度が低下します。これは、より多くのイベントを保持しているインデクサーの処理が、少ないインデクサーよりも遅くなるためです。

イベントの分散調査用の検索

index=_internal search_group="dmc_group_indexer" earliest=-24h host=* source=*metrics.log* group=tcpin_connections | eval sourceHost=if(isnull(hostname),sourceHost,hostname)| stats sum(kb) AS KB dc(sourceHost) AS "forwarder count" min(_time) AS mintime max(_time) AS maxtime by host| convert ctime(mintime)| convert ctime(maxtime)| eval GB = round(KB / 1024 / 1024 ,2)| table host "forwarder count" mintime maxtime KB GB | sort -GB

特定のインデクサーが他のインデクサーよりもはるかに多くのデータをインデックスしている場合、どのホストやソースタイプがそのインデクサーに大量のデータを送信しているのか確認してください。フォワーダーが高ボリュームのデータソースを持っていたり、インデクサーへのロードバランシングが期待通りに機能していない可能性があります。

index=_internal host= source="*metrics.log" sourcetype=splunkd group=per_Sourcetype_thruput| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none")| search ingest_pipe=*| `dmc_timechart_for_metrics_log` per_second(kb) by series useother=false limit=15

ホスト別イベントの取り込み(indexing)レート調査用検索

index=_internal host= source="*metrics.log" sourcetype=splunkd group=per_Host_thruput
| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none")
| search ingest_pipe=*
| `dmc_timechart_for_metrics_log` per_second(kb) by series useother=false limit=15

高ボリュームのホストを特定した場合、そのホストのロードバランシングのメトリクスを確認し、1つのインデクサーに通常より長く(デフォルトでは30秒)留まっていないかどうかを調べてください。

index=_internal host= source=*metrics.log* 
group=tcpout_connections 
| timechart span=30s max(tcp_KBps) by destIp limit=0

もしデータが1つのインデクサー(destIp)に長時間滞留している場合は、その大量データがどのソースタイプから来ているのか特定し、適切なEVENT_BREAKERの適用を検討してください。

デフォルトでは、ユニバーサルフォワーダーはデータストリームを安全に切り替えられる場合にのみ、インデクサー間でロードバランシングを行うことができます。つまり、1つのイベントの一部があるサーバーに、もう一部が別のサーバーに送信されるのを防ぐために、切り替えのタイミングを安全な場所まで待つ必要があります。

EVENT_BREAKERを使うことで、イベントの境目を定義し、より頻繁に送信先インデクサーの切り替えが行われるよう調整する事ができます。

# Use the following settings to handle better load balancing from UF.
# NOTE: The EVENT_BREAKER properties are applicable for Splunk Universal
# Forwarder instances only.

EVENT_BREAKER_ENABLE = <boolean>
* Whether or not a universal forwarder (UF) uses the 'ChunkedLBProcessor'
  data processor to improve distribution of events to receiving
  indexers for a given source type.
* When set to true, a UF splits incoming data with a
  light-weight chunked line breaking processor ('ChunkedLBProcessor')
  so that data is distributed fairly evenly amongst multiple indexers.
* When set to false, a UF uses standard load-balancing methods to
  send events to indexers.
* Use this setting on a UF to indicate that data
  should be split on event boundaries across indexers, especially
  for large files.
* This setting is only valid on universal forwarder instances.
* Default: false

# Use the following to define event boundaries for multi-line events
# For single-line events, the default settings should suffice 

EVENT_BREAKER = <regular expression>

* A regular expression that specifies the event boundary for a
  universal forwarder to use to determine when it can send events
  to an indexer.
* The regular expression must contain a capturing group
  (a pair of parentheses that defines an identified sub-component
  of the match.)
* When the UF finds a match, it considers the first capturing group
  to be the end of the previous event, and the end of the capturing group
  to be the beginning of the next event.
* At this point, the forwarder can then change the receiving indexer
  based on these event boundaries.
* This setting is only active if you set 'EVENT_BREAKER_ENABLE' to
  "true", only works on universal forwarders, and
  works best with multiline events.* Default: "([\r\n]+)

設定例

イベント例：

2019-07-23 12:12:32 some event string here
This data is mutiline event and continues across hundreds of lines.
Hogehoge
~
fugafuga
end of event

以下の正規表現を含むprops.confをユニバーサルフォワーダーに設定する事によって上記のイベントの境界を定義しイベントが分割される事なく、インデクサー間に分散されます。

[mysourcetype]
EVENT_BREAKER_ENABLE=true
EVENT_BREAKER=([\r\n]+)(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)

7. データ量の増加確認

直近1日、1週間、1ヶ月でインデックスごとのイベント量の増減を確認し、増加傾向を把握。

| tstats count WHERE index=* OR index=_* BY index _time span=1h prestats=t
| timechart span=1h count by index

8. 関連概念の理解

以下の概念を理解すると調整がスムーズになります。

トラブルシューティング(Troubleshooting)

シナリオ1：データモデル加速(DMA)検索の追跡

目的：特定のscheduled_timeにおけるDMA検索について以下のとおり確認します。

時刻通りに実行されたか
遅延（latency）して実行されたか
実行されずスキップされたか
DMA検索の特徴
- スケジューリングモード：リアルタイム(real-time scheduling mode)
- タイプ：自動集計(auto-summarization)
- 優先度が最も低いため、スキップされやすい
- スキップされても、加速時に過去の時間範囲をカバーするため、結果への影響は少ない

調査用サーチ例

scheduler.logの例(リアルタイムモード、DMA検索、遅延あり)

index=_internal host=<search-head-cluster-members> source=*scheduler.log* scheduled_time=<unix time for scheduled search> <datamodel acceleration search name>

ログにはscheduled_timeと実際のdispatch_timeの差(latency)が確認できる
concurrencyリミット到達時、次の時刻まで毎秒再試行メッセージが出続けることがある

audit.logの例(リアルタイムモード、DMA検索、遅延あり)

index=_audit host=<search-head-cluster-members> source-audittrail info=completed <sidfrom scheduler.log>

audit.logには検索の完了や権限許可（granted）、開始・終了時刻が記録される

シナリオ2：サマリーインデックス(SI)検索の追跡

目的：特定のscheduled_timeにおけるSI検索について以下のとおり確認します。

時刻通りに実行されたか
遅延して実行されたか
延期(deferred)されたか
SI検索の特徴
- スケジューリングモード：連続(continuous scheduling mode)
- タイプ：サマリーインデックス
- 優先度は中間(リアルタイムより低い)

調査用サーチ例

scheduler.logの例(連続モード、SI検索、遅延あり)

index=_internal host=<search head member> source*scheduler.log* savedsearch_name=<saved search name> scheduled_time=<unixtime of scheduled search>

concurrencyリミット到達時には「deferred」ステータスが付与される
遅延中もスケジューラは scheduled_time を保持して後で実行

追加リソース

今井浩平 (Kohei Imai)

2023年Splunk Service Japan合同会社入社。テクニカルコンサルティングエンジニアとして、Splunk Enterprise製品を中心に、お客様の課題解決をサポートしている。

Tips 5 分程度

ホストからSplunkにログが送信されていないことをすばやく検出する方法

tstatsを使ってホストを監視し、Splunkにログが送信されていないことを検出する方法について説明します。

Tips 5 分程度

AWS CloudWatch LogsからFirehose経由でログを収集

CloudWatch Logsを使用すれば、AWS環境からすべてのログを1カ所に収集可能。Lambda関数(ラムダ式)などを介して、ログソースをソースとログ形式ごとに複数のロググループにまとめられます。この記事では、ログファイルをCloudWatchからSplunkに取り込む方法についてご紹介します。

Tips 4 分程度

Splunkテクニカルサポート：問題解決のためのチェックリストとヒント

Splunkテクニカルサポートへお問い合わせの際、できるだけ早くお客様とスムーズな問題解決をご提供するために、弊社のサポートエンジニア担当者目線で、お問い合わせの際のチェックリストとヒントを紹介させていただきます。是非、ご一読ください。

Splunkについて

世界をリードする多くの企業が、デジタルレジリエンスを継続的に強化するために、シスコの一員であるSplunkを利用しています。業界屈指のAIを活用したセキュリティとオブザーバビリティの統合プラットフォームです。

Splunkの実績あるセキュリティ製品とオブザーバビリティ製品は、あらゆる規模の複雑なデジタル環境のセキュリティと信頼性を向上させるソリューションとして、お客様から高い評価を得ています。

Splunkの詳細はこちら