コンテンツにスキップ

LinkChecker

LinkChecker は無効なリンクをチェックしてくれるツール

  • 結構愚直にリクエスト投げてるので、チェック中はアクセスが集中する
  • 指定したURLの範囲外については書式のチェックのみで、実際の通信は発生しないので自分のドメインに向けて叩く分には迷惑は掛かりにくい
$ pip install LinkChecker
$ linkchecker https://www.ainoniwa.net/pelican/
INFO linkcheck.cmdline 2021-05-22 15:47:26,940 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 10.0.1
Copyright (C) 2000-2016 Bastian Kleineidam, 2010-2021 LinkChecker Authors
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it under
certain conditions. Look at the file `LICENSE' within this distribution.
Get the newest version at https://linkchecker.github.io/linkchecker/
Write comments and bugs to https://github.com/linkchecker/linkchecker/issues

Start checking at 2021-05-22 15:47:26+009
10 threads active,    29 links queued,   17 links in  56 URLs checked, runtime 1 seconds
(...snip...)
 8 threads active,     0 links queued, 2068 links in 2075 URLs checked, runtime 7 minutes, 36 seconds
 2 threads active,     0 links queued, 2084 links in 2085 URLs checked, runtime 7 minutes, 41 seconds

Statistics:
Downloaded: 8.99MB.
Content types: 937 image, 381 text, 0 video, 0 audio, 14 application, 2 mail and 763 other.
URL lengths: min=8, max=723, avg=101.

That's it. 2097 links in 2096 URLs checked. 0 warnings found. 1 error found.
Stopped checking at 2021-05-22 15:55:12+009 (7 minutes, 45 seconds)

おかしなリンクがあるとこんな風に出る。

URL        `http:/'
Name       `http:/'
Parent URL https://www.ainoniwa.net/pelican/2014/0520a.html, line 208, col 28
Real URL   http:///
Info       The URL is outside of the domain filter, checked only
           syntax.
Result     Error: URL has empty hostname

--verbose を付けるとチェックしたURL毎にログが出る。
指定したURLの範囲であれば実際にアクセスし、範囲外となるURLについては書式だけがチェックされたことが確認できる。

URL        `python/fabric/'
Name       `Fabric'
Parent URL https://www.ainoniwa.net/pdoc/, line 1, col 19603
Real URL   https://www.ainoniwa.net/pdoc/python/fabric/
Check time 3.474 seconds
D/L time   0.001 seconds
Size       41.44KB
Result     Valid: 200 OK

URL        `https://pypi.org/project/Fabric3/'
Name       `fabric3'
Parent URL https://www.ainoniwa.net/pdoc/python/fabric/, line 54, col 293
Real URL   https://pypi.org/project/Fabric3/
Info       The URL is outside of the domain filter, checked only
           syntax.
Result     Valid: filtered

最終更新日: 2021-05-22 07:13:01