⚗️ trying fastcov

2019-03-30 09:12:32 +01:00 · 2019-03-30 09:12:32 +01:00 · b12287b362
commit b12287b362
parent b21c04c938
6 changed files with 494 additions and 1 deletions
--- a/10
+++ b/10
@ -76,12 +76,20 @@ check-fast:
 coverage:
 	rm -fr build_coverage
 	mkdir build_coverage
-	cd build_coverage ; CXX=$(COMPILER_DIR)/g++ cmake .. -GNinja -DJSON_Coverage=ON -DJSON_MultipleHeaders=ON
+	cd build_coverage ; CXX=g++-7 cmake .. -GNinja -DJSON_Coverage=ON -DJSON_MultipleHeaders=ON
 	cd build_coverage ; ninja
 	cd build_coverage ; ctest -E '.*_default' -j10
 	cd build_coverage ; ninja lcov_html
 	open build_coverage/test/html/index.html

+fast-cov:
+	rm -fr build_coverage
+	mkdir build_coverage
+	cd build_coverage ; CXX=$(COMPILER_DIR)/g++ cmake .. -GNinja -DJSON_Coverage=ON -DJSON_MultipleHeaders=ON
+	cd build_coverage ; ninja
+	cd build_coverage ; ctest -E '.*_default' -j10
+	cd build_coverage ; ninja lcov_html2
+	open build_coverage/test/html/index.html

 ##########################################################################
 # documentation tests
--- a/test/CMakeLists.txt
+++ b/test/CMakeLists.txt
@ -51,6 +51,17 @@ if(JSON_Coverage)
        COMMAND genhtml --title "JSON for Modern C++" --legend --demangle-cpp --output-directory html --show-details --branch-coverage json.info.filtered.noexcept
        COMMENT "Generating HTML report test/html/index.html"
    )
+
+    # add target to collect coverage information and generate HTML file
+    # (filter script from https://stackoverflow.com/a/43726240/266378)
+    add_custom_target(lcov_html2
+        COMMAND ${CMAKE_SOURCE_DIR}/test/thirdparty/fastcov/fastcov.py --lcov -o json.info --gcov ${GCOV_BIN}
+        COMMAND gsed -i 's%build_coverage/%%g' json.info
+        COMMAND lcov -e json.info ${SOURCE_FILES} --output-file json.info.filtered --rc lcov_branch_coverage=1
+        COMMAND ${CMAKE_SOURCE_DIR}/test/thirdparty/imapdl/filterbr.py json.info.filtered > json.info.filtered.noexcept
+        COMMAND genhtml --title "JSON for Modern C++" --legend --demangle-cpp --output-directory html --show-details --branch-coverage json.info.filtered.noexcept
+        COMMENT "Generating HTML report test/html/index.html"
+    )
 endif()

 #############################################################################
--- a/test/thirdparty/fastcov/LICENSE
+++ b/test/thirdparty/fastcov/LICENSE
@ -0,0 +1,21 @@
+The MIT License
+
+Copyright (c) 2018-2019 Bryan Gillespie
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- a/test/thirdparty/fastcov/README.md
+++ b/test/thirdparty/fastcov/README.md
@ -0,0 +1,46 @@
+# fastcov
+A massively parallel gcov wrapper for generating intermediate coverage formats *fast*
+
+The goal of fastcov is to generate code coverage intermediate formats *as fast as possible* (ideally < 1 second), even for large projects with hundreds of gcda objects. The intermediate formats may then be consumed by a report generator such as lcov's genhtml, or a dedicated front end such as coveralls. fastcov was originally designed to be a drop-in replacement for lcov (application coverage only, not kernel coverage).
+
+Currently the only intermediate formats supported are gcov json format and lcov info format. Adding support for other formats should require just a few lines of python to transform gcov json format to the desired shape.
+
+In order to achieve the massive speed gains, a few constraints apply:
+
+1. GCC version >= 9.0.0
+
+These versions of GCOV have support for JSON intermediate format as well as streaming report data straight to stdout
+
+2. Object files must be either be built:
+
+- Using absolute paths for all `-I` flags passed to the compiler
+- Invoking the compiler from the same root directory
+
+If you use CMake, you are almost certainly satisfying the second constraint (unless you care about `ExternalProject` coverage).
+
+## Sample Usage:
+```bash
+$ cd build_dir
+$ fastcov.py --zerocounters
+$ <run unit tests>
+$ fastcov.py --exclude /usr/include --lcov -o report.info
+$ genhtml -o code_coverage report.info
+```
+
+## Legacy fastcov
+
+It is possible to reap most of the benefits of fastcov for GCC version < 9.0.0 and >= 7.1.0. However, there will be a *potential* header file loss of correctness.
+
+`fastcov_legacy.py` can be used with pre GCC-9 down to GCC 7.1.0 but with a few penalties due to gcov limitations. This is because running gcov in parallel generates .gcov header reports in parallel which overwrite each other. This isn't a problem unless your header files have actual logic (i.e. header only library) that you want to measure coverage for. Use the `-F` flag to specify which gcda files should not be run in parallel in order to capture accurate header file data just for those. I don't plan on supporting `fastcov_legacy.py` aside from basic bug fixes.
+
+## Benchmarks
+
+Anecdotal testing on my own projects indicate that fastcov is over 100x faster than lcov and over 30x faster than gcovr:
+
+Project Size: ~250 .gcda, ~500 .gcov generated by gcov
+
+Time to process all gcda and parse all gcov:
+
+- fastcov: ~700ms
+- lcov:    ~90s
+- gcovr:   ~30s
--- a/test/thirdparty/fastcov/fastcov.py
+++ b/test/thirdparty/fastcov/fastcov.py
@ -0,0 +1,189 @@
+#!/usr/bin/env python3
+"""
+    Author: Bryan Gillespie
+
+    A massively parallel gcov wrapper for generating intermediate coverage formats fast
+
+    The goal of fastcov is to generate code coverage intermediate formats as fast as possible
+    (ideally < 1 second), even for large projects with hundreds of gcda objects. The intermediate
+    formats may then be consumed by a report generator such as lcov's genhtml, or a dedicated front
+    end such as coveralls.
+
+    Sample Usage:
+        $ cd build_dir
+        $ ./fastcov.py --zerocounters
+        $ <run unit tests>
+        $ ./fastcov.py --exclude-gcov /usr/include --lcov -o report.info
+        $ genhtml -o code_coverage report.info
+"""
+
+import re
+import os
+import sys
+import glob
+import json
+import argparse
+import threading
+import subprocess
+import multiprocessing
+
+MINIMUM_GCOV = (9,0,0)
+MINIMUM_CHUNK_SIZE = 10
+
+# Interesting metrics
+GCOVS_TOTAL = []
+GCOVS_SKIPPED = []
+
+def chunks(l, n):
+    """Yield successive n-sized chunks from l."""
+    for i in range(0, len(l), n):
+        yield l[i:i + n]
+
+def getGcovVersion(gcov):
+    p = subprocess.Popen([gcov, "-v"], stdout=subprocess.PIPE)
+    output = p.communicate()[0].decode('UTF-8')
+    p.wait()
+    version_str = re.search(r'\s([\d.]+)\s', output.split("\n")[0]).group(1)
+    version = tuple(map(int, version_str.split(".")))
+    return version
+
+def removeFiles(files):
+    for file in files:
+        os.remove(file)
+
+def getFilteredGcdaFiles(gcda_files, exclude):
+    def excludeGcda(gcda):
+        for ex in exclude:
+            if ex in gcda:
+                return False
+        return True
+    return list(filter(excludeGcda, gcda_files))
+
+def getGcdaFiles(cwd, gcda_files):
+    if not gcda_files:
+        gcda_files = glob.glob(os.path.join(cwd, "**/*.gcda"), recursive=True)
+    return gcda_files
+
+def gcovWorker(cwd, gcov, files, chunk, exclude):
+    p = subprocess.Popen([gcov, "-it"] + chunk, cwd=cwd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
+    for line in iter(p.stdout.readline, b''):
+        intermediate_json = json.loads(line.decode(sys.stdout.encoding))
+        intermediate_json_files = processGcovs(intermediate_json["files"], exclude)
+        for f in intermediate_json_files:
+            files.append(f) #thread safe, there might be a better way to do this though
+        GCOVS_TOTAL.append(len(intermediate_json["files"]))
+        GCOVS_SKIPPED.append(len(intermediate_json["files"])-len(intermediate_json_files))
+    p.wait()
+
+def processGcdas(cwd, gcov, jobs, gcda_files, exclude):
+    chunk_size = max(MINIMUM_CHUNK_SIZE, int(len(gcda_files) / jobs) + 1)
+
+    threads = []
+    intermediate_json_files = []
+    for chunk in chunks(gcda_files, chunk_size):
+        t = threading.Thread(target=gcovWorker, args=(cwd, gcov, intermediate_json_files, chunk, exclude))
+        threads.append(t)
+        t.start()
+
+    log("Spawned %d gcov processes each processing at most %d gcda files" % (len(threads), chunk_size))
+    for t in threads:
+        t.join()
+
+    return intermediate_json_files
+
+def processGcov(gcov, files, exclude):
+    for ex in exclude:
+        if ex in gcov["file"]:
+            return
+    files.append(gcov)
+
+def processGcovs(gcov_files, exclude):
+    files = []
+    for gcov in gcov_files:
+        processGcov(gcov, files, exclude)
+    return files
+
+def dumpToLcovInfo(cwd, intermediate, output):
+    with open(output, "w") as f:
+        for file in intermediate:
+            #Convert to absolute path so it plays nice with genhtml
+            sf = file["file"]
+            if not os.path.isabs(file["file"]):
+                sf = os.path.abspath(os.path.join(cwd, file["file"]))
+            f.write("SF:%s\n" % sf)
+            fn_miss = 0
+            for function in file["functions"]:
+                f.write("FN:%s,%s\n" % (function["start_line"], function["name"]))
+                f.write("FNDA:%s,%s\n" % (function["execution_count"], function["name"]))
+                fn_miss += int(not function["execution_count"] == 0)
+            f.write("FNF:%s\n" % len(file["functions"]))
+            f.write("FNH:%s\n" % (len(file["functions"]) - fn_miss))
+            line_miss = 0
+            for line in file["lines"]:
+                f.write("DA:%s,%s\n" % (line["line_number"], line["count"]))
+                line_miss += int(not line["count"] == 0)
+            f.write("LF:%s\n" % len(file["lines"]))
+            f.write("LH:%s\n" % (len(file["lines"]) - line_miss))
+            f.write("end_of_record\n")
+
+def dumpToGcovJson(intermediate, output):
+    with open(output, "w") as f:
+        json.dump(intermediate, f)
+
+def log(line):
+    if not args.quiet:
+        print(line)
+
+def main(args):
+    # Need at least gcov 9.0.0 because that's when gcov JSON and stdout streaming was introduced
+    current_gcov_version = getGcovVersion(args.gcov)
+    if current_gcov_version < MINIMUM_GCOV:
+        sys.stderr.write("Minimum gcov version {} required, found {}\n".format(".".join(map(str, MINIMUM_GCOV)), ".".join(map(str, current_gcov_version))))
+        exit(1)
+
+    gcda_files = getGcdaFiles(args.directory, args.gcda_files)
+    log("%d .gcda files" % len(gcda_files))
+
+    if args.excludepre:
+        gcda_files = getFilteredGcdaFiles(gcda_files, args.excludepre)
+        log("%d .gcda files after filtering" % len(gcda_files))
+
+    # We "zero" the "counters" by simply deleting all gcda files
+    if args.zerocounters:
+        removeFiles(gcda_files)
+        log("%d .gcda files removed" % len(gcda_files))
+        return
+
+    intermediate_json_files = processGcdas(args.cdirectory, args.gcov, args.jobs, gcda_files, args.excludepost)
+
+    gcov_total = sum(GCOVS_TOTAL)
+    gcov_skipped = sum(GCOVS_SKIPPED)
+    log("%d .gcov files generated by gcov" % gcov_total)
+    log("%d .gcov files processed by fastcov (%d skipped)" % (gcov_total - gcov_skipped, gcov_skipped))
+
+    if args.lcov:
+        dumpToLcovInfo(args.cdirectory, intermediate_json_files, args.output)
+        log("Created lcov info file '%s'" % args.output)
+    else:
+        dumpToGcovJson(intermediate_json_files, args.output)
+        log("Created gcov json file '%s'" % args.output)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='A parallel gcov wrapper for fast coverage report generation')
+    parser.add_argument('-z', '--zerocounters', dest='zerocounters', action="store_true", help='Recursively delete all gcda files')
+
+    parser.add_argument('-f', '--gcda-files', dest='gcda_files', nargs="+", default=[], help='Specify exactly which gcda files should be processed instead of recursivly searching the search directory.')
+    parser.add_argument('-E', '--exclude-gcda', dest='excludepre', nargs="+", default=[], help='.gcda filter - Exclude gcda files from being processed via simple find matching (not regex)')
+    parser.add_argument('-e', '--exclude-gcov', dest='excludepost', nargs="+", default=[], help='.gcov filter - Exclude gcov files from being processed via simple find matching (not regex)')
+
+    parser.add_argument('-g', '--gcov', dest='gcov', default='gcov', help='which gcov binary to use')
+
+    parser.add_argument('-d', '--search-directory', dest='directory', default=".", help='Base directory to recursively search for gcda files (default: .)')
+    parser.add_argument('-c', '--compiler-directory', dest='cdirectory', default=".", help='Base directory compiler was invoked from (default: .)')
+    parser.add_argument('-j', '--jobs', dest='jobs', type=int, default=multiprocessing.cpu_count(), help='Number of parallel gcov to spawn (default: %d).' % multiprocessing.cpu_count())
+
+    parser.add_argument('-o', '--output', dest='output', default="coverage.json", help='Name of output file (default: coverage.json)')
+    parser.add_argument('-i', '--lcov', dest='lcov', action="store_true", help='Output in lcov info format instead of gcov json')
+    parser.add_argument('-q', '--quiet', dest='quiet', action="store_true", help='Suppress output to stdout')
+    args = parser.parse_args()
+    main(args)
--- a/test/thirdparty/fastcov/fastcov_legacy.py
+++ b/test/thirdparty/fastcov/fastcov_legacy.py
@ -0,0 +1,218 @@
+#!/usr/bin/env python3
+"""
+    Author: Bryan Gillespie
+
+    Legacy version... supports versions 7.1.0 <= GCC < 9.0.0
+
+    A massively parallel gcov wrapper for generating intermediate coverage formats fast
+
+    The goal of fastcov is to generate code coverage intermediate formats as fast as possible
+    (ideally < 1 second), even for large projects with hundreds of gcda objects. The intermediate
+    formats may then be consumed by a report generator such as lcov's genhtml, or a dedicated front
+    end such as coveralls.
+
+    Sample Usage:
+        $ cd build_dir
+        $ ./fastcov.py --exclude-gcov /usr/include --lcov -o report.info
+        $ genhtml -o code_coverage report.info
+"""
+
+import re
+import os
+import glob
+import json
+import argparse
+import subprocess
+import multiprocessing
+from random import shuffle
+
+MINIMUM_GCOV = (7,1,0)
+MINIMUM_CHUNK_SIZE = 10
+
+def chunks(l, n):
+    """Yield successive n-sized chunks from l."""
+    for i in range(0, len(l), n):
+        yield l[i:i + n]
+
+def getGcovVersion(gcov):
+    p = subprocess.Popen([gcov, "-v"], stdout=subprocess.PIPE)
+    output = p.communicate()[0].decode('UTF-8')
+    p.wait()
+    version_str = re.search(r'\s([\d.]+)\s', output.split("\n")[0]).group(1)
+    version = tuple(map(int, version_str.split(".")))
+    return version
+
+def removeFiles(files):
+    for file in files:
+        os.remove(file)
+
+def getFilteredGcdaFiles(gcda_files, exclude):
+    def excludeGcda(gcda):
+        for ex in exclude:
+            if ex in gcda:
+                return False
+        return True
+    return list(filter(excludeGcda, gcda_files))
+
+def getGcdaFiles(cwd, gcda_files, exclude):
+    if not gcda_files:
+        gcda_files = glob.glob(os.path.join(cwd, "**/*.gcda"), recursive=True)
+    if exclude:
+        return getFilteredGcdaFiles(gcda_files, exclude)
+    return gcda_files
+
+def getGcovFiles(cwd):
+    return glob.glob(os.path.join(cwd, "*.gcov"))
+
+def filterGcovFiles(gcov):
+    with open(gcov) as f:
+        path = f.readline()[5:]
+        for ex in args.exclude:
+            if ex in path:
+                return False
+        return True
+
+def processGcdasPre9(cwd, gcov, jobs, gcda_files):
+    chunk_size = min(MINIMUM_CHUNK_SIZE, int(len(gcda_files) / jobs) + 1)
+
+    processes = []
+    # shuffle(gcda_files) # improves performance by preventing any one gcov from bottlenecking on a list of sequential, expensive gcdas (?)
+    for chunk in chunks(gcda_files, chunk_size):
+        processes.append(subprocess.Popen([gcov, "-i"] + chunk, cwd=cwd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL))
+
+    for p in processes:
+        p.wait()
+
+def processGcdasPre9Accurate(cwd, gcov, gcda_files, exclude):
+    intermediate_json_files = []
+    for gcda in gcda_files:
+        subprocess.Popen([gcov, "-i", gcda], cwd=cwd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL).wait()
+        gcov_files = getGcovFiles(cwd)
+        intermediate_json_files += processGcovs(gcov_files, exclude)
+        removeFiles(gcov_files)
+    return intermediate_json_files
+
+def processGcovLine(file, line):
+    line_type, data = line.split(":", 1)
+    if line_type == "lcount":
+        num, count = data.split(",")
+        hit = (count != 0)
+        file["lines_hit"] += int(hit)
+        file["lines"].append({
+            "branches": [],
+            "line_number": num,
+            "count": count,
+            "unexecuted_block": not hit
+        })
+    elif line_type == "function":
+        num, count, name = data.split(",")
+        hit = (count != 0)
+        file["functions_hit"] += int(hit)
+        file["functions"].append({
+            "name": name,
+            "execution_count": count,
+            "start_line": num,
+            "end_line": None,
+            "blocks": None,
+            "blocks_executed": None,
+            "demangled_name": None
+        })
+
+def processGcov(files, gcov, exclude):
+    with open(gcov) as f:
+        path = f.readline()[5:].rstrip()
+        for ex in exclude:
+            if ex in path:
+                return False
+        file = {
+            "file": path,
+            "functions": [],
+            "functions_hit": 0,
+            "lines": [],
+            "lines_hit": 0
+        }
+        for line in f:
+            processGcovLine(file, line.rstrip())
+    files.append(file)
+    return True
+
+def processGcovs(gcov_files, exclude):
+    files = []
+    filtered = 0
+    for gcov in gcov_files:
+        filtered += int(not processGcov(files, gcov, exclude))
+    print("Skipped %d .gcov files" % filtered)
+    return files
+
+def dumpToLcovInfo(intermediate, output):
+    with open(output, "w") as f:
+        for file in intermediate:
+            f.write("SF:%s\n" % file["file"])
+            for function in file["functions"]:
+                f.write("FN:%s,%s\n" % (function["start_line"], function["name"]))
+                f.write("FNDA:%s,%s\n" % (function["execution_count"], function["name"]))
+            f.write("FNF:%s\n" % len(file["functions"]))
+            f.write("FNH:%s\n" % file["functions_hit"])
+            for line in file["lines"]:
+                f.write("DA:%s,%s\n" % (line["line_number"], line["count"]))
+            f.write("LF:%s\n" % len(file["lines"]))
+            f.write("LH:%s\n" % file["lines_hit"])
+            f.write("end_of_record\n")
+
+def dumpToGcovJson(intermediate, output):
+    with open(output, "w") as f:
+        json.dump(intermediate, f)
+
+def main(args):
+    # Need at least gcov 7.1.0 because of bug not allowing -i in conjunction with multiple files
+    # See: https://github.com/gcc-mirror/gcc/commit/41da7513d5aaaff3a5651b40edeccc1e32ea785a
+    current_gcov_version = getGcovVersion(args.gcov)
+    if current_gcov_version < MINIMUM_GCOV:
+        print("Minimum gcov version {} required, found {}".format(".".join(map(str, MINIMUM_GCOV)), ".".join(map(str, current_gcov_version))))
+        exit(1)
+
+    gcda_files = getGcdaFiles(args.directory, args.gcda_files, args.excludepre)
+    print("Found %d .gcda files" % len(gcda_files))
+
+    # We "zero" the "counters" by simply deleting all gcda files
+    if args.zerocounters:
+        removeFiles(gcda_files)
+        print("Removed %d .gcda files" % len(gcda_files))
+        return
+
+    # If we are less than gcov 9.0.0, convert .gcov files to GCOV 9 JSON format
+    processGcdasPre9(args.cdirectory, args.gcov, args.jobs, gcda_files)
+    gcov_files = getGcovFiles(args.cdirectory)
+
+    print("Found %d .gcov files" % len(gcov_files))
+
+    intermediate_json_files = processGcovs(gcov_files, args.excludepost)
+    removeFiles(gcov_files)
+
+    intermediate_json_files += processGcdasPre9Accurate(args.cdirectory, args.gcov, args.gcda_files_accurate, args.excludepost)
+
+    if args.lcov:
+        dumpToLcovInfo(intermediate_json_files, args.output)
+    else:
+        dumpToGcovJson(intermediate_json_files, args.output)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='A parallel gcov wrapper for fast coverage report generation')
+    parser.add_argument('-z', '--zerocounters', dest='zerocounters', action="store_true", help='Recursively delete all gcda files')
+
+    parser.add_argument('-f', '--gcda-files', dest='gcda_files', nargs="+", default=[], help='Specify exactly which gcda files should be processed instead of recursivly searching the search directory.')
+    parser.add_argument('-F', '--gcda-files-accurate', dest='gcda_files_accurate', nargs="+", default=[], help='(< gcov 9.0.0) Get accurate header coverage information for just these. These files cannot be processed in parallel')
+    parser.add_argument('-E', '--exclude-gcda', dest='excludepre', nargs="+", default=[], help='.gcda filter - Exclude gcda files from being processed via simple find matching (not regex)')
+    parser.add_argument('-e', '--exclude-gcov', dest='excludepost', nargs="+", default=[], help='.gcov filter - Exclude gcov files from being processed via simple find matching (not regex)')
+
+    parser.add_argument('-g', '--gcov', dest='gcov', default='gcov', help='which gcov binary to use')
+
+    parser.add_argument('-d', '--search-directory', dest='directory', default=".", help='Base directory to recursively search for gcda files (default: .)')
+    parser.add_argument('-c', '--compiler-directory', dest='cdirectory', default=".", help='Base directory compiler was invoked from (default: .)')
+    parser.add_argument('-j', '--jobs', dest='jobs', type=int, default=multiprocessing.cpu_count(), help='Number of parallel gcov to spawn (default: %d).' % multiprocessing.cpu_count())
+
+
+    parser.add_argument('-o', '--output', dest='output', default="coverage.json", help='Name of output file (default: coverage.json)')
+    parser.add_argument('-i', '--lcov', dest='lcov', action="store_true", help='Output in lcov info format instead of gcov json')
+    args = parser.parse_args()
+    main(args)