[Swarming] work around slow calls in archive.py
Apparently, the tarfile Python module spends a lot of time in grp.getgrid for retrieving a piece information (the name of the primary group) which we don't need anyway. There is no proper way to disable these slow calls, but there's a workaround which relies on the way in which grp (and pwd) is used. In fact, pwd and grp are imported in this fashion: try: import grp, pwd except ImportError: grp = pwd = None and then used with the following pattern [2]: if grp: try: tarinfo.gname = grp.getgrgid(tarinfo.gid)[0] except KeyError: pass By setting grp and pwd to None, thus skipping the calls, I was able to achieve a 35x speedup on my workstation. The user and group names are set to test262 when building the tar. The downside to this approach is that we are relying on an implementation detail, which is not in the public API. However, the blamelist shows that the relevant bits of the module have not been updated since 2003 [3], so we might as well assume that the workaround will keep working, on cPython 2.x at least. --- [1] https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l56 [2] https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l1933 [3] https://hg.python.org/cpython/rev/f9a5ed092660 BUG=chromium:535160 LOG=N Review URL: https://codereview.chromium.org/1727773002 Cr-Commit-Position: refs/heads/master@{#34245}
This commit is contained in:
parent
20362a2214
commit
1c1b70c98d
@ -8,10 +8,15 @@ import tarfile
|
||||
|
||||
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
# Workaround for slow grp and pwd calls.
|
||||
tarfile.grp = None
|
||||
tarfile.pwd = None
|
||||
|
||||
def filter_git(tar_info):
|
||||
if tar_info.name.startswith(os.path.join('data', '.git')):
|
||||
return None
|
||||
else:
|
||||
tar_info.uname = tar_info.gname = "test262"
|
||||
return tar_info
|
||||
|
||||
with tarfile.open('data.tar', 'w') as tar:
|
||||
|
Loading…
Reference in New Issue
Block a user