Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 7 errors on collectd start using ceph_pool_plugin #28

Open
steve--d opened this issue Apr 15, 2015 · 14 comments
Open

CentOS 7 errors on collectd start using ceph_pool_plugin #28

steve--d opened this issue Apr 15, 2015 · 14 comments

Comments

@steve--d
Copy link

After starting collectd running on CentOS 7, (ceph giant and now upgraded to hammer) I'm getting the following log errors using the ceph_pool_plugin.

-- Unit collectd.service has begun starting up.
Apr 15 15:04:18 ceph1.domain systemd[1]: Started Collectd statistics daemon.
-- Subject: Unit collectd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit collectd.service has finished starting up.
-- 
-- The start-up result is done.
Apr 15 15:04:18 ceph1.domain collectd[22862]: Initialization complete, entering read-loop.
Apr 15 15:04:18 ceph1.domain python[22874]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Package 'ceph-common' isn't signed with proper key
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: 'post-create' on '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874' exited with 1
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Deleting problem directory '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path  = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain python[22884]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22891]: Not saving repeating crash in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path  = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain collectd[22862]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):
                                                      File "/usr/lib64/collectd/base.py", line 114, in read_callback
                                                        stats = self.get_stats()
                                                      File "/usr/lib64/collectd/ceph_pool_plugin.py", line 67, in get_stats
                                                        json_stats_data = json.loads(stats_output)
                                                      File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
                                                        return _default_decoder.decode(s)
                                                      File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
                                                        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                                                      File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
                                                        raise ValueError("No JSON object could be decoded")
                                                    ValueError: No JSON object could be decoded
Apr 15 15:04:18 ceph1.domain collectd[22862]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Apr 15 15:04:18 ceph1.domain collectd[22862]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 120.000 seconds.

collectd.conf:

<LoadPlugin python>
  Globals true
</LoadPlugin>

<Plugin "python">
    ModulePath "/usr/lib64/collectd"

    Import "ceph_pool_plugin"

    <Module "ceph_pool_plugin">
        Verbose "True"
        Cluster "ceph"
        Interval "60"
        TestPool "rbd"
    </Module>
</Plugin>
@brynmathias
Copy link

I also see this error on RHEL 7.1

[root@tapir2 python]# service collectd status
Redirecting to /bin/systemctl status collectd.service
collectd.service - Collectd statistics daemon
Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled)
Active: active (running) since Fri 2015-05-08 13:08:01 BST; 2min 42s ago
Docs: man:collectd(1)
man:collectd.conf(5)
Main PID: 18995 (collectd)
CGroup: /system.slice/collectd.service
└─18995 /usr/sbin/collectd -C /etc/collectd.conf -f

May 08 13:08:01 tapir2.eng.velocix.com systemd[1]: Started Collectd statistics daemon.
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Initialization complete, entering read-loop.
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: read-function of plugin python.ceph' failed. Will suspend it for 20.000 seconds. May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: read-function of pluginpython.ceph' failed. Will suspend it for 40.000 seconds.
May 08 13:09:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found

my collectd.conf

Globals true ModulePath "/usr/lib64/collectd/python" Import "ceph"
<Module ceph>
    AdminSocket "/var/run/ceph/ceph-*.asok"
</Module>

TypesDB "/usr/share/collectd/types.db" "/usr/lib64/collectd/python/ceph.types.db"

@solune
Copy link

solune commented Jul 3, 2015

I've the same problem, have you find a workaround ?

@ozhanka
Copy link

ozhanka commented Jul 7, 2015

Hi i have also same problem for Rhel 7.1 and Ceph Hammer release, does anyone has fix/workaround for this problem?

@rochaporto
Copy link
Owner

I should be able to have a look next week.

@ksingh7
Copy link

ksingh7 commented Jul 20, 2015

I am facing exactly the same issue [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment

Collectd Logs

[2015-07-20 11:30:29] [info] ceph: collectd new data from service :: took 0 seconds
[2015-07-20 11:30:30] [error] ceph: failed to get stats :: Expecting object: line 2 column 124 (char 124) :: Traceback (most recent call last):
  File "/etc/collectd/plugins/ceph/base.py", line 114, in read_callback
    stats = self.get_stats()
  File "/etc/collectd/plugins/ceph/ceph_pool_plugin.py", line 72, in get_stats
    json_stats_data = json.loads(stats_output)
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 217, in JSONArray
    value, end = iterscan(s, idx=end, context=context).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = ac
[2015-07-20 11:30:30] [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
[2015-07-20 11:30:30] [notice] read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 240.000 seconds.
[2015-07-20 11:30:41] [info] ceph: collectd new data from service :: took 13 seconds

Did anyone managed to fix this.

@rochaporto Do you have time to check this , appreciate your help.

@gcmalloc
Copy link

I'm having the same issue here.
Seems like the origin is there:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 896, in <module>
    retval = main()
  File "/usr/bin/ceph", line 647, in main
    conffile=conffile)
  File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
    library_path  = find_library('rados')
  File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
    return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
  File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
    f.close()
IOError: [Errno 10] No child processes

@roadracer
Copy link

Any news?
I'm having the same issue for Ubuntu 14.04 and Ceph Hammer release:

Aug 21 00:07:54 collectd collectd[17115]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):#12 File "/usr/lib/collectd/plugins/ceph/base.py", line 108, in read_callback#012 stats = self.get_stats()#12 File "/usr/lib/collectd/plugins/ceph/ceph_pool_plugin.py", line 67, in get_stats#012 json_stats_data = json.loads(stats_output)#12 File "/usr/lib/python2.7/json/init.py", line 338, in loads#012 return _default_decoder.decode(s)#12 File "/usr/lib/python2.7/json/decoder.py", line 366, in decode#012 obj, end = self.raw_decode(s, idx=_w(s, 0).end())#12 File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode#012 raise ValueError("No JSON object could be decoded")#012ValueError: No JSON object could be decoded
Aug 21 00:07:54 collectd collectd[17115]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Aug 21 00:07:54 collectd collectd[17115]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 20.000 seconds.

@solune
Copy link

solune commented Oct 13, 2015

One of you has succeed to make it works ? Another ceph ceph -- collectd plugin ?

@yashumitsu
Copy link

Hello!

This note described in a man page:

You may put getsigchld.py in scripts folder and insert the line to a configuration:

<Plugin "python"> 
  ModulePath [..]
  Import "getsigchld"

@solune
Copy link

solune commented Nov 30, 2015

it works better yashumitsu !

but now there is a new error:
Nov 30 20:33:05 cephrr1n4 collectd[19331]: ceph: failed to get stats :: list index out of range :: Traceback (most recent call last):
File "/opt/collectd-ceph/git/collectd-ceph/plugins/base.py", line 114, in read_callback
stats = self.get_stats()
File "/opt/collectd-ceph/git/collectd-ceph/plugins/ceph_latency_plugin.py", line 67, in get_stats
data[ceph_cluster]['cluster']['stddev_latency'] = results[1]
IndexError: list index out of range
Nov 30 20:33:05 cephrr1n4 collectd[19331]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Nov 30 20:33:05 cephrr1n4 collectd[19331]: read-function of plugin `python.ceph_latency_plugin' failed. Will suspend it for 120.000 seconds.

@yashumitsu
Copy link

No thanks necessary!

The easiest way to get it works is to change default pool name (data) to another pool, which is exists:

@solune
Copy link

solune commented Dec 1, 2015

It works!
Thanks

@mourgaya
Copy link
Contributor

mourgaya commented Dec 7, 2015

with strace we can see that getsigchld.py
so try to copy getsigchld.py
cp collectd-5.5.0/contrib/python/getsigchld.py /usr/lib64/python2.7/site-packages/

@benh57
Copy link

benh57 commented Dec 23, 2015

Thanks for posting this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests