Close

Death at onNodeReplacement

A project log for ROSCOE - A Scalable, Platform Independent Robot

A new algebraic machine cognition model and a novel machine vision architecture

j-groffJ Groff 08/13/2014 at 17:400 Comments

The master server attempts to resolve a request for publishers by looking up the node name in a table and then comparing the URI of the request to the URI of the node retrieved from the table. If the 2 URI's do not match Master will attempt to shut down the running node and replace it with a new one listening at the URI of the request. In this way the client drives the creation of the slave server that will service future requests at the known URI, or so it seems to me. 

Recall that the slave server is composed of 2 subsystems: Apache XmlRpcServer and TcpRosServer which is the persistent P2P socket connect that actually handles topic data traffic. The XmlRpcServer is composed of a rudimentary web server which is necessary because random ports listening to the Http XmlRpc traffic are necessary and spawned as needed, something that is anathema to standalone containers like Apache that need lots of configs and specific port assignments.

TcpRosServer is based on the Netty project, part of JBoss. Essentially its a framework for protocol server construction that allows you to set up Java NIO channel groups and service them asynchronously and lots of other neat stuff we used to have to cobble here and there.

ROS is a big improvement over the PowerKernel architecture I devised where the connections are static and the compute graph is monolithic but getting everything to hook up is just as much a beeyotch. It occurred to me that wow, put some VTUN software on a 1U *nix box and install a ROS master and you have a ready-made dark web. Lo and behold I see the guy that wrote ROSJava has just such a project on his site. Great minds..

Anyway, when the client node requests a topic the master calls a shutdown of the slave XmlRpcServer and TcpRosServers and invokes this onNodeReplacement method that signals shutdown through the slave client I believe. Somewhere in the chain of events an attempted  connection to the old server or the new one is causing 'Connection Refused' exception and bombing the slave server. The web server runs its own thread so most likely theres a lifecycle issue there. Web server shutdown involves killing its listener thread and shutting down the thread pool of requests it maintains. Its possible I may have to more tightly integrate it, perhaps put it under a different Executor. 

Discussions