1.session factory

tensorflow支持两种session factory，direct session以及grpc session，采用哪种session，由传入的session option来决定。

1.1 API

virtual Session* NewSession(const SessionOptions& options) = 0;

virtual bool AcceptsOptions(const SessionOptions& options) = 0;

virtual Status Reset(const SessionOptions& options, const std::vector<string>& containers) { return errors::Unimplemented("Reset()"); }

virtual ~SessionFactory() {} static void Register(const string& runtime_type, SessionFactory* factory); static Status GetFactory(const SessionOptions& options, SessionFactory** out_factory);

1.2 factory注册

tensorflow提供factory注册机制，不同模块可以注册自己的factory

void SessionFactory::Register(const string& runtime_type,
                              SessionFactory* factory) {
  mutex_lock l(*get_session_factory_lock());
  if (!session_factories()->insert({runtime_type, factory}).second) {
    LOG(ERROR) << "Two session factories are being registered "
               << "under" << runtime_type;
  }
}

session_factories为unordered_map类型，注册factory相当于向其中插入一个表项，如果插入成功，返回值的成员变量second为true。

typedef std::unordered_map<string, SessionFactory*> SessionFactories;
SessionFactories* session_factories() {
  static SessionFactories* factories = new SessionFactories;
  return factories;
}

direct session注册factory的代码如下所示：

class DirectSessionRegistrar {
 public:
  DirectSessionRegistrar() {
    SessionFactory::Register("DIRECT_SESSION", new DirectSessionFactory());
  }
};
static DirectSessionRegistrar registrar;

1.3 session 创建

tensorflow通过factory调用NewSession函数创建session，NewSession函数在session.cc中进行调用，源码路径为： tensorflow/core/common_runtime/session.cc

1.3.1 API

NewSession存在两种不同的接口，一种通过传入SessionOptions，创建Session之后返回指针，如下所示：

Session* NewSession(const SessionOptions& options)
{
   SessionFactory* factory;
   const Status s = SessionFactory::GetFactory(options, &factory);

   if (!s.ok()){
        LOG(ERROR) << s; return nullptr;
   }
   return factory->NewSession(options);
}

另一种将创建的session指针作为参数传入，返回值为Status，标志是否创建成功。

Status NewSession(const SessionOptions& options, Session** out_session) {
  SessionFactory* factory;
  const Status s = SessionFactory::GetFactory(options, &factory);
  if (!s.ok()) {
    *out_session = nullptr;
    LOG(ERROR) << s;
    return s;
  }
  *out_session = factory->NewSession(options);
  if (!*out_session) {
    return errors::Internal("Failed to create session.");
  }
  return Status::OK();
}

备注：LoadSavedModel调用后一种方式完成session的初始化,后续会有章节详细介绍该流程

Status LoadMetaGraphIntoSession(const MetaGraphDef& meta_graph_def,
                                const SessionOptions& session_options,
                                std::unique_ptr<Session>* session) {
  Session* session_p = nullptr;
  TF_RETURN_IF_ERROR(NewSession(session_options, &session_p));
  session->reset(session_p);
  return (*session)->Create(meta_graph_def.graph_def());
}

不同的factory完成NewSession的实现，后续会有额外的章节详细介绍。

1.3.2 GetFactory

创建session前需要先调用GetFactory函数，根据传入的SessionOption决定使用哪种factory，代码路径为： tensorflow/core/common_runtime/session_factory.cc 去掉注释及报错信息等，主体代码如下：

Status SessionFactory::GetFactory(const SessionOptions& options,
                                  SessionFactory** out_factory) {
  mutex_lock l(*get_session_factory_lock());  // could use reader lock

  std::vector<std::pair<string, SessionFactory*>> candidate_factories;
  for (const auto& session_factory : *session_factories()) {
    if (session_factory.second->AcceptsOptions(options)) {
      candidate_factories.push_back(session_factory);
    } 
  }
  if (candidate_factories.size() == 1) {
    *out_factory = candidate_factories[0].second;
    return Status::OK();
  } else if (candidate_factories.size() > 1) {
    std::vector<string> factory_types;
    factory_types.reserve(candidate_factories.size());
    for (const auto& candidate_factory : candidate_factories) {
      factory_types.push_back(candidate_factory.first);
    }

遍历session_factories中注册的factory，通过调用各个factory的AcceptOptions函数，进行判断选择候选factory并返回。 direct session的AceptOptions函数实现如下，若SessionOptions中没有指定target，默认使用tensorflow local runtime implementation：

bool AcceptsOptions(const SessionOptions& options) override {
    return options.target.empty();
  }

1.3.3 SessionOptions

SessionOptions在session_options.h中定义，代码路径为： tensorflow/core/public/session_options.h 去掉注释之后，主体结构如下所示：

struct SessionOptions {
  Env* env;
  string target;
  ConfigProto config;
  std::shared_ptr<SessionResource> sessionResource;
  CustomKernelCreator customKernelCreator;
  SessionOptions();
};

1.3.4 ConfigProto

session中主要配置信息通过ConfigProto进行设置，ConfigProto在config.proto中定义，代码路径为： tensorflow/core/protobuf/config.proto 具体代码如下：

// Session configuration parameters.
// The system picks appropriate values for fields that are not set.
message ConfigProto {
  // Map from device type name (e.g., "CPU" or "GPU" ) to maximum
  // number of devices of that type to use.  If a particular device
  // type is not found in the map, the system picks an appropriate
  // number.
  map<string, int32> device_count = 1;

  // The execution of an individual op (for some op types) can be
  // parallelized on a pool of intra_op_parallelism_threads.
  // 0 means the system picks an appropriate number.
  int32 intra_op_parallelism_threads = 2;

  // Nodes that perform blocking operations are enqueued on a pool of
  // inter_op_parallelism_threads available in each process.
  //
  // 0 means the system picks an appropriate number.
  //
  // Note that the first Session created in the process sets the
  // number of threads for all future sessions unless use_per_session_threads is
  // true or session_inter_op_thread_pool is configured.
  int32 inter_op_parallelism_threads = 5;

  // If true, use a new set of threads for this session rather than the global
  // pool of threads. Only supported by direct sessions.
  //
  // If false, use the global threads created by the first session, or the
  // per-session thread pools configured by session_inter_op_thread_pool.
  //
  // This option is deprecated. The same effect can be achieved by setting
  // session_inter_op_thread_pool to have one element, whose num_threads equals
  // inter_op_parallelism_threads.
  bool use_per_session_threads = 9;

  // This option is experimental - it may be replaced with a different mechanism
  // in the future.
  //
  // Configures session thread pools. If this is configured, then RunOptions for
  // a Run call can select the thread pool to use.
  //
  // The intended use is for when some session invocations need to run in a
  // background pool limited to a small number of threads:
  // - For example, a session may be configured to have one large pool (for
  // regular compute) and one small pool (for periodic, low priority work);
  // using the small pool is currently the mechanism for limiting the inter-op
  // parallelism of the low priority work.  Note that it does not limit the
  // parallelism of work spawned by a single op kernel implementation.
  // - Using this setting is normally not needed in training, but may help some
  // serving use cases.
  // - It is also generally recommended to set the global_name field of this
  // proto, to avoid creating multiple large pools. It is typically better to
  // run the non-low-priority work, even across sessions, in a single large
  // pool.
  repeated ThreadPoolOptionProto session_inter_op_thread_pool = 12;

  // Assignment of Nodes to Devices is recomputed every placement_period
  // steps until the system warms up (at which point the recomputation
  // typically slows down automatically).
  int32 placement_period = 3;

  // When any filters are present sessions will ignore all devices which do not
  // match the filters. Each filter can be partially specified, e.g. "/job:ps"
  // "/job:worker/replica:3", etc.
  repeated string device_filters = 4;

  // Options that apply to all GPUs.
  GPUOptions gpu_options = 6;

  // Whether soft placement is allowed. If allow_soft_placement is true,
  // an op will be placed on CPU if
  //   1. there's no GPU implementation for the OP
  // or
  //   2. no GPU devices are known or registered
  // or
  //   3. need to co-locate with reftype input(s) which are from CPU.
  bool allow_soft_placement = 7;

  // Whether device placements should be logged.
  bool log_device_placement = 8;

  // Options that apply to all graphs.
  GraphOptions graph_options = 10;

  // Global timeout for all blocking operations in this session.  If non-zero,
  // and not overridden on a per-operation basis, this value will be used as the
  // deadline for all blocking operations.
  int64 operation_timeout_in_ms = 11;

  // Options that apply when this session uses the distributed runtime.
  RPCOptions rpc_options = 13;

  // Optional list of all workers to use in this session.
  ClusterDef cluster_def = 14;

  // If true, any resources such as Variables used in the session will not be
  // shared with other sessions.
  bool isolate_session_state = 15;

  // Next: 16
  // NOTE(yuanman.ym) Specify session handle in client side
  string session_handle = 200;

  // Whether tensor fuse is used in distributed running mode.
  bool tensor_fuse = 201;

  // Some expensive io operations in an op (like MergeV2Checkpoints) can be
  // paralledized on an pool of global_io_parallelism_threads
  // 0 means the system picks an appropriate number.
  int32 global_io_parallelism_threads = 202;

  bool run_graph_mode = 203;
  // Next: 204
};

config中主要对执行中的线程池进行了相关配置，供session创建时进行不同的线程池创建，主要包括：

global thread pool：多个会话共享线程池，注意由于线程池共享，多个会话间执行时需要排队
session thread pool：多个会话多个线程池，每个session可以有自己的线程池，根据use_per_session_threads配置使能。对于多个异构计算资源时，推荐使用这种配置，可以实现session间并行，充分利用计算资源，提升性能。
单个会话设置多个线程池：跟据sessionoptions的配置，读取多个线程池的配置，生成多个线程池的vector

TODO:对direct session中SessionOption的具体使用方法，后续专门的文章进行介绍