zoo.util package¶

Submodules¶

zoo.util.engine module¶

zoo.util.engine.check_spark_source_conflict(spark_home, pyspark_path)[source]¶

zoo.util.engine.compare_version(version1, version2)[source]¶: Compare version strings. Return 1 if version1 is after version2; -1 if version1 is before version2; 0 if two versions are the same.

zoo.util.engine.exist_pyspark()[source]¶

zoo.util.engine.get_analytics_zoo_classpath()[source]¶: Get and return the jar path for analytics-zoo if exists.

zoo.util.engine.is_spark_below_2_2()[source]¶: Check if spark version is below 2.2.

zoo.util.engine.prepare_env()[source]¶

zoo.util.nest module¶

zoo.util.nest.flatten(seq)[source]¶

zoo.util.nest.is_sequence(s)[source]¶

zoo.util.nest.pack_sequence_as(structure, flat_sequence)[source]¶

zoo.util.nest.ptensor_to_numpy(seq)[source]¶

zoo.util.spark module¶

class zoo.util.spark.SparkRunner(spark_log_level='WARN', redirect_spark_log=True)[source]¶

Bases: object

create_sc(submit_args, conf)[source]¶

init_spark_on_k8s(master, container_image, num_executors, executor_cores, executor_memory='2g', driver_memory='1g', driver_cores=4, extra_executor_memory_for_ray=None, extra_python_lib=None, conf=None, jars=None, python_location=None)[source]¶

init_spark_on_local(cores, conf=None, python_location=None)[source]¶

init_spark_on_yarn(hadoop_conf, conda_name, num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', extra_executor_memory_for_ray=None, extra_python_lib=None, penv_archive=None, additional_archive=None, hadoop_user_name='root', spark_yarn_archive=None, conf=None, jars=None)[source]¶

init_spark_standalone(num_executors, executor_cores, executor_memory='2g', driver_cores=4, driver_memory='1g', master=None, extra_executor_memory_for_ray=None, extra_python_lib=None, conf=None, jars=None, python_location=None, enable_numa_binding=False)[source]¶

standalone_env = None¶

static stop_spark_standalone()[source]¶

zoo.util.spark.enrich_conf_for_spark(conf, driver_cores, driver_memory, num_executors, executor_cores, executor_memory, extra_executor_memory_for_ray=None)[source]¶

zoo.util.spark.gen_submit_args(driver_cores, driver_memory, num_executors, executor_cores, executor_memory, extra_python_lib=None, jars=None)[source]¶

zoo.util.tf module¶

zoo.util.tf.export_tf(sess, folder, inputs, outputs, generate_backward=False, allow_non_differentiable_input=True)[source]¶

Export the frozen tensorflow graph as well as the inputs/outputs information to the folder for inference.

This function will 1. freeze the graph (replace all variables with constants) 2. strip all unused node as specified by inputs and outputs 3. add placeholder nodes as needed 4. write the frozen graph and inputs/outputs names to the folder

Note: There should not be any queuing operation between inputs and outputs

Parameters:	sess – tensorflow session holding the variables to be saved folder – the folder where graph file and inputs/outputs information are saved inputs – a list of tensorflow tensors that will be fed during inference outputs – a list of tensorflow tensors that will be fetched during inference
Returns:

zoo.util.tf.process_grad(grad)[source]¶

zoo.util.tf.strip_unused(input_graph_def, input_tensor_names, output_tensor_names, placeholder_type_enum)[source]¶

Removes unused nodes from a GraphDef.

Args: input_graph_def: A graph with nodes we want to prune. input_tensor_names: A list of the nodes we use as inputs. output_tensor_names: A list of the output nodes. placeholder_type_enum: The AttrValue enum for the placeholder data type, or a list that specifies one value per input node name.

Returns: A GraphDef with all unnecessary ops removed. and a map containing the old input names to the new input names

Raises: ValueError: If any element in input_node_names refers to a tensor instead of an operation. KeyError: If any element in input_node_names is not found in the graph.

zoo.util.tf_graph_util module¶

Helpers to manipulate a tensor graph in python.

zoo.util.tf_graph_util.convert_variables_to_constants(sess, input_graph_def, output_node_names, variable_names_whitelist=None, variable_names_blacklist=None)[source]¶

Replaces all the variables in a graph with constants of the same values. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants

If you have a trained graph containing Variable ops, it can be convenient to convert them all to Const ops holding the same values. This makes it possible to describe the network fully with a single GraphDef file, and allows the removal of a lot of ops related to loading and saving the variables. Args: sess: Active TensorFlow session containing the variables. input_graph_def: GraphDef object holding the network. output_node_names: List of name strings for the result nodes of the graph. variable_names_whitelist: The set of variable names to convert (by default, all variables are converted). variable_names_blacklist: The set of variable names to omit converting to constants. Returns: GraphDef containing a simplified version of the original.

zoo.util.tf_graph_util.extract_sub_graph(graph_def, dest_nodes)[source]¶

Extract the subgraph that can reach any of the nodes in ‘dest_nodes’. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph

Args: graph_def: A graph_pb2.GraphDef proto. dest_nodes: A list of strings specifying the destination node names. Returns: The GraphDef of the sub-graph. Raises: TypeError: If ‘graph_def’ is not a graph_pb2.GraphDef proto.

zoo.util.tf_graph_util.must_run_on_cpu(node, pin_variables_on_cpu=False)[source]¶

Returns True if the given node_def must run on CPU, otherwise False. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.must_run_on_cpu

Args: node: The node to be assigned to a device. Could be either an ops.Operation or NodeDef. pin_variables_on_cpu: If True, this function will return False if node_def represents a variable-related op. Returns: True if the given node must run on CPU, otherwise False.

zoo.util.tf_graph_util.remove_training_nodes(input_graph, protected_nodes=None)[source]¶

Prunes out nodes that aren’t needed for inference. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.remove_training_nodes

There are nodes like Identity and CheckNumerics that are only useful during training, and can be removed in graphs that will be used for nothing but inference. Here we identify and remove them, returning an equivalent graph. To be specific, CheckNumerics nodes are always removed, and Identity nodes that aren’t involved in control edges are spliced out so that their input and outputs are directly connected. Args: input_graph: Model to analyze and prune. protected_nodes: An optional list of names of nodes to be kept unconditionally. This is for example useful to preserve Identity output nodes. Returns: A list of nodes with the unnecessary ones removed.

zoo.util.tf_graph_util.tensor_shape_from_node_def_name(graph, input_name)[source]¶

Convenience function to get a shape from a NodeDef’s input string. (deprecated)

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.tensor_shape_from_node_def_name

zoo.util.utils module¶

zoo.util.utils.detect_python_location()[source]¶

zoo.util.utils.get_conda_python_path()[source]¶

zoo.util.utils.get_executor_conda_zoo_classpath(conda_path)[source]¶

zoo.util.utils.get_node_ip()[source]¶: This function is ported from ray to get the ip of the current node. In the settings where Ray is not involved, calling ray.services.get_node_ip_address would introduce Ray overhead.

zoo.util.utils.get_zoo_bigdl_classpath_on_driver()[source]¶

zoo.util.utils.pack_conda_main(conda_name, tmp_path)[source]¶

zoo.util.utils.pack_penv(conda_name, output_name)[source]¶

zoo.util.utils.set_python_home()[source]¶

zoo.util.utils.to_sample_rdd(x, y, sc, num_slices=None)[source]¶: Convert x and y into RDD[Sample] :param sc: SparkContext :param x: ndarray and the first dimension should be batch :param y: ndarray and the first dimension should be batch :param num_slices: The number of partitions for x and y. :return: