SageMaker is an AWS-fully managed service that covers the entire workflow of Machine Learning. Using the SageMaker demo of AWS, we illustrate the most important relationships, basics and functional principles.
For our experiment, we use the MNIST dataset as training data . The Modified National Institute of Standards and Technology ( MNIST) database is a very large database of handwritten digits that is commonly used to train various image processing systems. The database is also widely used for machine learning (ML) training and testing. The dataset was created by "remixing" the samples from the original NIST dataset records .The reason for this is that the makers thought the NIST training dataset was not directly suited for machine learning experiments because it came from American Census Bureau staff and the test dataset from American students. In the MNIST database, the NIST's black-and-white images were normalised to a size of 28 x 28 pixels with anti- aliasing and grayscale values.
The MNIST database of handwritten digits currently includes a training set of 50,000 examples and a test set of 10,000 examples, a subset of the NIST dataset. The MNIST data foundation is well-suited for trying out learning techniques and pattern recognition methods to real data with minimal pre-processing and formatting.
In the following experiment, we set out to do the example listed here - https://github.com/prasanjit-/ml_notebooks/blob/master/kmeans_mnist.ipynb
The high level steps are:
- Prepare training data
- Train a model
- Deploy & validate the model
- Use the result for predictions
- Train a model
- Deploy & validate the model
- Use the result for predictions
Refer to the Jupiter Notebook in the Github repo for detailed steps- https://github.com/prasanjit-/ml_notebooks/blob/master/MNISTDemo.ipynb
The below is a summary of steps to create this training model on Sagemaker:
- Create an S3 bucket
Create an S3 bucket to hold the following -
a. The model training data
b. Model artifacts (which Amazon SageMaker generates during model training).
2. Create a Notebook instance
Create a Notebook instance by logging onto: https://console.aws.amazon.com/sagemaker/
3. Create a new conda_python3 notebook
Once created, open the notebook instance and you will be directed to Jupyter Server. At this point create a new conda_python3 notebook.
4. Specify the role
Specify the role and S3 bucket as follows:
from sagemaker import get_execution_rolerole = get_execution_role()
bucket=’bucket-name’
5. Download the MNIST dataset
Download the MNIST dataset to the notebook’s memory.
The MNIST database of handwritten digits has a training set of 60,000 examples.
%%time
import pickle, gzip, numpy, urllib.request, json# Load the dataset
urllib.request.urlretrieve(“http://deeplearning.net/data/mnist/mnist.pkl.gz", “mnist.pkl.gz”)
with gzip.open(‘mnist.pkl.gz’, ‘rb’) as f:
train_set, valid_set, test_set = pickle.load(f, encoding=’latin1')
6. Convert to RecordIO Format
For this example Data needs to be converted to RecordIO format — which is a file format for storing a sequence of records. Records are stored as an unsigned variant specifying the length of the data, and then the data itself as a binary blob.
Algorithms can accept input data from one or more channels. For example, an algorithm might have two channels of input data, training_data and validation_data. The configuration for each channel provides the S3 location where the input data is stored. It also provides information about the stored data: the MIME type, compression method, and whether the data is wrapped in RecordIO format.
Depending on the input mode that the algorithm supports, Amazon SageMaker either copies input data files from an S3 bucket to a local directory in the Docker container, or makes it available as input streams.
Manual Transformation is not needed since we are following Amazon SageMaker’s Highlevel Libraries fit method in this example.
7. Create a training job
In this example we will use the Amazon SageMaker KMeans module.
From SageMaker, import KMeans as follows:
data_location = ‘s3://{}/kmeans_highlevel_example/data’.format(bucket)
output_location = ‘s3://{}/kmeans_example/output’.format(bucket)print(‘training data will be uploaded to: {}’.format(data_location))
print(‘training artifacts will be uploaded to: {}’.format(output_location))kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type=’ml.c4.8xlarge’,
output_path=output_location,
k=10,
data_location=data_location)
- role — The IAM role that Amazon SageMaker can assume to perform tasks on your behalf (for example, reading training results, called model artifacts, from the S3 bucket and writing training results to Amazon S3).
- output_path — The S3 location where Amazon SageMaker stores the training results.
- train_instance_count and train_instance_type — The type and number of ML EC2 compute instances to use for model training.
- k — The number of clusters to create. For more information, see K-Means Hyperparameters.
- data_location — The S3 location where the high-level library uploads the transformed training data.
8. Start Model Training
%%timekmeans.fit(kmeans.record_set(train_set[0]))
9. Deploy a Model
Deploying a model is a 3 step process.
- Create a Model — CreateModel request is used to provide information such as the location of the S3 bucket that contains your model artifacts and the registry path of the image that contains inference code.
- Create an Endpoint Configuration — CreateEndpointConfig request is used to provide the resource configuration for hosting. This includes the type and number of ML compute instances to launch for deploying the model.
- Create an Endpoint — CreateEndpoint request is used to create an endpoint. Amazon SageMaker launches the ML compute instances and deploys the model.
The High Level Python Library deploy method provides all these tasks.
%%timekmeans_predictor = kmeans.deploy(initial_instance_count=1,
instance_type=’ml.m4.xlarge’)
The sagemaker.amazon.kmeans.KMeans instance knows the registry path of the image that contains the k-means inference code, so you don’t need to provide it.
This is a synchronous operation. The method waits until the deployment completes before returning. It returns a kmeans_predictor.
10. Validate the Model
Here we get an inference for the 30th image of a handwritten number in the valid_set dataset.
result = kmeans_predictor.predict(train_set[0][30:31])
print(result)
The result would show the closest cluster and the distance from that cluster.
This video has a complete demonstration of this experiment.
Below is the set of commands that were executed and the results of the execution:
In [1]:
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'sagemaker-ps-01' # Use the name of your s3 bucket here
In [2]:
role
Out[2]:
'arn:aws:iam::779615490104:role/service-role/AmazonSageMaker-ExecutionRole-20191103T150143'
In [3]:
%%time
import pickle, gzip, numpy, urllib.request, json
# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
CPU times: user 892 ms, sys: 278 ms, total: 1.17 s Wall time: 4.6 s
In [6]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (2,10)
def show_digit(img, caption='', subplot=None):
if subplot == None:
_, (subplot) = plt.subplots(1,1)
imgr = img.reshape((28,28))
subplot.axis('off')
subplot.imshow(imgr, cmap='gray')
plt.title(caption)
show_digit(train_set[0][1], 'This is a {}'.format(train_set[1][1]))
In [7]:
from sagemaker import KMeans
data_location = 's3://{}/kmeans_highlevel_example/data'.format(bucket)
output_location = 's3://{}/kmeans_highlevel_example/output'.format(bucket)
print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))
kmeans = KMeans(role=role,
train_instance_count=2,
train_instance_type='ml.c4.8xlarge',
output_path=output_location,
k=10,
epochs=100,
data_location=data_location)
training data will be uploaded to: s3://sagemaker-ps-01/kmeans_highlevel_example/data training artifacts will be uploaded to: s3://sagemaker-ps-01/kmeans_highlevel_example/output
In [8]:
%%time
kmeans.fit(kmeans.record_set(train_set[0]))
2019-11-03 11:45:01 Starting - Starting the training job... 2019-11-03 11:45:03 Starting - Launching requested ML instances...... 2019-11-03 11:46:02 Starting - Preparing the instances for training... 2019-11-03 11:46:43 Downloading - Downloading input data... 2019-11-03 11:47:26 Training - Training image download completed. Training in progress..Docker entrypoint called with argument(s): train [11/03/2019 11:47:28 INFO 140552810366784] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'_enable_profiler': u'false', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'_kvstore': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'true', u'epochs': u'1', u'init_method': u'random', u'local_lloyd_tol': u'0.0001', u'local_lloyd_max_iter': u'300', u'_disable_wait_to_read': u'false', u'extra_center_factor': u'auto', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'half_life_time_size': u'0', u'_num_slices': u'1'} [11/03/2019 11:47:28 INFO 140552810366784] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'epochs': u'100', u'feature_dim': u'784', u'k': u'10', u'force_dense': u'True'} [11/03/2019 11:47:28 INFO 140552810366784] Final configuration: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'} [11/03/2019 11:47:28 WARNING 140552810366784] Loggers have already been setup. [11/03/2019 11:47:28 INFO 140552810366784] Environment: {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/dc163b99-1521-4ccb-ad30-92ce3ffc3cce', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_PS_ROOT_PORT': '9000', 'DMLC_NUM_WORKER': '2', 'SAGEMAKER_HTTP_PORT': '8080', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'HOME': '/root', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-208-60.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/4055df43-a805-42f4-8085-40b6d8b6ab74', 'DMLC_ROLE': 'worker', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} Process 1 is a worker. [11/03/2019 11:47:28 INFO 140552810366784] Using default worker. [11/03/2019 11:47:28 INFO 140552810366784] Loaded iterator creator application/x-recordio-protobuf for content type ('application/x-recordio-protobuf', '1.0') [11/03/2019 11:47:28 INFO 140552810366784] Create Store: dist_async Docker entrypoint called with argument(s): train [11/03/2019 11:47:29 INFO 140169171593024] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'_enable_profiler': u'false', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'_kvstore': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'true', u'epochs': u'1', u'init_method': u'random', u'local_lloyd_tol': u'0.0001', u'local_lloyd_max_iter': u'300', u'_disable_wait_to_read': u'false', u'extra_center_factor': u'auto', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'half_life_time_size': u'0', u'_num_slices': u'1'} [11/03/2019 11:47:29 INFO 140169171593024] Reading provided configuration from /opt/ml/input/config/hyperparameters.json: {u'epochs': u'100', u'feature_dim': u'784', u'k': u'10', u'force_dense': u'True'} [11/03/2019 11:47:29 INFO 140169171593024] Final configuration: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'} [11/03/2019 11:47:29 WARNING 140169171593024] Loggers have already been setup. [11/03/2019 11:47:29 INFO 140169171593024] Launching parameter server for role scheduler [11/03/2019 11:47:29 INFO 140169171593024] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} [11/03/2019 11:47:29 INFO 140169171593024] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '2', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'scheduler', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} [11/03/2019 11:47:29 INFO 140169171593024] Launching parameter server for role server [11/03/2019 11:47:29 INFO 140169171593024] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} [11/03/2019 11:47:29 INFO 140169171593024] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '2', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'server', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} [11/03/2019 11:47:29 INFO 140169171593024] Environment: {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/86d7c856-2158-4dd0-a0f9-7e34716c8d05', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_PS_ROOT_PORT': '9000', 'DMLC_NUM_WORKER': '2', 'SAGEMAKER_HTTP_PORT': '8080', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'MXNET_KVSTORE_BIGARRAY_BOUND': '400000000', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.229.182', 'AWS_REGION': 'eu-west-1', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'kmeans-2019-11-03-11-45-00-997', 'HOME': '/root', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '18', 'HOSTNAME': 'ip-10-0-229-182.eu-west-1.compute.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/05ea5ad8-333f-415c-981c-e8b507b70f15', 'DMLC_ROLE': 'worker', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:eu-west-1:779615490104:training-job/kmeans-2019-11-03-11-45-00-997', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'} Process 109 is a shell:scheduler. Process 118 is a shell:server. Process 1 is a worker. [11/03/2019 11:47:29 INFO 140169171593024] Using default worker. [11/03/2019 11:47:29 INFO 140169171593024] Loaded iterator creator application/x-recordio-protobuf for content type ('application/x-recordio-protobuf', '1.0') [11/03/2019 11:47:29 INFO 140169171593024] Create Store: dist_async [11/03/2019 11:47:30 INFO 140552810366784] nvidia-smi took: 0.0252320766449 secs to identify 0 gpus [11/03/2019 11:47:30 INFO 140552810366784] Number of GPUs being used: 0 [11/03/2019 11:47:30 INFO 140552810366784] Setting up with params: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'} [11/03/2019 11:47:30 INFO 140552810366784] 'extra_center_factor' was set to 'auto', evaluated to 10. [11/03/2019 11:47:30 INFO 140552810366784] Number of GPUs being used: 0 [11/03/2019 11:47:30 INFO 140552810366784] number of center slices 1 #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Batches Since Last Reset": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Records Since Last Reset": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Total Batches Seen": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Total Records Seen": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Max Records Seen Between Resets": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1572781650.394244, "Dimensions": {"Host": "algo-2", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.394209} [2019-11-03 11:47:30.417] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 0, "duration": 87, "num_examples": 1, "num_bytes": 15820000} [2019-11-03 11:47:30.596] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 178, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 1 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Total Records Seen": {"count": 1, "max": 30000, "sum": 30000.0, "min": 30000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1572781650.596894, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 0}, "StartTime": 1572781650.417194} [11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=139020.312597 records/second [11/03/2019 11:47:30 INFO 140169171593024] nvidia-smi took: 0.025279045105 secs to identify 0 gpus [11/03/2019 11:47:30 INFO 140169171593024] Number of GPUs being used: 0 [11/03/2019 11:47:30 INFO 140169171593024] Setting up with params: {u'_tuning_objective_metric': u'', u'extra_center_factor': u'auto', u'local_lloyd_init_method': u'kmeans++', u'force_dense': u'True', u'epochs': u'100', u'feature_dim': u'784', u'local_lloyd_tol': u'0.0001', u'_disable_wait_to_read': u'false', u'eval_metrics': u'["msd"]', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'_enable_profiler': u'false', u'_num_gpus': u'auto', u'local_lloyd_num_trials': u'auto', u'_log_level': u'info', u'init_method': u'random', u'half_life_time_size': u'0', u'local_lloyd_max_iter': u'300', u'_kvstore': u'auto', u'k': u'10', u'_num_slices': u'1'} [11/03/2019 11:47:30 INFO 140169171593024] 'extra_center_factor' was set to 'auto', evaluated to 10. [11/03/2019 11:47:30 INFO 140169171593024] Number of GPUs being used: 0 [11/03/2019 11:47:30 INFO 140169171593024] number of center slices 1 #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Batches Since Last Reset": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Records Since Last Reset": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Total Batches Seen": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Total Records Seen": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Max Records Seen Between Resets": {"count": 1, "max": 5000, "sum": 5000.0, "min": 5000}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1572781650.390149, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.390114} [2019-11-03 11:47:30.413] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 0, "duration": 88, "num_examples": 1, "num_bytes": 15820000} [2019-11-03 11:47:30.610] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 196, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 1 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Total Records Seen": {"count": 1, "max": 30000, "sum": 30000.0, "min": 30000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1572781650.611, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 0}, "StartTime": 1572781650.413488} [11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=126474.646596 records/second [2019-11-03 11:47:30.732] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 3, "duration": 120, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 2 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 11, "sum": 11.0, "min": 11}, "Total Records Seen": {"count": 1, "max": 55000, "sum": 55000.0, "min": 55000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 2, "sum": 2.0, "min": 2}}, "EndTime": 1572781650.732486, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 1}, "StartTime": 1572781650.611256} [11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206017.191414 records/second [2019-11-03 11:47:30.853] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 5, "duration": 120, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 3 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 16, "sum": 16.0, "min": 16}, "Total Records Seen": {"count": 1, "max": 80000, "sum": 80000.0, "min": 80000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 3, "sum": 3.0, "min": 3}}, "EndTime": 1572781650.854186, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 2}, "StartTime": 1572781650.732736} [11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205620.877095 records/second [2019-11-03 11:47:30.962] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 7, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140169171593024] #progress_metric: host=algo-1, completed 4 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 21, "sum": 21.0, "min": 21}, "Total Records Seen": {"count": 1, "max": 105000, "sum": 105000.0, "min": 105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 4, "sum": 4.0, "min": 4}}, "EndTime": 1572781650.96329, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 3}, "StartTime": 1572781650.856089} [11/03/2019 11:47:30 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=232952.697479 records/second [2019-11-03 11:47:31.061] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 9, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 5 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 26, "sum": 26.0, "min": 26}, "Total Records Seen": {"count": 1, "max": 130000, "sum": 130000.0, "min": 130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1572781651.061609, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 4}, "StartTime": 1572781650.963495} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=254481.560222 records/second [2019-11-03 11:47:31.176] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 11, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 6 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 31, "sum": 31.0, "min": 31}, "Total Records Seen": {"count": 1, "max": 155000, "sum": 155000.0, "min": 155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 6, "sum": 6.0, "min": 6}}, "EndTime": 1572781651.177087, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 5}, "StartTime": 1572781651.061859} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216692.257301 records/second [2019-11-03 11:47:30.711] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 3, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 2 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 11, "sum": 11.0, "min": 11}, "Total Records Seen": {"count": 1, "max": 55000, "sum": 55000.0, "min": 55000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 2, "sum": 2.0, "min": 2}}, "EndTime": 1572781650.712005, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 1}, "StartTime": 1572781650.597101} [11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217345.775486 records/second [2019-11-03 11:47:30.825] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 5, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 3 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 16, "sum": 16.0, "min": 16}, "Total Records Seen": {"count": 1, "max": 80000, "sum": 80000.0, "min": 80000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 3, "sum": 3.0, "min": 3}}, "EndTime": 1572781650.826047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 2}, "StartTime": 1572781650.712255} [11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=219428.877551 records/second [2019-11-03 11:47:30.942] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 7, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:30 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:30 INFO 140552810366784] #progress_metric: host=algo-2, completed 4 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 21, "sum": 21.0, "min": 21}, "Total Records Seen": {"count": 1, "max": 105000, "sum": 105000.0, "min": 105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 4, "sum": 4.0, "min": 4}}, "EndTime": 1572781650.943047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 3}, "StartTime": 1572781650.826549} [11/03/2019 11:47:30 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214336.728541 records/second [2019-11-03 11:47:31.046] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 9, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 5 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 26, "sum": 26.0, "min": 26}, "Total Records Seen": {"count": 1, "max": 130000, "sum": 130000.0, "min": 130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1572781651.046523, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 4}, "StartTime": 1572781650.943299} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241870.421288 records/second [2019-11-03 11:47:31.143] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 11, "duration": 96, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 6 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 31, "sum": 31.0, "min": 31}, "Total Records Seen": {"count": 1, "max": 155000, "sum": 155000.0, "min": 155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 6, "sum": 6.0, "min": 6}}, "EndTime": 1572781651.144019, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 5}, "StartTime": 1572781651.046998} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=257288.957374 records/second [2019-11-03 11:47:31.244] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 13, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 7 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 36, "sum": 36.0, "min": 36}, "Total Records Seen": {"count": 1, "max": 180000, "sum": 180000.0, "min": 180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 7, "sum": 7.0, "min": 7}}, "EndTime": 1572781651.244924, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 6}, "StartTime": 1572781651.144272} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248028.100718 records/second [2019-11-03 11:47:31.344] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 15, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 8 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 41, "sum": 41.0, "min": 41}, "Total Records Seen": {"count": 1, "max": 205000, "sum": 205000.0, "min": 205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 8, "sum": 8.0, "min": 8}}, "EndTime": 1572781651.345334, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 7}, "StartTime": 1572781651.245178} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249264.503124 records/second [2019-11-03 11:47:31.437] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 17, "duration": 91, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 9 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 46, "sum": 46.0, "min": 46}, "Total Records Seen": {"count": 1, "max": 230000, "sum": 230000.0, "min": 230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 9, "sum": 9.0, "min": 9}}, "EndTime": 1572781651.437796, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 8}, "StartTime": 1572781651.345584} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=270515.787328 records/second [2019-11-03 11:47:31.544] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 19, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 10 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 51, "sum": 51.0, "min": 51}, "Total Records Seen": {"count": 1, "max": 255000, "sum": 255000.0, "min": 255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 10, "sum": 10.0, "min": 10}}, "EndTime": 1572781651.54472, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 9}, "StartTime": 1572781651.438118} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234205.612486 records/second [2019-11-03 11:47:31.299] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 13, "duration": 120, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 7 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 36, "sum": 36.0, "min": 36}, "Total Records Seen": {"count": 1, "max": 180000, "sum": 180000.0, "min": 180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 7, "sum": 7.0, "min": 7}}, "EndTime": 1572781651.300212, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 6}, "StartTime": 1572781651.179075} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206112.356017 records/second [2019-11-03 11:47:31.417] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 15, "duration": 117, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 8 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 41, "sum": 41.0, "min": 41}, "Total Records Seen": {"count": 1, "max": 205000, "sum": 205000.0, "min": 205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 8, "sum": 8.0, "min": 8}}, "EndTime": 1572781651.418261, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 7}, "StartTime": 1572781651.300484} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212013.425533 records/second [2019-11-03 11:47:31.537] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 17, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 9 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 46, "sum": 46.0, "min": 46}, "Total Records Seen": {"count": 1, "max": 230000, "sum": 230000.0, "min": 230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 9, "sum": 9.0, "min": 9}}, "EndTime": 1572781651.537545, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 8}, "StartTime": 1572781651.41851} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209763.02597 records/second [2019-11-03 11:47:31.659] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 19, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 10 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 51, "sum": 51.0, "min": 51}, "Total Records Seen": {"count": 1, "max": 255000, "sum": 255000.0, "min": 255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 10, "sum": 10.0, "min": 10}}, "EndTime": 1572781651.659652, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 9}, "StartTime": 1572781651.539169} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=207255.491823 records/second [2019-11-03 11:47:31.766] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 21, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 11 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 56, "sum": 56.0, "min": 56}, "Total Records Seen": {"count": 1, "max": 280000, "sum": 280000.0, "min": 280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 11, "sum": 11.0, "min": 11}}, "EndTime": 1572781651.766884, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 10}, "StartTime": 1572781651.66} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233601.411532 records/second [2019-11-03 11:47:31.880] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 23, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140169171593024] #progress_metric: host=algo-1, completed 12 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 61, "sum": 61.0, "min": 61}, "Total Records Seen": {"count": 1, "max": 305000, "sum": 305000.0, "min": 305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 12, "sum": 12.0, "min": 12}}, "EndTime": 1572781651.882341, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 11}, "StartTime": 1572781651.767134} [11/03/2019 11:47:31 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216771.099282 records/second [2019-11-03 11:47:32.005] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 25, "duration": 122, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 13 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 66, "sum": 66.0, "min": 66}, "Total Records Seen": {"count": 1, "max": 330000, "sum": 330000.0, "min": 330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 13, "sum": 13.0, "min": 13}}, "EndTime": 1572781652.006303, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 12}, "StartTime": 1572781651.882572} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=201845.642106 records/second [2019-11-03 11:47:32.126] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 27, "duration": 120, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 14 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 71, "sum": 71.0, "min": 71}, "Total Records Seen": {"count": 1, "max": 355000, "sum": 355000.0, "min": 355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 14, "sum": 14.0, "min": 14}}, "EndTime": 1572781652.12742, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 13}, "StartTime": 1572781652.006544} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206517.890027 records/second [2019-11-03 11:47:31.671] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 21, "duration": 126, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 11 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 56, "sum": 56.0, "min": 56}, "Total Records Seen": {"count": 1, "max": 280000, "sum": 280000.0, "min": 280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 11, "sum": 11.0, "min": 11}}, "EndTime": 1572781651.671943, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 10}, "StartTime": 1572781651.544972} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=196678.558433 records/second [2019-11-03 11:47:31.777] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 23, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 12 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 61, "sum": 61.0, "min": 61}, "Total Records Seen": {"count": 1, "max": 305000, "sum": 305000.0, "min": 305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 12, "sum": 12.0, "min": 12}}, "EndTime": 1572781651.778138, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 11}, "StartTime": 1572781651.672195} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=235702.324034 records/second [2019-11-03 11:47:31.885] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 25, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 13 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 66, "sum": 66.0, "min": 66}, "Total Records Seen": {"count": 1, "max": 330000, "sum": 330000.0, "min": 330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 13, "sum": 13.0, "min": 13}}, "EndTime": 1572781651.885934, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 12}, "StartTime": 1572781651.778343} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=232080.829543 records/second [2019-11-03 11:47:31.995] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 27, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:31 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:31 INFO 140552810366784] #progress_metric: host=algo-2, completed 14 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 71, "sum": 71.0, "min": 71}, "Total Records Seen": {"count": 1, "max": 355000, "sum": 355000.0, "min": 355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 14, "sum": 14.0, "min": 14}}, "EndTime": 1572781651.996102, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 13}, "StartTime": 1572781651.887734} [11/03/2019 11:47:31 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=230376.771092 records/second [2019-11-03 11:47:32.102] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 29, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 15 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 76, "sum": 76.0, "min": 76}, "Total Records Seen": {"count": 1, "max": 380000, "sum": 380000.0, "min": 380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 15, "sum": 15.0, "min": 15}}, "EndTime": 1572781652.102514, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 14}, "StartTime": 1572781651.998441} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=239888.906428 records/second [2019-11-03 11:47:32.208] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 31, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 16 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 81, "sum": 81.0, "min": 81}, "Total Records Seen": {"count": 1, "max": 405000, "sum": 405000.0, "min": 405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 16, "sum": 16.0, "min": 16}}, "EndTime": 1572781652.209094, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 15}, "StartTime": 1572781652.10273} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234803.482498 records/second [2019-11-03 11:47:32.323] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 33, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 17 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 86, "sum": 86.0, "min": 86}, "Total Records Seen": {"count": 1, "max": 430000, "sum": 430000.0, "min": 430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 17, "sum": 17.0, "min": 17}}, "EndTime": 1572781652.324047, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 16}, "StartTime": 1572781652.210925} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=220762.137353 records/second [2019-11-03 11:47:32.416] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 35, "duration": 90, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 18 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 91, "sum": 91.0, "min": 91}, "Total Records Seen": {"count": 1, "max": 455000, "sum": 455000.0, "min": 455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 18, "sum": 18.0, "min": 18}}, "EndTime": 1572781652.417163, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 17}, "StartTime": 1572781652.325707} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=272922.387384 records/second [2019-11-03 11:47:32.526] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 37, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 19 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 96, "sum": 96.0, "min": 96}, "Total Records Seen": {"count": 1, "max": 480000, "sum": 480000.0, "min": 480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 19, "sum": 19.0, "min": 19}}, "EndTime": 1572781652.527196, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 18}, "StartTime": 1572781652.417384} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=227432.165709 records/second [2019-11-03 11:47:32.626] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 39, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 20 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 101, "sum": 101.0, "min": 101}, "Total Records Seen": {"count": 1, "max": 505000, "sum": 505000.0, "min": 505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 20, "sum": 20.0, "min": 20}}, "EndTime": 1572781652.62669, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 19}, "StartTime": 1572781652.528697} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254831.607036 records/second [2019-11-03 11:47:32.248] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 29, "duration": 120, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 15 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 76, "sum": 76.0, "min": 76}, "Total Records Seen": {"count": 1, "max": 380000, "sum": 380000.0, "min": 380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 15, "sum": 15.0, "min": 15}}, "EndTime": 1572781652.249401, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 14}, "StartTime": 1572781652.127715} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205183.516847 records/second [2019-11-03 11:47:32.377] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 31, "duration": 125, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 16 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 81, "sum": 81.0, "min": 81}, "Total Records Seen": {"count": 1, "max": 405000, "sum": 405000.0, "min": 405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 16, "sum": 16.0, "min": 16}}, "EndTime": 1572781652.378822, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 15}, "StartTime": 1572781652.2497} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=193302.289226 records/second [2019-11-03 11:47:32.496] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 33, "duration": 116, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 17 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 86, "sum": 86.0, "min": 86}, "Total Records Seen": {"count": 1, "max": 430000, "sum": 430000.0, "min": 430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 17, "sum": 17.0, "min": 17}}, "EndTime": 1572781652.496576, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 16}, "StartTime": 1572781652.379179} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212693.332035 records/second [2019-11-03 11:47:32.615] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 35, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 18 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 91, "sum": 91.0, "min": 91}, "Total Records Seen": {"count": 1, "max": 455000, "sum": 455000.0, "min": 455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 18, "sum": 18.0, "min": 18}}, "EndTime": 1572781652.616174, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 17}, "StartTime": 1572781652.496823} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209236.884482 records/second [2019-11-03 11:47:32.737] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 37, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 19 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 96, "sum": 96.0, "min": 96}, "Total Records Seen": {"count": 1, "max": 480000, "sum": 480000.0, "min": 480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 19, "sum": 19.0, "min": 19}}, "EndTime": 1572781652.738183, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 18}, "StartTime": 1572781652.616413} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=205061.132536 records/second [2019-11-03 11:47:32.857] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 39, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 20 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 101, "sum": 101.0, "min": 101}, "Total Records Seen": {"count": 1, "max": 505000, "sum": 505000.0, "min": 505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 20, "sum": 20.0, "min": 20}}, "EndTime": 1572781652.858275, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 19}, "StartTime": 1572781652.738444} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208381.972142 records/second [2019-11-03 11:47:32.966] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 41, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140169171593024] #progress_metric: host=algo-1, completed 21 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 106, "sum": 106.0, "min": 106}, "Total Records Seen": {"count": 1, "max": 530000, "sum": 530000.0, "min": 530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 21, "sum": 21.0, "min": 21}}, "EndTime": 1572781652.966966, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 20}, "StartTime": 1572781652.858526} [11/03/2019 11:47:32 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=229766.962848 records/second [2019-11-03 11:47:33.074] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 43, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 22 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 111, "sum": 111.0, "min": 111}, "Total Records Seen": {"count": 1, "max": 555000, "sum": 555000.0, "min": 555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 22, "sum": 22.0, "min": 22}}, "EndTime": 1572781653.075226, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 21}, "StartTime": 1572781652.967602} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231852.986895 records/second [2019-11-03 11:47:33.183] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 45, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 23 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 116, "sum": 116.0, "min": 116}, "Total Records Seen": {"count": 1, "max": 580000, "sum": 580000.0, "min": 580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 23, "sum": 23.0, "min": 23}}, "EndTime": 1572781653.183966, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 22}, "StartTime": 1572781653.075554} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230295.815882 records/second [2019-11-03 11:47:32.733] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 41, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 21 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 106, "sum": 106.0, "min": 106}, "Total Records Seen": {"count": 1, "max": 530000, "sum": 530000.0, "min": 530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 21, "sum": 21.0, "min": 21}}, "EndTime": 1572781652.733428, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 20}, "StartTime": 1572781652.626892} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=234418.188728 records/second [2019-11-03 11:47:32.852] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 43, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 22 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 111, "sum": 111.0, "min": 111}, "Total Records Seen": {"count": 1, "max": 555000, "sum": 555000.0, "min": 555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 22, "sum": 22.0, "min": 22}}, "EndTime": 1572781652.85269, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 21}, "StartTime": 1572781652.73363} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=209777.294079 records/second [2019-11-03 11:47:32.963] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 45, "duration": 110, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:32 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:32 INFO 140552810366784] #progress_metric: host=algo-2, completed 23 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 116, "sum": 116.0, "min": 116}, "Total Records Seen": {"count": 1, "max": 580000, "sum": 580000.0, "min": 580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 23, "sum": 23.0, "min": 23}}, "EndTime": 1572781652.963672, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 22}, "StartTime": 1572781652.852899} [11/03/2019 11:47:32 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225459.002118 records/second [2019-11-03 11:47:33.079] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 47, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 24 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 121, "sum": 121.0, "min": 121}, "Total Records Seen": {"count": 1, "max": 605000, "sum": 605000.0, "min": 605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 24, "sum": 24.0, "min": 24}}, "EndTime": 1572781653.080286, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 23}, "StartTime": 1572781652.963875} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214553.817697 records/second [2019-11-03 11:47:33.194] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 49, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 25 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 126, "sum": 126.0, "min": 126}, "Total Records Seen": {"count": 1, "max": 630000, "sum": 630000.0, "min": 630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 25, "sum": 25.0, "min": 25}}, "EndTime": 1572781653.194447, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 24}, "StartTime": 1572781653.080736} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=219639.386018 records/second [2019-11-03 11:47:33.304] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 51, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 26 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 131, "sum": 131.0, "min": 131}, "Total Records Seen": {"count": 1, "max": 655000, "sum": 655000.0, "min": 655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 26, "sum": 26.0, "min": 26}}, "EndTime": 1572781653.304679, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 25}, "StartTime": 1572781653.194648} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226936.503505 records/second [2019-11-03 11:47:33.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 53, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 27 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 136, "sum": 136.0, "min": 136}, "Total Records Seen": {"count": 1, "max": 680000, "sum": 680000.0, "min": 680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 27, "sum": 27.0, "min": 27}}, "EndTime": 1572781653.408885, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 26}, "StartTime": 1572781653.305177} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=240719.926538 records/second [2019-11-03 11:47:33.506] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 55, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 28 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 141, "sum": 141.0, "min": 141}, "Total Records Seen": {"count": 1, "max": 705000, "sum": 705000.0, "min": 705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 28, "sum": 28.0, "min": 28}}, "EndTime": 1572781653.507087, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 27}, "StartTime": 1572781653.409142} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254881.161309 records/second [2019-11-03 11:47:33.616] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 57, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 29 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 146, "sum": 146.0, "min": 146}, "Total Records Seen": {"count": 1, "max": 730000, "sum": 730000.0, "min": 730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 29, "sum": 29.0, "min": 29}}, "EndTime": 1572781653.616641, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 28}, "StartTime": 1572781653.507342} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=228445.939469 records/second [2019-11-03 11:47:33.285] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 47, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 24 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 121, "sum": 121.0, "min": 121}, "Total Records Seen": {"count": 1, "max": 605000, "sum": 605000.0, "min": 605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 24, "sum": 24.0, "min": 24}}, "EndTime": 1572781653.285798, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 23}, "StartTime": 1572781653.184255} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=245898.125588 records/second [2019-11-03 11:47:33.395] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 49, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 25 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 126, "sum": 126.0, "min": 126}, "Total Records Seen": {"count": 1, "max": 630000, "sum": 630000.0, "min": 630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 25, "sum": 25.0, "min": 25}}, "EndTime": 1572781653.395731, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 24}, "StartTime": 1572781653.287535} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230465.887587 records/second [2019-11-03 11:47:33.507] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 51, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 26 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 131, "sum": 131.0, "min": 131}, "Total Records Seen": {"count": 1, "max": 655000, "sum": 655000.0, "min": 655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 26, "sum": 26.0, "min": 26}}, "EndTime": 1572781653.507964, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 25}, "StartTime": 1572781653.396171} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=223359.803688 records/second [2019-11-03 11:47:33.614] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 53, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 27 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 136, "sum": 136.0, "min": 136}, "Total Records Seen": {"count": 1, "max": 680000, "sum": 680000.0, "min": 680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 27, "sum": 27.0, "min": 27}}, "EndTime": 1572781653.615178, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 26}, "StartTime": 1572781653.508207} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233372.133136 records/second [2019-11-03 11:47:33.734] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 55, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 28 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 141, "sum": 141.0, "min": 141}, "Total Records Seen": {"count": 1, "max": 705000, "sum": 705000.0, "min": 705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 28, "sum": 28.0, "min": 28}}, "EndTime": 1572781653.735415, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 27}, "StartTime": 1572781653.615447} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208149.086275 records/second [2019-11-03 11:47:33.843] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 57, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 29 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 146, "sum": 146.0, "min": 146}, "Total Records Seen": {"count": 1, "max": 730000, "sum": 730000.0, "min": 730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 29, "sum": 29.0, "min": 29}}, "EndTime": 1572781653.843744, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 28}, "StartTime": 1572781653.737312} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234438.104777 records/second [2019-11-03 11:47:33.945] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 59, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140169171593024] #progress_metric: host=algo-1, completed 30 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 151, "sum": 151.0, "min": 151}, "Total Records Seen": {"count": 1, "max": 755000, "sum": 755000.0, "min": 755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 30, "sum": 30.0, "min": 30}}, "EndTime": 1572781653.946883, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 29}, "StartTime": 1572781653.845438} [11/03/2019 11:47:33 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=246162.514174 records/second [2019-11-03 11:47:34.066] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 61, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 31 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 156, "sum": 156.0, "min": 156}, "Total Records Seen": {"count": 1, "max": 780000, "sum": 780000.0, "min": 780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 31, "sum": 31.0, "min": 31}}, "EndTime": 1572781654.067035, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 30}, "StartTime": 1572781653.94709} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208220.592586 records/second [2019-11-03 11:47:34.171] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 63, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 32 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 161, "sum": 161.0, "min": 161}, "Total Records Seen": {"count": 1, "max": 805000, "sum": 805000.0, "min": 805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 32, "sum": 32.0, "min": 32}}, "EndTime": 1572781654.171523, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 31}, "StartTime": 1572781654.068661} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242749.526574 records/second [2019-11-03 11:47:33.717] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 59, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 30 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 151, "sum": 151.0, "min": 151}, "Total Records Seen": {"count": 1, "max": 755000, "sum": 755000.0, "min": 755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 30, "sum": 30.0, "min": 30}}, "EndTime": 1572781653.71781, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 29}, "StartTime": 1572781653.616888} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=247224.029801 records/second [2019-11-03 11:47:33.821] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 61, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 31 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 156, "sum": 156.0, "min": 156}, "Total Records Seen": {"count": 1, "max": 780000, "sum": 780000.0, "min": 780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 31, "sum": 31.0, "min": 31}}, "EndTime": 1572781653.821933, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 30}, "StartTime": 1572781653.718127} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=240509.56349 records/second [2019-11-03 11:47:33.916] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 63, "duration": 94, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:33 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:33 INFO 140552810366784] #progress_metric: host=algo-2, completed 32 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 161, "sum": 161.0, "min": 161}, "Total Records Seen": {"count": 1, "max": 805000, "sum": 805000.0, "min": 805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 32, "sum": 32.0, "min": 32}}, "EndTime": 1572781653.916884, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 31}, "StartTime": 1572781653.822185} [11/03/2019 11:47:33 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263612.98335 records/second [2019-11-03 11:47:34.010] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 65, "duration": 93, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 33 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 166, "sum": 166.0, "min": 166}, "Total Records Seen": {"count": 1, "max": 830000, "sum": 830000.0, "min": 830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 33, "sum": 33.0, "min": 33}}, "EndTime": 1572781654.011389, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 32}, "StartTime": 1572781653.917124} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=264847.430143 records/second [2019-11-03 11:47:34.105] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 67, "duration": 93, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 34 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 171, "sum": 171.0, "min": 171}, "Total Records Seen": {"count": 1, "max": 855000, "sum": 855000.0, "min": 855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 34, "sum": 34.0, "min": 34}}, "EndTime": 1572781654.106247, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 33}, "StartTime": 1572781654.011634} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263838.502784 records/second [2019-11-03 11:47:34.205] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 69, "duration": 98, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 35 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 176, "sum": 176.0, "min": 176}, "Total Records Seen": {"count": 1, "max": 880000, "sum": 880000.0, "min": 880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 35, "sum": 35.0, "min": 35}}, "EndTime": 1572781654.205973, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 34}, "StartTime": 1572781654.106499} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=250973.784295 records/second [2019-11-03 11:47:34.306] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 71, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 36 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 181, "sum": 181.0, "min": 181}, "Total Records Seen": {"count": 1, "max": 905000, "sum": 905000.0, "min": 905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 36, "sum": 36.0, "min": 36}}, "EndTime": 1572781654.306714, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 35}, "StartTime": 1572781654.206226} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248451.820189 records/second [2019-11-03 11:47:34.400] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 73, "duration": 91, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 37 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 186, "sum": 186.0, "min": 186}, "Total Records Seen": {"count": 1, "max": 930000, "sum": 930000.0, "min": 930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 37, "sum": 37.0, "min": 37}}, "EndTime": 1572781654.400918, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 36}, "StartTime": 1572781654.308464} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=269996.163423 records/second [2019-11-03 11:47:34.509] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 75, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 38 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 191, "sum": 191.0, "min": 191}, "Total Records Seen": {"count": 1, "max": 955000, "sum": 955000.0, "min": 955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 38, "sum": 38.0, "min": 38}}, "EndTime": 1572781654.50983, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 37}, "StartTime": 1572781654.402811} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233186.856197 records/second [2019-11-03 11:47:34.606] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 77, "duration": 94, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 39 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 196, "sum": 196.0, "min": 196}, "Total Records Seen": {"count": 1, "max": 980000, "sum": 980000.0, "min": 980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 39, "sum": 39.0, "min": 39}}, "EndTime": 1572781654.606595, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 38}, "StartTime": 1572781654.511681} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=263049.548069 records/second [2019-11-03 11:47:34.280] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 65, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 33 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 166, "sum": 166.0, "min": 166}, "Total Records Seen": {"count": 1, "max": 830000, "sum": 830000.0, "min": 830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 33, "sum": 33.0, "min": 33}}, "EndTime": 1572781654.280882, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 32}, "StartTime": 1572781654.171763} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228820.324144 records/second [2019-11-03 11:47:34.382] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 67, "duration": 101, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 34 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 171, "sum": 171.0, "min": 171}, "Total Records Seen": {"count": 1, "max": 855000, "sum": 855000.0, "min": 855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 34, "sum": 34.0, "min": 34}}, "EndTime": 1572781654.3829, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 33}, "StartTime": 1572781654.281128} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=245078.566704 records/second [2019-11-03 11:47:34.495] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 69, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 35 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 176, "sum": 176.0, "min": 176}, "Total Records Seen": {"count": 1, "max": 880000, "sum": 880000.0, "min": 880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 35, "sum": 35.0, "min": 35}}, "EndTime": 1572781654.495952, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 34}, "StartTime": 1572781654.383263} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221599.585786 records/second [2019-11-03 11:47:34.602] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 71, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 36 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 181, "sum": 181.0, "min": 181}, "Total Records Seen": {"count": 1, "max": 905000, "sum": 905000.0, "min": 905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 36, "sum": 36.0, "min": 36}}, "EndTime": 1572781654.603045, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 35}, "StartTime": 1572781654.496192} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233684.187068 records/second [2019-11-03 11:47:34.716] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 73, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 37 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 186, "sum": 186.0, "min": 186}, "Total Records Seen": {"count": 1, "max": 930000, "sum": 930000.0, "min": 930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 37, "sum": 37.0, "min": 37}}, "EndTime": 1572781654.716717, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 36}, "StartTime": 1572781654.603287} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=220101.342133 records/second [2019-11-03 11:47:34.835] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 75, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 38 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 191, "sum": 191.0, "min": 191}, "Total Records Seen": {"count": 1, "max": 955000, "sum": 955000.0, "min": 955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 38, "sum": 38.0, "min": 38}}, "EndTime": 1572781654.836292, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 37}, "StartTime": 1572781654.716984} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209318.750287 records/second [2019-11-03 11:47:34.942] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 77, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140169171593024] #progress_metric: host=algo-1, completed 39 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 196, "sum": 196.0, "min": 196}, "Total Records Seen": {"count": 1, "max": 980000, "sum": 980000.0, "min": 980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 39, "sum": 39.0, "min": 39}}, "EndTime": 1572781654.942638, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 38}, "StartTime": 1572781654.836537} [11/03/2019 11:47:34 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235349.463572 records/second [2019-11-03 11:47:35.055] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 79, "duration": 110, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 40 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 201, "sum": 201.0, "min": 201}, "Total Records Seen": {"count": 1, "max": 1005000, "sum": 1005000.0, "min": 1005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 40, "sum": 40.0, "min": 40}}, "EndTime": 1572781655.055605, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 39}, "StartTime": 1572781654.944466} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=224686.993098 records/second [2019-11-03 11:47:35.176] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 81, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 41 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 206, "sum": 206.0, "min": 206}, "Total Records Seen": {"count": 1, "max": 1030000, "sum": 1030000.0, "min": 1030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 41, "sum": 41.0, "min": 41}}, "EndTime": 1572781655.177454, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 40}, "StartTime": 1572781655.057545} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208179.253468 records/second [2019-11-03 11:47:34.712] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 79, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 40 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 201, "sum": 201.0, "min": 201}, "Total Records Seen": {"count": 1, "max": 1005000, "sum": 1005000.0, "min": 1005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 40, "sum": 40.0, "min": 40}}, "EndTime": 1572781654.712592, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 39}, "StartTime": 1572781654.606835} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=236098.233163 records/second [2019-11-03 11:47:34.811] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 81, "duration": 98, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 41 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 206, "sum": 206.0, "min": 206}, "Total Records Seen": {"count": 1, "max": 1030000, "sum": 1030000.0, "min": 1030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 41, "sum": 41.0, "min": 41}}, "EndTime": 1572781654.811942, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 40}, "StartTime": 1572781654.712832} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=251942.229281 records/second [2019-11-03 11:47:34.908] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 83, "duration": 96, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:34 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:34 INFO 140552810366784] #progress_metric: host=algo-2, completed 42 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 211, "sum": 211.0, "min": 211}, "Total Records Seen": {"count": 1, "max": 1055000, "sum": 1055000.0, "min": 1055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 42, "sum": 42.0, "min": 42}}, "EndTime": 1572781654.909221, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 41}, "StartTime": 1572781654.812146} [11/03/2019 11:47:34 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=257259.920411 records/second [2019-11-03 11:47:35.013] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 85, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 43 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 216, "sum": 216.0, "min": 216}, "Total Records Seen": {"count": 1, "max": 1080000, "sum": 1080000.0, "min": 1080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 43, "sum": 43.0, "min": 43}}, "EndTime": 1572781655.014094, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 42}, "StartTime": 1572781654.909413} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238581.673887 records/second [2019-11-03 11:47:35.110] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 87, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 44 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 221, "sum": 221.0, "min": 221}, "Total Records Seen": {"count": 1, "max": 1105000, "sum": 1105000.0, "min": 1105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 44, "sum": 44.0, "min": 44}}, "EndTime": 1572781655.110599, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 43}, "StartTime": 1572781655.01429} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=259303.973233 records/second [2019-11-03 11:47:35.203] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 89, "duration": 92, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 45 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 226, "sum": 226.0, "min": 226}, "Total Records Seen": {"count": 1, "max": 1130000, "sum": 1130000.0, "min": 1130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 45, "sum": 45.0, "min": 45}}, "EndTime": 1572781655.203846, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 44}, "StartTime": 1572781655.11079} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=268355.078287 records/second [2019-11-03 11:47:35.301] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 91, "duration": 96, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 46 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 231, "sum": 231.0, "min": 231}, "Total Records Seen": {"count": 1, "max": 1155000, "sum": 1155000.0, "min": 1155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 46, "sum": 46.0, "min": 46}}, "EndTime": 1572781655.301512, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 45}, "StartTime": 1572781655.204047} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=256218.311012 records/second [2019-11-03 11:47:35.402] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 93, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 47 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 236, "sum": 236.0, "min": 236}, "Total Records Seen": {"count": 1, "max": 1180000, "sum": 1180000.0, "min": 1180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 47, "sum": 47.0, "min": 47}}, "EndTime": 1572781655.402659, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 46}, "StartTime": 1572781655.301705} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=247390.26318 records/second [2019-11-03 11:47:35.497] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 95, "duration": 94, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 48 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 241, "sum": 241.0, "min": 241}, "Total Records Seen": {"count": 1, "max": 1205000, "sum": 1205000.0, "min": 1205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 48, "sum": 48.0, "min": 48}}, "EndTime": 1572781655.498125, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 47}, "StartTime": 1572781655.402851} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=262123.030158 records/second [2019-11-03 11:47:35.597] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 97, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 49 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 246, "sum": 246.0, "min": 246}, "Total Records Seen": {"count": 1, "max": 1230000, "sum": 1230000.0, "min": 1230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 49, "sum": 49.0, "min": 49}}, "EndTime": 1572781655.597852, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 48}, "StartTime": 1572781655.499737} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254351.928665 records/second [2019-11-03 11:47:35.291] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 83, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 42 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 211, "sum": 211.0, "min": 211}, "Total Records Seen": {"count": 1, "max": 1055000, "sum": 1055000.0, "min": 1055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 42, "sum": 42.0, "min": 42}}, "EndTime": 1572781655.291625, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 41}, "StartTime": 1572781655.179141} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221949.970155 records/second [2019-11-03 11:47:35.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 85, "duration": 116, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 43 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 216, "sum": 216.0, "min": 216}, "Total Records Seen": {"count": 1, "max": 1080000, "sum": 1080000.0, "min": 1080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 43, "sum": 43.0, "min": 43}}, "EndTime": 1572781655.408899, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 42}, "StartTime": 1572781655.291992} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=213604.075804 records/second [2019-11-03 11:47:35.513] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 87, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 44 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 221, "sum": 221.0, "min": 221}, "Total Records Seen": {"count": 1, "max": 1105000, "sum": 1105000.0, "min": 1105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 44, "sum": 44.0, "min": 44}}, "EndTime": 1572781655.513764, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 43}, "StartTime": 1572781655.40914} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238626.738367 records/second [2019-11-03 11:47:35.637] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 89, "duration": 121, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 45 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 226, "sum": 226.0, "min": 226}, "Total Records Seen": {"count": 1, "max": 1130000, "sum": 1130000.0, "min": 1130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 45, "sum": 45.0, "min": 45}}, "EndTime": 1572781655.637705, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 44}, "StartTime": 1572781655.514016} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=201894.610373 records/second [2019-11-03 11:47:35.743] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 91, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 46 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 231, "sum": 231.0, "min": 231}, "Total Records Seen": {"count": 1, "max": 1155000, "sum": 1155000.0, "min": 1155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 46, "sum": 46.0, "min": 46}}, "EndTime": 1572781655.744241, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 45}, "StartTime": 1572781655.637956} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234933.94992 records/second [2019-11-03 11:47:35.854] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 93, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 47 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 236, "sum": 236.0, "min": 236}, "Total Records Seen": {"count": 1, "max": 1180000, "sum": 1180000.0, "min": 1180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 47, "sum": 47.0, "min": 47}}, "EndTime": 1572781655.854611, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 46}, "StartTime": 1572781655.744479} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226738.744973 records/second [2019-11-03 11:47:35.963] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 95, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140169171593024] #progress_metric: host=algo-1, completed 48 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 241, "sum": 241.0, "min": 241}, "Total Records Seen": {"count": 1, "max": 1205000, "sum": 1205000.0, "min": 1205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 48, "sum": 48.0, "min": 48}}, "EndTime": 1572781655.96422, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 47}, "StartTime": 1572781655.854849} [11/03/2019 11:47:35 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228308.159928 records/second [2019-11-03 11:47:36.080] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 97, "duration": 116, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 49 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 246, "sum": 246.0, "min": 246}, "Total Records Seen": {"count": 1, "max": 1230000, "sum": 1230000.0, "min": 1230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 49, "sum": 49.0, "min": 49}}, "EndTime": 1572781656.081254, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 48}, "StartTime": 1572781655.964466} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=213826.658999 records/second [2019-11-03 11:47:36.195] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 99, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 50 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 251, "sum": 251.0, "min": 251}, "Total Records Seen": {"count": 1, "max": 1255000, "sum": 1255000.0, "min": 1255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 50, "sum": 50.0, "min": 50}}, "EndTime": 1572781656.196875, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 49}, "StartTime": 1572781656.081495} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216472.608961 records/second [2019-11-03 11:47:35.695] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 99, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 50 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 251, "sum": 251.0, "min": 251}, "Total Records Seen": {"count": 1, "max": 1255000, "sum": 1255000.0, "min": 1255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 50, "sum": 50.0, "min": 50}}, "EndTime": 1572781655.69541, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 49}, "StartTime": 1572781655.599632} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260737.322147 records/second [2019-11-03 11:47:35.811] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 101, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 51 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 256, "sum": 256.0, "min": 256}, "Total Records Seen": {"count": 1, "max": 1280000, "sum": 1280000.0, "min": 1280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 51, "sum": 51.0, "min": 51}}, "EndTime": 1572781655.811704, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 50}, "StartTime": 1572781655.697071} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217885.922078 records/second [2019-11-03 11:47:35.926] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 103, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:35 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:35 INFO 140552810366784] #progress_metric: host=algo-2, completed 52 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 261, "sum": 261.0, "min": 261}, "Total Records Seen": {"count": 1, "max": 1305000, "sum": 1305000.0, "min": 1305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 52, "sum": 52.0, "min": 52}}, "EndTime": 1572781655.926947, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 51}, "StartTime": 1572781655.813594} [11/03/2019 11:47:35 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=220338.142528 records/second [2019-11-03 11:47:36.025] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 105, "duration": 96, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 53 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 266, "sum": 266.0, "min": 266}, "Total Records Seen": {"count": 1, "max": 1330000, "sum": 1330000.0, "min": 1330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 53, "sum": 53.0, "min": 53}}, "EndTime": 1572781656.026239, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 52}, "StartTime": 1572781655.928995} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=256794.960963 records/second [2019-11-03 11:47:36.130] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 107, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 54 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 271, "sum": 271.0, "min": 271}, "Total Records Seen": {"count": 1, "max": 1355000, "sum": 1355000.0, "min": 1355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 54, "sum": 54.0, "min": 54}}, "EndTime": 1572781656.131204, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 53}, "StartTime": 1572781656.027974} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241924.550851 records/second [2019-11-03 11:47:36.221] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 109, "duration": 90, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 55 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 276, "sum": 276.0, "min": 276}, "Total Records Seen": {"count": 1, "max": 1380000, "sum": 1380000.0, "min": 1380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 55, "sum": 55.0, "min": 55}}, "EndTime": 1572781656.222293, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 54}, "StartTime": 1572781656.131432} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=274815.754437 records/second [2019-11-03 11:47:36.316] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 111, "duration": 93, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 56 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 281, "sum": 281.0, "min": 281}, "Total Records Seen": {"count": 1, "max": 1405000, "sum": 1405000.0, "min": 1405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 56, "sum": 56.0, "min": 56}}, "EndTime": 1572781656.317027, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 55}, "StartTime": 1572781656.222511} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=264206.794548 records/second [2019-11-03 11:47:36.408] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 113, "duration": 89, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 57 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 286, "sum": 286.0, "min": 286}, "Total Records Seen": {"count": 1, "max": 1430000, "sum": 1430000.0, "min": 1430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 57, "sum": 57.0, "min": 57}}, "EndTime": 1572781656.40873, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 56}, "StartTime": 1572781656.318606} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=277057.301919 records/second [2019-11-03 11:47:36.506] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 115, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 58 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 291, "sum": 291.0, "min": 291}, "Total Records Seen": {"count": 1, "max": 1455000, "sum": 1455000.0, "min": 1455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 58, "sum": 58.0, "min": 58}}, "EndTime": 1572781656.507134, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 57}, "StartTime": 1572781656.408955} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254313.064948 records/second [2019-11-03 11:47:36.611] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 117, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 59 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 296, "sum": 296.0, "min": 296}, "Total Records Seen": {"count": 1, "max": 1480000, "sum": 1480000.0, "min": 1480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 59, "sum": 59.0, "min": 59}}, "EndTime": 1572781656.61187, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 58}, "StartTime": 1572781656.508761} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=242210.667584 records/second [2019-11-03 11:47:36.315] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 101, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 51 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 256, "sum": 256.0, "min": 256}, "Total Records Seen": {"count": 1, "max": 1280000, "sum": 1280000.0, "min": 1280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 51, "sum": 51.0, "min": 51}}, "EndTime": 1572781656.316061, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 50}, "StartTime": 1572781656.19707} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209839.005013 records/second [2019-11-03 11:47:36.435] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 103, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 52 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 261, "sum": 261.0, "min": 261}, "Total Records Seen": {"count": 1, "max": 1305000, "sum": 1305000.0, "min": 1305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 52, "sum": 52.0, "min": 52}}, "EndTime": 1572781656.435687, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 51}, "StartTime": 1572781656.316337} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=209205.157825 records/second [2019-11-03 11:47:36.550] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 105, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 53 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 266, "sum": 266.0, "min": 266}, "Total Records Seen": {"count": 1, "max": 1330000, "sum": 1330000.0, "min": 1330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 53, "sum": 53.0, "min": 53}}, "EndTime": 1572781656.550555, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 52}, "StartTime": 1572781656.436036} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=218056.742633 records/second [2019-11-03 11:47:36.667] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 107, "duration": 116, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 54 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 271, "sum": 271.0, "min": 271}, "Total Records Seen": {"count": 1, "max": 1355000, "sum": 1355000.0, "min": 1355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 54, "sum": 54.0, "min": 54}}, "EndTime": 1572781656.668235, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 53}, "StartTime": 1572781656.550799} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=212650.198033 records/second [2019-11-03 11:47:36.783] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 109, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 55 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 276, "sum": 276.0, "min": 276}, "Total Records Seen": {"count": 1, "max": 1380000, "sum": 1380000.0, "min": 1380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 55, "sum": 55.0, "min": 55}}, "EndTime": 1572781656.783669, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 54}, "StartTime": 1572781656.668477} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216786.336732 records/second [2019-11-03 11:47:36.893] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 111, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140169171593024] #progress_metric: host=algo-1, completed 56 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 281, "sum": 281.0, "min": 281}, "Total Records Seen": {"count": 1, "max": 1405000, "sum": 1405000.0, "min": 1405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 56, "sum": 56.0, "min": 56}}, "EndTime": 1572781656.894322, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 55}, "StartTime": 1572781656.783985} [11/03/2019 11:47:36 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226315.925788 records/second [2019-11-03 11:47:37.002] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 113, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 57 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 286, "sum": 286.0, "min": 286}, "Total Records Seen": {"count": 1, "max": 1430000, "sum": 1430000.0, "min": 1430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 57, "sum": 57.0, "min": 57}}, "EndTime": 1572781657.002531, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 56}, "StartTime": 1572781656.894559} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231273.599887 records/second [2019-11-03 11:47:37.117] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 115, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 58 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 291, "sum": 291.0, "min": 291}, "Total Records Seen": {"count": 1, "max": 1455000, "sum": 1455000.0, "min": 1455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 58, "sum": 58.0, "min": 58}}, "EndTime": 1572781657.117718, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 57}, "StartTime": 1572781657.004481} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=220530.454552 records/second [2019-11-03 11:47:37.229] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 117, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 59 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 296, "sum": 296.0, "min": 296}, "Total Records Seen": {"count": 1, "max": 1480000, "sum": 1480000.0, "min": 1480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 59, "sum": 59.0, "min": 59}}, "EndTime": 1572781657.229763, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 58}, "StartTime": 1572781657.119333} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226135.826937 records/second [2019-11-03 11:47:37.346] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 119, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 60 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 301, "sum": 301.0, "min": 301}, "Total Records Seen": {"count": 1, "max": 1505000, "sum": 1505000.0, "min": 1505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 60, "sum": 60.0, "min": 60}}, "EndTime": 1572781657.346454, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 59}, "StartTime": 1572781657.231374} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=217004.377024 records/second [2019-11-03 11:47:37.458] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 121, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 61 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 306, "sum": 306.0, "min": 306}, "Total Records Seen": {"count": 1, "max": 1530000, "sum": 1530000.0, "min": 1530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 61, "sum": 61.0, "min": 61}}, "EndTime": 1572781657.459096, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 60}, "StartTime": 1572781657.346694} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=222127.695632 records/second [2019-11-03 11:47:37.564] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 123, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 62 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 311, "sum": 311.0, "min": 311}, "Total Records Seen": {"count": 1, "max": 1555000, "sum": 1555000.0, "min": 1555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 62, "sum": 62.0, "min": 62}}, "EndTime": 1572781657.56538, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 61}, "StartTime": 1572781657.459356} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235513.85916 records/second [2019-11-03 11:47:37.674] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 125, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 63 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 316, "sum": 316.0, "min": 316}, "Total Records Seen": {"count": 1, "max": 1580000, "sum": 1580000.0, "min": 1580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 63, "sum": 63.0, "min": 63}}, "EndTime": 1572781657.674904, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 62}, "StartTime": 1572781657.565617} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=228475.307499 records/second [2019-11-03 11:47:37.780] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 127, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 64 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 321, "sum": 321.0, "min": 321}, "Total Records Seen": {"count": 1, "max": 1605000, "sum": 1605000.0, "min": 1605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 64, "sum": 64.0, "min": 64}}, "EndTime": 1572781657.780621, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 63}, "StartTime": 1572781657.675158} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=236744.830825 records/second [2019-11-03 11:47:37.902] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 129, "duration": 121, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140169171593024] #progress_metric: host=algo-1, completed 65 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 326, "sum": 326.0, "min": 326}, "Total Records Seen": {"count": 1, "max": 1630000, "sum": 1630000.0, "min": 1630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 65, "sum": 65.0, "min": 65}}, "EndTime": 1572781657.903117, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 64}, "StartTime": 1572781657.78087} [11/03/2019 11:47:37 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=204263.409598 records/second [2019-11-03 11:47:38.008] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 131, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 66 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 331, "sum": 331.0, "min": 331}, "Total Records Seen": {"count": 1, "max": 1655000, "sum": 1655000.0, "min": 1655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 66, "sum": 66.0, "min": 66}}, "EndTime": 1572781658.009231, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 65}, "StartTime": 1572781657.90343} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235951.071548 records/second [2019-11-03 11:47:38.115] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 133, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 67 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 336, "sum": 336.0, "min": 336}, "Total Records Seen": {"count": 1, "max": 1680000, "sum": 1680000.0, "min": 1680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 67, "sum": 67.0, "min": 67}}, "EndTime": 1572781658.115497, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 66}, "StartTime": 1572781658.009521} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=235577.882222 records/second [2019-11-03 11:47:38.218] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 135, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 68 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 341, "sum": 341.0, "min": 341}, "Total Records Seen": {"count": 1, "max": 1705000, "sum": 1705000.0, "min": 1705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 68, "sum": 68.0, "min": 68}}, "EndTime": 1572781658.218867, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 67}, "StartTime": 1572781658.115755} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242118.947176 records/second [2019-11-03 11:47:36.720] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 119, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 60 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 301, "sum": 301.0, "min": 301}, "Total Records Seen": {"count": 1, "max": 1505000, "sum": 1505000.0, "min": 1505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 60, "sum": 60.0, "min": 60}}, "EndTime": 1572781656.720562, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 59}, "StartTime": 1572781656.613444} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233106.50539 records/second [2019-11-03 11:47:36.829] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 121, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 61 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 306, "sum": 306.0, "min": 306}, "Total Records Seen": {"count": 1, "max": 1530000, "sum": 1530000.0, "min": 1530000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 61, "sum": 61.0, "min": 61}}, "EndTime": 1572781656.829433, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 60}, "StartTime": 1572781656.720761} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=229819.839565 records/second [2019-11-03 11:47:36.947] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 123, "duration": 115, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:36 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:36 INFO 140552810366784] #progress_metric: host=algo-2, completed 62 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 311, "sum": 311.0, "min": 311}, "Total Records Seen": {"count": 1, "max": 1555000, "sum": 1555000.0, "min": 1555000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 62, "sum": 62.0, "min": 62}}, "EndTime": 1572781656.94762, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 61}, "StartTime": 1572781656.831255} [11/03/2019 11:47:36 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=214647.367204 records/second [2019-11-03 11:47:37.049] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 125, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 63 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 316, "sum": 316.0, "min": 316}, "Total Records Seen": {"count": 1, "max": 1580000, "sum": 1580000.0, "min": 1580000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 63, "sum": 63.0, "min": 63}}, "EndTime": 1572781657.050018, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 62}, "StartTime": 1572781656.949825} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249187.496138 records/second [2019-11-03 11:47:37.145] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 127, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 64 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 321, "sum": 321.0, "min": 321}, "Total Records Seen": {"count": 1, "max": 1605000, "sum": 1605000.0, "min": 1605000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 64, "sum": 64.0, "min": 64}}, "EndTime": 1572781657.146114, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 63}, "StartTime": 1572781657.050269} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260495.066231 records/second [2019-11-03 11:47:37.247] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 129, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 65 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 326, "sum": 326.0, "min": 326}, "Total Records Seen": {"count": 1, "max": 1630000, "sum": 1630000.0, "min": 1630000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 65, "sum": 65.0, "min": 65}}, "EndTime": 1572781657.247753, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 64}, "StartTime": 1572781657.14635} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=246215.691385 records/second [2019-11-03 11:47:37.343] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 131, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 66 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 331, "sum": 331.0, "min": 331}, "Total Records Seen": {"count": 1, "max": 1655000, "sum": 1655000.0, "min": 1655000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 66, "sum": 66.0, "min": 66}}, "EndTime": 1572781657.344179, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 65}, "StartTime": 1572781657.248007} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=259551.727125 records/second [2019-11-03 11:47:37.451] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 133, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 67 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 336, "sum": 336.0, "min": 336}, "Total Records Seen": {"count": 1, "max": 1680000, "sum": 1680000.0, "min": 1680000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 67, "sum": 67.0, "min": 67}}, "EndTime": 1572781657.451658, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 66}, "StartTime": 1572781657.344442} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=232884.920767 records/second [2019-11-03 11:47:37.550] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 135, "duration": 98, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 68 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 341, "sum": 341.0, "min": 341}, "Total Records Seen": {"count": 1, "max": 1705000, "sum": 1705000.0, "min": 1705000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 68, "sum": 68.0, "min": 68}}, "EndTime": 1572781657.551124, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 67}, "StartTime": 1572781657.451914} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=251663.4746 records/second [2019-11-03 11:47:37.652] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 137, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 69 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 346, "sum": 346.0, "min": 346}, "Total Records Seen": {"count": 1, "max": 1730000, "sum": 1730000.0, "min": 1730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 69, "sum": 69.0, "min": 69}}, "EndTime": 1572781657.652952, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 68}, "StartTime": 1572781657.551352} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=245806.472786 records/second [2019-11-03 11:47:37.745] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 139, "duration": 91, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 70 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 351, "sum": 351.0, "min": 351}, "Total Records Seen": {"count": 1, "max": 1755000, "sum": 1755000.0, "min": 1755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 70, "sum": 70.0, "min": 70}}, "EndTime": 1572781657.745565, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 69}, "StartTime": 1572781657.653178} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=270211.850321 records/second [2019-11-03 11:47:37.839] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 141, "duration": 93, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 71 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 356, "sum": 356.0, "min": 356}, "Total Records Seen": {"count": 1, "max": 1780000, "sum": 1780000.0, "min": 1780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 71, "sum": 71.0, "min": 71}}, "EndTime": 1572781657.839737, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 70}, "StartTime": 1572781657.745817} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=265818.946941 records/second [2019-11-03 11:47:37.946] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 143, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:37 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:37 INFO 140552810366784] #progress_metric: host=algo-2, completed 72 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 361, "sum": 361.0, "min": 361}, "Total Records Seen": {"count": 1, "max": 1805000, "sum": 1805000.0, "min": 1805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 72, "sum": 72.0, "min": 72}}, "EndTime": 1572781657.94681, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 71}, "StartTime": 1572781657.839979} [11/03/2019 11:47:37 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=233723.773457 records/second [2019-11-03 11:47:38.048] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 145, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 73 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 366, "sum": 366.0, "min": 366}, "Total Records Seen": {"count": 1, "max": 1830000, "sum": 1830000.0, "min": 1830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 73, "sum": 73.0, "min": 73}}, "EndTime": 1572781658.048629, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 72}, "StartTime": 1572781657.947054} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=245784.578458 records/second [2019-11-03 11:47:38.153] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 147, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 74 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 371, "sum": 371.0, "min": 371}, "Total Records Seen": {"count": 1, "max": 1855000, "sum": 1855000.0, "min": 1855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 74, "sum": 74.0, "min": 74}}, "EndTime": 1572781658.153764, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 73}, "StartTime": 1572781658.048879} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238061.139932 records/second [2019-11-03 11:47:38.257] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 149, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 75 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 376, "sum": 376.0, "min": 376}, "Total Records Seen": {"count": 1, "max": 1880000, "sum": 1880000.0, "min": 1880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 75, "sum": 75.0, "min": 75}}, "EndTime": 1572781658.25836, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 74}, "StartTime": 1572781658.154126} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=239560.073017 records/second [2019-11-03 11:47:38.361] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 151, "duration": 101, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 76 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 381, "sum": 381.0, "min": 381}, "Total Records Seen": {"count": 1, "max": 1905000, "sum": 1905000.0, "min": 1905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 76, "sum": 76.0, "min": 76}}, "EndTime": 1572781658.362082, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 75}, "StartTime": 1572781658.258831} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241788.993576 records/second [2019-11-03 11:47:38.461] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 153, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 77 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 386, "sum": 386.0, "min": 386}, "Total Records Seen": {"count": 1, "max": 1930000, "sum": 1930000.0, "min": 1930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 77, "sum": 77.0, "min": 77}}, "EndTime": 1572781658.462326, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 76}, "StartTime": 1572781658.362336} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=249697.812534 records/second [2019-11-03 11:47:38.572] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 155, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 78 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 391, "sum": 391.0, "min": 391}, "Total Records Seen": {"count": 1, "max": 1955000, "sum": 1955000.0, "min": 1955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 78, "sum": 78.0, "min": 78}}, "EndTime": 1572781658.573086, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 77}, "StartTime": 1572781658.462574} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225957.962151 records/second [2019-11-03 11:47:38.327] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 137, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 69 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 346, "sum": 346.0, "min": 346}, "Total Records Seen": {"count": 1, "max": 1730000, "sum": 1730000.0, "min": 1730000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 69, "sum": 69.0, "min": 69}}, "EndTime": 1572781658.327902, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 68}, "StartTime": 1572781658.219121} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=229541.126148 records/second [2019-11-03 11:47:38.434] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 139, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 70 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 351, "sum": 351.0, "min": 351}, "Total Records Seen": {"count": 1, "max": 1755000, "sum": 1755000.0, "min": 1755000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 70, "sum": 70.0, "min": 70}}, "EndTime": 1572781658.435265, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 69}, "StartTime": 1572781658.328147} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233102.359754 records/second [2019-11-03 11:47:38.545] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 141, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 71 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 356, "sum": 356.0, "min": 356}, "Total Records Seen": {"count": 1, "max": 1780000, "sum": 1780000.0, "min": 1780000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 71, "sum": 71.0, "min": 71}}, "EndTime": 1572781658.54583, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 70}, "StartTime": 1572781658.435508} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=226338.885807 records/second [2019-11-03 11:47:38.658] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 143, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 72 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 361, "sum": 361.0, "min": 361}, "Total Records Seen": {"count": 1, "max": 1805000, "sum": 1805000.0, "min": 1805000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 72, "sum": 72.0, "min": 72}}, "EndTime": 1572781658.658848, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 71}, "StartTime": 1572781658.546075} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=221397.458284 records/second [2019-11-03 11:47:38.770] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 145, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 73 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 366, "sum": 366.0, "min": 366}, "Total Records Seen": {"count": 1, "max": 1830000, "sum": 1830000.0, "min": 1830000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 73, "sum": 73.0, "min": 73}}, "EndTime": 1572781658.771435, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 72}, "StartTime": 1572781658.659151} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=222358.50457 records/second [2019-11-03 11:47:38.893] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 147, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140169171593024] #progress_metric: host=algo-1, completed 74 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 371, "sum": 371.0, "min": 371}, "Total Records Seen": {"count": 1, "max": 1855000, "sum": 1855000.0, "min": 1855000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 74, "sum": 74.0, "min": 74}}, "EndTime": 1572781658.893545, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 73}, "StartTime": 1572781658.771688} [11/03/2019 11:47:38 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=204919.669885 records/second [2019-11-03 11:47:39.014] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 149, "duration": 119, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 75 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 376, "sum": 376.0, "min": 376}, "Total Records Seen": {"count": 1, "max": 1880000, "sum": 1880000.0, "min": 1880000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 75, "sum": 75.0, "min": 75}}, "EndTime": 1572781659.01457, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 74}, "StartTime": 1572781658.893806} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=206791.172816 records/second [2019-11-03 11:47:39.134] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 151, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 76 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 381, "sum": 381.0, "min": 381}, "Total Records Seen": {"count": 1, "max": 1905000, "sum": 1905000.0, "min": 1905000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 76, "sum": 76.0, "min": 76}}, "EndTime": 1572781659.134788, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 75}, "StartTime": 1572781659.014817} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=207656.90476 records/second [2019-11-03 11:47:38.679] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 157, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 79 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 396, "sum": 396.0, "min": 396}, "Total Records Seen": {"count": 1, "max": 1980000, "sum": 1980000.0, "min": 1980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 79, "sum": 79.0, "min": 79}}, "EndTime": 1572781658.679666, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 78}, "StartTime": 1572781658.57351} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=235247.558516 records/second [2019-11-03 11:47:38.783] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 159, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 80 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 401, "sum": 401.0, "min": 401}, "Total Records Seen": {"count": 1, "max": 2005000, "sum": 2005000.0, "min": 2005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 80, "sum": 80.0, "min": 80}}, "EndTime": 1572781658.783399, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 79}, "StartTime": 1572781658.68004} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241622.405081 records/second [2019-11-03 11:47:38.878] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 161, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 81 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 406, "sum": 406.0, "min": 406}, "Total Records Seen": {"count": 1, "max": 2030000, "sum": 2030000.0, "min": 2030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 81, "sum": 81.0, "min": 81}}, "EndTime": 1572781658.879164, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 80}, "StartTime": 1572781658.783612} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=261350.800321 records/second [2019-11-03 11:47:38.987] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 163, "duration": 108, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:38 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:38 INFO 140552810366784] #progress_metric: host=algo-2, completed 82 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 411, "sum": 411.0, "min": 411}, "Total Records Seen": {"count": 1, "max": 2055000, "sum": 2055000.0, "min": 2055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 82, "sum": 82.0, "min": 82}}, "EndTime": 1572781658.988273, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 81}, "StartTime": 1572781658.879363} [11/03/2019 11:47:38 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=229286.649859 records/second [2019-11-03 11:47:39.083] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 165, "duration": 95, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 83 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 416, "sum": 416.0, "min": 416}, "Total Records Seen": {"count": 1, "max": 2080000, "sum": 2080000.0, "min": 2080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 83, "sum": 83.0, "min": 83}}, "EndTime": 1572781659.084386, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 82}, "StartTime": 1572781658.988487} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=260354.065798 records/second [2019-11-03 11:47:39.195] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 167, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 84 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 421, "sum": 421.0, "min": 421}, "Total Records Seen": {"count": 1, "max": 2105000, "sum": 2105000.0, "min": 2105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 84, "sum": 84.0, "min": 84}}, "EndTime": 1572781659.196119, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 83}, "StartTime": 1572781659.086359} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=227550.123586 records/second [2019-11-03 11:47:39.306] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 169, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 85 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 426, "sum": 426.0, "min": 426}, "Total Records Seen": {"count": 1, "max": 2130000, "sum": 2130000.0, "min": 2130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 85, "sum": 85.0, "min": 85}}, "EndTime": 1572781659.306819, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 84}, "StartTime": 1572781659.196313} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226019.330419 records/second [2019-11-03 11:47:39.418] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 171, "duration": 111, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 86 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 431, "sum": 431.0, "min": 431}, "Total Records Seen": {"count": 1, "max": 2155000, "sum": 2155000.0, "min": 2155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 86, "sum": 86.0, "min": 86}}, "EndTime": 1572781659.418981, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 85}, "StartTime": 1572781659.307034} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=223112.669583 records/second [2019-11-03 11:47:39.535] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 173, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 87 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 436, "sum": 436.0, "min": 436}, "Total Records Seen": {"count": 1, "max": 2180000, "sum": 2180000.0, "min": 2180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 87, "sum": 87.0, "min": 87}}, "EndTime": 1572781659.535613, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 86}, "StartTime": 1572781659.421016} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=217954.308781 records/second [2019-11-03 11:47:39.653] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 175, "duration": 116, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 88 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 441, "sum": 441.0, "min": 441}, "Total Records Seen": {"count": 1, "max": 2205000, "sum": 2205000.0, "min": 2205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 88, "sum": 88.0, "min": 88}}, "EndTime": 1572781659.654219, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 87}, "StartTime": 1572781659.537496} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=213978.507789 records/second [2019-11-03 11:47:39.242] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 153, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 77 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 386, "sum": 386.0, "min": 386}, "Total Records Seen": {"count": 1, "max": 1930000, "sum": 1930000.0, "min": 1930000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 77, "sum": 77.0, "min": 77}}, "EndTime": 1572781659.242942, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 76}, "StartTime": 1572781659.135533} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=232437.345108 records/second [2019-11-03 11:47:39.362] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 155, "duration": 117, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 78 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 391, "sum": 391.0, "min": 391}, "Total Records Seen": {"count": 1, "max": 1955000, "sum": 1955000.0, "min": 1955000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 78, "sum": 78.0, "min": 78}}, "EndTime": 1572781659.36302, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 77}, "StartTime": 1572781659.244852} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=211332.314873 records/second [2019-11-03 11:47:39.468] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 157, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 79 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 396, "sum": 396.0, "min": 396}, "Total Records Seen": {"count": 1, "max": 1980000, "sum": 1980000.0, "min": 1980000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 79, "sum": 79.0, "min": 79}}, "EndTime": 1572781659.468665, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 78}, "StartTime": 1572781659.363264} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=236905.829698 records/second [2019-11-03 11:47:39.579] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 159, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 80 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 401, "sum": 401.0, "min": 401}, "Total Records Seen": {"count": 1, "max": 2005000, "sum": 2005000.0, "min": 2005000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 80, "sum": 80.0, "min": 80}}, "EndTime": 1572781659.579519, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 79}, "StartTime": 1572781659.468902} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=225727.398758 records/second [2019-11-03 11:47:39.694] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 161, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 81 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 406, "sum": 406.0, "min": 406}, "Total Records Seen": {"count": 1, "max": 2030000, "sum": 2030000.0, "min": 2030000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 81, "sum": 81.0, "min": 81}}, "EndTime": 1572781659.695116, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 80}, "StartTime": 1572781659.579765} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=216469.033856 records/second [2019-11-03 11:47:39.808] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 163, "duration": 112, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 82 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 411, "sum": 411.0, "min": 411}, "Total Records Seen": {"count": 1, "max": 2055000, "sum": 2055000.0, "min": 2055000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 82, "sum": 82.0, "min": 82}}, "EndTime": 1572781659.80906, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 81}, "StartTime": 1572781659.695364} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=219606.72616 records/second [2019-11-03 11:47:39.924] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 165, "duration": 114, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140169171593024] #progress_metric: host=algo-1, completed 83 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 416, "sum": 416.0, "min": 416}, "Total Records Seen": {"count": 1, "max": 2080000, "sum": 2080000.0, "min": 2080000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 83, "sum": 83.0, "min": 83}}, "EndTime": 1572781659.925035, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 82}, "StartTime": 1572781659.809357} [11/03/2019 11:47:39 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=215857.64515 records/second [2019-11-03 11:47:40.033] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 167, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 84 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 421, "sum": 421.0, "min": 421}, "Total Records Seen": {"count": 1, "max": 2105000, "sum": 2105000.0, "min": 2105000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 84, "sum": 84.0, "min": 84}}, "EndTime": 1572781660.033847, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 83}, "StartTime": 1572781659.925312} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=230064.900587 records/second [2019-11-03 11:47:40.160] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 169, "duration": 124, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 85 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 426, "sum": 426.0, "min": 426}, "Total Records Seen": {"count": 1, "max": 2130000, "sum": 2130000.0, "min": 2130000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 85, "sum": 85.0, "min": 85}}, "EndTime": 1572781660.160738, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 84}, "StartTime": 1572781660.034084} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=197187.113905 records/second [2019-11-03 11:47:39.753] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 177, "duration": 97, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 89 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 446, "sum": 446.0, "min": 446}, "Total Records Seen": {"count": 1, "max": 2230000, "sum": 2230000.0, "min": 2230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 89, "sum": 89.0, "min": 89}}, "EndTime": 1572781659.754287, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 88}, "StartTime": 1572781659.656077} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=254284.695766 records/second [2019-11-03 11:47:39.844] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 179, "duration": 90, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 90 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 451, "sum": 451.0, "min": 451}, "Total Records Seen": {"count": 1, "max": 2255000, "sum": 2255000.0, "min": 2255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 90, "sum": 90.0, "min": 90}}, "EndTime": 1572781659.845219, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 89}, "StartTime": 1572781659.75448} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=275145.303451 records/second [2019-11-03 11:47:39.953] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 181, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:39 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:39 INFO 140552810366784] #progress_metric: host=algo-2, completed 91 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 456, "sum": 456.0, "min": 456}, "Total Records Seen": {"count": 1, "max": 2280000, "sum": 2280000.0, "min": 2280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 91, "sum": 91.0, "min": 91}}, "EndTime": 1572781659.953676, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 90}, "StartTime": 1572781659.845461} [11/03/2019 11:47:39 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=230744.313781 records/second [2019-11-03 11:47:40.060] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 183, "duration": 104, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 92 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 461, "sum": 461.0, "min": 461}, "Total Records Seen": {"count": 1, "max": 2305000, "sum": 2305000.0, "min": 2305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 92, "sum": 92.0, "min": 92}}, "EndTime": 1572781660.060468, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 91}, "StartTime": 1572781659.955789} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=238536.083788 records/second [2019-11-03 11:47:40.165] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 185, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 93 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 466, "sum": 466.0, "min": 466}, "Total Records Seen": {"count": 1, "max": 2330000, "sum": 2330000.0, "min": 2330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 93, "sum": 93.0, "min": 93}}, "EndTime": 1572781660.165692, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 92}, "StartTime": 1572781660.062342} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=241597.353105 records/second [2019-11-03 11:47:40.269] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 187, "duration": 101, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 94 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 471, "sum": 471.0, "min": 471}, "Total Records Seen": {"count": 1, "max": 2355000, "sum": 2355000.0, "min": 2355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 94, "sum": 94.0, "min": 94}}, "EndTime": 1572781660.269801, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 93}, "StartTime": 1572781660.167789} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=244587.508631 records/second [2019-11-03 11:47:40.373] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 189, "duration": 101, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 95 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 476, "sum": 476.0, "min": 476}, "Total Records Seen": {"count": 1, "max": 2380000, "sum": 2380000.0, "min": 2380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 95, "sum": 95.0, "min": 95}}, "EndTime": 1572781660.374038, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 94}, "StartTime": 1572781660.271758} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=244131.376699 records/second [2019-11-03 11:47:40.472] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 191, "duration": 98, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 96 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 481, "sum": 481.0, "min": 481}, "Total Records Seen": {"count": 1, "max": 2405000, "sum": 2405000.0, "min": 2405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 96, "sum": 96.0, "min": 96}}, "EndTime": 1572781660.473244, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 95}, "StartTime": 1572781660.374283} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=252290.783452 records/second [2019-11-03 11:47:40.584] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 193, "duration": 109, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 97 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 486, "sum": 486.0, "min": 486}, "Total Records Seen": {"count": 1, "max": 2430000, "sum": 2430000.0, "min": 2430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 97, "sum": 97.0, "min": 97}}, "EndTime": 1572781660.585374, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 96}, "StartTime": 1572781660.475085} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=226414.149157 records/second [2019-11-03 11:47:40.268] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 171, "duration": 107, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 86 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 431, "sum": 431.0, "min": 431}, "Total Records Seen": {"count": 1, "max": 2155000, "sum": 2155000.0, "min": 2155000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 86, "sum": 86.0, "min": 86}}, "EndTime": 1572781660.269015, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 85}, "StartTime": 1572781660.160977} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=231130.351597 records/second [2019-11-03 11:47:40.371] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 173, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 87 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 436, "sum": 436.0, "min": 436}, "Total Records Seen": {"count": 1, "max": 2180000, "sum": 2180000.0, "min": 2180000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 87, "sum": 87.0, "min": 87}}, "EndTime": 1572781660.372365, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 86}, "StartTime": 1572781660.269253} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=242147.462543 records/second [2019-11-03 11:47:40.486] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 175, "duration": 113, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 88 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 441, "sum": 441.0, "min": 441}, "Total Records Seen": {"count": 1, "max": 2205000, "sum": 2205000.0, "min": 2205000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 88, "sum": 88.0, "min": 88}}, "EndTime": 1572781660.487108, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 87}, "StartTime": 1572781660.372609} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=217679.211637 records/second [2019-11-03 11:47:40.595] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 177, "duration": 106, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 89 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 446, "sum": 446.0, "min": 446}, "Total Records Seen": {"count": 1, "max": 2230000, "sum": 2230000.0, "min": 2230000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 89, "sum": 89.0, "min": 89}}, "EndTime": 1572781660.596009, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 88}, "StartTime": 1572781660.489044} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=233424.603864 records/second [2019-11-03 11:47:40.715] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 179, "duration": 118, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 90 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 451, "sum": 451.0, "min": 451}, "Total Records Seen": {"count": 1, "max": 2255000, "sum": 2255000.0, "min": 2255000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 90, "sum": 90.0, "min": 90}}, "EndTime": 1572781660.716018, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 89}, "StartTime": 1572781660.596258} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=208517.47562 records/second [2019-11-03 11:47:40.854] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 181, "duration": 138, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 91 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 456, "sum": 456.0, "min": 456}, "Total Records Seen": {"count": 1, "max": 2280000, "sum": 2280000.0, "min": 2280000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 91, "sum": 91.0, "min": 91}}, "EndTime": 1572781660.855434, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 90}, "StartTime": 1572781660.716272} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=179375.916273 records/second [2019-11-03 11:47:40.981] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 183, "duration": 124, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140169171593024] #progress_metric: host=algo-1, completed 92 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 461, "sum": 461.0, "min": 461}, "Total Records Seen": {"count": 1, "max": 2305000, "sum": 2305000.0, "min": 2305000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 92, "sum": 92.0, "min": 92}}, "EndTime": 1572781660.981819, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 91}, "StartTime": 1572781660.855922} [11/03/2019 11:47:40 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=198300.994743 records/second [2019-11-03 11:47:41.087] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 185, "duration": 102, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 93 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 466, "sum": 466.0, "min": 466}, "Total Records Seen": {"count": 1, "max": 2330000, "sum": 2330000.0, "min": 2330000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 93, "sum": 93.0, "min": 93}}, "EndTime": 1572781661.087967, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 92}, "StartTime": 1572781660.984266} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=240775.75379 records/second [2019-11-03 11:47:41.189] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 187, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 94 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 471, "sum": 471.0, "min": 471}, "Total Records Seen": {"count": 1, "max": 2355000, "sum": 2355000.0, "min": 2355000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 94, "sum": 94.0, "min": 94}}, "EndTime": 1572781661.189626, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 93}, "StartTime": 1572781661.089645} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=249742.415979 records/second [2019-11-03 11:47:40.697] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 195, "duration": 110, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 98 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 491, "sum": 491.0, "min": 491}, "Total Records Seen": {"count": 1, "max": 2455000, "sum": 2455000.0, "min": 2455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 98, "sum": 98.0, "min": 98}}, "EndTime": 1572781660.698355, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 97}, "StartTime": 1572781660.587506} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=225269.616479 records/second [2019-11-03 11:47:40.838] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 197, "duration": 137, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 99 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 496, "sum": 496.0, "min": 496}, "Total Records Seen": {"count": 1, "max": 2480000, "sum": 2480000.0, "min": 2480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 99, "sum": 99.0, "min": 99}}, "EndTime": 1572781660.838742, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 98}, "StartTime": 1572781660.700431} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=180615.201237 records/second [2019-11-03 11:47:40.938] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 199, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:40 INFO 140552810366784] processed a total of 25000 examples [11/03/2019 11:47:40 INFO 140552810366784] #progress_metric: host=algo-2, completed 100 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 501, "sum": 501.0, "min": 501}, "Total Records Seen": {"count": 1, "max": 2505000, "sum": 2505000.0, "min": 2505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 100, "sum": 100.0, "min": 100}}, "EndTime": 1572781660.939309, "Dimensions": {"Host": "algo-2", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 99}, "StartTime": 1572781660.838951} [11/03/2019 11:47:40 INFO 140552810366784] #throughput_metric: host=algo-2, train throughput=248843.324315 records/second [2019-11-03 11:47:41.293] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 189, "duration": 101, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 95 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 476, "sum": 476.0, "min": 476}, "Total Records Seen": {"count": 1, "max": 2380000, "sum": 2380000.0, "min": 2380000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 95, "sum": 95.0, "min": 95}}, "EndTime": 1572781661.294615, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 94}, "StartTime": 1572781661.189859} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238360.94119 records/second [2019-11-03 11:47:41.400] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 191, "duration": 105, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 96 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 481, "sum": 481.0, "min": 481}, "Total Records Seen": {"count": 1, "max": 2405000, "sum": 2405000.0, "min": 2405000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 96, "sum": 96.0, "min": 96}}, "EndTime": 1572781661.401357, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 95}, "StartTime": 1572781661.294858} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=234471.130948 records/second [2019-11-03 11:47:41.505] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 193, "duration": 103, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 97 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 486, "sum": 486.0, "min": 486}, "Total Records Seen": {"count": 1, "max": 2430000, "sum": 2430000.0, "min": 2430000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 97, "sum": 97.0, "min": 97}}, "EndTime": 1572781661.506414, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 96}, "StartTime": 1572781661.401588} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=238201.746913 records/second [2019-11-03 11:47:41.607] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 195, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 98 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 491, "sum": 491.0, "min": 491}, "Total Records Seen": {"count": 1, "max": 2455000, "sum": 2455000.0, "min": 2455000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 98, "sum": 98.0, "min": 98}}, "EndTime": 1572781661.608669, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 97}, "StartTime": 1572781661.506654} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=244752.499282 records/second [2019-11-03 11:47:41.708] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 197, "duration": 99, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 99 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 496, "sum": 496.0, "min": 496}, "Total Records Seen": {"count": 1, "max": 2480000, "sum": 2480000.0, "min": 2480000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 99, "sum": 99.0, "min": 99}}, "EndTime": 1572781661.70883, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 98}, "StartTime": 1572781661.608911} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=249809.6486 records/second [2019-11-03 11:47:41.810] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 199, "duration": 100, "num_examples": 5, "num_bytes": 79100000} [11/03/2019 11:47:41 INFO 140169171593024] processed a total of 25000 examples [11/03/2019 11:47:41 INFO 140169171593024] #progress_metric: host=algo-1, completed 100 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Batches Since Last Reset": {"count": 1, "max": 5, "sum": 5.0, "min": 5}, "Number of Records Since Last Reset": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Total Batches Seen": {"count": 1, "max": 501, "sum": 501.0, "min": 501}, "Total Records Seen": {"count": 1, "max": 2505000, "sum": 2505000.0, "min": 2505000}, "Max Records Seen Between Resets": {"count": 1, "max": 25000, "sum": 25000.0, "min": 25000}, "Reset Count": {"count": 1, "max": 100, "sum": 100.0, "min": 100}}, "EndTime": 1572781661.810584, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KMeansWebscale", "epoch": 99}, "StartTime": 1572781661.70911} [11/03/2019 11:47:41 INFO 140169171593024] #throughput_metric: host=algo-1, train throughput=246070.664214 records/second [11/03/2019 11:47:41 INFO 140169171593024] shrinking 100 centers into 10 [11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #0. Current mean square distance 12.902647 [11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #1. Current mean square distance 11.803318 [11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #2. Current mean square distance 12.321064 [11/03/2019 11:47:41 INFO 140169171593024] local kmeans attempt #3. Current mean square distance 12.036984 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #4. Current mean square distance 12.555333 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #5. Current mean square distance 12.615070 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #6. Current mean square distance 11.918087 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #7. Current mean square distance 12.279174 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #8. Current mean square distance 12.339795 [11/03/2019 11:47:42 INFO 140169171593024] local kmeans attempt #9. Current mean square distance 12.555266 [11/03/2019 11:47:42 INFO 140169171593024] finished shrinking process. Mean Square Distance = 12 [11/03/2019 11:47:42 INFO 140169171593024] #quality_metric: host=algo-1, train msd <loss>=11.8033180237 [11/03/2019 11:47:42 INFO 140169171593024] batch data loading with context took: 38.6209%, (4.388304 secs) [11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: point norm took: 19.0106%, (2.160087 secs) [11/03/2019 11:47:42 INFO 140169171593024] gradient: cluster center took: 13.1121%, (1.489863 secs) [11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: inner product took: 9.4443%, (1.073109 secs) [11/03/2019 11:47:42 INFO 140169171593024] collect from kv store took: 5.5164%, (0.626799 secs) [11/03/2019 11:47:42 INFO 140169171593024] predict compute msd took: 4.7494%, (0.539646 secs) [11/03/2019 11:47:42 INFO 140169171593024] gradient: cluster size took: 3.1338%, (0.356081 secs) [11/03/2019 11:47:42 INFO 140169171593024] splitting centers key-value pair took: 1.9277%, (0.219037 secs) [11/03/2019 11:47:42 INFO 140169171593024] compute all data-center distances: center norm took: 1.5278%, (0.173592 secs) [11/03/2019 11:47:42 INFO 140169171593024] gradient: one_hot took: 1.4084%, (0.160024 secs) [11/03/2019 11:47:42 INFO 140169171593024] update state and report convergance took: 1.3147%, (0.149378 secs) [11/03/2019 11:47:42 INFO 140169171593024] update set-up time took: 0.1200%, (0.013640 secs) [11/03/2019 11:47:42 INFO 140169171593024] predict minus dist took: 0.1141%, (0.012959 secs) [11/03/2019 11:47:42 INFO 140169171593024] TOTAL took: 11.3625204563 [11/03/2019 11:47:42 INFO 140169171593024] Number of GPUs being used: 0 #metrics {"Metrics": {"finalize.time": {"count": 1, "max": 387.3600959777832, "sum": 387.3600959777832, "min": 387.3600959777832}, "initialize.time": {"count": 1, "max": 42.871952056884766, "sum": 42.871952056884766, "min": 42.871952056884766}, "model.serialize.time": {"count": 1, "max": 0.2219676971435547, "sum": 0.2219676971435547, "min": 0.2219676971435547}, "update.time": {"count": 100, "max": 197.33190536499023, "sum": 11322.939395904541, "min": 97.9759693145752}, "epochs": {"count": 1, "max": 100, "sum": 100.0, "min": 100}, "state.serialize.time": {"count": 1, "max": 0.5171298980712891, "sum": 0.5171298980712891, "min": 0.5171298980712891}, "_shrink.time": {"count": 1, "max": 384.3569755554199, "sum": 384.3569755554199, "min": 384.3569755554199}}, "EndTime": 1572781662.199495, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.32371} [11/03/2019 11:47:42 INFO 140169171593024] Test data is not provided. #metrics {"Metrics": {"totaltime": {"count": 1, "max": 13017.530918121338, "sum": 13017.530918121338, "min": 13017.530918121338}, "setuptime": {"count": 1, "max": 30.853986740112305, "sum": 30.853986740112305, "min": 30.853986740112305}}, "EndTime": 1572781662.202104, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781662.199603} [11/03/2019 11:47:41 INFO 140552810366784] shrinking 100 centers into 10 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #0. Current mean square distance 12.250052 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #1. Current mean square distance 12.186016 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #2. Current mean square distance 12.200719 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #3. Current mean square distance 11.887745 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #4. Current mean square distance 12.341534 [11/03/2019 11:47:41 INFO 140552810366784] local kmeans attempt #5. Current mean square distance 12.504448 [11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #6. Current mean square distance 12.133743 [11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #7. Current mean square distance 12.772625 [11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #8. Current mean square distance 12.143409 [11/03/2019 11:47:42 INFO 140552810366784] local kmeans attempt #9. Current mean square distance 12.344214 [11/03/2019 11:47:42 INFO 140552810366784] finished shrinking process. Mean Square Distance = 12 [11/03/2019 11:47:42 INFO 140552810366784] #quality_metric: host=algo-2, train msd <loss>=11.8877449036 [11/03/2019 11:47:42 INFO 140552810366784] batch data loading with context took: 31.9681%, (3.320623 secs) [11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: point norm took: 20.7105%, (2.151268 secs) [11/03/2019 11:47:42 INFO 140552810366784] collect from kv store took: 13.6408%, (1.416910 secs) [11/03/2019 11:47:42 INFO 140552810366784] gradient: cluster center took: 11.5084%, (1.195417 secs) [11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: inner product took: 9.2459%, (0.960398 secs) [11/03/2019 11:47:42 INFO 140552810366784] predict compute msd took: 4.4798%, (0.465329 secs) [11/03/2019 11:47:42 INFO 140552810366784] gradient: cluster size took: 3.0899%, (0.320962 secs) [11/03/2019 11:47:42 INFO 140552810366784] gradient: one_hot took: 1.5796%, (0.164074 secs) [11/03/2019 11:47:42 INFO 140552810366784] update state and report convergance took: 1.2818%, (0.133143 secs) [11/03/2019 11:47:42 INFO 140552810366784] splitting centers key-value pair took: 1.1349%, (0.117886 secs) [11/03/2019 11:47:42 INFO 140552810366784] compute all data-center distances: center norm took: 1.1272%, (0.117085 secs) [11/03/2019 11:47:42 INFO 140552810366784] predict minus dist took: 0.1201%, (0.012476 secs) [11/03/2019 11:47:42 INFO 140552810366784] update set-up time took: 0.1130%, (0.011741 secs) [11/03/2019 11:47:42 INFO 140552810366784] TOTAL took: 10.3873124123 [11/03/2019 11:47:42 INFO 140552810366784] Number of GPUs being used: 0 [11/03/2019 11:47:42 INFO 140552810366784] No model is serialized on a non-master node #metrics {"Metrics": {"finalize.time": {"count": 1, "max": 291.3999557495117, "sum": 291.3999557495117, "min": 291.3999557495117}, "initialize.time": {"count": 1, "max": 41.98312759399414, "sum": 41.98312759399414, "min": 41.98312759399414}, "model.serialize.time": {"count": 1, "max": 0.07700920104980469, "sum": 0.07700920104980469, "min": 0.07700920104980469}, "update.time": {"count": 100, "max": 179.54707145690918, "sum": 10432.80816078186, "min": 89.97201919555664}, "epochs": {"count": 1, "max": 100, "sum": 100.0, "min": 100}, "state.serialize.time": {"count": 1, "max": 0.4820823669433594, "sum": 0.4820823669433594, "min": 0.4820823669433594}, "_shrink.time": {"count": 1, "max": 288.4190082550049, "sum": 288.4190082550049, "min": 288.4190082550049}}, "EndTime": 1572781662.107717, "Dimensions": {"Host": "algo-2", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781650.328628} [11/03/2019 11:47:42 INFO 140552810366784] Test data is not provided. #metrics {"Metrics": {"totaltime": {"count": 1, "max": 13907.652139663696, "sum": 13907.652139663696, "min": 13907.652139663696}, "setuptime": {"count": 1, "max": 16.698122024536133, "sum": 16.698122024536133, "min": 16.698122024536133}}, "EndTime": 1572781662.109637, "Dimensions": {"Host": "algo-2", "Operation": "training", "Algorithm": "AWS/KMeansWebscale"}, "StartTime": 1572781662.107824} 2019-11-03 11:47:54 Uploading - Uploading generated training model 2019-11-03 11:47:54 Completed - Training job completed Training seconds: 142 Billable seconds: 142 CPU times: user 7.93 s, sys: 394 ms, total: 8.33 s Wall time: 3min 21s
In [9]:
%%time
kmeans_predictor = kmeans.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')
--------------------------------------------------------------------------------------------------!CPU times: user 482 ms, sys: 38.7 ms, total: 521 ms Wall time: 8min 14s
In [10]:
%%time
result = kmeans_predictor.predict(valid_set[0][0:100])
clusters = [r.label['closest_cluster'].float32_tensor.values[0] for r in result]
CPU times: user 34.1 ms, sys: 353 µs, total: 34.5 ms Wall time: 334 ms
In [11]:
for cluster in range(10):
print('\n\n\nCluster {}:'.format(int(cluster)))
digits = [ img for l, img in zip(clusters, valid_set[0]) if int(l) == cluster ]
height = ((len(digits)-1)//5) + 1
width = 5
plt.rcParams["figure.figsize"] = (width,height)
_, subplots = plt.subplots(height, width)
subplots = numpy.ndarray.flatten(subplots)
for subplot, image in zip(subplots, digits):
show_digit(image, subplot=subplot)
for subplot in subplots[len(digits):]:
subplot.axis('off')
plt.show()
Cluster 0:
Cluster 1:
Cluster 2:
Cluster 3:
Cluster 4:
Cluster 5:
Cluster 6:
Cluster 7:
Cluster 8:
Cluster 9:
In [12]:
result = kmeans_predictor.predict(valid_set[0][230:231])
print(result)
[label { key: "closest_cluster" value { float32_tensor { values: 4.0 } } } label { key: "distance_to_cluster" value { float32_tensor { values: 6.309240818023682 } } } ]
In [13]:
show_digit(valid_set[0][230], 'This is a {}'.format(valid_set[1][230]))
In [ ]: