PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Predictor Analysis

by Bayesian InferenceApril 2nd, 2024

Too Long; Didn't Read

This paper investigates how the configuration of on-device hardware affects energy consumption for neural network inference with regular fine-tuning.

featured image - PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Predictor Analysis

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison; (2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, [email protected].

Table of Links

D PREDICTOR ANALYSIS

We vary the latency SLO to assess how the predictor schedules the fine-tuning requests. We replay a 60-second stream where we initially set the latency SLO to 250ms for the first half (30 seconds), and then increase it to 700ms for the remainder. As shown in Figure 14, under stringent latency conditions, the predictor deduces that it is impractical to schedule fine-tuning requests while adhering to the latency SLO, hence no fine-tuning requests are scheduled. Conversely, when the latency SLO is more relaxed, the predictor determines that it is feasible to schedule fine-tuning requests and sequentially schedules each request once the preceding one is completed and has issued a completion signal.

L O A D I N G
. . . comments & more!

About Author

Bayesian Inference@bayesianinference

At BayesianInference.Tech, as more evidence becomes available, we make predictions and refine beliefs.

Read my stories Learn More

TOPICS

machine-learning #neural-networks #polythrottle #neural-network-inference #edge-devices #on-device-hardware #fine-tuning #nvidia-triton #efficientnet

THIS ARTICLE WAS FEATURED IN...

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas