Introduction
n
Warehouse-scale computer (WSC)
n Provides Internet services
n
Search,
social networking, online maps, video sharing, online shopping, email, cloud
computing, etc.
n Differences with HPC “clusters”:
n Clusters have higher performance processors and network
n
Clusters
emphasize thread-level parallelism, WSCs emphasize request-level parallelism
n Differences with datacenters:
n
Datacenters
consolidate different machines and software into one location
n
Datacenters
emphasize virtual machines and hardware heterogeneity in order to serve varied
customers
Introduction
n
Important design factors for WSC:
n Cost-performance
n Small savings add up
n Energy efficiency
n Affects power distribution and cooling
n Work per joule
n Dependability via redundancy
n Network I/O
n Interactive and batch processing workloads
n Ample computational parallelism is not important
n Most jobs are totally independent
n “Request-level parallelism”
n Operational costs count
n
Power
consumption is a primary, not secondary, constraint when designing system
n Scale and its opportunities and problems
n Can afford to build customized systems since WSC require
volume purchase
Program’s
Models and Workloads
n
Batch processing framework: Map
Reduce
n
Map: applies a programmer-supplied
function to each logical input
record
n Runs on thousands of computers
n Provides new set of key-value pairs as intermediate values
n
Reduce: collapses values using another programmer-supplied function
Program’s Models and Workloads
n
Example:
n map
(String key, String value):
n // key:
document name
n // value:
document contents
n for each
word w in value
n Emit
Intermediate(w,”1”); // Produce list of all words
n reduce
(String key, Iterator values):
n // key: a
word
n // value:
a list of counts
n int result
= 0;
n for each v
in values:
n result +=
ParseInt(v); // get integer from key-value pair
n Emit(AsString(result));
Program’s
Models and Workloads
n
MapReduce runtime environment
schedules map and reduce task to WSC nodes
n
Availability:
n Use
replicas of data across different servers
n Use
relaxed consistency:
n No need
for all replicas to always agree
n
Workload demands
n Often vary
considerably
Computer Architecture of WSC
n
WSC often use a hierarchy of
networks for interconnection
n
Each 19” rack holds 48 1U servers
connected to a rack switch
n
Rack switches are uplinked to
switch higher in hierarchy
n
Uplink has 48 / n times lower
bandwidth, where n = # of uplink ports
n “Oversubscription”
n
Goal is to maximize locality of
communication relative to the rack
Storage
n
Storage options:
n Use disks
inside the servers, or
n Network
attached storage through Infiniband
n WSCs
generally rely on local disks
n
Google File System (GFS) uses
local disks and maintains at least three relicas
Array Switch
n
Switch that connects an array of racks
n
Array switch should have 10 X the
bisection bandwidth of rack switch
n Cost of n-port switch grows as n2
n
Often utilize content addressible
memory chips and FPGAs
WSC Memory
Hierarchy
n
Servers can access DRAM and disks
on other servers using a NUMA-style interface
Infrastructure
and Costs of WSC
n
Location of WSC
n
Proximity to Internet backbones,
electricity cost, property tax rates, low risk from earthquakes, floods, and
hurricanes
n
Power distribution
Infrastructure
and Costs of WSC
n
Cooling
n Air
conditioning used to cool server room
n 64 F – 71
F
n Keep
temperature higher (closer to 71 F)
n Cooling
towers can also be used
n Minimum
temperature is “wet bulb temperature”
Infrastructure
and Costs of WSC
n
Cooling system also uses water
(evaporation and spills)
n E.g. 70,000
to 200,000 gallons per day for an 8 MW facility
n
Power cost breakdown:
n Chillers:
30-50% of the power used by the IT equipment
n Air
conditioning: 10-20% of the IT power, mostly due to fans
n
How man servers can a WSC support?
n Each
server:
n “Nameplate
power rating” gives maximum power consumption
n To get
actual, measure power under actual workloads
n
Oversubscribe cumulative server
power by 40%, but
Monitor
power closely
Measuring Efficiency of a WSC
n
Power Utilization Effectiveness (PEU)
n = Total
facility power / IT equipment power
n Median PUE
on 2006 study was 1.69
n
Performance
n
Latency is important metric
because it is seen by users
n
Bing study: users will use search
less as response time increases
n
Service Level Objectives
(SLOs)/Service Level Agreements (SLAs)
n E.g. 99%
of requests be below 100 ms
Cost of a
WSC
n
Capital expenditures (CAPEX)
n Cost to
build a WSC
n
Operational expenditures (OPEX)
n Cost to
operate a WSC
|
Cloud
Computing
n
WSCs offer economies of scale that
cannot be achieved with a datacenter:
n 5.7 times
reduction in storage costs
n 7.1 times
reduction in administrative costs
n 7.3 times
reduction in networking costs
n
This has given rise to cloud
services such as Amazon Web Services
n “Utility
Computing”
n
Based on using open source virtual
machine and operating system software
|
No comments:
Post a Comment