BIG DATA
Content:
- Introduction
- What
is big data
- Characteristic
of big data
- Storing,
selecting and processing of big data
- Why
big data
- How
it is different
- Big
data sources
- Tools
used in big data
- Application
of big data
10. Risks
of big data
11.  How
big data impact on it
12.
Benefits of big data
13.  Future
of big data.
a.   Big
data may well be the next big thing in the it world.Â
b.   Big
data burst upon the scene in the first decade of the 21st century.
c.   The
first organizations to embrace it were online and startup firms. Firms like
Google, eBay, LinkedIn, and Facebook were built around big data from the
beginning.
d.   Like
many new information technologies, big data can bring about dramatic cost
reductions, substantial improvements in the time required to perform a
computing task, or new product and service offerings.
2.   What
is big data? :
a.   ‘big
data’ is similar to ‘small data’, but bigger in size
b.   But
having data bigger it requires different approaches:
                                        Â
i.   Techniques,
tools and architecture
c.   An
aim to solve new problems or old problems in a better way
d.   Big
data generates value from the storage and processing of very large quantities
of digital information that cannot be analyzed with traditional computing
techniques.
e.   Walmart
handles more than 1 million customer transactions every hour.
                                        Â
i.    Facebook handles 40 billion photos from its
user base.
                                       ii.    Decoding the human genome originally took 10years to process; now it can be achieved in one week.
3.   Three
characteristics of big data v3s:
a.   Volume
                                        Â
  Data
quantity
                                          [text]
b.   Velocity
                                           Data
speed.
                      [text]
c.   Variety
                                            Data
types
                                      Â
   [text]
a.   1st
character of big data volume:
 A
typical pc might have had 10 gigabytes of storage in 2000.
 Today,
Facebook ingests 500 terabytes of new data every day.
 Boeing
737 will generate 240 terabytes of flight data during a single flight across
the us.
   The smart phones, the data they create and
consume; sensors embedded into everyday objects will soon result in billions of
new, constantly-updated data feeds containing environmental, location, and
other information, including video.
b.   2nd
character of big data velocity:
  Clickstreams
and ad impressions capture user behavior at millions of events per second
   high-frequency stock trading algorithms
reflect market changes within microseconds
   machine to machine processes exchange data
between billions of devices
   infrastructure and sensors generate massive
log data in real-time
   On-line gaming systems support millions
of concurrent users, each producing multiple inputs per second.
c.   3rd
character of big data variety:
  Big
data isn't just numbers, dates, and strings. Big data is also geospatial data,
3d data, audio and video, and unstructured text, including log files and social
media.
  Traditional
database systems were designed to address smaller volumes of structured data,
fewer updates or a predictable, consistent data structure.
  Big
data analysis includes different types of data
- Storing,
selecting and processing of big data:
- Storing
big data:
                                        Â
i.   Analyzing
your data characteristics
1.   Selecting
data sources for analysis.
2.   Eliminating
redundant data.
3.   Establishing
the role of no SQL.
                                      Â
ii.   Overview
of big data stores
1.   Data
models: key value, graph, document, column-family
2.   Hadoop
distributed file system
3.   Hbase
4.   Hive
- Selecting
big data stores:
                                        Â
i.   Choosing
the correct data stores based on your data characteristics
                                      Â
ii.   Moving
code to data
                                     Â
iii.   Implementing
polyglot data store solutions
                                     Â
iv.   Aligning
business goals to the appropriate data store
- Processing
big data:
                                        Â
i.   Integrating
disparate data stores:
1.   Mapping
data to the programming framework
2.   Connecting
and extracting data from storage
3.   Transforming
data for processing
4.   Subdividing
data in preparation for Hadoop map reduce
                                      Â
ii.   Employing
hadoop map reduce
1.   Creating
the components of hadoop map reduce jobs
2.   Distributing
data processing across server farms
3.   Executing
hadoop map reduce jobs
4.   Monitoring the progress of job flows
- The
structure of big data:
                                        Â
i.   Structured:
1.  Â
Most traditional data
sources.
                                      Â
ii.   Semi
– structured
1.  Â
Many sources of big data
                                     Â
iii.   Unstructured
1.  Â
Video data, audio data
Why big data:
- Growth
of big data is neededÂ
                                        Â
i.   Increase
of storage capacities
                                      Â
ii.   Increase
of processing power
                                     Â
iii.   Availability
of data(different data types)
                                     Â
iv.   Every
day we create 2.5 quintillion bytes of data; 90% of the data in the world today
has been created in the last two years alone
                                      Â
v.   FB
generates 10tb daily
                                     Â
vi.   Twitter
generates 7tb of data
                                    Â
vii.   Daily
                                  Â
viii.   IBM
claims 90% of today’s
                                     Â
ix.    stored data was generated
                                      Â
x.   In
just the last two years.
6.    How
is big data different? :
a.   Automatically
generated by a machine (e.g. Sensor embedded in an engine)
b.   Typically
an entirely new source of data (e.g. Use of the internet)
c.   Not
designed to be friendly (e.g. Text streams)
d.   May
not have much values
                                        Â
i.   Need
to focus on the important part
7.     Big
data sources:
a.   Data generation pointsÂ
b.   Big
data analytics:
                                        Â
i.   Examining
large amount of data
                                      Â
ii.   Appropriate
information
                                     Â
iii.   Identification
of hidden patterns, unknown correlations
                                     Â
iv.   Competitive
advantage
                                      Â
v.   Better
business decisions: strategic and operational
                                     Â
vi.   Effective
marketing, customer satisfaction,
increased revenue
8.   Types
of tools used in big-data:
a.   Where
processing is hosted?
                                        Â
i.   Distributed
servers / cloud (e.g. Amazon ec2)
b.   Where
data is stored?
                                        Â
i.   Distributed
storage (e.g. Amazon s3)
c.   What
is the programming model?
                                        Â
i.   Distributed
processing (e.g. Map reduce)
d.   How
data is stored & indexed?
                                        Â
i.   High-performance
schema-free databases (e.g. Mongo dB)
e.   What
operations are performed on data?
                                        Â
i.   Analytic
/ semantic processing
9.Â
10Â . Risks
of big data:
a.   Will
be so overwhelmed
                                        Â
i.   Need
the right people and solve the right problems
b.   Costs
escalate too fast
                                        Â
i.   Isn’t
necessary to capture 100%
c.   Many
sources of big data is privacy
                                        Â
i.   Self-regulation
                                      Â
ii.   Legal
regulation
Leading
technology vendors
üÂ
Example vendors:
v IBM
– Netezza
v EMC
– green plum
v Oracle
– exadata
üÂ
Commonality:
v  MPP(Massively Parallel Processing)
architectures
v  commodity hardware
v  RDBMS based
v  full SQL compliance
11. How
big data impacts on it:
a.   Big
data is a troublesome force presenting opportunities with challenges to it
organizations.
                                        Â
i.   By
2015 4.4 million it jobs in big data ; 1.9 million is in us itself
                                      Â
ii.   India
will require a minimum of 1 lakh data scientists in the next couple of years in
addition to data analysts and data managers to support the big data space.
Potential
value of big data:
v $300
billion potential annual value to us health care.
v  $600 billion potential annual consumer surplus
from using personal location data.
v 60%
potential in retailers’ operating margins.
           India
– big data:
v Gaining
attraction
v Huge
market opportunities for it services (82.9% of revenues) and analytics firms
(17.1 % )
v Current
market size is $200 million. By 2015 $1 billion
v The
opportunity for Indian service providers lies in offering services around big
data implementation and analytics for global multinationals
12. Benefits
of big data:
a.   Real-time
big data isn’t just a process for storing petabytes or Exabyte’s of data in a
data warehouse, it’s about the ability to make better decisions and take
meaningful actions at the right time.
b.   Fast
forward to the present and technologies like Hadoop give you the scale and
flexibility to store data before you know how you are going to process it.
c.   Technologies
such as map reduce, hive and impala enable you to run queries without changing
the data structures underneath.
d.   Our
newest research finds that organizations are using big data to target
customer-centric outcomes, tap into internal data and build a better
information ecosystem.
e.   Big
data is already an important part of the $64 billion database and data
analytics market.
f.    It
offers commercial opportunities of a comparable scale to enterprise software in
the late 1980s.
g.   And
the internet boom of the 1990s, and the social media explosion of today.
13 Future of big data:
a.   $15
billion on software firms only specializing in data management and analytics.
b.   This
industry on its own is worth more than $100 billion and growing at almost 10% a
year which is roughly twice as fast as the software business as a whole.
c.   In
February 2012, the open source analyst firm Wikibon released the first market
forecast for big data , listing $5.1b revenue in 2012 with growth to $53.4b in
2017
d.   The
mckinsey global institute estimates that data volume is growing 40% per year,
and will grow 44x between 2009 and 2020.
Post a Comment