Continuous Delivery Software Development Q&A

Alex McDonald

Apr 26, 2021

title of post
What’s the best way to make a development team lean and agile? It was a question we explored at length during our SNIA Cloud Storage Technologies Initiative live webcast “Continuous Delivery: Cloud Software Development on Speed.” During this session, continuous delivery expert, Davis Frank, Co-creator of the Jasmine Test Framework, explained why product development teams are adopting a continuous delivery (CD) model. If you missed the live event, you can watch it on-demand here. The webcast audience was highly engaged with this topic and asked some interesting questions. Here are Davis Frank’s answers: Q.  What are a few simple tests you can use to determine your team’s capability to deliver CD? A. I would ask the team three questions:
  1. Do you want to move to a Continuous Delivery model?
  2. Are you willing to meet to discover what is preventing your team from working in a CD manner, and then hold yourselves accountable to making improvement?
  3. Are you able to repeat the second step regularly?
If the answers to these questions are all yes, then you have the foundation necessary to get started. Q. If you’re talking about multiple products from different companies, how do you ensure that you can deliver CD type products? A. When building cloud software today, you are going to have dependencies. Dependencies on open-source frameworks, closed-source tooling, and web-based API’s. As these dependencies change, they can affect your products. Automated testing and Continuous integration help here. They will catch issues before delivery just like bugs or issues that come from your team. Finding the issues early means the team can recover, amend, or work around these types of problems so they have a minimal impact on the team’s delivery pace or the overall business. Q. When you get to team rotation, how well do software engineers in your experience adapt to moving?  As you move engineers to new areas of the code, do you experience that they spend time re-writing what’s not broken because they didn’t write it in the first place? A. Every person is different, but most engineers like new problems. Learning a new domain and applying their knowledge to solve new problems often is highly motivating for engineers. The urge to re-write is often just to build understanding. As I mentioned in the webcast, the “rewrite risk” can be mitigated by pair programming sessions with engineers more familiar with the code and well-written test suites. The test suites act as documentation on how the code actually works. These accelerate knowledge transfer, reducing some of the motivation to re-write. There are other reasons to rewrite code. Sometimes it is for reuse, or to make it easier to maintain, or to use newer patterns. These types of rewrites, or refactorings, are natural and happen every day. They improve the code. With good test coverage, the product risk of this type of work is low. Q. Does the Lean Methodology work well in terms of delivering software, is it still too heavy a process for rapid development by software engineering teams? A. Any new process will feel heavyweight to a team. I recommend finding things that are not working and use new techniques – whatever their origin – to attempt to fix or optimize them. With short feedback loops, you can experiment, tweak, improve – this is the Learn cycle of Build-Measure-Learn – until you have fixed a problem. And then pick a new one. Q. Do you still see a need for a “product release” timeline, or does this move software completely to a place where a feature is enabled as soon as it’s ready?  How do you cover “feature regression” if new code breaks an existing feature, or updates the way the feature is supported? A. We touched on the first part of this question in the webcast. A CD team is working well, they are just always delivering new features to production. Whether those features are available to users is a product decision and can be tied to a planned release timeline. Companies often use feature flags, or other similar technology, to hide functionality from users until they are available. Hiding functionality could be necessary due to public announcement or marketing concerns. Or, partial functionality is delivered and waiting until the remaining functionality is ready and the feature flags are removed. As to “feature regression” or updating how a feature is supported, automated testing and continuous integration should detect and or protect these cases – which totally happen – they should just happen during the development process and thus before production. Q. Do you have to do CD using open source, or does it work with closed-source products? A. I see open-sourcing as a product feature around licensing, transparency, and community. It does not directly have to do with how the software is developed and delivered. So, I see no conflicts with closed-source software. Said another way, does Amazon release their store platform as open-source? Or Google for GMail or Google Docs?

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Continuous Delivery: Cloud Software Development on Speed

Alex McDonald

Mar 23, 2021

title of post
It happens with more frequency these days. Two companies merge, and the IT departments breathe a small sigh of relief as they learn that they both use the same infrastructure software, though one is on-premises and one is in the cloud. Their relief slowly dissolves, as they discover that the cloud-provisioned workers are using features in the software that have yet to be integrated into the on-prem version. Now both have to adapt and it seems that no one is happy. So, what’s the best way to get these versions in sync? A Continuous Delivery model is increasingly being adopted to get software development on a pace to keep up with business demands. The Continuous Delivery model results in a development organization that looks much like current manufacturing processes with effective workers, modern machines, and a just-in-time inventory. Even large software companies are starting to embrace this cloud delivery methodology to create a continuous stream of new revisions. On April 20, 2021, the SNIA Cloud Storage Technologies Initiative will explore why Continuous Delivery is a valuable addition to the software development toolbox at our live webcast “Continuous Delivery: Cloud Software Development on Speed.” By adapting some of the principles of modern manufacturing to software development, a Continuous Delivery methodology ensures that the product is streamlined in its feature set while building constant value to the customer via the cloud. Webcast attendees will learn:
  • Structuring development and testing resources for Continuous Delivery
  • A flexible software planning cycle for driving new features throughout the process
  • A set of simple guidelines for tracking success
  • Ways to ensure new features are delivered before moving to the next plan
Register today. Our expert speakers, Davis Frank, Co-creator of the Jasmine Test Framework & Former Associate Director at Pivotal Labs and Glyn Bowden, CTO, AI & Data Practice at HPE will be on hand to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Cloud Analytics Drives Airplanes-as-a-Service Business

Jim Fister

Feb 25, 2021

title of post
On-demand flying through an app sounds like something for only the rich and famous, yet the use of cloud analytics is making flexible flying a reality at start-up airline, KinectAir.  On April 7, 2021, The CTO of KinectAir, Ben Howard, will join the SNIA Cloud Storage Technologies Initiative (CSTI) for a fascinating discussion on first-hand experiences of leveraging cloud analytics methods to bring new business models to life that are competitive and profitable. And since start-up companies may not have legacy data and analytics to consider, we’ll also explore what established businesses using traditional analytics methods can learn from this use case. Join us on April 7th for our live webcast “Adapting Cloud Analytics for Practical Business Use” for views from both start-up and established companies on how to revisit the analytics decision process with a discussion on:
  • How to build and take advantage of a data ecosystem
  • Overcoming challenges and roadblocks
  • How to use cloud resources in unique ways to accomplish business and engineering goals
  • Considerations for business requirements and developing technical metrics
  • Thoughts on when to start new vs. adapt existing analytics processes
  • Real-world examples of cloud analytics and AI
Register today. Our panelists will be on-hand to answer questions. We hope to see you there.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Understanding CDMI and S3 Together

Alex McDonald

Feb 9, 2021

title of post
How does the Cloud Data Management Interface (CDMI™) International Standard work? Is it possible be to both S3 and CMDI compliant? What security measures are in place with CDMI? How, and where, is CDMI being deployed? These are just some of the topics we covered at our recent SNIA Cloud Storage Technologies (CSTI) webcast, “Cloud Data Management & Interoperability: Why A CDMI Standard Matters.” CDMI is intended for application developers who are implementing cloud storage systems, and who are developing applications to manage and consume cloud storage. Q. Can you compare CDMI to S3? Is it possible to be both CDMI and S3 compliant? Is it too complicated? A. Yes, this is possible, and is relatively straightforward. Both protocols are HTTP-based, and while S3 is primarily a data access protocol, CDMI provides both management functionality and standardized access to object data. Many companies that implement CDMI allow management of data namespaces that are accessible via multiple protocols, including NFS, CIFS, and S3. CDMI has several capabilities that ease integration with S3:
  • CDMI is designed so that any S3 URL can be used as a CDMI URL by specifying an Accept header with a CDMI content type.
  • CDMI allows S3 header-style metadata to be accessed, queried, and managed through CDMI.
  • CDMI supports S3 signed header authentication.
CDMI is also commonly used as a serialization representation for objects, files and LUNS, which eases transport between different storage systems and clouds. Q: With the new CDMI Object Encryption feature is it possible to use it with the OASIS Key Management Interoperability Protocol (KMIP)? A: CDMI does not directly use KMIP, but some organization have successfully used CDMI and KMIP together. At a basic level, KMIP can be used for the key management by a client and this client can then use the key material in its interactions with CDMI. Also noteworthy, both CDMI and KMIP use RESTful interfaces and have dependencies on the Transport Layer Security (TLS) protocol to support communications securities. Q: Do any existing security standards provide guidance on the use of CDMI? A: ISO/IEC 27040 (Information security – Security techniques – Storage security) does provide security guidance on cloud storage, and CDMI specifically. An important aspect of the CDMI security guidance is to use the capability queries to determine what security capabilities have been implemented and then to make a risk-based decision on whether the implementation offers adequate security protections. Q. When users interact with each other, in real-time, how can we guarantee the information request comes from the safe end? Would you like to explain it in details, please? A. If this question is about user authentication, then the use of TLS can provide some measure of protection; however, user authentication in CDMI will provide the best protection in this situation. See the next question for more details on TLS. Q: HTTP is not a stateful protocol, but TLS is. Does this create problems? A: When TLS is used with CDMI, it is important for the client to consistently use the same connection, especially when any load balancing is being employed with the CDMI servers. Unless pre-shared keys (PSK) are being used, switching between servers causes TLS to tear down the connection and to start a new session that imposes needless loads on the servers. TLS startups can involve significant calculations as part of the negotiations to establish a session key. There are multiple CDMI implementations. CDMI is open source and anyone can get involved in its development. You don’t need to be a SNIA member. To learn more visit https://www.snia.org/cdmi.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

5G Streaming Questions Answered

Michael Hoard

Dec 2, 2020

title of post

The broad adoption of 5G, internet of things (IOT) and edge computing are reshaping the nature and role of enterprise and cloud storage. Preparing for this significant disruption is important. It’s a topic the SNIA Cloud Storage Technologies Initiative covered in our recent webcast “Storage Implications at the Velocity of 5G Streaming,” where my colleagues, Steve Adams and Chip Maurer, took a deep dive into the 5G journey, streaming data and real-time edge AI, 5G use cases and much more. If you missed the webcast, it’s available on-demand along with a copy of the webcast slides.

As you might expect, this discussion generated some intriguing questions. As promised during the live presentation, our experts have answered them all here.

Q. What kind of transport do you see that is going to be used for those (5G) use-cases?

A. At a high level, 5G consists of 3 primary slices: enhanced mobile broadband (eMBB), ultra-low latency communications (URLLC) and massive machine type communication (mMTC). Each of these are better suited for different use cases, for example normal smartphone usage relies on eMBB, factory robotics relies on URLLC, and intelligent device or sensor applications like farming, edge computing and IOT relies on mMTC.  

The primary 5G standards-making bodies include:

  • The 3rd Generation Partnership Project (3GPP) – formulates 5G technical specifications which become 5G standards. Release 15 was the first release to define 5G implementations, and Release 16 is currently underway.
  • The Internet Engineering Task Force (IETF) partners with 3GPP on the development of 5G and new uses of the technology. Particularly, IETF develops key specifications for various functions enabling IP protocols to support network virtualization. For example, IETF is pioneering Service Function Chaining (SFC), which will link the virtualized components of the 5G architecture—such as the base station, serving gateway, and packet data gateway—into a single path. This will permit the dynamic creation and linkage of Virtual Network Functions (VNFs).
  • The International Telecommunication Union (ITU), based in Geneva, is the United Nations specialized agency focused on information and communication technologies. ITU World Radio communication conferences revise the international treaty governing the use of the radio-frequency spectrum and the geostationary and non-geostationary satellite orbits.

To learn more, see

Q. What if the data source at the Edge is not close to where the signal is good to connect to cloud? And, I wonder how these algorithm(s) / data streaming solutions should be considered?

A. When we look at a 5G applications like massive Machine Type Communications (mMTC), we expect many kinds of devices will connect only occasionally, e.g. battery-operated sensors attached to farming water sprinklers or water pumps.  Therefore, long distance, low bandwidth, sporadically connected 5G network applications will need to tolerate long stretches of no-contact without losing context or connectivity, as well as adapt to variations in signal strength and signal quality.   

Additionally, 5G supports three broad ranges of wireless frequency spectrum: Low, Mid and High. The lower frequency range provides lower bandwidth for broader or more wide area wireless coverage.  The higher frequency range provides higher bandwidth for limited area or more focused area wireless coverage. To learn more, check out The Wired Guide to 5G.

On the second part of the question regarding algorithm(s) / data streaming solutions, we anticipate streaming IOT data from sporadically connected devices can still be treated as steaming data sources from a data ingestion standpoint. It is likely to consist of broad snapshots (pre-stipulated time windows) with potential intervals of null sets of data when compared with other types of data sources. Streaming data, regardless of interval of data arrival, has value because of the “last known state” value versus previous interval known states. Calculation of trending data is one of the most common meaningful ways to extract value and make decisions. 

Q. Is there an improvement with the latency in 5G from cloud to data center?

By 2023, we should see the introduction of 5G ultra reliable low latency connection (URLLC) capabilities, which will increase the amount of time sensitive data ingested into and delivered from wireless access networks. This will increase demand for fronthaul and backhaul bandwidth to move time sensitive data from remote radio units, to baseband stations and aggregation points like metro area central offices.

As an example, to reduce latency, some hyperscalers have multiple connections out to regional co-location sites, central offices and in some cases sites near cell towers. To save on backhaul transport costs and improve 5G latency, some cloud service providers (CSP) are motivated to locate their networks as close to users as possible.

Independent of CSPs, we expect that backhaul bandwidth will increase to support the growth in wireless access bandwidth of 5G over 4G LTE. But it isn’t the only reason backhaul bandwidth is growing. COVID-19 revealed that many cable and fiber access networks were built to support much more download than upload traffic. The explosion in work and study from home, as well as video conferencing has changed the ratio of upload to download. So many wireline operators (which are often also wireless operators) are upgrading their backhaul capacity in anticipation that not everyone will go back to the office any time soon and some may hardly ever return to the office.

Q. Are the 5G speeds ensured from end-to-end (i.e from mobile device to tower and with MSP’s infrastructure)? Understand most of the MSPs have improved the low latency speeds between Device and Tower.

We expect specialized services like 5G ultra reliable low latency connection (URLLC) will help improve low latency and narrow jitter communications. As far as “assured,” this depends on the service provider SLA. More broadly 5G mobile broadband and massive machine type communications are typically best effort networks, so generally, there is no overall guaranteed or assured latency or jitter profile.

5G supports the largest range of radio frequencies. The high frequency range uses milli-meter (mm) wave signals to deliver the theoretical max of 10Gbps, which means by default reduced latency along with higher throughput. For more information on deterministic over-the-air network connections using 5G URLLC and TSN (Time Sensitive Networking), see this ITU presentation “Integration of 5G and TSN.”  

To provide a bit more detail, mobile devices communicate via wireless with Remote Radio Head (RRH) units co-located at the antenna tower site, while baseband unit (BBU) processing is typically hosted in local central offices.  The local connection between RRHs and BBUs is called the fronthaul network (from antennas to central office). Fronthaul networks are usually fiber optic supporting eCPRI7.2 protocol, which provide time sensitive network delivery. Therefore, this portion of the wireless data path is deterministic even if the over-the-air or other backhaul portions of the network are not.

Q. Do we use a lot of matrix calculations in streaming data, and do we have a circuit model for matrix calculations for convenience?

We see this applies case-by-case based on the type of data.  What we often see is many edge hardware systems include extensive GPU support to facilitate matrix calculations for real time analytics.

Q. How do you see the deployment and benefits of Hyperconverged Infrastructure (HCI) on the edge?

Great question.  The software flexibility of HCI can provide many advantages on the edge over dedicated hardware solutions. Ease of deployment, scalability and service provider support make HCI an attractive option.  See this very informative article from TechTarget “Why hyper-converged edge computing is coming into vogue” for more details.

Q. Can you comment on edge-AI accelerator usage and future potentials? What are the places these will be used?

Edge processing capabilities include many resources to improve AI capabilities.  Things like computational storage and increased use of GPUs will only serve to improve analytics performance. Here is a great article on this topic.

Q. How important is high availability (HA) for edge computing?

For most enterprises, edge computing reliability is mission critical.  Therefore, almost every edge processing solution we have seen includes complete and comprehensive HA capabilities.

Q. How do you see Computational Storage fitting into these Edge use cases?  Any recommendations on initial deployment targets?

The definition and maturity of computational storage is rapidly evolving and is targeted to offer huge benefits for management and scale of 5G data usage on distributed edge devices.  First and foremost, 5G data can be used to train deep neural networks at higher rates due to parallel operation of “in storage processing.”  Petabytes of data may be analyzed in storage devices or within storage enclosures (not moved over the network for analysis). Secondly, computational storage may also accelerate the process of conditioning data or filtering out unwanted data.

Q. Do you think that the QUIC protocol will be a standard for the 5G communication?

So far, TCP is still the dominate transport layer protocol within the industry. QUIC was initially proposed by Google and is widely adopted in the Chrome/Android ecosystem.  QUIC is getting increased interest and adoption due to its performance benefits and ease in implementation (it can be implemented in user space and does not need OS kernel changes). 

For more information, here is an informative SNIA presentation on the QUIC protocol.

Please note this is an active area of innovation.  There are other methods including Apple IOS devices using MPTCP, and for inter/intra data center communications RoCE (RDMA over Converged Ethernet) is also gaining traction, as it allows for direct memory access without consuming host CPU cycles.  We expect TCP/QUIC/RDMA will all co-exist, as other new L3/L4 protocols will continue to emerge for next generation workloads. The choice will depend on workloads, service requirements and system availability.

 

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Why Cloud Standards Matter

Alex McDonald

Nov 18, 2020

title of post
Effective cloud data management and interoperability is critical for organizations looking to gain control and security over their cloud usage in hybrid and multicloud environments. The Cloud Data Management Interface (CDMI™), also known as the ISO/IEC 17826 International Standard, is intended for application developers who are implementing or using cloud storage systems, and who are developing applications to manage and consume cloud storage. It specifies how to access cloud storage namespaces and how to interoperably manage the data stored in these namespaces. Standardizing the metadata that expresses the requirements for the data, leads to multiple clouds from different vendors treating your data the same. First published in 2010, the CDMI standard (ISO/IEC 17826:2016) is now at version 2.0 and will be the topic of our webcast on December 9, 2020, “Cloud Data Management & Interoperability: Why A CDMI Standard Matters,” where our experts, Mark Carlson, Co-chair of the SNIA Technical Council and Eric Hibbard, SNIA Storage Security Technical Work Group Chair, will provide an overview of the CDMI standard and cover CDMI 2.0:
  • Support for encrypted objects
  • Delegated access control
  • General clarifications
  • Errata contributed by vendors implementing the CDMI standard
This webcast will be live and Mark and Eric will be available to answer your questions on the spot. We hope to see you there. Register today.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Keeping Up with 5G, IoT and Edge Computing

Michael Hoard

Oct 1, 2020

title of post
The broad adoption of 5G, Internet of things (IoT) and edge computing will reshape the nature and role of enterprise and cloud storage over the next several years. What building blocks, capabilities and integration methods are needed to make this happen? That will be the topic of discussion at our live SNIA Cloud Storage Technologies webcast on October 21, 2020 “Storage Implications at the Velocity of 5G Streaming.” Join my SNIA expert colleagues, Steve Adams and Chip Maurer, for a discussion on common questions surrounding this topic, including: 
  • With 5G, IoT and edge computing – how much data are we talking about?
  • What will be the first applications leading to collaborative data-intelligence streaming?
  • How can low latency microservices and AI quickly extract insights from large amounts of data?
  • What are the emerging requirements for scalable stream storage – from peta to zeta?
  • How do yesterday’s object-based batch analytic processing (Hadoop) and today’s streaming messaging capabilities (Apache Kafka and RabbitMQ) work together?
  • What are the best approaches for getting data from the Edge to the Cloud?
I hope you will register today and join us on October 21st. It’s live so please bring your questions!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

The Impact and Implications of Internet of Payments

Jim Fister

Sep 2, 2020

title of post
Electronic payments, once the purview of a few companies, have expanded to include a variety of financial and technology companies. Internet of Payment (IoP) enables payment processing over many kinds of IoT devices and has also led to the emergence of the micro-transaction. The growth of independent payment services offering e-commerce solutions, such as Square, and the entry of new ways to pay, such as Apple Pay, mean that a variety of devices and technologies also have come into wide use. This is the topic that the SNIA Cloud Storage Technologies Initiative is going to examine at our live webcast on October 14, 2020 “Technology Implications of Internet of Payments.” Along with the rise and dispersal of the payment eco-system, more of our assets that we exchange for payment are becoming digitized as well. When digital ownership is equivalent to physical ownership, security and scrutiny of those digital platforms and methods takes a leap forward in significance. Assets and funds are now widely distributed across multiple organizations. Physical asset ownership is even being shared between many stakeholders resulting in more ownership opportunities for less investment but in a distributed way. In this webcast we will look at the impact of all of these new principles across multiple use cases and how it impacts not only on the consumers driving this behavior but on the underlying infrastructure that supports and enables it. We’ll examine:
  • The cloud network, applications and storage implications of IoP
  • Use of emerging blockchain capabilities for payment histories and smart contracts
  • Identity and security challenges at the device in addition to point of payment
  • Considerations on architecting IoP solutions for future scale
Register today and please bring your questions for our expert presenters.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Where Does Cyber Insurance Fit in Your Security Strategy?

Paul Talbut

Jul 17, 2020

title of post
Protection against cyber threats is recognized as a necessary component of an effective risk management approach, typically based on a well-known cybersecurity framework. A growing area to further mitigate risks and provide organizations with the high level of protection they need is cyber insurance. However, it’s not as simple as buying a pre-packaged policy. In fact, it’s critical to identify what risks and conditions are excluded from a cyber insurance policy before you buy. Determining what kind of cyber insurance your business needs or if the policy you have will really cover you in the event of an incident is challenging. On August 27, 2020 the SNIA Cloud Storage Technologies Initiative (CSTI) will host a live webcast, “Does Your Storage Need a Cyber Insurance Tune-Up?” where we’ll examine how cyber insurance fits in a risk management program. We’ll identify key terms and conditions that should be understood and carefully negotiated as cyber insurance policies may not cover all types of losses. Join this webcast to learn:
  • General threat tactics, risk management approaches, cybersecurity frameworks
  • How cyber insurance fits within an enterprise data security strategy
  • Nuances of cyber insurance – exclusions, exemption, triggers, deductibles and payouts
  • Reputational damage considerations
  • Challenges associated with data stored in the cloud
There’s a lot to cover when it comes to this topic. In fact, we may need to offer a “Part Two” to this webcast, but hope you will register today to join us on August 27th.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

A Q&A on the Impact of AI

Alex McDonald

Jun 15, 2020

title of post

It was April Fools’ Day, but the Artificial Intelligence (AI) webcast the SNIA Cloud Storage Technologies Initiative (CSTI) hosted on April 1st was no joke! We were fortunate to have AI experts, Glyn Bowden and James Myers, join us for an interesting discussion on the impact AI is having on data strategies. If you missed the live event, you can watch it here on-demand. The audience asked several great questions. Here are our experts’ answers:

Q. How does the performance requirement of the data change from its capture at the edge through to its use

A. That depends a lot on what purpose the data is being
captured for. For example, consider a video analytics solution to capture
real-time activities. The data transfer will need to be low latency to get the
frames to the inference engine as quickly as possible. However, there is less
of a need to protect that data, as if we lose a frame or two it’s not a major
issue. Resolution and image fidelity are already likely to have been sacrificed
through compression. Now think of financial trading transactions. It may be we
want to do some real-time work against them to detect fraud, or feedback into a
market prediction engine; however we may just want to push them into an archive.
In this case, as long as we can push the data through the acquisition function
quickly, we don’t want to cause issues for processing new incoming data and have
side effects like filling up of caches etc,  so we don’t need to be too concerned with
performance. However, we MUST protect every transaction. This means that each
piece of data and its use will dictate what the performance, protection and any
other requirements are required as it passes through the pipeline.

Q. Need to think of the security, who is seeing the data resource?

A. Security
and governance is key to building a successful and flexible data pipeline. We
can no longer assume that data will only have one use, or that we know in
advance all personas who will access it; hence we won’t know in advance how to protect
the data. So, each step needs to consider how the data should be treated and
protected. The security model is one where the security profile of the data is
applied to the data itself and not any individual storage appliance that it
might pass through. This can be done with the use of metadata and signing to
ensure you know exactly how a particular data set, or even object, can and
should be treated. The upside to this is that you can also build very good data
dictionaries using this metadata, and make discoverability and audit of use
much simpler. And with that sort of metadata, the ability to couple data to
locations through standards such as the SNIA
Cloud Data Management Interface (CDMI
) brings real opportunity.

Q.
Great overview on the inner workings of AI. Would a company’s Blockchain have a
role in the provisioning of AI?

A.
Blockchain can play a role in AI. There are vendors with patents around Blockchain’s
use in distributing training features so that others can leverage trained
weights and parameters for refining their own models without the need to have
access to the original data. Now, is blockchain a requirement for this to
happen? No, not at all. However, it can provide a method to assess the providence
of those parameters and ensure you’re not being duped into using polluted
weights.

Q.
It looks like everybody is talking about AI, but thinking about pattern
recognition / machine learning. The biggest differentiator for human
intelligence is – making a decision and acting on its own, without external
influence. Little children are good example. Can AI make decisions on its own
right now?

A.
Yes and no. Machine Learning (ML) today results in a prediction and a
probability of its accuracy. So that’s only one stage of the cognitive pipeline
that leads from observation, to assessment, to decision and ultimately action.
Basically, ML on its own provides the assessment and decision capability. We
then write additional components to translate that decision into actions. That
doesn’t need to be a “Switch / Case” or “If this then that”
situation. We can plug the outcomes directly into the decision engine so that the
ML algorithm is selecting the outcome desired directly. Our extra code just
tells it how to go about that. But today’s AI has a very narrow focus. It’s not
general intelligence that can assess entirely new features without training and
then infer from previous experience how it should interpret them. It is not yet
capable of deriving context from past experiences and applying them to new and
different experiences.

Q.
Shouldn’t there be a path for the live data (or some cleaned-up version or
output of the inference) to be fed back into the training data to evolve and
improve the training model?

A. Yes
there should be. Ideally you will capture in a couple of places. One would be
your live pipeline. If you are using something like Kafka to do the pipelining you
can split the data to two different locations and persist one in a data lake or
archive and process the other through your live inference pipeline. You might
also then want your inference results pushed out to the archive as well as this
could be a good source of “training data”; it’s essentially labelled
and ready to use. Of course, you would need to manually review this, as if
there is inaccuracy in the model, a few false positives can reinforce that
inaccuracy.

Q.
Can the next topic focus be on pipes and new options?

A. Great Idea. In fact, given the popularity of this presentation, we are looking at a couple more webcasts on AI. There’s a lot to cover! Follow us on Twitter @sniacloud_com for dates of future webcast.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Cloud Storage