Скачать книгу

tribal thinking down to that one app on that one computer in that one person’s office.

       Data capture: The second place to look is the data entering your organization. It can arrive in many forms, but you can use data extraction, metadata extraction, and categorization to supplement data. For example, you can run paper documents, whether handwritten or printed, through an optical character recognition system to digitize them in preparation for processing. Then they can join the rest of the digital data, such as emails, PDF files, Word documents, images, voice mail messages, videos, and other formats to be classified and populate the data store that will feed your AI insights.

       Data as a service (DaaS): If there are still holes in your data requirements, you can turn to third-party data for purchase, either commercial datasets such as Accuweather or public datasets such as data.gov and Kaggle.com. Broadening your datasets can increase the insights lurking in your own data.

      Cleaning the data

      What’s worse than no data? Dirty data. Dirty data is poorly structured, poorly formatted, inaccurate, or incomplete.

Screenshot displaying 17 different date formats supported by Microsoft Excel.

      FIGURE 3-6: Microsoft Excel supports 17 date formats.

Type Example
Incomplete Empty or null values — the most prevalent type of bad data
Incorrect A date with a 47 in the month or day position
Inaccurate A data with a valid month value (1-12) but the wrong month
Inconsistent Different formats or terms for the same meaning
Duplicate One or more occurrences of the same record
Rule violation Starting date falls after ending date
Why is dirty data worse? Because it costs you more.

      For most companies, bad data costs from 15 to 25 percent of revenue as workers research a valid source, correct errors, and deal with the complications that result from relying on bad data.

      The solution is to focus on data, not models. Not surprisingly, in a recent CrowdFlower survey, data scientists said the top two time-consuming tasks were cleaning and organizing data (60 percent) and collecting datasets (19 percent). However, in the survey, they also identified as the least enjoyable part of their job cleaning and organizing data (57 percent) and collecting datasets (21 percent).

      During the one-year grace period, many providers just continued to use the ICD-9 codes rather than transition to the more accurate ICD-10 codes, and their automated claims submissions reflected the less specific data. Claim denials increased, which meant more work for the providers who had to retroactively collect supporting documentation to appeal the denial or face loss of revenue. If they had submitted the claims with the more accurate, although a bit more complex, ICD-10 codes, the extra work wouldn’t have been necessary.

      You can take this anecdote a step further. Imagine that a few years later, the facility that didn’t upgrade to ICD-10 codes decides to transition to an AI-enabled medical records system to not only streamline document intake, but also serve as a database for medical history and diagnosis. They lose all the potential benefit of diagnostic insights from the history for the “dark year.”

      An Alegion study found that two of the top three problems with training data relate to dirty data.

       “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.”

      — Abraham Maslow, “The Psychology of Science: A Reconnaissance” 1966

      To hear some tell it, AI is the panacea to solve all the problems of the world; to hear others tell it, AI will lead to the singularity and the destruction of human civilization. As usual, the truth lies somewhere in between.

      As with any tool, AI does some things well and other things not so well. If you’re upholstering a chair, a tack hammer is best, but if you’re putting up a circus tent, you need a bigger hammer.

      A → B

      As computer scientist Andrew Ng points out, “Despite AI’s breadth of impact, the types of AI being deployed are still extremely limited. Almost all of AI’s recent progress is through one type, in which some input data (A) is used to quickly generate some simple response (B).”

      Often, response B consists of nothing more than “Is this X or not X?” For example:

       Is this transaction nominal or anomalous?

       Is this document a patient case history?

       Does this image contain a human face?

      This capability is known as supervised learning, in which AI learns the relationship between A and B by processing massive amounts of data, guided by humans to establish the rules that govern the decisions. (See Chapter 1 for more information.)

      In other cases, B is a transformation of A, such as when AI is used for transcribing or translating a passage. This capability is known as natural-language processing.

      Good use cases

      A good use case for AI relies on the core enablers of AI as a tool — big data, digitalization, and well-defined classification and rules.

      A 2019 IDC guide on worldwide AI spending through 2023 indicated that the top three use cases are automated customer service agents, automated threat intelligence and prevention systems, and sales process recommendation and automation; the use cases with the biggest growth are human resource automation

Скачать книгу