?Improving Data Quality:Why is it so difficult
در نمایش آنلاین پاورپوینت، ممکن است بعضی علائم، اعداد و حتی فونتها به خوبی نمایش داده نشود. این مشکل در فایل اصلی پاورپوینت وجود ندارد.
- جزئیات
- امتیاز و نظرات
- متن پاورپوینت
امتیاز
?Improving Data Quality:Why is it so difficult
اسلاید 1: Improving Data Quality: Why is it so difficult?presented byLarissa T. MossPresident, Method Focus, Inc.DAMAOakland, CAMay 7, 2003 Copyright 2003, Larissa T. Moss, Method Focus, Inc.
اسلاید 2: Ms. Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She frequently speaks at Data Warehouse, Business Intelligence, CRM, and Information Quality conferences around the world on the topics of information asset management, data quality, data modeling, project management, and organizational realignment. She lectures worldwide on the BI topics of spiral development methodology, data modeling, data audit and control, project management, as well as organizational issues. Her articles are frequently published in DM Review, TDWI Journal of Data Warehousing, Cutter IT Journal, Analytic Edge, and The Navigator. She co-authored the books: Data Warehouse Project Management, Addison Wesley 2000, Impossible Data Warehouse Situations, Addison Wesley 2002, and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision Support Applications, Addison Wesley 2003. Ms. Moss is a member of the IBM Gold Group, a Friend of Teradata, a senior consultant at the Cutter Consortium, and a contributing member of Ask The Experts on www.dmreview.com. She has been a lecturer at DCI, TDWI, MISTI, and at the Extension of the California Polytechnic University, Pomona . She can be reached at lmoss@ methodfocus.com.Method Focus Inc.  www.methodfocus.com  methodfocus@earthlink.net  (626) 355-8167Larissa T. Moss
اسلاید 3: Presentation OutlineWhat do we mean by data quality?Dirty data categoriesHow are we addressing it today?Ineffective technology solutionsWhat do we have to change?Approaches and techniquesHow do we change? 12 steps to [DQ] recovery
اسلاید 4: What do we mean by data quality? Data is correct Data is accurate Data is consistent Data is complete Data is integrated  Data values follow the business rules Data corresponds to established domains Data is well defined and understood#1
اسلاید 5: Symptoms of poor-quality data Do your programs abend with data exceptions? Are your users confused about meaning of data? Is some of your data is too stale for reporting? Is your data being shared? Is it sharable?  Are reports inconsistent? Does it take your IT staff or the end users hours to  reconcile inconsistent reports? Does merging data often cause the system to fail? Do beepers go off at night??
اسلاید 6: Dirty data categoriesDummy (default) values“Intelligent” dummy valuesMissing valuesMulti-purpose fieldsCryptic valuesFree-form address linesContradicting valuesViolation of business rulesReused primary keyNon-unique primary keyMissing data relationshipsInappropriate data relationshipsnot just data entryerrors
اسلاید 7: Dummy (default) values Defaults for mandatory fields  SSN 999-99-9999  Age 999  Zip  99999 Income 9,999,999.99 Business Impact:Inability to determine customer profiles Inability to determine customer demographics
اسلاید 8: “Intelligent” dummy values Defaults with meaning SSN 888-88-8888Income 999,999.99Age 000Source Code‘FF’Non-resident alienEmployeeCorporate customer Account closed prior to 1991 Business Impact:Inability to write straight forward queries withoutknowing how to filter data
اسلاید 9: Missing Values Operational systems do not always require  informational or demographic dataGender EthnicityAgeIncomeReferring Source Business Impact:Inability to analyze marketing channels?
اسلاید 10: Multi-purpose fields Business Impact:Inability to judge product profitability  ONE field explicitly has MANY meanings Which business unit enters the data At what time in history it was entered A value in one or more other fields Appraisal Amount redefined as Advertised Amount redefined as Sold Date Loan Type Code redefined as ...25 redefines = 25 attributes !Not mutually exclusive ! Only the value of oneis known for each record !
اسلاید 11: Cryptic values (1) Often found in “Kitchen Sink” fields Usually one byte (if not one bit) Highly cryptic (A, B, C, 1, 2, 3, ...) Non-intelligent, non-intuitive codes Often not mutually exclusive Business Impact:Inability to empower end users to write their own queries
اسلاید 12: Cryptic values (2)Need a CODE TRANSLATION booklet  ONE field implicitly has MANY meaningsMaster_Cd{A, B, C, D, E, F, G, H, I}{A, B, C}{D, E, F} {G, H, I}Type of customerType of supplierRegional constraints
اسلاید 13: Free-form address lines Unstructured text no discernable pattern cannot be parsedaddress-line-1:ROSENTHAL, LEVITZ, Aaddress-line-2:TTORNEYSaddress-line-3:10 MARKET, SAN FRANCaddress-line-4:ISCO, CA 95111Business Impact:Inability to perform market analysis
اسلاید 14: Contradicting values Values in one field are inconsistent with values in another related field 1488 Flatbush Avenue New York, NY 75261 Type of real property:Single Family Residence Number of rental units:fourTexas ZipIncome propertyBusiness Impact:Inability to make reliable business decisions
اسلاید 15: Violation of business rules Business Rule: Adjustable Rate Mortgages must haveMaximum Interest Rate ( Ceiling)Minimum Interest Rate ( Floor) Business Rule: A Ceiling is always higher than a Floorceiling-interest-rate: 8.25floor-interest-rate: 14.75switched ?Business Impact:Inability to calculate product profitability
اسلاید 16: Reused primary keys Little history, if any, stored in operational files  primary keys are customarily re-used  may have a different rollup structureJanuary ‘94: branch 501 = San Francisco Mainregion 1area SWAugust ‘97: branch 501 = San Luis Obisporegion 2area SWBusiness Impact:Inability to evaluate organizational performance
اسلاید 17: Non-unique primary keys Business Impact:Inability to determine customer relationshipsInability to analyze employee benefits trends Duplicate identification numbers Multiple customer numbers  Customer Name Phone Number Cust. Number Philip K. Sherman 818.357.5166 960601  Philip K. Sherman  818.357.7711 960105 Philip K. Sherman818.357.8911 960003  Multiple employee numbers Employee Name Department Empl. Number July 1995: Bob Smith 213 (HR) 21304762 January 1996: Bob Smith 432 (SRV)  43218221 August 1999: Bob Smith 206 (MKT)  20684762
اسلاید 18: Missing data relationships Data that should be related to other data in a dependent (parent-child) relationship Branch number 0765 does not exist in the BRANCH tableBranchEmployeeBusiness Impact:Inability to produce accurate rollupsBenefit
اسلاید 19: Inappropriate data relationships Data that is inadvertently related, but should not be two entity types with the same key valuesPurchaser:Jackie Schmidt837221Seller:Robert Black837221Business Impact:Inability to determine customer or vendorrelationships
اسلاید 20: Impact of erroneous data Extra time it takes to correct data problems Extra resources needed to correct data problems Time and effort required to re-run jobs that abend Time wasted arguing over inconsistent reports  Lost business opportunities due to unavailable data Unable to demonstrate business potential in a  buyout Fines may be paid for noncompliance with government regulations Shipping products to the wrong customers Bad public relations with customers  leads to alienated and lost customer
اسلاید 21: Cost of erroneous dataMarketingCampaignPerInstanceNumberof InstancesTotal NumberPer YearTotalCostPer YearTime: ($60/hour loaded rate) Creating redundant occurrence 2.4 min 167,141 1 $ 401,138 Researching correct address 10 min 5,000/mo 12 $ 600,000 Correcting address errors 0.3 min 6,000/mo 12 $ 21,600 Handling complaints from customers 5.5 min 974/yr 1 $ 5,357 Mail preparation 0.1 min 393,273 4 $ 157,309Materials, Facilities, Equipment: Marketing brochure  $1.96 393,273 4 $3,083,260 Postage $0.52 393,273 4 $ 818,008 Warehouse storage  $0.01 393,273 4 $ 15,731 Shipping equipment and maintenance $5,000/yr 36% 1 $ 1,800Computing resources: CPU transactions $0.02/trans 393,273 4 $ 31,462 Data storage  $0.001/mo 393,273 12 $ 4,719  Data backup  $0.005/mo 393,273 12 $ 23,596Direct Costs of Non-Quality Information© Larry English,Improving DW and BI QualityTotal Annual Costs  $5,163,980
اسلاید 22: Impact of redundant data Hardware (CPU, disks) and software (program maintenance) costs incurred  as a result of uncontrolled redundant data Extra time it takes to reconcile inconsistencies Extra resources needed to reconcile inconsistencies Unwise business decisions made due to redundant  and inconsistent data Lost opportunities due to unreliable data Overcharging or overpayment for products Duplicate shipping of products Money wasted on sending redundant marketing  material
اسلاید 23: Cost of redundant dataInformation Development Cost AnalysisCategoryPortfolioTotalNumberRelativeWeightFactor*AverageUnitDev/MaintCostsTotalDev/MaintExpenses**TotalInfrastructureValue-addingCost-addingExpenses% ofBudgetExpensesInfrastructure Basis: Enterprise architected DBs 200 0.75 $ 15,000 $ 3,000,000  Enterprise reusable create/update programs + 300 1.50 $ 30,000 $ 9,000,000 Total Infrastructure expenses $12,000,000Value Basis: Total retrieve equivalent pgms + 300 1.00 $ 20,000 $ 6,000,000 Total value-adding expenses $ 6,000,000 Cost-adding Basis: Redundant create/update pgms 500 1.50 $ 30,000 $15,000,000 Interface/extract programs 400 1.00 $ 20,000 $ 8,000,000 Redundant database files 600 0.75 $ 15,000 $ 9,000,000 Total cost-adding expenses 1,500 $32,000,000  Lifetime Total ** 3,800  $50,000,000 * Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00”+ For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of create/update attributes (e.g., to retrieve customer data to create an order)**Based on 3.800 application programs and database files in portfolio and $50 Million in development© Larry English,Improving DW and BI Quality 24%  12% 64%100%
اسلاید 24: Dirty data – How did it happen?BusinessManagerBusinessManagerTechnologyManagerTechnologyManager............Business TechnologyChiefExecutiveOfficerChiefOperatingOfficerChiefInformationOfficerpaired withBusiness UnitsMarketingFinancial (AP & AR)Product PricingCustomer SupportDistributionInventorySalesClient Client Client Client Client Client ClientIT IT IT IT IT IT ITInformation Technology Units?swim lane data redundancy process redundancy dirty data
اسلاید 25: Major cause for data deficienciesTIMESCOPEBUDGETPEOPLEQUALITY1 2 3 4 5highest to lowest priorityProject ConstraintsWrong priority on project constraints!     PriorityIndustrial Age: Cheaper, faster, better Automate as quickly as possibleCost-based value proposition
اسلاید 26: Time is getting shorter – scope is getting biggerEveryone on the business side and in IT wants quality, but rarely is the extra time given or taken to achieve it. Quality and time are polarized constraints. The higher the quality the more effort (time) it takes to deliver. Companies are driven by shorter and shorter schedules.SCOPETIMEYAHDDD
اسلاید 27: How are we addressing it today?Data WarehousingCustomer Relationship ManagementEnterprise Resource PlanningEnterprise Application IntegrationKnowledge ManagementWhy can’t technologyfix this?Ineffective Technology Solutions
اسلاید 28: Data Warehousing The Promise:t data integrationt no redundancyt consistency t historical datat ad-hoc reportingt trend analysis reportingt faster data delivery t faster data access The Reality:t stove pipe martst departmental views t swim lane development approacht too time consuming to integrate t too costly to cleanse datat increased data redundancyIf it sounds too good to be true, it is to good to be true. DW delivers...a collection of integrated data used to support the strategic decision making process for the enterprise.
اسلاید 29: Customer Relationship Management  The Promise:t data integration t data qualityt customer intimacyt customer wallet sharet product pricing customization t knowing your competitiont geographic market potential  The Reality:t more stovepipe systemst departmental views t dirty customer datat purchased packages not integratedt focus is too narrowt privacy issuesIf it sounds too good to be true, it is to good to be true.CRM delivers …   the organizational lifeline, creating competitive advantage  through customer service excellence.seamless coordination between back-office systems, front-office systems and the Web.
اسلاید 30: Enterprise Resource Planning  ERP delivers...a collection of functional modules used to integrateoperational data to support seamless operational business processes for the enterprise. The Promise:t data integrationt no redundancyt consistency t data quality t easy reportingt easy maintenancet Y2K complianceIf it sounds too good to be true, it is to good to be true. The Reality:t system conversion not cross-organizational analysis t same dirty datat operational focus t poor quality (unusable) reportst one-size-fits-all data warehouse t too costly
اسلاید 31: Enterprise Application IntegrationEAI delivers ... integration of disparate applications into a unified set  of business processes through centrally managed rules  and middleware technologies.  The Promise: t fast & automated integrationt leverage existing datat bridge islands of automationt easy cross-system reportingt faster data deliveryt faster data accessIf it sounds too good to be true, it is to good to be true. The Reality:t dirty datat no true integration t still data redundancyt still islands of automationt easier access to the current data mess
اسلاید 32: Knowledge ManagementKM delivers ... a process for capturing, editing, verifying (for accuracy), disseminating, and utilizing tacit and explicit  information about the organization. The Promise:t utilize organizational infot data integrationt historical datat faster data deliveryt faster data accesst first & only customer contactt reduction of customer callst less re-solving same problems  Reality of KM:t too difficult to buildt too time consuming t too costlyt technology challengest non-sharing culturet isolated applicationst difficult to disseminate informationIf it sounds too good to be true, it is to good to be true.
اسلاید 33: What’s the lesson?You cannot keep doingwhat you have always doneand expect the results to be different.“That wouldn’t be logical”Spock, Star TrekNot even withnew technology.
اسلاید 34: What do we have to change?Assess the current state of data quality at your companyUnderstand and fix the root causes for data contaminationPerform data audits regularly (monthly, quarterly)Stop working in isolated “swim lanes” > Stop recreating dataCentrally manage your data like a business asset(Enterprise Information Management [EIM]) > Assemble data as needed from the data inventory (enterprise data model and meta data) > Standardize and reconcile data transformations for BI/DW applications (coordinated ETL staging area)Scale down project scopes to incorporate data quality and EIM activitiesEmbed data quality and EIM activities in all projects
اسلاید 35: Business intelligence … …is a cross-organizational discipline and an enterprise architecture for an integrated collection of operational as well as decision support applications and databases, which provide the business community easy access to their business data, and allows them to make accurate business decisions.  … is not business as usual
اسلاید 36: BI goals and objectivesData ManagementGet control over the existing data chaosData DeliveryProvide intuitive access to business informationData Reengineering (Enterprise Information Management) 80%20%
اسلاید 37: Proliferation of data quality problemsLegacyData WarehousesData MartsMarketingFinanceProduct SalesEngineeringUsersLLLLDMDMDWDMDMtransformation ? cleansing?Customer Support“LegaMarts”(Doug Hackney)BI ?
اسلاید 38: Industrial-age mental modelBusiness UnitsMarketingFinancial (AP & AR)Product PricingCustomer SupportDistributionInventorySalesClient Client Client Client Client Client ClientIT IT IT IT IT IT ITInformation Technology UnitsTIMESCOPEBUDGETPEOPLEQUALITY1 2 3 4 5highest to lowest priorityProject Constraints     PriorityScrap and rework
اسلاید 39: The game has changed Enormous degree of complexity Extremely high rate of changeCheaper, faster, better !!!But how?Don’t scrap and rework.Reuse what you already have.(John Zachman)…but our mental model has not
اسلاید 40: Information-age mental modelQUALITYBUDGETPEOPLETIMESCOPE1 2 3 4 5Project Constraints     PriorityReassemble reusable componentshighest to lowest priorityInformation Age: Reassemble the entire enterprise Reuse assets from inventoryInvestment-based value proposition
اسلاید 41: Software release concept (1)SecondReleaseFirstReleaseFourthReleaseReusable &ExpandingFinalReleaseApplicationFifthReleaseThirdReleaseProjects“Refactoring”- Kent BeckProject = Application /“Extreme scoping”- Larissa Moss
اسلاید 42: Software release concept (2) Requirements can be tested, and implemented in small increments Scope is very small and manageable Technology infrastructure can be tested and proven Data volumes (per release) are relatively small Project schedules are easier to estimate because the scope is very small Development activities can be iteratively refined, honed, and adaptedAND: The quality of the release deliverables (and ultimatelythe quality of the applications) will be higher!
اسلاید 43: Cross-organizational development approach (1)BI/DW Development Steps1. Business Case Assessment ...........................2.A Enterprise Technical Infrastructure ...........2.B Enterprise Non-Technical Infrastructure ...3. Project Planning ...........................................4. Project Requirements Definition ..................5. Data Analysis ...............................................6. Application Prototyping ...............................7. Meta Data Repository Analysis ...................8. Database Design ..........................................9. ETL Design .......................................….......10. Meta Data Repository Design ....................11. ETL Development .....................................12. Application Development .........................13. Data Mining ..............................................14. Meta Data Repository Development ........15. Implementation .........................................16. Release Evaluation ...................................Cross-organizationalCross-organizationalCross-organizationalProject-specificProject-specificCross-organizationalProject-specificCross-organizationalCross-organizationalCross-organizationalCross-organizationalCross-organizationalProject-specificCross-organizationalCross-organizationalProject-specificCross-organizationalData QualityTouch Points(© Larissa Moss and Shaku Atre, “Business Intelligence Roadmap”)
اسلاید 44: Cross-organizational development approach (2)Commitment to data quality embedded in the methodologyCross-organizational program managementEnterprise information management groupStandards that include a common information architecture (enterprise data model) Involving down-stream information consumers in the requirements definition step Involving data owners in the data analysis step Involving business representatives from all business units to ratify the data models and meta dataCoordinating the development/ETL processes  Disallowing stovepipe development Extracting and cleansing source data only once Reconciling data transformations and storing the reconciliation totals as meta data < enforcement< governance< resources< policy< principle
اسلاید 45: Enterprise information managementBusiness UnitsMarketingFinancial (AP & AR)Product PricingCustomer SupportDistributionInventorySales Client Client Client Client Client Client ClientIT IT IT IT IT IT ITODSDMDiscover, Coordinate, Integrate, Document, ControlOperational EnvironmentEDWOMBI/DW DatabasesInformation Technology UnitsEnterprise Information ManagementDecision Support Environmentcross-organizationalOperational Systems
اسلاید 46: EIM responsibilitiesBusiness architecture inventoryProcess modelsData modelsApplication inventory ProgramsDatabasesMeta data inventoryBusiness meta dataTechnical meta dataPolicy inventoryStandardsProceduresGuidelines… Discover, Coordinate, Integrate, Document, ControlArchitectsStewardsManagersIT asset inventorymanagement
اسلاید 47: Data stewardshipGuardians of the data while it is being created or maintained by themCreate standards and procedures to ensure that policies and business rules are known and followedEnforce adherence to policies and business rules that govern the data while the data is in their custodyPeriodically monitor (audit) the quality of the data in their custodyAlso known as custodiansCan be a business person or an IT person “One who manages another’s property.”
اسلاید 48: Data ownershipAuthority to establish policies and set business rules for the data under their controlDecide what the official enterprise definition and domain is for the data under their controlMonitor and advise other end users on proper usage of their dataFrequently, but not always, the data originatorCan be a person or a committee“One who has the legal right to the possession of a property.”
اسلاید 49: Enterprise architecture1. Data Management data integration data cleansing2. Data Delivery data access data manipulationBusiness ArchitectureInformation ArchitectureApplication ArchitectureTechnology ArchitectureMission and ObjectiveBusiness PrinciplesBusiness FunctionsProgram ManagementEnterprise Data Model- Data Standardization- Data Integration- Data Reconciliation- Data QualityOperational ApplicationsData Access ApplicationsData Analysis ApplicationsApplication DatabasesTechnology PlatformNetworkMiddlewareDBMS, ToolsContent Storage &Presentation
اسلاید 50: Enterprise data model (data inventory)Supported by common data definitions, domains, and business rules.SalespersonCommissionedSalespersonSalariedSalespersonOrgStructureOrg UnitProduct PartProductCategoryProductCustomerProduct OrderPotentialCustomerExistingCustomerCustomerAccountAccountPaymentPaymentMethodPartSupplierShipmentWarehouseTop-Down
اسلاید 51: Source data analysisDomain Violations:Dummy valuesIntelligent dummy valuesMissing valuesMulti-purpose fieldsCryptic valuesFree-form address linesIntegrity Violations:Contradicting valuesViolation of business rulesReused primary keysNon-unique primary keys Missing data relationshipsInappropriate data relationshipsFind the DataDirtyBottom-Up
اسلاید 52: To cleanse or not to cleanse ……that is the questionYou probably cannot cleanse it all (takes too long)It may not be worth the time and money to cleanse every data elementNot all data is equally significantNot all data can be cleansedHow do you know what to cleanse?
اسلاید 53: Triaging questions (1) Can the data be cleansed?Does the correct data exist anywhere?Is it easily accessible? Should the data be cleansed?How extensive is the problem?How elaborate will the cleansing process be?Is it cost-effective?Triage
اسلاید 54: Triaging questions (2) Why are we building the application?What business questions cannot be answered today? Why are we not able to answer the business  questions? Is it because of this dirty data?Is it because of these missing relationships? Will the benefits of cleansing outweigh the cost of the effort?Triage
اسلاید 55: Categories of data significance Critical data Not all data is equally critical to all end users All critical data must be cleansed Usually includes amount fields Important data Important to the organization, but not absolutely critical Further prioritize important data elements Cleanse as many as time allows Those that cannot be cleansed should be bumped to critical for the next release Insignificant data Informational data, which is nice to have Cleansing is optional if time allowsBusiness decision!
اسلاید 56: Cleansing – repairing – prevention  Where should the dirty data be cleansed? In the staging area of the BI application?In the source (legacy) files? When should it be cleansed?Retroactively?At data entry time? How should it be cleansed?Use data cleansing or ETL tools?Write procedural (COBOL/C++) code?  What will we do to prevent dirty data in the future?Source Data Reengineering … Total [Data] Quality Management (TQM)
اسلاید 57: Coordinated ETL staging ClientsLegacyOperat’lreportsData MartsStrategic rptsEnterprise Data  WarehouseStrategic rptsOperationalData Store/Oper MartsTactical rptsLLLODSEDWFinanceProduct PricingEngineeringDMDMMarketingCRM DMAnalyticalCRMOperationalOMCustomerSupportEXWLegalEnterprise Architecture & Meta Data RepositoryStaging AreaCleansingTransform’sStaging AreaCleansingTransform’sDailyStA MoStATransformation  Cleansing
اسلاید 58: ETL process flowExtract AccountsMerge CustomersAccount TranFileCustomer Info FileCustomerMaster CustomersExtract New SalesSalesFileFilter AccountsNewSalesAccountsNewAccountsProspectsExtract ProspectsProspectsMerge ProspectsAllCustomersSort AcctsSortedAccountsMatchAccountsSort CustomersSortedCustomers2AccountErrorsExtract CleanseTransformPrepareLoad AssociateAccounts1ProfileCustomers3– coordinated –
اسلاید 59: ETL ReconciliationDMDMDMDMreconciled !(monthly)Load FilesLLLLODS(daily)EDW(monthly)MonthlyStagingArea
اسلاید 60: ETL tie-outs: record countsINPUTRECORDSPROCESSMODULEOUTPUTRECORDSREJECTEDRECORDS# Input Records = # Output Records +# Rejected Records
اسلاید 61: ETL tie-outs: domain counts# Records Per Input Domain = # Records Per First Output Domain +# Records Per Second Output Domain +# Records Per Third Output Domain +# Rejected Data ValuesOUTPUTCODESOUTPUTCODESINPUTCODESPROCESS MODULEOUTPUTCODESREJECTEDCODES
اسلاید 62: ETL tie-outs: amount countsOUTPUTAMOUNTSOUTPUTAMOUNTSINPUTAMOUNTSPROCESS MODULEREJECTEDAMOUNTSTotal $ Input Amounts = Total $ Per First Input Amount +Total $ Per Second Input Amount +Total $ Per Rejected AmountsTotal $ Per First Output Amount +Total $ Per Second Output Amount +Total $ Rejected Amounts
اسلاید 63: Data quality improvements Source data repairs Increased program edits Enhanced data entry procedures Improved data quality training Regular data audits Data usage monitoring Enterprise-wide end user surveys Continuous validation of enterprise data model Continuous validation of meta data, especially definitions and domains Involvement of data owners, information consumers, and business sponsors
اسلاید 64: Data quality maturityDiscoveryby accidentProgram abends1Limiteddata analysisData profilingData cleansingduring ETL2Proactiveprevention4Enterprise-wideDQ methods &techniquesAt what level of DQ maturity is your organization?3Addressingroot causesRepairingsource dataand programsshortterm5OptimizationContinuousDQ process improvementslongtermScale of 1 .. 5
اسلاید 65: DQ capability maturity model (1)CMM Level 1. Uncertainty - Unconscious and unawareData quality problems are denied. No formal data quality processes defined.  Data quality initiatives are ad hoc and chaotic.Any success is dependent on individual efforts. (Source: Larry English)CMM Level 2. Awakening - The big Aha! and lip serviceData quality problems are acknowledged. Major problems are attacked as they come up. Minimum funding for a formal data quality initiative. Capability is a characteristic of the individual rather than the organization.
اسلاید 66: DQ capability maturity model (2)CMM Level 3. Enlightenment - Let’s do somethingData quality initiative takes off. Enterprise-wide data quality assessment is performed.  Data quality problems are corrected at the source (where possible).Data quality improvement process is institutionalized. CMM Level 4. Wisdom - Making a differenceManagement accepts personal responsibility for data quality. Data quality group reports to a chief officer (CIO, CKO, COO). Data quality correction changes to data defect prevention. All business areas are involved. (Source: Larry English)
اسلاید 67: DQ capability maturity model (3)CMM Level 5. Certainty - NirvanaData defect prevention is the main focus. Data quality is an integral part of the business processes. All business areas are continuously improving the processes. The culture of the organization has changed.(Source: Larry English)
اسلاید 68: Organizational impact Cross-organizational tasks and responsibilities are not well defined Data quality responsibility is not clear or ignored Value of data is not understood or appreciated Projects are often cost justified using the industrial-age mental model Resource requirements are not well defined Impact on application development empire No reward for data sharing Resistance to change
اسلاید 69: Organizational changesBusiness and IT collaboration (“partnership”)Business and business collaboration (“partnership”)IT and IT collaboration (“partnership”)Increased end user involvementCross-organizational activitiesArchitecture and standardizationSoftware release conceptNew charge-back systemNew incentivesNew leadership
اسلاید 70: New leadership quality dataCEOCTOCKOCOOEIM ...EALOB ExecsIT ExecsChiefKnowledgeOfficerCFOEnterpriseInformationManagementcollaborationcollaborationDA DQA MDA
اسلاید 71: How do we change? 12 steps to [DQ] recovery (1)1. Become awareEvery cultural transformation process begins with an “Aha”. Understand the root causes for your current data chaos.2. Accept responsibility“Yes, it is our fault” for being in this mess.Accepting responsibility is a prerequisite for change.
اسلاید 72: 12 steps to [DQ] recovery (2)3. Decide to changeNow that “you know better”, the decision is yours: Stay stuck or change. There can be no more false hopes for any silver bullet technology solutions.4. Identify root causesWhat are the specific root causes for non-quality data in your organization?Some root causes are common, some are not.
اسلاید 73: 12 steps to [DQ] recovery (3)5. CollaborateIt doesn’t matter “whose fault” it is that the root causes exist. IT must collaborate with the business community to affect changes.Business community must also collaborate with business community.6. Identify change agentsWho will be the couriers?Changes must be systemic and holistic, not isolated and sporadic.
اسلاید 74: 12 steps to [DQ] recovery (4)8. Plan changesBig changes do not get implemented in one “Big Bang”.Involve people in change planning.Cross-organizational changes are phased in.7. Spread the wordTo embrace changes, there must be “something in it” for everybody.Otherwise, changes trigger anxiety and anxiety results in resistance or rejection.
اسلاید 75: 12 steps to [DQ] recovery (5)9. Prioritize changesSome changes are easier to implement than others. Some changes have a higher payback.10. Implement changesEveryone affected by the changes must have an opportunity to review and approve the plan before implementation.
اسلاید 76: 12 steps to [DQ] recovery (6)11. Measure effectivenessSolicit feedback from “the trenches”. Are the changes affecting anyone adversely?12. Refine changesNothing is perfect the first time around.What might work in one organization may not work in another.
اسلاید 77: BibliographyAdelman, Sid, and Larissa Terpeluk Moss. Data Warehouse Project Management. Boston, MA: Addison-Wesley, 2000.Aiken, Peter H. Data Reverse Engineering: Slaying the Legacy Dragon. New York: McGraw-Hill, 1995.Brackett, Michael H. Data Resource Quality: Turning Bad Habits into Good Practices. Boston, MA: Addison-Wesley, 2000.Brackett, Michael H. The Data Warehouse Challenge: Taming Data Chaos. New York: John Wiley & Sons, 1996.English, Larry P. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons, 1999.Hoberman, Steve. Data Modeler’s Workbench: Tools and Techniques for Analysis and Design. New York: John Wiley & Sons, 2001.Kuan-Tsae, Huang, Yang W. Lee, and Richard Y. Wang. Quality Information and Knowledge Management. Upper Saddle River, NJ: Prentice Hall, 1998.Marco, David. Building and Managing the Meta Data Repository: A Full Lifecycle Guide. New York: John Wiley & Sons, 2000.Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete Lifecycle for Decision-Support Applications. Boston, MA: Addison-Wesley, 2003.Reingruber, Michael C., and William W. Gregory. The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models. New York: John Wiley & Sons, 1994.Ross, Ronald G. The Business Rule Concepts. Houston, TX: Business Rule Solutions, Inc., 1998.Simsion, Graeme. Data Modeling Essentials: Analysis, Design, and Innovation. Boston, MA: International Thomson Computer Press, 1994.Von Halle, Barbara. Business Rules Applied: Building Better Systems Using the Business Rules Approach. New York: John Wiley & Sons, 2001.
خرید پاورپوینت توسط کلیه کارتهای شتاب امکانپذیر است و بلافاصله پس از خرید، لینک دانلود پاورپوینت در اختیار شما قرار خواهد گرفت.
در صورت عدم رضایت سفارش برگشت و وجه به حساب شما برگشت داده خواهد شد.
در صورت بروز هر گونه مشکل به شماره 09353405883 در ایتا پیام دهید یا با ای دی poshtibani_ppt_ir در تلگرام ارتباط بگیرید.
- پاورپوینتهای مشابه

 
                 
  
         
         
         
         
         
         
         
         
  
  
  
  
 
نقد و بررسی ها
هیچ نظری برای این پاورپوینت نوشته نشده است.