Wednesday, July 31, 2019
Quiz 7
1. Access controls include the following a. Answers 1 and 2 only require employee logouts when the workstations are left unattended, prohibitions against visitors roaming the building in which computers are stored 2. Identity theft can be prevented by b. All of the above monitor credit reports regularly, sending personal information in encrypted form, immediately cancel missing credit cards, shred all personal documents after they are used 3. Which of the following can be used to detect whether confidential information has been disclosed c. A digital watermark 4.Which of the following is a fundamental control for protecting privacy d. Encryption 5. Which of the following are internationally recognized best practices for protecting the privacy of customers' personal information e. All of the above Disclosure to third parties only according to their privacy policy. , Use and retention of customer information as described by their privacy policy. , Organizations should explain the choices available and obtain their consent to the collection of customer data prior to its collection 6. The same key is used to encrypt and decrypt in which type of encryption systems f.Symmetric encryption systems Read also Quiz Week 4 7. Which of the following represents a process that takes plaintext and transforms into a short code g. Hashing 8. Which of the following uses encryption to create a secure pathway to transmit data h. Virtual Private Network (VPN 9. Which of the following represents an organization that issues documentation as to the validity and authenticity of digital identification such as digital certificates i. Certificate Authority 10. Which of the following is NOT a factor that can influence encryption strength j. Digital Certificate Length 1. What is the first step in protecting the confidentiality of intellectual property and other sensitive business information k. Identify where confidential data resides and who has access to it 12. Which of the following is a major privacy-related concern l. Answers 1 and 2 Spam, Identify theft 1. These are used to create digital signatures a. Asymmetric encryption and hashing 2. On March 3, 2008, a laptop computer belonging to Folding Squid Technology was stolen from the trunk of Jiao Jan's car while he was attending a conference in Cleveland, Ohio.After reporting the theft, Jiao considered the implications of the theft for the company's network security and concluded there was nothing to worry about because b. the data stored on the computer was encrypted 3. Using a combination of symmetric and asymmetric key encryption, Chris Kai sent a report to her home office in Syracuse, New York. She received an email acknowledgement that the document had been received and then, a few minutes later, she received a second email that indicated that the hash calculated from the report differed from that sent with the report.This most likely explanation for this result is that c. the symmetric encryption key had been compromised 4. Asymmetric key encryption combined with the information provided by a certificate authority allows unique identification of d. the user of encrypted data 5. These systems use the same key to encrypt and to decrypt e. S ymmetric encryption 6. In a private key system the sender and the receiver have ________, and in the public key system they have ________ f. the same key; two separate keys 7.In a private key system the sender and the receiver have ________, and in the public key system they have ________ g. the same key; two separate keys 8. Encryption has a remarkably long and varied history. Spies have been using it to convey secret messages ever since there were secret messages to convey. One powerful method of encryption uses random digits. Two documents are prepared with the same random sequence of numbers. The spy is sent out with one and the spy master retains the other. The digits are used as follows. Suppose that the word to be encrypted is SPY and the random digits are 352.Then S becomes V (three letters after S), P becomes U (five letters after P), and Y becomes A (two letters after Y, restarting at A after Z). The spy would encrypt a message and then destroy the document used to encrypt it. This is an early example of h. symmetric key encryption 9. Which of the following isà notà associated with asymmetric encryption? i. Speed 10. A process that takes plaintext of any length and transforms it into a short code j. Hashing 11. k. 12. l. 13. m. 14. n. 15. o. 16. p. 17. q. 18. r. 19. s. 20. t. 21. u. 22. v. 23. w. 24. x. 25. y. 26. z. 27. {. 28. |.
Tuesday, July 30, 2019
Law Enforcement Today Essay
Many police departments are facing budget problems, forcing them to cut their police force down. Many officers are being asked to do things they normally donââ¬â¢t do such as patrolling. Police departments are also facing increasing crimes due to the poor economy. More people are engaging in criminal activities. Local and small agencies interact with the communities that they patrol on a daily basis. Sharing information between agencies are important not only for Home Land Security but for the publicââ¬â¢s safety as well. Law enforcement agencies are using The Home Land Security Information Network which allows them to securely collaborate with partners across the country. Law enforcement professionals also use HSIN to share information including Be on the Lookouts (BOLOs), Requests for Information (RFIs), For Your Information (FYIs), Intelligence Reports, and other Law Enforcement Sensitive documents. HSIN allows users to create and distribute messages to large, mission-specific contact lists. This rapid, secure information exchange provides law enforcement professionals with critical intelligence as they conduct work in the field (ââ¬Å"Homeland Security Information Network ââ¬â Law Enforcement Missionâ⬠, n. d. ). The purpose of this State and Local Fusion Center Concept of Operations (CONOPS) is to establish a framework for a comprehensive, coordinated and consistent approach for outreach by the Department of Homeland Security (DHS) to State and Local Fusion Centers (SLFCs). This CONOPS outlines DHS processes relating to SLFC support including intelligence and operational information flows and interactions, deployment of officers, component integration, and identification of SLFC requirements, technical assistance and training. DHS will also ensure outreach, communication, and integration with other multidisciplinary partners (i. e. , fire service, public health, and emergency management), to further ensure and facilitate information sharing between SLFCs and these disciplines. This CONOPS will be periodically reviewed and modified as additional processes are implemented and refinements identified The CONOPS provides transparency into DHS support to SLFCs. The CONOPS also: ââ¬â Furthers the goals of the Director of National Intelligence (DNI) and the Program Manager Information Sharing Environment (PM-ISE) to develop and support a national information sharing environment and network of fusion centers. Underscores the role of the Under Secretary for Intelligence and Analysis as the Executive Agent for DHS SLFC Program and DHSââ¬â¢s representative to various Federal Senior-level advisory groups providing guidance and support to fusion centers. ââ¬â Defines the roles and responsibilities of the State and Local Program Management Office (SLPO) to execute the DHS SLFC Implementation Plan and to lead DHS outreach to SLFCs which includes, but is not limited to, the assignment of DHS intelligence analysts and officers and the provision of tools to the fusion centers nationwide. The SLPO serves in the central coordination role for DHS interaction with SLFCs. ââ¬â Institutionalizes the Single Point of Service (SPS), a coordinated Office of Intelligence and Analysis/Office of Operations Coordination and Planning business process, developed to ensure all SLFC inquires are responded to expeditiously by the appropriate elements within DHS and there is accountability for this transactional activity. An assumption circulating within information sharing discourse is that the effectiveness of information sharing can be measured in terms of information flow, distribution, timeliness, coordination, and related system performance measures. 44 The Information Sharing Environmentââ¬â¢s (ISE) stated mission is to ensure the ability of agencies to share information ââ¬â but just who is responsible for ensuring that such abilities to share information tangibly improve preparedness remains unclear. This study indicates that using system performance measures and capabilities to assess the effectiveness of information sharing is inadequate and potentially wasteful and misleading. In developing metrics to assess the benefits of information sharing, officials must engage in the difficult task of relating system use to tangible improvements in preparedness. Information-sharing initiatives also unfold within varying budgetary constraints and divergent funding priorities. As a result, future research needs to address how financial and structural conditions influence information-sharing processes and practices. This study also suggests the need for comparative and longitudinal research of information sharing. However, future studies that attempt to construct concrete variables for hypothesis testing may similarly confront the contingency of the meanings of information sharing and preparedness. Although information sharing and preparedness are socially-defined concepts, their meanings can be mapped within different organizational contexts and across time using both qualitative and quantitative methods. Doing so can potentially assist policy makers and practitioners assess the utility of information-sharing strategies and the impact of associated organizational change efforts.
Hamlet and Gatsby Comparison
Love is an essential part of life. Every individual wants to be loved, and needs someone to love. It is an element that is fundamental to the well-being of all human kind; it is that magic that can heal wounds. However love also has the capacity to traumatize a person if it is extracted from their life. While we all wish to experience love, many of us tend to find the often inevitable detachment to be quite painful. In the novel The Great Gatsby, Jay Gatsby's longing for Daisy Buchanan leads him to his own downfall. Similarly in the novel Hamlet, Hamlet's extreme love for his father and his hatred towards his mother play a major role in his tragedy. In these works, there are a number of motivating factors that contribute to the downfall of the main characters- obsession, hatred, and the wanting to be accepted ââ¬â but ultimately it is love that leads to the demise of Gatsby and Hamlet. Hamlet loved his father, King Hamlet, and it was his death that broke young Hamlet's heart. It is the love he had for his father that brought him to his doom. After King Hamlet's death he appeared as a ghost moving through the castle at one o'clock every morning. When the guards and Horatio, Hamlet's best friend, noticed this ghostly figure, Horatio quite intelligently believed that he could get the ghost to speak with Hamlet. The next day the two guards, Horatio, and young Hamlet were present to speak to the ghost of King Hamlet. The ghost told Hamlet that he was murdered by Claudius, his brother, who had been sworn in as the new king and married his wife, Hamletââ¬â¢s mother, Gertrude. After hearing this, young Hamlet was asked to avenge his fatherââ¬â¢s death, but in doing so his mother was to remain unharmed. Hamlet, being the loving and devoted son he was, and unable to accept Claudius as the leader to replace his father, accepted King Hamletââ¬â¢s request. After this encounter, young Hamlet refused to tell the guardsman and Horatio what happened but made it known that he would act like a madman and they were not to say why. Claudius soon became suspicious of young Hamlet's moodiness and began to spy on him through Guildenstern and Rosencrantz. They believe it is because Ophelia, his former lover, left upon words of her father. Hamlet procrastinated in the killing of Claudius as he waited for his confession. Hoping to inspire a confession, Young Hamlet puts on a play that resembles what truly happened to King Hamlet to catch the conscious of Claudius. But more truthfully, Hamlet sought to prove to his mother that she was wrong in her actions after her husbandââ¬â¢s death. Hamlet later sees Claudius alone praying, and although he would have been an easy target, Hamlet refrains from killing him then because he believed that Claudius should be able to make peace with God and go to heaven upon his death. In reality, Claudius was not indeed praying, and thus Hamlet missed his only opportunity to avenge his fatherââ¬â¢s death. This event ultimately led to Hamlet's own death when Claudius realized Hamlet's motives and wanted to get rid of him. Hamletââ¬â¢s love for his father drove him to lose his love, Ophelia, his friends, and his life. Hamlet and Gatsby are similar in that they are willing to go through so much to the point where it leads to their death, so as to bring happiness to those whom they love. Being accepted by ââ¬Å"old moneyâ⬠was very important to Jay Gatsby. He thought that if he was accepted by this elite group he will be able to win over Daisy, the woman he had come to love. Yet his unwillingness to trust himself and to be proud of who he was lead to his downfall. Despite his efforts to fit in, the elites knew that he was nothing more than a bootlegger. They would make mock him behind his back, talk about how he did not actually attend Oxford, and laugh at how he really became rich. Gatsby would try to impress them with his luxurious weekly parties, which he hoped would help him fit in while attracting Daisy. Gatsby shows off his wealth to demonstrate his influence and luxurious lifestyle, while demonstrating that he has plenty of money to spend on Daisy. All the while, he does not see what other truly think of him. For example, Tom once stopped by Gatsbyââ¬â¢s house with his friends for some drinks at which Gatsby became nervous and agitated. He tells Tom awkwardly that he knows Daisy, and invites Tom and the Sloanââ¬â¢s to dinner. Rejecting his invitation, they ask insincerely if he would like to join them to which Gatsby unknowingly accepts, not realizing that they have no interest in him at all. Gatsby is so eager to be with them, fixated on his goal to be a part of the ââ¬Å"old moneyâ⬠group of East Egg in order to show Daisy that he is worthy of her and able to support her. He is so in love with Daisy that it is blinds his judgement. If Gatsby had focused on being himself instead of trying to be accepted he would have made fewer enemies, and perhaps won over Daisy. Gatsby was driven by his love for Daisy, and was single-minded about how to get her. He did not realize that loving Daisy is all that he became concerned with and that it consumed him. Gatsby truly believed that if the ââ¬Å"old moneyâ⬠of East Egg accepted him he would win her over but it was this unhealthy single focus, and his inability to trust that he could simply be himself, which caused his downfall. Meanwhile, Hamlet loved his father and when he found out he died it hurt him deeply. But moreover, it hurt young Hamletââ¬â¢s heart more when he found out his mom married Claudius. It his Hamletââ¬â¢s undying love for his father and his lost love for his mother that brought about his madness, and ultimately his death. In conclusion, Hamlet and Jay Gatsby are very similar to one another in that they both let their emotions control them. They have no sense of self control and die because of it. Tragically, they could have gotten what they wanted if they just were themselves and if they were able to not let their emotions get the best of them.
Monday, July 29, 2019
Article Summaries Essay Example | Topics and Well Written Essays - 250 words
Article Summaries - Essay Example Although different nations have varying prices, measuring their GDPs require the use of same prices (Charles and Klenow 7-9). The authors used Rawls prowess in economics calculate life expectancy, inequality, and other welfare components (Charles and Klenow 10-11). In constructing welfare over time, Charles and Klenow compared how Rawls valued living in the same country but in different years. Using figure 4 and table 3, they correlated welfare and income growth, as well as, displaying a summary of statistics of the same. Between 1980 and 2000, the US has registered an income average growth of 2.04% (Charles and Klenow 23-25). The researchers had to make a number of a number of assumptions from the Rawls utility functions. They checked the robustness of their calculations using alternative specifications of utility and welfare measures. The alternatives they used held up well to account for the differences between income and welfare (Charles and Klenow 29-34) They used various sources of data to perform their calculations. Consumption, as well as, income data for macro calculations was sourced from the Penn World Tables and life expectancy data from the World Bankââ¬â¢s HNPStats database. In addition, the inequality data was sourced from the UNU-WIDER World Income Database (Charles and Klenow 12-15). The micro data was of immense importance because it analyzed working hours and consumption rate for adults and older children in households. The data collected from the Household Survey enabled the researchers to calculate consumption inequality rather than creating assumptions from the income inequality (Charles and Klenow 38-41). The researchers, in particular, found out that the living standards of Western European were 71% for income and 90% for welfare compared to the U.S. This is because people in these countries live long, have equal consumption
Sunday, July 28, 2019
Assignment Example | Topics and Well Written Essays - 750 words - 86
Assignment Example Hint: There are two null hypotheses for each research question here. (2 pts. each, 8 pts. total). (d) List the type of variable for the dependent variable and independent variables (categorical or continuous). (.50 pt. each, 4 pts. total). Recall that researchers often compute total scores or average scores as a composite score when using measures. 1. What statistical test did the researchers conduct to test research question 2 and research question 3? (1 pt.) Was each test the appropriate one to use? Why or why not? (1 pt.) Hint: Consider how many groups they were comparing for school type by looking at the dfbetween in Table 3. 4. What is the Cohenââ¬â¢s d effect size for the difference in cyber bullying between males and females? (1 pt.) Interpret the effect size as small, medium, or large. (.50 pt.) Interpret the effect size in terms of standard deviation. (1 pt.) 1. Did the intervention (KWL group) and control groups statistically significantly differ on the pre-tests for the MAT, MI, or MAS? Support your answer with evidence from the article. Include what alpha level the researchers used as the criterion. (2 pts.) An Independent Samples t-test is used to draw inferences about two populations by comparing TWO INDEPENDENT samples on a continuous level dependent variable. In this case the researchers were drawing conclusions about the pre-test scores study group and control groups which are independent. 1. (a) State one research question from your area of interest that could be answered with an Independent Samples t-test (2 pts.). Identify the (b) independent variable (1 pt.), (c) dependent variable (1 pt.), and types of variables in your research question (2 pts.). 2. (a) State one research question from your area of interest that could be answered with a One-Way Analysis of Variance (2 pts.). Identify the (b) independent variable (1 pt.), (c) dependent variable (1 pt.), and types of variables in your research question (2 pts.). 3. (a) State one
Saturday, July 27, 2019
Customer Aquisation Versus Retention Essay Example | Topics and Well Written Essays - 1250 words
Customer Aquisation Versus Retention - Essay Example Other statistics reason that a businessââ¬â¢ 80% sales are just gotten from about 20%, which spells out the critical need for the business to maintain health relations with these loyal customers. It is easier for the retained relations to communicate over a businessââ¬â¢ brands than new customers. They therefore play a critical role of tracking the progress of the business in terms of quality standards of commodities as well as the services. Moreover, findings from a report by Reichheld indicate that retained customers eventually buy a lot from this business and as a result, the business records higher profits. The operating costs in serving retained consumers tend to be lower than the costs for serving new consumers. Retained customers are also better placed to refer and bring other customers on board than is the case in new customers. Businesses therefore analyze the profitability of retaining all the customers but the cost effectiveness analysis dictates that a business shou ld strive to maintain the only segment of customers that prove more profitable than the rest (2001, p. 1). Nevertheless, globalization trends are seen to adversely affect the efforts of retaining customers as the internet has opened a whole view world of advertising and promotion. Via just a click of a mouse, it is possible to lose a retained customer. Therefore, the efforts of retaining a customer get more expensive and complicated. On the other hand, the supporters of the customer accusation reason that there is no more growth.... g one customer through promotional services and sales as to being ranging from six to ten times higher than the costs that the business would incur in retaining a customer. Other statistics reason that a businessââ¬â¢ 80% sales are just gotten from about 20%, which spells out the critical need for the business to maintain health relations with these loyal customers. It is easier for the retained relations to communicate over a businessââ¬â¢ brands than new customers. They therefore play a critical role of tracking the progress of the business in terms of quality standards of commodities as well as the services. Moreover, findings from a report by Reichheld indicate that retained customers eventually buy a lot from this business and as a result, the business records higher profits. The operating costs in serving retained consumers tend to be lower than the costs for serving new consumers. Retained customers are also better placed to refer and bring other customers on board than is the case in new customers. Businesses therefore analyze the profitability of retaining all the customers but the cost effectiveness analysis dictates that a business should strive to maintain the only segment of customers that prove more profitable than the rest (2001, p. 1). Nevertheless, globalization trends are seen to adversely affect the efforts of retaining customers as the internet has opened a whole view world of advertising and promotion. Via just a click of a mouse, it is possible to lose a retained customer. Therefore, the efforts of retaining a customer get more expensive and complicated. On the other hand, the supporters of the customer accusation reason that there is no more growth that can result from an already retained pool of customers than has already been met. This therefore
Friday, July 26, 2019
A critical study of credit risk management in the first bank of Dissertation
A critical study of credit risk management in the first bank of Nigeria Plc - Dissertation Example Circumstances led to the situation in which the giant loss incurring banks due to subprime crisis have to solely depend on capital flow from Middle East, Chinese and investors from Singapore. Thus major nucleus of these losses has been related to credit risk. Thus the notion of the credit risk management is a grave concern in this world of complex financial milieu and it has become highly essential for the financial institutions to suppress loses arising from credit for sustained long run performance. The obnoxious cases of bank failures, acquisitions, consolidation have steered the focus of management of the financial institutions in restructuring operations, improving asset quality and building loan portfolios with credit risk management as the base structure (Yo & Yusoff, 2009, p.46). Influence of credit risk management on the banks Credit risk management has an overwhelming concern on the financial institutions especially that of a bank. The credit risks in simple language can be defined as the potential which the bank borrower or the counterparty will fail to meet its obligations with various agreed terms. The basic objectives of the credit risk management are directed towards the maximization of the risk adjustment of the bank with the maintenance of the credit risk exposure within the domain of various accepted parameters (which may vary from time to time). The banks basically require managing the credit risk intrinsic in the entire portfolio as well as the risks in the individual credits or the transactions. The banks should be also taking into account the relationships between the credit risk as well as that of the other risks. The effective management of the credit risk can be argued as a crucial component of a comprehensive approach towards risk management and are highly essential to the long-term success of any of the banking organization (Principles for the Management of Credit Risk, 2012, p.1). In the recent decades leading to financial crisis, th e banks have operating in an enhanced competitive market and as an involuntary mechanism being forced in taking more risks for seeking out higher margin actions. Securitization, commercial papers have created the platform where the banks can generate higher margin business by the process of converting the illiquid loans into marketable securities and thus lead to the release of capital for other investment opportunities. Empirical testing reveals that the process of securitization leads to the expansion of credit leading the banks to hold riskier assets (Casu et al, 2010, p.3). From the perspective of the Basel Accord II , securitization exposures the banks have to abide by some norms like that of proper documentation of the objectives, summary of the bankââ¬â¢s policies for securitization and whether there is limitations in the application of sophisticated credit risk management with the securitization method. The credit risk management can be successfully implemented if the ban ks adapt refined techniques for minimizing the risk of the expected losses (Securitization of Credit Exposures: Important Tool of Credit Risk Management under Basel Accord II, 2006, p.598). Technology enhancing the process of credit risk management One of the most important parts in the credit risk management is that of quantifying the risks and it is a very crucial part in the risk management process. From
Thursday, July 25, 2019
Sports Laws and Anti-Doping Essay Example | Topics and Well Written Essays - 3000 words
Sports Laws and Anti-Doping - Essay Example The paper gives detailed information about individuals who find difficult to adjust to the set rules hence apply dubious methods to achieve success. Some, however, have gone against the set policies in their attempts to curb certain impairments that may have been experienced during their career to lead them down career failures. A country may, however, possess a power to intervene into some decisions that these institutions make especially if they affect the welfare of decisions made in the sport or go against specific sport law provisions. Doping in sport is an entity that various countries have made attempts to eradicate and anti-doping organization have been formed where sports men/women are controlled to inhibit the heinous practice. Doping has been witnessed in various instances where an athlete uses various ability enhancing substances to achieve their success. The specific body that controls the sport normally passes judgment on the implication of doping. However, there may be provisions that may involve the government to interfere with certain decision in their attempts to protect their citizens. The independent bodies may be internationally formed, for example, FIFA, which oversee the football affairs across the globe. Inside FIFA, there are certain provisions that limit the actions of the countryââ¬â¢s government to interfere with their affairs. Though independent there have been attempts to limit the restrictions of the independent tribunals, especially evident rulings on doping cases. Sports Laws and Anti-Doping Sporting activities have grown famous among individuals over the past few decades and has seen many individuals growing into becoming perfectionist in their specific talented area. Many venture into sport for fun and articulate it as being a leisure activity. However, recent statistics has shown that the sport industry is increasingly becoming commercialized with many individuals not only entering the event for fame but for the financial packages that the modern sports entail (Aketch, 2008). The athletes have applied their talent to gain the favor to apply the sporting activities as a source of livelihood and the majorities have grown up without accessing any form of education to substitute their trade. The financial entity of the sport industry has driven the majority of sport personnel to cheat and apply doping techniques to advance in their career. The rule in any sporting activity is to be the best among competitor and therefore through applying various skill advancing methods, many have gained favor to advance before their rivals. However, there have been massive anti-doping measures that have rooted out the course of this action singling individuals with drug addiction problems and still participate in the sport. These actions have led to a lack in interest from most fans with the majority losing favor towards their once known excelling competitors. This eventually leads to a lack in interest and favor towards the sport. There has since emerged various sport organizations to promote the eradication of doping and unfair play. Examples are FIFA and IOC that oversee the actions in football and Olympics respectively. These organizations share no relation or any form of direct influence from national law and have set their own independent rules that members must follow to gain merit in participating
The Role of the Derivatives in Credit Default Essay
The Role of the Derivatives in Credit Default - Essay Example This is known as the ability of derivatives to soar 100 percent within a few days, when the security has risen to by a small percent of 10 percent. Derivatives are also used to control large blocks of stocks for a much lesser sum that would be required for the outright purchase (Carter, 2009, p. 67). This means that derivatives give people the ability to control and manage risk. As supervisors of banking, the central bank are concerned that commercial banksââ¬â¢ participation in derivatives markets could lead to a major bank default that could be worsen and lead to the disruption of financial markets. Default on any derivative or financial contract involves the failure by one party to the contract to make a payment under the required contract agreements. For derivatives, default occurs when two conditions are met in a simultaneous manner. In this case, a party to the contract is in debt under the contract terms, and the counterparty cannot obtain the money within the given period (Hanson, 2010, p. 58). No regulation of the derivatives can work well if there is no strong mandatory mechanism that would expose raw data to the regulators in policing the market for misuse. Credit derivatives are the causative factors that led to the overwhelmed financial markets that led to the recession. Due to deposit insurance and the reluctance of the government to let the banks, the credit risk is transferred to the government which is the turned onto the tax payers. The bank depositors who are the main stake holders have no incentive in monitoring the banksââ¬â¢ risk exposure. This move will allow the banks to load up on risk without attracting additional capital. This means that unregulated credit derivatives will offer unprecedented leverage. Since finance markets are a true reflection of a true economy, the misuses of the derivatives can have a great impact on it (Teslik, 2009, p. 60). The credit defaults have played a major role in the financial problems that people are faced with. The high volatility and turbulence that financial markets experienced is as a result of their misuse of derivative security. Banks that have been faced with lack of operating capital have been faced with the wrath of fluctuating values in their debt obligation, mortgage backed securities and credit default swaps. 2) What lessons should be taken by the UKââ¬â¢s financial sector and regulators in relation to 'Bear Stearnsââ¬â¢ and other high profile cases? An important lesson that has been learned is the difference between short term and long term liability has been neglected or has been given insufficient attention by regulators. With reference to the liability structure of the U.S banking system, there is a clear majority of short term debts. This was taken in forms of wholesale or deposit funding which included commercial paper or repurchase agreements. Whole sale funding runs were also witnessed through refusal or commercial paper or repo creditors to roll ov er their loans. This played a major role in the demise Bear Stearns, Northern Rock and Lehman Brother among other higher profile failure cases. The UKââ¬â¢s financial sector should be able to regulate debt maturity (Kirkpatrick, 2009, p. 78). Another lesson that was learned was that the fire scale risk associated with excessive short term funding does not only originate from depositories, but rather, a financial intermediary with a combination of financing structure and asset choice which may exacerbate a
Wednesday, July 24, 2019
Journals Paper Essay Example | Topics and Well Written Essays - 1500 words
Journals Paper - Essay Example This journal is all about the use of code of ethics and its importance in American Society for Public Administration (ASPA). According to Terry and Svara, the ASPA is facing several challenges and problems. ASPA has a mission that covers broader scope and it is much more varied in its membership. ASPA is a unique pan-generalist organization. This professional association seeks to connect its academic and practitioner members across the governmental levels, functional specializations and sectors. Woodrow Wilson proposed an active role for the public administration in order to shape the policy decisions. He offered important guidance for the ethical standard. The standard of ASPA as the professional association was affected due to the lack of code of ethics. It can be identified from the journal that, the gap was reduced and closed in the year 1984. ASPA followed several significant strategies to close the gap. The organization adopted code of ethics in order to close the gap. The code was received and approved in the year 1994. This code provides optimistic moral authority. It indicates the importance of the principles which it embodies. Section A of the code identifies the public interest. Section B of this code entitled respect to the law and constitution. Section C covers personal integrity1. Section D identifies the mission of the organizations. Lastly, Section E covers professional excellence. Donald C. Menzel, ââ¬Å"Public Administration as a Professionâ⬠This specific article explores the values of public service that helps to define the public administration as the professional field of practice and study. Public service values and ethics comprise the soul and body of public administration. These approaches have both positive and negative aspects. The negative aspect is that there is limited agreement on what the values are beyond general exhortations. It is identified from the journal that the American Public Administration of 1880s, when various important events occurred. Another important event occurred in the year 1887. Woodrow Wilsonââ¬â¢s essay ââ¬Å"The Study of administrationâ⬠sketched the picture of the skills and characters of kind people. According to the journal, the conscience of civil servant was a particular inner spirit of the Wilsonian idealization of governance and government. The civil servant honor was vested in the ability of Wilsonian. It helps to execute carefully the superior authoritiesââ¬â¢ orders. Certainly the twentieth century brought new significant ways of looking at the governance and government, both internationally and nationally. It also brought a significant transformation in the occupationsââ¬â¢ professionalization. It helped in the growth of professional societies. An empirical study of spirituality in the organizationââ¬â¢s workplace claimed that spirituality always exists. It is the responsibility of the management of the organization to recognize the spirituality. It should be incorporated into the workplace culture of the organization. The organization should believe in this spirituality in order to bring morality in organizational culture.2 James L. Perry, ââ¬Å"Federalist No. 72: What happened to the Public Service ideaâ⬠Federalist No. 72 is a particular oft-neglected defense of the reeligibility of president for election. This journal has concluded
Tuesday, July 23, 2019
Business admin assignment 1 Essay Example | Topics and Well Written Essays - 1250 words
Business admin assignment 1 - Essay Example It is also considered one of the most admired companies in the United States since many people recognize its performance and strategic planning (Dowling, 2008). This is mostly seen in the strategies that are used in enhancing its markets. This has been widely attributed by globalization. First, globalization has played a major role in expanding the performance of apple inc. This has been realized as there is increased competition from the corporation. With the induction of globalization, many corporations find it easy to penetrate into other markets (Berry, 2005). As such, they have the legality of selling their products in all markets across the globe. With induction of their products in such markets, many people have access to a variety of products which are available in the market. Since the population has a variety of products, they have the freewill of choosing the best quality products. This will have a positive impact on the quality of products that are sold in the market. Sim ilarly, this is an opportunity to reduce any chances of corporations monopolizing a market. Monopolies have direct control over the market and may likely supply substandard products. However, with the induction of Apple Inc in the market, the corporation has been able to give the market unparalleled products. This has been awaking call for other companies that would like to dominate the market. Secondly, globalization has played a major role in ensuring there are economies of large scale production. Businesses have been stating that the costs of productions are escalating with each passing year. However, large corporations have stated that economies of large scale production are the only solution to effective production. This has been proved by Apple Corporation as it savors large scale production. This reduces the cost of production, cost of labor and other miscellaneous expenses. As such, the corporations are able to produce high quality products to the market at an affordable pri ce. Similarly, the corporation is able to make substantial profits that are used in expansion (Berry, 2005). In many instances, corporations plough back profits into the business with the essence of increasing production. In addition to enhancing the performance of the corporations, globalization has helped these corporations to increase competitiveness. Apparently, this has helped many corporations to reconsider customer needs and their values. As a matter of fact, many customers in the market are now realizing the cost effectiveness of globalization through production of better products that are of high quality. As such, customers feel that their needs and preferences are approached in a satisfactory manner. Thirdly, globalization has helped many corporations including Apple Corporation to realize the positivity of location flexibility. When there is globalization, many corporations find it appropriate to expand their businesses to other untapped markets. These markets provide a s ubstantial demand market for their products as they feel the urge to purchase such products. In the end, the corporation benefits substantially as it increases its profits. In addition to this, the corporation realizes the reduction in costs of production. For instance, when they start another plant in an area, they use locally available labor, materials and other resources in the same
Monday, July 22, 2019
Canterville Ghost Summary Essay Example for Free
Canterville Ghost Summary Essay The next morning, when the Otis family met at breakfast, they discussed the ghost at some length. The United States Minister was naturally a little annoyed to find that his present had not been accepted. I have no wish, he said, to do the ghost any personal injury, and I must say that, considering the length of time he has been in the house, I dont think it is at all polite to throw pillows at him,a very just remark, at which, I am sorry to say, the twins burst into shouts of laughter. Upon the other hand, he continued, if he really declines to use the Rising Sun Lubricator, we shall have to take his chains from him. It would be quite impossible to sleep, with such a noise going on outside the bedrooms. For the rest of the week, however, they were undisturbed, the only thing that excited any attention being the continual renewal of the blood-stain on the library floor. This certainly was very strange, as the door was always locked at night by Mr. Otis, and the windows kept closely barred. The chameleon-like colour, also, of the stain excited a good deal of comment. Some mornings it was a dull (almost Indian) red, then it would be vermilion, then a rich purple, and once when they came down for family prayers, according to the simple rites of the Free American Reformed Episcopalian Church, they found it a bright emerald-green. These kaleidoscopic changes naturally amused the party very much, and bets on the subject were freely made every evening. The only person who did not enter into the joke was little Virginia, who, for some unexplained reason, was always a good deal distressed at the sight of the blood-stain, and very nearly cried the morning it was emerald-green. The second appearance of the ghost was on Sunday night. Shortly after they had gone to bed they were suddenly alarmed by a fearful crash in the hall. Rushing down-stairs, they found that a large suit of old armour had become detached from its stand, and had fallen on the stone floor, while seated in a high-backed chair was the Canterville ghost, rubbing his knees with an expression of acute agony on his face. The twins, having brought their pea-shooters with them, at once discharged two pellets on him, with that accuracy of aim which can only be attained by long and careful practice on a writing-master, while the United States Minister covered him with his revolver, and called upon him, in accordance with Californian etiquette, to hold up his hands! The ghost started up with a wild shriek of rage, and swept through them like a mist, extinguishing Washington Otiss candle as he passed, and so leaving them all in total darkness. On reaching the top of the staircase he recovered himself, and determined to give his celebrated peal of demoniac laughter. This he had on more than one occasion found extremely useful. It was said to have turned Lord Rakers wig grey in a single night, and had certainly made three of Lady Cantervilles French governesses give warning before their month was up. He accordingly laughed his most horrible laugh, till the old vaulted roof rang and rang again, but hardly had the fearful echo died away when a door opened, and Mrs. Otis came out in a light blue dressing-gown. I am afraid you are far from well, she said, and have brought you a bottle of Doctor Dobells tincture. If it is indigestion, you will find it a most excellent remedy. The ghost glared at her in fury, and began at once to make preparations for turning himself into a large black dog, an accomplishment for which he was justly renowned, and to which the family doctor always attributed the permanent idiocy of Lord Cantervilles uncle, the Hon. Thomas Horton. The sound of approaching footsteps, however, made him hesitate in his fell purpose, so he contented himself with becoming faintly phosphorescent, and vanished with a deep churchyard groan, just as the twins had come up to him. On reaching his room he entirely broke down, and became a prey to the most violent agitation. The vulgarity of the twins, and the gross materialism of Mrs. Otis, were naturally extremely annoying, but what really distressed him most was that he had been unable to wear the suit of mail. He had hoped that even modern Americans would be thrilled by the sight of a Spectre in armour, if for no more sensible reason, at least out of respect for their natural poet Longfellow, over whose graceful and attractive poetry he himself had whiled away many a weary hour when the Cantervilles were up in town. Besides it was his own suit. He had worn it with great success at the Kenilworth tournament, and had been highly complimented on it by no less a person than the Virgin Queen herself. Yet when he had put it on, he had been completely overpowered by the weight of the huge breastplate and steel casque, and had fallen heavily on the stone pavement, barking both his knees severely, and bruising the kn uckles of his right hand. For some days after this he was extremely ill, and hardly stirred out of his room at all, except to keep the blood-stain in proper repair. However, by taking great care of himself, he recovered, and resolved to make a third attempt to frighten the United States Minister and his family. He selected Friday, August 17th, for his appearance, and spent most of that day in looking over his wardrobe, ultimately deciding in favour of a large slouched hat with a red feather, a winding-sheet frilled at the wrists and neck, and a rusty dagger. Towards evening a violent storm of rain came on, and the wind was so high that all the windows and doors in the old house shook and rattled. In fact, it was just such weather as he loved. His plan of action was this. He was to make his way quietly to Washington Otiss room, gibber at him from the foot of the bed, and stab himself three times in the throat to the sound of low music. He bore Washington a special grudge, being quite aware that it was he who was in the habit of removing the famous Canterville blood-stain by means of Pinkertons Paragon Detergent. Having reduced the reckless and foolhardy youth to a condition of abject terror, he was then to proceed to the room occupied by the United States Minister and his wife, and there to place a clammy hand on Mrs. Otiss forehead, while he hissed into her trembling husbands ear the awful secrets of the charnel-house. With regard to little Virginia, he had not quite made up his mind. She had never insulted him in any way, and was pretty and gentle. A few hollow groans from the wardrobe, he thought, would be more than sufficient, or, if that failed to wake her, he might grabble at the counterpane with palsy-twitching fingers. As for the twins, he was quite determined to teach them a lesson. The first thing to be done was, of course, to sit upon their chests, so as to produce the stifling sensation of nightmare. Then, as their beds were quite close to each other, to stand between them in the form of a green, icy-cold corpse, till they became paralyzed with fear, and finally, to throw off the winding-sheet, and crawl round the room, with white, bleached bones and one rolling eyeball, in the character of Dumb Daniel, or the Suicides Skeleton, a _rà ´le_ in which he had on more than one occasion produced a great effect, and which he considered quite equal to his famous part of Martin the Maniac, or the Masked Mystery. At half-past ten he heard the family going to bed. For some time he was disturbed by wild shrieks of laughter from the twins, who, with the light-hearted gaiety of schoolboys, were evidently amusing themselves before they retired to rest, but at a quarter-past eleven all was still, and, as midnight sounded, he sallied forth. The owl beat against the window-panes, the raven croaked from the old yew-tree, and the wind wandered moaning round the house like a lost soul; but the Otis family slept unconscious of their doom, and high above the rain and storm he could hear the steady snoring of the Minister for the United States. He stepped stealthily out of the wainscoting, with an evil smile on his cruel, wrinkled mouth, and the moon hid her face in a cloud as he stole past the great oriel window, where his own arms and those of his murdered wife were blazoned in azure and gold. On and on he glided, like an evil shadow, the very darkness seeming to loathe him as he passed. Once he thought he heard something call, and stopped; but it was only the baying of a dog from the Red Farm, and he went on, muttering strange sixteenth-century curses, and ever and anon brandishing the rusty dagger in the midnight air. Finally he reached the corner of the passage that led to luckless Washingtons room. For a moment he paused there, the wind blowing his long grey locks about his head, and twisting into grotesque and fantastic folds the nameless horror of the dead mans shroud. Then the clock struck the quarter, and he felt the time was come. He chuckled to himself, and turned the corner; but no sooner had he done so than, with a piteous wail of terror, he fell back, and hid his blanched face in his long, bony hands. Right in front of him was standing a horrible spectre, motionless as a carven image, and monstrous as a madmans dream! Its head was bald and burnished; its face round, and fat, and white; and hideous laughter seemed to have writhed its features into an eter nal grin. From the eyes streamed rays of scarlet light, the mouth was a wide well of fire, and a hideous garment, like to his own, swathed with its silent snows the Titan form. On its breast was a placard with strange writing in antique characters, some scroll of shame it seemed, some record of wild sins, some awful calendar of crime, and, with its right hand, it bore aloft a falchion of gleaming steel. Never having seen a ghost before, he naturally was terribly frightened, and, after a second hasty glance at the awful phantom, he fled back to his room, tripping up in his long winding-sheet as he sped down the corridor, and finally dropping the rusty dagger into the Ministers jack-boots, where it was found in the morning by the butler. Once in the privacy of his own apartment, he flung himself down on a small pallet-bed, and hid his face under the clothes. After a time, however, the brave old Canterville spirit asserted itself, and he determined to go and speak to the other ghost as soon as it was daylight. Accordingly, just as the dawn was touching the hills with silver, he returned towards the spot where he had first laid eyes on the grisly phantom, feeling that, after all, two ghosts were better than one, and that, by the aid of his new friend, he might safely grapple with the twins. On reaching the spot, however, a terrible sight met his gaze. Something had evidently happened to the spectre, for the light had entirely faded from its hollow eyes, the gleaming falchion had fallen from its hand, and it was leaning up against the wall in a strained and uncomfortable attitude. He rushed forward and seized it in his arms, when, to his horror, the head slipped off and rolled on the floor, the body assumed a recumbent posture, and he found himself clasping a white dimity bed-curtain, with a sweeping-brush, a kitchen cleaver, and a hollow turnip lying at his feet! Unable to understand this curious The whole thing flashed across him. He had been tricked, foiled, and out-witted! The old Canterville look came into his eyes; he ground his toothless gums together; and, raising his withered hands high above his head, swore according to the picturesque phraseology of the antique school, that, when Chanticleer had sounded twice his merry horn, deeds of blood would be wrought, and murder walk abroad with silent feet. Hardly had he finished this awful oath when, from the red-tiled roof of a distant homestead, a cock crew. He laughed a long, low, bitter laugh, and waited. Hour after hour he waited, but the cock, for some strange reason, did not crow again. Finally, at half-past seven, the arrival of the housemaids made him give up his fearful vigil, and he stalked back to his room, thinking of his vain oath and baffled purpose. There he consulted several books of ancient chivalry, of which he was exceedingly fond, and found that, on every occasion on which this oath had been used, Chanticleer had always crowed a second time. Perdition seize the naughty fowl, he muttered, I have seen the day when, with my stout spear, I would have run him through the gorge, and made him crow for me an twere in death! He then retired to a comfortable lead coffin, and stayed there till evening.
Sunday, July 21, 2019
Identifying Clusters in High Dimensional Data
Identifying Clusters in High Dimensional Data ââ¬Å"Ask those who remember, are mindful if you do not know).â⬠(Holy Quran, 6:43) Removal Of Redundant Dimensions To Find Clusters In N-Dimensional Data Using Subspace Clustering Abstract The data mining has emerged as a powerful tool to extract knowledge from huge databases. Researchers have introduced several machine learning algorithms to explore the databases to discover information, hidden patterns, and rules from the data which were not known at the data recording time. Due to the remarkable developments in the storage capacities, processing and powerful algorithmic tools, practitioners are developing new and improved algorithms and techniques in several areas of data mining to discover the rules and relationship among the attributes in simple and complex higher dimensional databases. Furthermore data mining has its implementation in large variety of areas ranging from banking to marketing, engineering to bioinformatics and from investment to risk analysis and fraud detection. Practitioners are analyzing and implementing the techniques of artificial neural networks for classification and regression problems because of accuracy, efficiency. The aim of his short r esearch project is to develop a way of identifying the clusters in high dimensional data as well as redundant dimensions which can create a noise in identifying the clusters in high dimensional data. Techniques used in this project utilizes the strength of the projections of the data points along the dimensions to identify the intensity of projection along each dimension in order to find cluster and redundant dimension in high dimensional data. 1 Introduction In numerous scientific settings, engineering processes, and business applications ranging from experimental sensor data and process control data to telecommunication traffic observation and financial transaction monitoring, huge amounts of high-dimensional measurement data are produced and stored. Whereas sensor equipments as well as big storage devices are getting cheaper day by day, data analysis tools and techniques wrap behind. Clustering methods are common solutions to unsupervised learning problems where neither any expert knowledge nor some helpful annotation for the data is available. In general, clustering groups the data objects in a way that similar objects get together in clusters whereas objects from different clusters are of high dissimilarity. However it is observed that clustering disclose almost no structure even it is known there must be groups of similar objects. In many cases, the reason is that the cluster structure is stimulated by some subsets of the spaces dim ensions only, and the many additional dimensions contribute nothing other than making noise in the data that hinder the discovery of the clusters within that data. As a solution to this problem, clustering algorithms are applied to the relevant subspaces only. Immediately, the new question is how to determine the relevant subspaces among the dimensions of the full space. Being faced with the power set of the set of dimensions a brute force trial of all subsets is infeasible due to their exponential number with respect to the original dimensionality. In high dimensional data, as dimensions are increasing, the visualization and representation of the data becomes more difficult and sometimes increase in the dimensions can create a bottleneck. More dimensions mean more visualization or representation problems in the data. As the dimensions are increased, the data within those dimensions seems dispersing towards the corners / dimensions. Subspace clustering solves this problem by identifying both problems in parallel. It solves the problem of relevant subspaces which can be marked as redundant in high dimensional data. It also solves the problem of finding the cluster structures within that dataset which become apparent in these subspaces. Subspace clustering is an extension to the traditional clustering which automatically finds the clusters present in the subspace of high dimensional data space that allows better clustering the data points than the original space and it works even when the curse of dimensionality occurs. The most o f the clustering algorithms have been designed to discover clusters in full dimensional space so they are not effective in identifying the clusters that exists within subspace of the original data space. The most of the clustering algorithms produces clustering results based on the order in which the input records were processed [2]. Subspace clustering can identify the different cluster within subspaces which exists in the huge amount of sales data and through it we can find which of the different attributes are related. This can be useful in promoting the sales and in planning the inventory levels of different products. It can be used for finding the subspace clusters in spatial databases and some useful decisions can be taken based on the subspace clusters identified [2]. The technique used here for indentifying the redundant dimensions which are creating noise in the data in order to identifying the clusters consist of drawing or plotting the data points in all dimensions. At second step the projection of all data points along each dimension are plotted. At the third step the unions of projections along each dimension are plotted using all possible combinations among all no. of dimensions and finally the union of all projection along all dimensions and analyzed, it will show the contribution of each dimension in indentifying the cluster which will be represented by the weight of projection. If any of the given dimension is contributing very less in order to building the weight of projection, that dimension can be considered as redundant, which means this dimension is not so important to identify the clusters in given data. The details of this strategy will be covered in later chapters. 2 Data Mining 2.1 What is Data Mining? Data mining is the process of analyzing data from different perspective and summarizing it for getting useful information. The information can be used for many useful purposes like increasing revenue, cuts costs etc. The data mining process also finds the hidden knowledge and relationship within the data which was not known while data recording. Describing the data is the first step in data mining, followed by summarizing its attributes (like standard deviation mean etc). After that data is reviewed using visual tools like charts and graphs and then meaningful relations are determined. In the data mining process, the steps of collecting, exploring and selecting the right data are critically important. User can analyze data from different dimensions categorize and summarize it. Data mining finds the correlation or patterns amongst the fields in large databases. Data mining has a great potential to help companies to focus on their important information in their data warehouse. It can predict the future trends and behaviors and allows the business to make more proactive and knowledge driven decisions. It can answer the business questions that were traditionally much time consuming to resolve. It scours databases for hidden patterns for finding predictive information that experts may miss it might lies beyond their expectations. Data mining is normally used to transform the data into information or knowledge. It is commonly used in wide range of profiting practices such as marketing, fraud detection and scientific discovery. Many companies already collect and refine their data. Data mining techniques can be implemented on existing platforms for enhance the value of information resources. Data mining tools can analyze massive databases to deliver answers to the questions. Some other terms contains similar meaning from data mining such as ââ¬Å"Knowledge miningâ⬠or ââ¬Å"Knowledge Extractionâ⬠or ââ¬Å"Pattern Analysisâ⬠. Data mining can also be treated as a Knowledge Discovery from Data (KDD). Some people simply mean the data mining as an essential step in Knowledge discovery from a large data. The process of knowledge discovery from data contains following steps. * Data cleaning (removing the noise and inconsistent data) * Data Integration (combining multiple data sources) * Data selection (retrieving the data relevant to analysis task from database) * Data Transformation (transforming the data into appropriate forms for mining by performing summary or aggregation operations) * Data mining (applying the intelligent methods in order to extract data patterns) * Pattern evaluation (identifying the truly interesting patterns representing knowledge based on some measures) * Knowledge representation (representing knowledge techniques that are used to present the mined knowledge to the user) 2.2 Data Data can be any type of facts, or text, or image or number which can be processed by computer. Todays organizations are accumulating large and growing amounts of data in different formats and in different databases. It can include operational or transactional data which includes costs, sales, inventory, payroll and accounting. It can also include nonoperational data such as industry sales and forecast data. It can also include the meta data which is, data about the data itself, such as logical database design and data dictionary definitions. 2.3 Information The information can be retrieved from the data via patterns, associations or relationship may exist in the data. For example the retail point of sale transaction data can be analyzed to yield information about the products which are being sold and when. 2.4 Knowledge Knowledge can be retrieved from information via historical patterns and the future trends. For example the analysis on retail supermarket sales data in promotional efforts point of view can provide the knowledge buying behavior of customer. Hence items which are at most risk for promotional efforts can be determined by manufacturer easily. 2.5 Data warehouse The advancement in data capture, processing power, data transmission and storage technologies are enabling the industry to integrate their various databases into data warehouse. The process of centralizing and retrieving the data is called data warehousing. Data warehousing is new term but concept is a bit old. Data warehouse is storage of massive amount of data in electronic form. Data warehousing is used to represent an ideal way of maintaining a central repository for all organizational data. Purpose of data warehouse is to maximize the user access and analysis. The data from different data sources are extracted, transformed and then loaded into data warehouse. Users / clients can generate different types of reports and can do business analysis by accessing the data warehouse. Data mining is primarily used today by companies with a strong consumer focus retail, financial, communication, and marketing organizations. It allows these organizations to evaluate associations between certain internal external factors. The product positioning, price or staff skills can be example of internal factors. The external factor examples can be economic indicators, customer demographics and competition. It also allows them to calculate the impact on sales, corporate profits and customer satisfaction. Furthermore it allows them to summarize the information to look detailed transactional data. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by its capabilities. Data mining usually automates the procedure of searching predictive information in huge databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data very quickly. The targeted marketing can be an example of predictive problem. Data mining utilizes data on previous promotional mailings in order to recognize the targets most probably to increase return on investment as maximum as possible in future mailings. Tools used in data mining traverses through huge databases and discover previously unseen patterns in single step. Analysis on retail sales data to recognize apparently unrelated products which are usually purchased together can be an example of it. The more pattern discovery problems can include identifying fraudulent credit card transactions and identifying irregular data that could symbolize data entry input errors. When data mining tools are used on parallel processing systems of high performance, they are able to analy ze huge databases in very less amount of time. Faster or quick processing means that users can automatically experience with more details to recognize the complex data. High speed and quick response makes it actually possible for users to examine huge amounts of data. Huge databases, in turn, give improved and better predictions. 2.6 Descriptive and Predictive Data Mining Descriptive data mining aims to find patterns in the data that provide some information about what the data contains. It describes patterns in existing data, and is generally used to create meaningful subgroups such as demographic clusters. For example descriptions are in the form of Summaries and visualization, Clustering and Link Analysis. Predictive Data Mining is used to forecast explicit values, based on patterns determined from known results. For example, in the database having records of clients who have already answered to a specific offer, a model can be made that predicts which prospects are most probable to answer to the same offer. It is usually applied to recognize data mining projects with the goal to identify a statistical or neural network model or set of models that can be used to predict some response of interest. For example, a credit card company may want to engage in predictive data mining, to derive a (trained) model or set of models that can quickly identify tr ansactions which have a high probability of being fraudulent. Other types of data mining projects may be more exploratory in nature (e.g. to determine the cluster or divisions of customers), in which case drill-down descriptive and tentative methods need to be applied. Predictive data mining is goad oriented. It can be decomposed into following major tasks. * Data Preparation * Data Reduction * Data Modeling and Prediction * Case and Solution Analysis 2.7 Text Mining The Text Mining is sometimes also called Text Data Mining which is more or less equal to Text Analytics. Text mining is the process of extracting/deriving high quality information from the text. High quality information is typically derived from deriving the patterns and trends through means such as statistical pattern learning. It usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. The High Quality in text mining usually refers to some combination of relevance, novelty, and interestingness. The text categorization, concept/entity extraction, text clustering, sentiment analysis, production of rough taxonomies, entity relation modeling, document summarization can be included as text mining tasks. Text Mining is also known as the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. Linking together of the extracted information is the key element to create new facts or new hypotheses to be examined further by more conventional ways of experimentation. In text mining, the goal is to discover unknown information, something that no one yet knows and so could not have yet written down. The difference between ordinary data mining and text mining is that, in text mining the patterns are retrieved from natural language text instead of from structured databases of facts. Databases are designed and developed for programs to execute automatically; text is written for people to read. Most of the researchers think that it will need a full fledge simulation of how the brain works before that programs that read the way people do could be written. 2.8 Web Mining Web Mining is the technique which is used to extract and discover the information from web documents and services automatically. The interest of various research communities, tremendous growth of information resources on Web and recent interest in e-commerce has made this area of research very huge. Web mining can be usually decomposed into subtasks. * Resource finding: fetching intended web documents. * Information selection and pre-processing: selecting and preprocessing specific information from fetched web resources automatically. * Generalization: automatically discovers general patterns at individual and across multiple website * Analysis: validation and explanation of mined patterns. Web Mining can be mainly categorized into three areas of interest based on which part of Web needs to be mined: Web Content Mining, Web Structure Mining and Web Usage Mining. Web Contents Mining describes the discovery of useful information from the web contents, data and documents [10]. In past the internet consisted of only different types of services and data resources. But today most of the data is available over the internet; even digital libraries are also available on Web. The web contents consist of several types of data including text, image, audio, video, metadata as well as hyperlinks. Most of the companies are trying to transform their business and services into electronic form and putting it on Web. As a result, the databases of the companies which were previously residing on legacy systems are now accessible over the Web. Thus the employees, business partners and even end clients are able to access the companys databases over the Web. Users are accessing the application s over the web via their web interfaces due to which the most of the companies are trying to transform their business over the web, because internet is capable of making connection to any other computer anywhere in the world [11]. Some of the web contents are hidden and hence cannot be indexed. The dynamically generated data from the results of queries residing in the database or private data can fall in this area. Unstructured data such as free text or semi structured data such as HTML and fully structured data such as data in the tables or database generated web pages can be considered in this category. However unstructured text is mostly found in the web contents. The work on Web content mining is mostly done from 2 point of views, one is IR and other is DB point of view. ââ¬Å"From IR view, web content mining assists and improves the information finding or filtering to the user. From DB view web content mining models the data on the web and integrates them so that the more soph isticated queries other than keywords could be performed. [10]. In Web Structure Mining, we are more concerned with the structure of hyperlinks within the web itself which can be called as inter document structure [10]. It is closely related to the web usage mining [14]. Pattern detection and graphs mining are essentially related to the web structure mining. Link analysis technique can be used to determine the patterns in the graph. The search engines like Google usually uses the web structure mining. For example, the links are mined and one can then determine the web pages that point to a particular web page. When a string is searched, a webpage having most number of links pointed to it may become first in the list. Thats why web pages are listed based on rank which is calculated by the rank of web pages pointed to it [14]. Based on web structural data, web structure mining can be divided into two categories. The first kind of web structure mining interacts with extracting patterns from the hyperlinks in the web. A hyperlink is a structural comp onent that links or connects the web page to a different web page or different location. The other kind of the web structure mining interacts with the document structure, which is using the tree-like structure to analyze and describe the HTML or XML tags within the web pages. With continuous growth of e-commerce, web services and web applications, the volume of clickstream and user data collected by web based organizations in their daily operations has increased. The organizations can analyze such data to determine the life time value of clients, design cross marketing strategies etc. [13]. The Web usage mining interacts with data generated by users clickstream. ââ¬Å"The web usage data includes web server access logs, proxy server logs, browser logs, user profile, registration data, user sessions, transactions, cookies, user queries, bookmark data, mouse clicks and scrolls and any other data as a result of interactionâ⬠[10]. So the web usage mining is the most important task of the web mining [12]. Weblog databases can provide rich information about the web dynamics. In web usage mining, web log records are mined to discover the user access patterns through which the potential customers can be identified, quality of internet services can be enhanc ed and web server performance can be improved. Many techniques can be developed for implementation of web usage mining but it is important to know that success of such applications depends upon what and how much valid and reliable knowledge can be discovered the log data. Most often, the web logs are cleaned, condensed and transformed before extraction of any useful and significant information from weblog. Web mining can be performed on web log records to find associations patterns, sequential patterns and trend of web accessing. The overall Web usage mining process can be divided into three inter-dependent stages: data collection and pre-processing, pattern discovery, and pattern analysis [13]. In the data collection preprocessing stage, the raw data is collected, cleaned and transformed into a set of user transactions which represents the activities of each user during visits to the web site. In the pattern discovery stage, statistical, database, and machine learning operations a re performed to retrieve hidden patterns representing the typical behavior of users, as well as summary of statistics on Web resources, sessions, and users. 3 Classification 3.1 What is Classification? As the quantity and the variety increases in the available data, it needs some robust, efficient and versatile data categorization technique for exploration [16]. Classification is a method of categorizing class labels to patterns. It is actually a data mining methodology used to predict group membership for data instances. For example, one may want to use classification to guess whether the weather on a specific day would be ââ¬Å"sunnyâ⬠, ââ¬Å"cloudyâ⬠or ââ¬Å"rainyâ⬠. The data mining techniques which are used to differentiate similar kind of data objects / points from other are called clustering. It actually uses attribute values found in the data of one class to distinguish it from other types or classes. The data classification majorly concerns with the treatment of the large datasets. In classification we build a model by analyzing the existing data, describing the characteristics of various classes of data. We can use this model to predict the class/type of new data. Classification is a supervised machine learning procedure in which individual items are placed in a group based on quantitative information on one or more characteristics in the items. Decision Trees and Bayesian Networks are the examples of classification methods. One type of classification is Clustering. This is process of finding the similar data objects / points within the given dataset. This similarity can be in the meaning of distance measures or on any other parameter, depending upon the need and the given data. Classification is an ancient term as well as a modern one since classification of animals, plants and other physical objects is still valid today. Classification is a way of thinking about things rather than a study of things itself so it draws its theory and application from complete range of human experiences and thoughts [18]. From a bigger picture, classification can include medical patients based on disease, a set of images containing red rose from an image database, a set of documents describing ââ¬Å"classificationâ⬠from a document/text database, equipment malfunction based on cause and loan applicants based on their likelihood of payment etc. For example in later case, the problem is to predict a new applicants loans eligibility given old data about customers. There are many techniques which are used for data categorization / classification. The most common are Decision tree classifier and Bayesian classifiers. 3.2 Types of Classification There are two types of classification. One is supervised classification and other is unsupervised classification. Supervised learning is a machine learning technique for discovering a function from training data. The training data contains the pairs of input objects, and their desired outputs. The output of the function can be a continuous value which can be called regression, or can predict a class label of the input object which can be called as classification. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this goal, the learner needs to simplify from the presented data to hidden situations in a meaningful way. The unsupervised learning is a class of problems in machine learning in which it is needed to seek to determine how the data are organized. It is distinguished from supervised learning in that the learner is given only unknown examples. Unsupervised learning is nearly related to the problem of density estimation in statistics. However unsupervised learning also covers many other techniques that are used to summarize and explain key features of the data. One form of unsupervised learning is clustering which will be covered in next chapter. Blind source partition based on Independent Component Analysis is another example. Neural network models, adaptive resonance theory and the self organizing maps are most commonly used unsupervised learning algorithms. There are many techniques for the implementation of supervised classification. We will be discussing two of them which are most commonly used which are Decision Trees classifiers and Naà ¯ve Bayesian Classifiers. 3.2.1 Decision Trees Classifier There are many alternatives to represent classifiers. The decision tree is probably the most widely used approach for this purpose. It is one of the most widely used supervised learning methods used for data exploration. It is easy to use and can be represented in if-then-else statements/rules and can work well in noisy data as well [16]. Tree like graph or decisions models and their possible consequences including resource costs, chance event, outcomes, and utilities are used in decision trees. Decision trees are most commonly used in specifically in decision analysis, operations research, to help in identifying a strategy most probably to reach a target. In machine learning and data mining, a decision trees are used as predictive model; means a planning from observations calculations about an item to the conclusions about its target value. More descriptive names for such tree models are classification tree or regression tree. In these tree structures, leaves are representing class ifications and branches are representing conjunctions of features those lead to classifications. The machine learning technique for inducing a decision tree from data is called decision tree learning, or decision trees. Decision trees are simple but powerful form of multiple variable analyses [15]. Classification is done by tree like structures that have different test criteria for a variable at each of the nodes. New leaves are generated based on the results of the tests at the nodes. Decision Tree is a supervised learning system in which classification rules are constructed from the decision tree. Decision trees are produced by algorithms which identify various ways splitting data set into branch like segment. Decision tree try to find out a strong relationship between input and target values within the dataset [15]. In tasks classification, decision trees normally visualize that what steps should be taken to reach on classification. Every decision tree starts with a parent node called root node which is considered to be the parent of every other node. Each node in the tree calculates an attribute in the data and decides which path it should follow. Typically the decision test is comparison of a value against some constant. Classification with the help of decision tree is done by traversing from the root node up to a leaf node. Decision trees are able to represent and classify the diverse types of data. The simplest form of data is numerical data which is most familiar too. Organizing nominal data is also required many times in many situations. Nominal quantities are normally represented via discrete set of symbols. For example weather condition can be described in either nominal fashion or numeric. Quantification can be done about temperature by saying that it is eleven degrees Celsius or fifty two degrees Fahrenheit. The cool, mild, cold, warm or hot terminologies can also be sued. The former is a type of numeric data while and the latter is an example of nominal data. More precisely, the example of cool, mild, cold, warm and hot is a special type of nominal data, expressed as ordinal data. Ordinal data usually has an implicit assumption of ordered relationships among the values. In the weather example, purely nominal description like rainy, overcast and sunny can also be added. These values have no relationships or distance measures among each other. Decision Trees are those types of trees where each node is a question, each branch is an answer to a question, and each leaf is a result. Here is an example of Decision tree. Roughly, the idea is based upon the number of stock items; we have to make different decisions. If we dont have much, you buy at any cost. If you have a lot of items then you only buy if it is inexpensive. Now if stock items are less than 10 then buy all if unit price is less than 10 otherwise buy only 10 items. Now if we have 10 to 40 items in the stock then check unit price. If unit price is less than 5à £ then buy only 5 items otherwise no need to buy anything expensive since stock is good already. Now if we have more than 40 items in the stock, then buy 5 if and only if price is less than 2à £ otherwise no need to buy too expensive items. So in this way decision trees help us to make a decision at each level. Here is another example of decision tree, representing the risk factor associated with the rash driving. The root node at the top of the tree structure is showing the feature that is split first for highest discrimination. The internal nodes are showing decision rules on one or more attributes while leaf nodes are class labels. A person having age less than 20 has very high risk while a person having age greater than 30 has a very low risk. A middle category; a person having age greater than 20 but less than 30 depend upon another attribute which is car type. If car type is of sports then there is again high risk involved while if family car is used then there is low risk involved. In the field of sciences engineering and in the applied areas including business intelligence and data mining, many useful features are being introduced as the result of evolution of decision trees. * With the help of transformation in decision trees, the volume of data can be reduced into more compact form that preserves the major characteristic Identifying Clusters in High Dimensional Data Identifying Clusters in High Dimensional Data ââ¬Å"Ask those who remember, are mindful if you do not know).â⬠(Holy Quran, 6:43) Removal Of Redundant Dimensions To Find Clusters In N-Dimensional Data Using Subspace Clustering Abstract The data mining has emerged as a powerful tool to extract knowledge from huge databases. Researchers have introduced several machine learning algorithms to explore the databases to discover information, hidden patterns, and rules from the data which were not known at the data recording time. Due to the remarkable developments in the storage capacities, processing and powerful algorithmic tools, practitioners are developing new and improved algorithms and techniques in several areas of data mining to discover the rules and relationship among the attributes in simple and complex higher dimensional databases. Furthermore data mining has its implementation in large variety of areas ranging from banking to marketing, engineering to bioinformatics and from investment to risk analysis and fraud detection. Practitioners are analyzing and implementing the techniques of artificial neural networks for classification and regression problems because of accuracy, efficiency. The aim of his short r esearch project is to develop a way of identifying the clusters in high dimensional data as well as redundant dimensions which can create a noise in identifying the clusters in high dimensional data. Techniques used in this project utilizes the strength of the projections of the data points along the dimensions to identify the intensity of projection along each dimension in order to find cluster and redundant dimension in high dimensional data. 1 Introduction In numerous scientific settings, engineering processes, and business applications ranging from experimental sensor data and process control data to telecommunication traffic observation and financial transaction monitoring, huge amounts of high-dimensional measurement data are produced and stored. Whereas sensor equipments as well as big storage devices are getting cheaper day by day, data analysis tools and techniques wrap behind. Clustering methods are common solutions to unsupervised learning problems where neither any expert knowledge nor some helpful annotation for the data is available. In general, clustering groups the data objects in a way that similar objects get together in clusters whereas objects from different clusters are of high dissimilarity. However it is observed that clustering disclose almost no structure even it is known there must be groups of similar objects. In many cases, the reason is that the cluster structure is stimulated by some subsets of the spaces dim ensions only, and the many additional dimensions contribute nothing other than making noise in the data that hinder the discovery of the clusters within that data. As a solution to this problem, clustering algorithms are applied to the relevant subspaces only. Immediately, the new question is how to determine the relevant subspaces among the dimensions of the full space. Being faced with the power set of the set of dimensions a brute force trial of all subsets is infeasible due to their exponential number with respect to the original dimensionality. In high dimensional data, as dimensions are increasing, the visualization and representation of the data becomes more difficult and sometimes increase in the dimensions can create a bottleneck. More dimensions mean more visualization or representation problems in the data. As the dimensions are increased, the data within those dimensions seems dispersing towards the corners / dimensions. Subspace clustering solves this problem by identifying both problems in parallel. It solves the problem of relevant subspaces which can be marked as redundant in high dimensional data. It also solves the problem of finding the cluster structures within that dataset which become apparent in these subspaces. Subspace clustering is an extension to the traditional clustering which automatically finds the clusters present in the subspace of high dimensional data space that allows better clustering the data points than the original space and it works even when the curse of dimensionality occurs. The most o f the clustering algorithms have been designed to discover clusters in full dimensional space so they are not effective in identifying the clusters that exists within subspace of the original data space. The most of the clustering algorithms produces clustering results based on the order in which the input records were processed [2]. Subspace clustering can identify the different cluster within subspaces which exists in the huge amount of sales data and through it we can find which of the different attributes are related. This can be useful in promoting the sales and in planning the inventory levels of different products. It can be used for finding the subspace clusters in spatial databases and some useful decisions can be taken based on the subspace clusters identified [2]. The technique used here for indentifying the redundant dimensions which are creating noise in the data in order to identifying the clusters consist of drawing or plotting the data points in all dimensions. At second step the projection of all data points along each dimension are plotted. At the third step the unions of projections along each dimension are plotted using all possible combinations among all no. of dimensions and finally the union of all projection along all dimensions and analyzed, it will show the contribution of each dimension in indentifying the cluster which will be represented by the weight of projection. If any of the given dimension is contributing very less in order to building the weight of projection, that dimension can be considered as redundant, which means this dimension is not so important to identify the clusters in given data. The details of this strategy will be covered in later chapters. 2 Data Mining 2.1 What is Data Mining? Data mining is the process of analyzing data from different perspective and summarizing it for getting useful information. The information can be used for many useful purposes like increasing revenue, cuts costs etc. The data mining process also finds the hidden knowledge and relationship within the data which was not known while data recording. Describing the data is the first step in data mining, followed by summarizing its attributes (like standard deviation mean etc). After that data is reviewed using visual tools like charts and graphs and then meaningful relations are determined. In the data mining process, the steps of collecting, exploring and selecting the right data are critically important. User can analyze data from different dimensions categorize and summarize it. Data mining finds the correlation or patterns amongst the fields in large databases. Data mining has a great potential to help companies to focus on their important information in their data warehouse. It can predict the future trends and behaviors and allows the business to make more proactive and knowledge driven decisions. It can answer the business questions that were traditionally much time consuming to resolve. It scours databases for hidden patterns for finding predictive information that experts may miss it might lies beyond their expectations. Data mining is normally used to transform the data into information or knowledge. It is commonly used in wide range of profiting practices such as marketing, fraud detection and scientific discovery. Many companies already collect and refine their data. Data mining techniques can be implemented on existing platforms for enhance the value of information resources. Data mining tools can analyze massive databases to deliver answers to the questions. Some other terms contains similar meaning from data mining such as ââ¬Å"Knowledge miningâ⬠or ââ¬Å"Knowledge Extractionâ⬠or ââ¬Å"Pattern Analysisâ⬠. Data mining can also be treated as a Knowledge Discovery from Data (KDD). Some people simply mean the data mining as an essential step in Knowledge discovery from a large data. The process of knowledge discovery from data contains following steps. * Data cleaning (removing the noise and inconsistent data) * Data Integration (combining multiple data sources) * Data selection (retrieving the data relevant to analysis task from database) * Data Transformation (transforming the data into appropriate forms for mining by performing summary or aggregation operations) * Data mining (applying the intelligent methods in order to extract data patterns) * Pattern evaluation (identifying the truly interesting patterns representing knowledge based on some measures) * Knowledge representation (representing knowledge techniques that are used to present the mined knowledge to the user) 2.2 Data Data can be any type of facts, or text, or image or number which can be processed by computer. Todays organizations are accumulating large and growing amounts of data in different formats and in different databases. It can include operational or transactional data which includes costs, sales, inventory, payroll and accounting. It can also include nonoperational data such as industry sales and forecast data. It can also include the meta data which is, data about the data itself, such as logical database design and data dictionary definitions. 2.3 Information The information can be retrieved from the data via patterns, associations or relationship may exist in the data. For example the retail point of sale transaction data can be analyzed to yield information about the products which are being sold and when. 2.4 Knowledge Knowledge can be retrieved from information via historical patterns and the future trends. For example the analysis on retail supermarket sales data in promotional efforts point of view can provide the knowledge buying behavior of customer. Hence items which are at most risk for promotional efforts can be determined by manufacturer easily. 2.5 Data warehouse The advancement in data capture, processing power, data transmission and storage technologies are enabling the industry to integrate their various databases into data warehouse. The process of centralizing and retrieving the data is called data warehousing. Data warehousing is new term but concept is a bit old. Data warehouse is storage of massive amount of data in electronic form. Data warehousing is used to represent an ideal way of maintaining a central repository for all organizational data. Purpose of data warehouse is to maximize the user access and analysis. The data from different data sources are extracted, transformed and then loaded into data warehouse. Users / clients can generate different types of reports and can do business analysis by accessing the data warehouse. Data mining is primarily used today by companies with a strong consumer focus retail, financial, communication, and marketing organizations. It allows these organizations to evaluate associations between certain internal external factors. The product positioning, price or staff skills can be example of internal factors. The external factor examples can be economic indicators, customer demographics and competition. It also allows them to calculate the impact on sales, corporate profits and customer satisfaction. Furthermore it allows them to summarize the information to look detailed transactional data. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by its capabilities. Data mining usually automates the procedure of searching predictive information in huge databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data very quickly. The targeted marketing can be an example of predictive problem. Data mining utilizes data on previous promotional mailings in order to recognize the targets most probably to increase return on investment as maximum as possible in future mailings. Tools used in data mining traverses through huge databases and discover previously unseen patterns in single step. Analysis on retail sales data to recognize apparently unrelated products which are usually purchased together can be an example of it. The more pattern discovery problems can include identifying fraudulent credit card transactions and identifying irregular data that could symbolize data entry input errors. When data mining tools are used on parallel processing systems of high performance, they are able to analy ze huge databases in very less amount of time. Faster or quick processing means that users can automatically experience with more details to recognize the complex data. High speed and quick response makes it actually possible for users to examine huge amounts of data. Huge databases, in turn, give improved and better predictions. 2.6 Descriptive and Predictive Data Mining Descriptive data mining aims to find patterns in the data that provide some information about what the data contains. It describes patterns in existing data, and is generally used to create meaningful subgroups such as demographic clusters. For example descriptions are in the form of Summaries and visualization, Clustering and Link Analysis. Predictive Data Mining is used to forecast explicit values, based on patterns determined from known results. For example, in the database having records of clients who have already answered to a specific offer, a model can be made that predicts which prospects are most probable to answer to the same offer. It is usually applied to recognize data mining projects with the goal to identify a statistical or neural network model or set of models that can be used to predict some response of interest. For example, a credit card company may want to engage in predictive data mining, to derive a (trained) model or set of models that can quickly identify tr ansactions which have a high probability of being fraudulent. Other types of data mining projects may be more exploratory in nature (e.g. to determine the cluster or divisions of customers), in which case drill-down descriptive and tentative methods need to be applied. Predictive data mining is goad oriented. It can be decomposed into following major tasks. * Data Preparation * Data Reduction * Data Modeling and Prediction * Case and Solution Analysis 2.7 Text Mining The Text Mining is sometimes also called Text Data Mining which is more or less equal to Text Analytics. Text mining is the process of extracting/deriving high quality information from the text. High quality information is typically derived from deriving the patterns and trends through means such as statistical pattern learning. It usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. The High Quality in text mining usually refers to some combination of relevance, novelty, and interestingness. The text categorization, concept/entity extraction, text clustering, sentiment analysis, production of rough taxonomies, entity relation modeling, document summarization can be included as text mining tasks. Text Mining is also known as the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. Linking together of the extracted information is the key element to create new facts or new hypotheses to be examined further by more conventional ways of experimentation. In text mining, the goal is to discover unknown information, something that no one yet knows and so could not have yet written down. The difference between ordinary data mining and text mining is that, in text mining the patterns are retrieved from natural language text instead of from structured databases of facts. Databases are designed and developed for programs to execute automatically; text is written for people to read. Most of the researchers think that it will need a full fledge simulation of how the brain works before that programs that read the way people do could be written. 2.8 Web Mining Web Mining is the technique which is used to extract and discover the information from web documents and services automatically. The interest of various research communities, tremendous growth of information resources on Web and recent interest in e-commerce has made this area of research very huge. Web mining can be usually decomposed into subtasks. * Resource finding: fetching intended web documents. * Information selection and pre-processing: selecting and preprocessing specific information from fetched web resources automatically. * Generalization: automatically discovers general patterns at individual and across multiple website * Analysis: validation and explanation of mined patterns. Web Mining can be mainly categorized into three areas of interest based on which part of Web needs to be mined: Web Content Mining, Web Structure Mining and Web Usage Mining. Web Contents Mining describes the discovery of useful information from the web contents, data and documents [10]. In past the internet consisted of only different types of services and data resources. But today most of the data is available over the internet; even digital libraries are also available on Web. The web contents consist of several types of data including text, image, audio, video, metadata as well as hyperlinks. Most of the companies are trying to transform their business and services into electronic form and putting it on Web. As a result, the databases of the companies which were previously residing on legacy systems are now accessible over the Web. Thus the employees, business partners and even end clients are able to access the companys databases over the Web. Users are accessing the application s over the web via their web interfaces due to which the most of the companies are trying to transform their business over the web, because internet is capable of making connection to any other computer anywhere in the world [11]. Some of the web contents are hidden and hence cannot be indexed. The dynamically generated data from the results of queries residing in the database or private data can fall in this area. Unstructured data such as free text or semi structured data such as HTML and fully structured data such as data in the tables or database generated web pages can be considered in this category. However unstructured text is mostly found in the web contents. The work on Web content mining is mostly done from 2 point of views, one is IR and other is DB point of view. ââ¬Å"From IR view, web content mining assists and improves the information finding or filtering to the user. From DB view web content mining models the data on the web and integrates them so that the more soph isticated queries other than keywords could be performed. [10]. In Web Structure Mining, we are more concerned with the structure of hyperlinks within the web itself which can be called as inter document structure [10]. It is closely related to the web usage mining [14]. Pattern detection and graphs mining are essentially related to the web structure mining. Link analysis technique can be used to determine the patterns in the graph. The search engines like Google usually uses the web structure mining. For example, the links are mined and one can then determine the web pages that point to a particular web page. When a string is searched, a webpage having most number of links pointed to it may become first in the list. Thats why web pages are listed based on rank which is calculated by the rank of web pages pointed to it [14]. Based on web structural data, web structure mining can be divided into two categories. The first kind of web structure mining interacts with extracting patterns from the hyperlinks in the web. A hyperlink is a structural comp onent that links or connects the web page to a different web page or different location. The other kind of the web structure mining interacts with the document structure, which is using the tree-like structure to analyze and describe the HTML or XML tags within the web pages. With continuous growth of e-commerce, web services and web applications, the volume of clickstream and user data collected by web based organizations in their daily operations has increased. The organizations can analyze such data to determine the life time value of clients, design cross marketing strategies etc. [13]. The Web usage mining interacts with data generated by users clickstream. ââ¬Å"The web usage data includes web server access logs, proxy server logs, browser logs, user profile, registration data, user sessions, transactions, cookies, user queries, bookmark data, mouse clicks and scrolls and any other data as a result of interactionâ⬠[10]. So the web usage mining is the most important task of the web mining [12]. Weblog databases can provide rich information about the web dynamics. In web usage mining, web log records are mined to discover the user access patterns through which the potential customers can be identified, quality of internet services can be enhanc ed and web server performance can be improved. Many techniques can be developed for implementation of web usage mining but it is important to know that success of such applications depends upon what and how much valid and reliable knowledge can be discovered the log data. Most often, the web logs are cleaned, condensed and transformed before extraction of any useful and significant information from weblog. Web mining can be performed on web log records to find associations patterns, sequential patterns and trend of web accessing. The overall Web usage mining process can be divided into three inter-dependent stages: data collection and pre-processing, pattern discovery, and pattern analysis [13]. In the data collection preprocessing stage, the raw data is collected, cleaned and transformed into a set of user transactions which represents the activities of each user during visits to the web site. In the pattern discovery stage, statistical, database, and machine learning operations a re performed to retrieve hidden patterns representing the typical behavior of users, as well as summary of statistics on Web resources, sessions, and users. 3 Classification 3.1 What is Classification? As the quantity and the variety increases in the available data, it needs some robust, efficient and versatile data categorization technique for exploration [16]. Classification is a method of categorizing class labels to patterns. It is actually a data mining methodology used to predict group membership for data instances. For example, one may want to use classification to guess whether the weather on a specific day would be ââ¬Å"sunnyâ⬠, ââ¬Å"cloudyâ⬠or ââ¬Å"rainyâ⬠. The data mining techniques which are used to differentiate similar kind of data objects / points from other are called clustering. It actually uses attribute values found in the data of one class to distinguish it from other types or classes. The data classification majorly concerns with the treatment of the large datasets. In classification we build a model by analyzing the existing data, describing the characteristics of various classes of data. We can use this model to predict the class/type of new data. Classification is a supervised machine learning procedure in which individual items are placed in a group based on quantitative information on one or more characteristics in the items. Decision Trees and Bayesian Networks are the examples of classification methods. One type of classification is Clustering. This is process of finding the similar data objects / points within the given dataset. This similarity can be in the meaning of distance measures or on any other parameter, depending upon the need and the given data. Classification is an ancient term as well as a modern one since classification of animals, plants and other physical objects is still valid today. Classification is a way of thinking about things rather than a study of things itself so it draws its theory and application from complete range of human experiences and thoughts [18]. From a bigger picture, classification can include medical patients based on disease, a set of images containing red rose from an image database, a set of documents describing ââ¬Å"classificationâ⬠from a document/text database, equipment malfunction based on cause and loan applicants based on their likelihood of payment etc. For example in later case, the problem is to predict a new applicants loans eligibility given old data about customers. There are many techniques which are used for data categorization / classification. The most common are Decision tree classifier and Bayesian classifiers. 3.2 Types of Classification There are two types of classification. One is supervised classification and other is unsupervised classification. Supervised learning is a machine learning technique for discovering a function from training data. The training data contains the pairs of input objects, and their desired outputs. The output of the function can be a continuous value which can be called regression, or can predict a class label of the input object which can be called as classification. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this goal, the learner needs to simplify from the presented data to hidden situations in a meaningful way. The unsupervised learning is a class of problems in machine learning in which it is needed to seek to determine how the data are organized. It is distinguished from supervised learning in that the learner is given only unknown examples. Unsupervised learning is nearly related to the problem of density estimation in statistics. However unsupervised learning also covers many other techniques that are used to summarize and explain key features of the data. One form of unsupervised learning is clustering which will be covered in next chapter. Blind source partition based on Independent Component Analysis is another example. Neural network models, adaptive resonance theory and the self organizing maps are most commonly used unsupervised learning algorithms. There are many techniques for the implementation of supervised classification. We will be discussing two of them which are most commonly used which are Decision Trees classifiers and Naà ¯ve Bayesian Classifiers. 3.2.1 Decision Trees Classifier There are many alternatives to represent classifiers. The decision tree is probably the most widely used approach for this purpose. It is one of the most widely used supervised learning methods used for data exploration. It is easy to use and can be represented in if-then-else statements/rules and can work well in noisy data as well [16]. Tree like graph or decisions models and their possible consequences including resource costs, chance event, outcomes, and utilities are used in decision trees. Decision trees are most commonly used in specifically in decision analysis, operations research, to help in identifying a strategy most probably to reach a target. In machine learning and data mining, a decision trees are used as predictive model; means a planning from observations calculations about an item to the conclusions about its target value. More descriptive names for such tree models are classification tree or regression tree. In these tree structures, leaves are representing class ifications and branches are representing conjunctions of features those lead to classifications. The machine learning technique for inducing a decision tree from data is called decision tree learning, or decision trees. Decision trees are simple but powerful form of multiple variable analyses [15]. Classification is done by tree like structures that have different test criteria for a variable at each of the nodes. New leaves are generated based on the results of the tests at the nodes. Decision Tree is a supervised learning system in which classification rules are constructed from the decision tree. Decision trees are produced by algorithms which identify various ways splitting data set into branch like segment. Decision tree try to find out a strong relationship between input and target values within the dataset [15]. In tasks classification, decision trees normally visualize that what steps should be taken to reach on classification. Every decision tree starts with a parent node called root node which is considered to be the parent of every other node. Each node in the tree calculates an attribute in the data and decides which path it should follow. Typically the decision test is comparison of a value against some constant. Classification with the help of decision tree is done by traversing from the root node up to a leaf node. Decision trees are able to represent and classify the diverse types of data. The simplest form of data is numerical data which is most familiar too. Organizing nominal data is also required many times in many situations. Nominal quantities are normally represented via discrete set of symbols. For example weather condition can be described in either nominal fashion or numeric. Quantification can be done about temperature by saying that it is eleven degrees Celsius or fifty two degrees Fahrenheit. The cool, mild, cold, warm or hot terminologies can also be sued. The former is a type of numeric data while and the latter is an example of nominal data. More precisely, the example of cool, mild, cold, warm and hot is a special type of nominal data, expressed as ordinal data. Ordinal data usually has an implicit assumption of ordered relationships among the values. In the weather example, purely nominal description like rainy, overcast and sunny can also be added. These values have no relationships or distance measures among each other. Decision Trees are those types of trees where each node is a question, each branch is an answer to a question, and each leaf is a result. Here is an example of Decision tree. Roughly, the idea is based upon the number of stock items; we have to make different decisions. If we dont have much, you buy at any cost. If you have a lot of items then you only buy if it is inexpensive. Now if stock items are less than 10 then buy all if unit price is less than 10 otherwise buy only 10 items. Now if we have 10 to 40 items in the stock then check unit price. If unit price is less than 5à £ then buy only 5 items otherwise no need to buy anything expensive since stock is good already. Now if we have more than 40 items in the stock, then buy 5 if and only if price is less than 2à £ otherwise no need to buy too expensive items. So in this way decision trees help us to make a decision at each level. Here is another example of decision tree, representing the risk factor associated with the rash driving. The root node at the top of the tree structure is showing the feature that is split first for highest discrimination. The internal nodes are showing decision rules on one or more attributes while leaf nodes are class labels. A person having age less than 20 has very high risk while a person having age greater than 30 has a very low risk. A middle category; a person having age greater than 20 but less than 30 depend upon another attribute which is car type. If car type is of sports then there is again high risk involved while if family car is used then there is low risk involved. In the field of sciences engineering and in the applied areas including business intelligence and data mining, many useful features are being introduced as the result of evolution of decision trees. * With the help of transformation in decision trees, the volume of data can be reduced into more compact form that preserves the major characteristic
Subscribe to:
Posts (Atom)