Knowledge in Data Mining

Tutorial for HADOOP

uploaded books for beginners to understand hadoop

HADOOP and BIG DATA(many books)

everything about big data is in this clip. #hadoop#bigdata#tutorials#beginners#basicsofhadoop

Data Mining

The topic discussed in the attatchments below is of the course computer science and he subject data mining. The content in the documents below comprises of topics such as Data Cleaning, Data mining, Data Reduction, Data Transformation, Discretization, etc.

SEO Search Engine Optimization

Sharing the notes of SEO Search Engine Optimization in brief. Watch the related video on this link - https://www.youtube.com/watch?v=sx98BBEBOMY

How to get data from world bank data

Walkthrough: Downloading Data from the World Bank In Data Fundamentals, we address the question of how healthcare spending affects life expectancy around the world. This is one of the answers we can find looking at data from the World Bank. Open the World Bank data portal: it lives in http://data.WorldBank.org Select Data Catalog from the menu on the top. In the long list on the bottom find “World Development Indicators” Click on the left on the tabular icon You’ll find a very different site: The Databank – The databank is an interface to the World Bank database. You can select what data you want to see from which countries for what period of time. First select the countries. We’re interested in all the countries so click on selectall (check box icon). You can see how many countries you have selected in the top right corner. Click on the Series under the Country view. Now you’ll see a long list of data series you can export. We’ll need a few of them. Select “Health expenditure, private (% GDP)”, “Health expenditure, public (% GDP)” and “Health expenditure, total (% GDP)”. Since the expenditure is in % of GDP we’ll need to get the GDP as well. Since we want to compare countries directly we’ll need GDP in US$. To do this type GDP into the search box and find the entry “GDP, PPP (current international US$)” If we want to see how healthcare expenditure affects the life expectancy we need to add life expectancy to the data. Search for “Life expectancy at birth, total (years)”. Now let’s add one more thing: Population – like this we can calculate how much is spent by and on an average person. Search for “Population” and select “Population, total”. Click on the selected Series on the top left corner. Bring GDP and Population to the top (drag-and-drop) on the side of the list, your selection should now look like this: Click on Time to select the years we are interested in. To keep things simple, select the last 10 most recent years Click on Apply Changes You’ll see a preview of the data On the top left there is a rough layout of how your downloaded file will look like. You’ll see “time” in the columns bit and “series” in the rows bit – this influences how the spreadsheet will look like. While this might be great for some people: The data is a lot easier to handle if all of our “series” are in columns and the years are different rows. So let’s change this. Your arranged organization diagram should look like this: You should noticed the Preview changed. This is how your downloaded file will look like. Now let’s go and Export If you click on the Export button a pop up will appear asking you for the format. Select CSV. It will automatically download the file – store and name it in a folder so you remember where it comes from and what it is for

Data Mining

In simple words, data mining is defined as a process used to extract usable data from a larger set of any raw data. It implies analysing data patterns in large batches of data using one or more software. Data mining has applications in multiple fields, like science and research. As an application of data mining, businesses can learn more about their customers and develop more effective strategies related to various business functions and in turn leverage resources in a more optimal and insightful manner. This helps businesses be closer to their objective and make better decisions. Data mining involves effective data collection and warehousing as well as computer processing. For segmenting the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms. Data mining is also known as Knowledge Discovery in Data (KDD).  Key features of data mining: • Automatic pattern predictions based on trend and behaviour analysis. • Prediction based on likely outcomes. • Creation of decision-oriented information. • Focus on large data sets and databases for analysis. • Clustering based on finding and visually documented groups of facts not previously known. The Data Mining Process: Technological Infrastructure Required: 1. Database Size: For creating a more powerful system more data is required to processed and maintained. 2. Query complexity: For querying or processing more complex queries and the greater the number of queries, the more powerful system is required. Uses: 1. Data mining techniques are useful in many research projects, including mathematics, cybernetics, genetics and marketing. 2. With data mining, a retailer could manage and use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. The retailer could also develop products and promotions to appeal to specific customer segments based on mining demographic data from comment or warranty cards.

DMDW CBIT

it syllabus for data minning

My files

My subjects ppts pdfs

Data mining

It contains some information about data mining

Research methodology

Discriminant Analysis usage in data mining

Data analysis

Anova test using spss software

R programming basic concepts and theories with examples and programs

Basic concepts of R- programming along with sample programs and important theories for 6th Semester, End semester Exams for KIIT deemed to be University, Bhubaneswar.