Hello, there are many language implementations on the Internet about the algorithm of "Find common friends". When I have time today, I have studied the writing of the Scala algorithm myself.
The complete code can refer to the Git address:https://github.com/benben7466/SparkDemo/blob/master/spark-test/src/main/scala/testCommendFriend.s ...
Posted on Sat, 04 Jul 2020 10:58:46 -0400 by EGNJohn
Welcome to WeChat official account: ApacheHudi
Most modern data lakes are built on some kind of distributed file system (DFS), such as HDFS or cloud based storage, such as AWS S3. One of the basic principles to follow is the "write once read many" access model for files. This ...
Posted on Sun, 14 Jun 2020 22:39:40 -0400 by tozanni
1.Spark performance tuning: use of checkPoint
Checkpoint means to establish checkpoints, similar to snapshots. For example, in spark computing, the computing process DAG is very long, and the server needs to complete the wh ...
Posted on Sun, 14 Jun 2020 00:48:31 -0400 by brokeDUstudent
1. The relationship among artificial intelligence, machine learning and deep learning
What machine learning can do.
Recommended books for learningLearning objectives
2. What is machine learning
Machine learning is to automatically analyze and obtain laws (models) from data, and use laws to predict u ...
Posted on Fri, 12 Jun 2020 00:23:35 -0400 by Goose
Recently, the company is working on real-time flow processing. The specific requirements are: real-time import of relevant data tables in relational databases (MySQL, Oracle) into HBase, and use Hive mapping HBase for data query. The company uses the big data cluster built by CDH6.3.1~
1, Configure ...
Posted on Wed, 10 Jun 2020 00:55:16 -0400 by jcleary
In data statistics, it is often necessary to count some time-consuming data, such as online time. Some of these data are better to count, and some are a little bit more troublesome. For example, count the online time of users according to the log in and log out.
We can use the window functions lead and lag to complete, which ...
Posted on Tue, 09 Jun 2020 23:56:54 -0400 by dpiland
Reference article:How to submit spark tasks to yarn cluster remotely in idea
Several modes of running spark tasks:
1, local mode, write code in idea and run directly.
2,standalone mode, need to jar package program, upload to cluster, spark-submit submit to cluster run
3,yarn mode (local,client,cluster) as above, also requires jar packa ...
Posted on Thu, 21 May 2020 20:09:40 -0400 by twilightnights
hello everyone! Here are the saprk operator notes I learned during the epidemic holiday. I just spent the whole afternoon sorting them out and sharing them with you! It's not easy to code. If it helps you, remember to like it!
1, spark action operator
2, spark single value type
3, spark double value type
4, spa ...
Posted on Mon, 18 May 2020 04:02:17 -0400 by Mattyspatty
It is very difficult to train the machine learning model by single point in the process of actual processing and solving the engineering problem of machine learning.These scenarios include online recommendations, CTR estimates, Lookalike marketing, and so on. When there are hundreds of millions of data, tens of thousands of dimensional features ...
Posted on Mon, 11 May 2020 23:50:09 -0400 by les48
This problem makes us encounter again. It happens infrequently, but once it happens, it will cause resource manager service crash, ZK registration watch too many and other problems. It has always been a hindrance to not completely solve this problem, so based on the previous two times of analysis and reading the latest version of Hadoop 3.2.1 c ...
Posted on Sun, 10 May 2020 10:38:53 -0400 by FireWhizzle