MLlib basic data type

MLlib uses vectors as its localized storage type, which are mainly composed of two types: sparse and dense

Code:

import org.apache.spark.mllib.linalg
import org.apache.spark.mllib.linalg.Vectors

/**
  * Local vector set
  */
object testVector {

  def main(args: Array[String]): Unit = {

     val vd: linalg.Vector = Vectors.dense(2,0,1)  //Building dense vectors
     println(s"The second dense vector is: ${vd(1)}")

    /**
      * def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
      * First parameter: size of input data
      * Second parameter: data vs value of subscript
      * Third parameter: entered data value
      */
     val vs: linalg.Vector = Vectors.sparse(4,Array(1,2,3,4),Array(5,6,7,8)) //Building sparse vectors
     println(s"The second sparse vector is: ${vs(2)}")

  }

}

Output:

The second dense vector is: 0.0
 The second sparse vector is: 6.0

Vector label

Vector tags are used to tag different values of machine learning algorithms in MLlib. For example, in the classification problem, different data sets can be divided into several parts, with integer numbers 0, 1, 2 Mark.

Code:

import breeze.linalg.DenseVector
import org.apache.spark.mllib.linalg
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint

/**
  * Vector LabeledPoint
  */
object testLablePoint {

  def main(args: Array[String]): Unit = {
    val vd: linalg.Vector = Vectors.dense(2,0,6) //Building dense vectors
    val pos: LabeledPoint = LabeledPoint(1,vd) //Marking dense vectors
    println(s"Dense vector marker point content data: ${pos.features}")
    println(s"Mark: ${pos.label}")

    val vs: linalg.Vector = Vectors.sparse(4,Array(1,2,3,4),Array(5,7,8,0))  //Building sparse vectors
    val posVs = LabeledPoint(2,vs)
    println(s"Sparse vector marker point content: ${posVs.features}")
    println(s"Mark: ${posVs.label}")



  }

}

Output:

Dense vector marker point content data: [2.0,0.0,6.0]
Mark: 1
 Content of sparse vector marker points: (4, [1,2,3,4], [5.0,7.0,8.0,0.0])
Mark: 2

 

Tags: Apache Spark

Posted on Tue, 03 Dec 2019 04:44:37 -0500 by tamilmani