Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG execution engine that support acyclic data flow and in-memory computing.
Other Memory,给系统预留的,因为程序本身运行也是需要内存的,其默认比例为0.2。 除此之外,为了防止OOM,一般都会有个safetyFraction,这种内存分配机制最大的问题就是其静态性,每个部分都不能超过自己的上限,规定了多少就是多少,这在Storage Memory和Executor Memory当中尤为严重。借用别人的一张图片能够很清楚的说明静态内存的分配方式:
Repartition the RDD according to the given partitioner and, within each resulting partition, sort records by their keys.
This is more efficient than calling repartition and then sorting within each partition because it can push the sorting down into the shuffle machinery.
classMySAMRecordPartitioner(numParts: Int) extendsPartitioner{ overridedefnumPartitions: Int = numParts
overridedefgetPartition(key: Any): Int = { val record: MySAMRecord = key.asInstanceOf[MySAMRecord]
val code = record.regionId % numPartitions if (code < 0) { code + numPartitions } else { code } }
overridedefequals(other: Any): Boolean = other match { case records: MySAMRecordPartitioner => records.numPartitions == numPartitions case _ => false }
if brew list | grep coreutils > /dev/null ; then PATH="$(brew --prefix coreutils)/libexec/gnubin:$PATH" alias ls='gls -FHG --color=auto' eval `gdircolors -b $HOME/.dir_colors` fi
Error:(45, 66) not found: type SparkFlumeProtocol val transactionTimeout: Int, val backOffInterval: Int) extends SparkFlumeProtocol with Logging {
这个问题是由于flume-sink所需要的部分源文件idea不会自动下载,所以编译时不能通过。
解决方式
在Intellij IDEA里面:
打开View -> Tool Windows -> Maven Projects
右击Spark Project External Flume Sink
点击Generate Sources and Update Folders 随后,Intellij IDEA会自动下载Flume Sink相关的包
SqlBaseParser
1 2
Error:(36, 45) object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
MPI_Type_size - int MPI_Type_size(MPI_Datatype datatype, int *size)返回datatype所占有的字节数
MPI_Send - int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)MPI发送数据函数,buf为发送缓冲区的起始地址,count为将发送的数据的个数,datatype为发送数据的数据类型,dest为目的进程的标识号,tag为消息标志,comm为通信域,该函数的返回值为MPI_SUCCESS时表示发送成功
MPI_Irecv = int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)MPI接受数据函数,buf为接受缓冲区的起始地址,count为最多可接受的数据个数,datatype为接收数据的数据类型,source为接受数据的来源即发送数据的进程的进程标识号,tag为消息标识,与相应的发送操作的表示相匹配,comm为本进程和发送进程所在的通信域,status为返回状态