文件名称: Flume+Solr演示demo.pdf
  所属分类: Java
  文件大小: 5mb
  下载次数: 0
  上传时间: 2019-09-01
  提 供 者: qq_38******
 详细说明:该脑图是介绍Flume+Solr演示demo,请贡献给大家下载!tier1 sources, source, channels=channel1 tier1, channels, channe ll, type=memory tier1 channels. channell capacity=10000000 tier1 channe ls channe ll. transactionCapacity=10000 tier1 channels, channe ll, keep-alive=60 tier sinks k1. type apache. fl ink, soLr, morphline. MorphlineSolrsink i tier sinks, sink channel channell tier sinks. sink. morphlineFile =/home/ec2-user/morphline. conf tier1 sinks,, sink. morphlineld= morphline1 ticrl. sourccs-sourcc1 tier l channels=ch anne l1 tier sinks=sink tier1 es source type - avro #tierl.sourcessourcel.type=org.apacheflumesourcehttpFttpsOurcE tier 1. sources. source1 bind = tier l sources, source port=45678 #tierI, sources, sourcel handler org apache, flume sink, solr, morphline. BlobHandler 4tier1 sources source1 handler. max BlobLength= 260000000G #ticrl. sources. sourcel interceptors= uuidintcrccptor #tiers ces. sourcel. interceptors. uuidinterceptor type org apache. f lume sink solr morphline. UUIDInterceptor sUi lde #tier l sources, source interceptors. uuidinterceptor headerName id tier1 sources. source1 channe ls=channe l1 tier l channels. channell type=memory tier 1. channels. channell, capacity=10000000 tierl channe ls, channeLl. transactioncapaci ty=10000 tier 1. channels. channell, keep-alive=6G tier 1. sinks, sinkI, type org apache. f lume, sink solr. morphlire MorphlineSolrsink er1 sinks, sink k1, channel channel1 tier 1.sinks. sinkI, morph ineF i le =/home/ec2-user/morph line, cont 3.准备 morphline的配置文件 #f Specify server locations in a SOLR LOCATOR variable; used later in f variable substitutions SOLR LOCATOR: f Name of solr collection collection collection 1 #f ZooKeeper ensemble khOst:"ip-172-31-12-213:2181/sour" : Specify an array of one or more morphlines, each of which defines an EtL #f trans formation chain. A morphline consists of one or more potentially i #t nested commands, A morphline is a way to consume records such as Flume events : HDFS files or b locks, turn them into a stream of records, and pipe the stream : of records through a set of easily configurable transformations on its way to t Solr morpholines #f Name used to identify a morphline. For example, used if there are multiple *f morphlines in a morphline config file d: morph line1 *f Import all morphline commands in these java packages and their subpackage #f other commands that may be present on the classpath are not visible to this morphline. mport Commands ["org. kitesdk*k* :"org. apache, solr.**","com, cloudera, example.**" !1l commands ead]son i extractJsonPaths f flatten false paths t d: /id user name /user screen name created at /created at text :/text i text cn:/text cn #f Consume the output record of the previous command and pipe another t record downstream t convert timestamp field to native solr timestamp format # such as2012-09-06T07:14:34Zto2012-09-06T07:14:34.000Z convertTimestamp t field i created at inputFormats ["yyyy-MM-dd'T'HH: mm:ss 'Z,yyyy-MM-dd" inputTimezone America/Los Angeles outputFormat : yyyy-MM-dd'T'HH: mm: SS SSs Z outputTimezone UTC f Consume the output record of the previous command and pipe another f record downstream # t this command deletes record fields that are unknown to solr # schema.xm1。 Recall that solr throws an exception on any attempt to load a document f that contains a field that is not specified in schema, xml sanitizeUnknownSolrFields t f Location from which to fetch solr schema solrLocator SOLR LOCATOR] f log the record at debug level to SLF4J i logDebugi format "output record:i", args :[11 I f load the record into a Solr server or Mapreduce reducer LoadSolr t solrLocator $ISOLR LOCATOR] Lec2-usereip-172-31-12-213: - cat morphline. conf Specify server locations in a SOLR LOCATOR variable; used later in variable substitutions SOLR LOCATOR t Name of solr collection coLLection collection1 #f ZooKeeper ensemble zkHos t:"ip-172-31-12-213:2181/soLr" Specify an arr ay of one or more morphlines, each of which defines an ETL transformation chain. A morphline consists of one or more potentially nested commands, A mor phline is a way to consume records such as Flume events HDFS files or blocks, turn them into a stream of records, and pipe the stream of records through a set of easily configurable transformations on its way to Solr morpholines Name used to identify a morphline. For example, used if there are multiple morpholines in a morphline config file id: morphine Import all mor phline commands in these java packages and their subpackages other commands that may be present on the classpath are not visible to this mor phline impor commands :[ org. kitesdk *x,org. apache, solr. **,"com clouder a example. **1 commands reason 1 extract Paths flatten: false paths t id :/id user name /user screen name created at : /created at text / text text cn. /text cn Consume the output record of the previous command and pipe another s record downs tream convert timestamp field to native Solr timestamp format # such as2012-69-06Te7:14:34Zto212-9-06T67:14:34,⊙60Z convertTimes tamp field created at inputFormats : yyyy-MM-dd HH: mm: ss 2,yyyy-MM-dd] nputTimezone Ameri output Format :yyyy-MM-dd ' THH: mm: ss SSSZ outputTimezone 4.到 https://repository.cloudera.com/artifactory/cdh-releases-rcs/org/apache/lucene/ lucene- analyzers-smartcn/下载对应CDH版本的中文分词包,将下载的jar包放到 每台机器的/opt/ cloudera/ parcels/CDH/lib/solr/ webapps/sol/WEB-NF/lib和 /opt/ cloudera/ parcels/CDH/lb/ lume-ng/ib,重启Sor和 Flume(demo演示过程 中省略 ec2-usereip-172-31-12-213: /opt/cloudera/parcels/CDH/lib/flume-ng/lib> pwd /opt/cloudera/parcels/ CDH/Lib/flume-ng/lib c2-usereip-172-31-12-213: /opt/cloudera/parce ls/CDH/lib/flume-ng/1ib> l1 lucene* rwxrwxrwx 1 root root 3602595 4 23 02: 20 lucenc-analyzcrs-smartcn-4.10.3-cdh5 6.0, jar /parcels/CDH/liE/flume-ng/ibs 11 /opt/cloudera/parcelsCDH/1.b/solr/webapps/solr/WEB-INF/1ib/smarten" 5.根据数据格式制作 schema文件 field name=text cn" type=text ch indexed=true stored=true"/ /fields types